[2024-12-15 20:26:23,464][00278] Saving configuration to /content/train_dir/default_experiment/config.json... [2024-12-15 20:26:23,466][00278] Rollout worker 0 uses device cpu [2024-12-15 20:26:23,469][00278] Rollout worker 1 uses device cpu [2024-12-15 20:26:23,470][00278] Rollout worker 2 uses device cpu [2024-12-15 20:26:23,471][00278] Rollout worker 3 uses device cpu [2024-12-15 20:26:23,472][00278] Rollout worker 4 uses device cpu [2024-12-15 20:26:23,473][00278] Rollout worker 5 uses device cpu [2024-12-15 20:26:23,474][00278] Rollout worker 6 uses device cpu [2024-12-15 20:26:23,475][00278] Rollout worker 7 uses device cpu [2024-12-15 20:26:23,627][00278] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-12-15 20:26:23,629][00278] InferenceWorker_p0-w0: min num requests: 2 [2024-12-15 20:26:23,664][00278] Starting all processes... [2024-12-15 20:26:23,666][00278] Starting process learner_proc0 [2024-12-15 20:26:23,721][00278] Starting all processes... [2024-12-15 20:26:23,731][00278] Starting process inference_proc0-0 [2024-12-15 20:26:23,734][00278] Starting process rollout_proc0 [2024-12-15 20:26:23,737][00278] Starting process rollout_proc1 [2024-12-15 20:26:23,738][00278] Starting process rollout_proc2 [2024-12-15 20:26:23,738][00278] Starting process rollout_proc3 [2024-12-15 20:26:23,738][00278] Starting process rollout_proc4 [2024-12-15 20:26:23,738][00278] Starting process rollout_proc5 [2024-12-15 20:26:23,738][00278] Starting process rollout_proc6 [2024-12-15 20:26:23,738][00278] Starting process rollout_proc7 [2024-12-15 20:26:40,968][02481] Worker 4 uses CPU cores [0] [2024-12-15 20:26:41,042][02478] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-12-15 20:26:41,044][02478] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2024-12-15 20:26:41,165][02478] Num visible devices: 1 [2024-12-15 20:26:41,220][02484] Worker 5 uses CPU cores [1] [2024-12-15 20:26:41,326][02465] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-12-15 20:26:41,331][02465] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2024-12-15 20:26:41,368][02465] Num visible devices: 1 [2024-12-15 20:26:41,397][02465] Starting seed is not provided [2024-12-15 20:26:41,398][02465] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-12-15 20:26:41,399][02465] Initializing actor-critic model on device cuda:0 [2024-12-15 20:26:41,400][02465] RunningMeanStd input shape: (3, 72, 128) [2024-12-15 20:26:41,403][02465] RunningMeanStd input shape: (1,) [2024-12-15 20:26:41,473][02482] Worker 2 uses CPU cores [0] [2024-12-15 20:26:41,474][02465] ConvEncoder: input_channels=3 [2024-12-15 20:26:41,496][02486] Worker 6 uses CPU cores [0] [2024-12-15 20:26:41,653][02485] Worker 7 uses CPU cores [1] [2024-12-15 20:26:41,730][02479] Worker 0 uses CPU cores [0] [2024-12-15 20:26:41,744][02483] Worker 3 uses CPU cores [1] [2024-12-15 20:26:41,796][02480] Worker 1 uses CPU cores [1] [2024-12-15 20:26:42,013][02465] Conv encoder output size: 512 [2024-12-15 20:26:42,015][02465] Policy head output size: 512 [2024-12-15 20:26:42,084][02465] Created Actor Critic model with architecture: [2024-12-15 20:26:42,085][02465] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2024-12-15 20:26:42,629][02465] Using optimizer [2024-12-15 20:26:43,621][00278] Heartbeat connected on Batcher_0 [2024-12-15 20:26:43,628][00278] Heartbeat connected on InferenceWorker_p0-w0 [2024-12-15 20:26:43,636][00278] Heartbeat connected on RolloutWorker_w0 [2024-12-15 20:26:43,639][00278] Heartbeat connected on RolloutWorker_w1 [2024-12-15 20:26:43,644][00278] Heartbeat connected on RolloutWorker_w2 [2024-12-15 20:26:43,647][00278] Heartbeat connected on RolloutWorker_w3 [2024-12-15 20:26:43,653][00278] Heartbeat connected on RolloutWorker_w4 [2024-12-15 20:26:43,657][00278] Heartbeat connected on RolloutWorker_w5 [2024-12-15 20:26:43,660][00278] Heartbeat connected on RolloutWorker_w6 [2024-12-15 20:26:43,664][00278] Heartbeat connected on RolloutWorker_w7 [2024-12-15 20:26:46,314][02465] No checkpoints found [2024-12-15 20:26:46,314][02465] Did not load from checkpoint, starting from scratch! [2024-12-15 20:26:46,316][02465] Initialized policy 0 weights for model version 0 [2024-12-15 20:26:46,320][02465] LearnerWorker_p0 finished initialization! [2024-12-15 20:26:46,320][00278] Heartbeat connected on LearnerWorker_p0 [2024-12-15 20:26:46,322][02465] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-12-15 20:26:46,553][02478] RunningMeanStd input shape: (3, 72, 128) [2024-12-15 20:26:46,554][02478] RunningMeanStd input shape: (1,) [2024-12-15 20:26:46,576][02478] ConvEncoder: input_channels=3 [2024-12-15 20:26:46,739][02478] Conv encoder output size: 512 [2024-12-15 20:26:46,740][02478] Policy head output size: 512 [2024-12-15 20:26:46,813][00278] Inference worker 0-0 is ready! [2024-12-15 20:26:46,815][00278] All inference workers are ready! Signal rollout workers to start! [2024-12-15 20:26:46,990][02479] Doom resolution: 160x120, resize resolution: (128, 72) [2024-12-15 20:26:46,992][02486] Doom resolution: 160x120, resize resolution: (128, 72) [2024-12-15 20:26:46,997][02481] Doom resolution: 160x120, resize resolution: (128, 72) [2024-12-15 20:26:46,995][02482] Doom resolution: 160x120, resize resolution: (128, 72) [2024-12-15 20:26:47,172][02483] Doom resolution: 160x120, resize resolution: (128, 72) [2024-12-15 20:26:47,174][02485] Doom resolution: 160x120, resize resolution: (128, 72) [2024-12-15 20:26:47,175][02480] Doom resolution: 160x120, resize resolution: (128, 72) [2024-12-15 20:26:47,170][02484] Doom resolution: 160x120, resize resolution: (128, 72) [2024-12-15 20:26:47,687][00278] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-12-15 20:26:49,031][02480] Decorrelating experience for 0 frames... [2024-12-15 20:26:49,032][02484] Decorrelating experience for 0 frames... [2024-12-15 20:26:49,040][02483] Decorrelating experience for 0 frames... [2024-12-15 20:26:49,113][02481] Decorrelating experience for 0 frames... [2024-12-15 20:26:49,116][02479] Decorrelating experience for 0 frames... [2024-12-15 20:26:49,119][02486] Decorrelating experience for 0 frames... [2024-12-15 20:26:50,510][02480] Decorrelating experience for 32 frames... [2024-12-15 20:26:50,513][02484] Decorrelating experience for 32 frames... [2024-12-15 20:26:50,679][02485] Decorrelating experience for 0 frames... [2024-12-15 20:26:51,055][02481] Decorrelating experience for 32 frames... [2024-12-15 20:26:51,064][02486] Decorrelating experience for 32 frames... [2024-12-15 20:26:51,063][02482] Decorrelating experience for 0 frames... [2024-12-15 20:26:52,151][02480] Decorrelating experience for 64 frames... [2024-12-15 20:26:52,176][02479] Decorrelating experience for 32 frames... [2024-12-15 20:26:52,438][02484] Decorrelating experience for 64 frames... [2024-12-15 20:26:52,455][02483] Decorrelating experience for 32 frames... [2024-12-15 20:26:52,486][02482] Decorrelating experience for 32 frames... [2024-12-15 20:26:52,686][00278] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-12-15 20:26:52,809][02481] Decorrelating experience for 64 frames... [2024-12-15 20:26:53,420][02479] Decorrelating experience for 64 frames... [2024-12-15 20:26:53,432][02480] Decorrelating experience for 96 frames... [2024-12-15 20:26:53,570][02486] Decorrelating experience for 64 frames... [2024-12-15 20:26:53,775][02485] Decorrelating experience for 32 frames... [2024-12-15 20:26:54,060][02483] Decorrelating experience for 64 frames... [2024-12-15 20:26:54,926][02482] Decorrelating experience for 64 frames... [2024-12-15 20:26:55,031][02484] Decorrelating experience for 96 frames... [2024-12-15 20:26:55,038][02486] Decorrelating experience for 96 frames... [2024-12-15 20:26:55,215][02479] Decorrelating experience for 96 frames... [2024-12-15 20:26:55,880][02485] Decorrelating experience for 64 frames... [2024-12-15 20:26:56,013][02483] Decorrelating experience for 96 frames... [2024-12-15 20:26:56,080][02481] Decorrelating experience for 96 frames... [2024-12-15 20:26:56,516][02482] Decorrelating experience for 96 frames... [2024-12-15 20:26:57,686][00278] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 3.8. Samples: 38. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-12-15 20:26:57,691][00278] Avg episode reward: [(0, '1.560')] [2024-12-15 20:26:58,755][02485] Decorrelating experience for 96 frames... [2024-12-15 20:26:59,185][02465] Signal inference workers to stop experience collection... [2024-12-15 20:26:59,201][02478] InferenceWorker_p0-w0: stopping experience collection [2024-12-15 20:27:02,338][02465] Signal inference workers to resume experience collection... [2024-12-15 20:27:02,341][02478] InferenceWorker_p0-w0: resuming experience collection [2024-12-15 20:27:02,687][00278] Fps is (10 sec: 409.6, 60 sec: 273.1, 300 sec: 273.1). Total num frames: 4096. Throughput: 0: 178.4. Samples: 2676. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-12-15 20:27:02,693][00278] Avg episode reward: [(0, '2.700')] [2024-12-15 20:27:07,686][00278] Fps is (10 sec: 2048.0, 60 sec: 1024.1, 300 sec: 1024.1). Total num frames: 20480. Throughput: 0: 196.9. Samples: 3938. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-15 20:27:07,689][00278] Avg episode reward: [(0, '3.685')] [2024-12-15 20:27:12,221][02478] Updated weights for policy 0, policy_version 10 (0.0248) [2024-12-15 20:27:12,686][00278] Fps is (10 sec: 3686.4, 60 sec: 1638.5, 300 sec: 1638.5). Total num frames: 40960. Throughput: 0: 367.2. Samples: 9180. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-15 20:27:12,688][00278] Avg episode reward: [(0, '4.123')] [2024-12-15 20:27:17,686][00278] Fps is (10 sec: 4505.6, 60 sec: 2184.6, 300 sec: 2184.6). Total num frames: 65536. Throughput: 0: 548.9. Samples: 16466. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-15 20:27:17,693][00278] Avg episode reward: [(0, '4.407')] [2024-12-15 20:27:21,663][02478] Updated weights for policy 0, policy_version 20 (0.0032) [2024-12-15 20:27:22,686][00278] Fps is (10 sec: 4095.9, 60 sec: 2340.6, 300 sec: 2340.6). Total num frames: 81920. Throughput: 0: 561.7. Samples: 19660. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-15 20:27:22,689][00278] Avg episode reward: [(0, '4.350')] [2024-12-15 20:27:27,686][00278] Fps is (10 sec: 3686.3, 60 sec: 2560.1, 300 sec: 2560.1). Total num frames: 102400. Throughput: 0: 601.7. Samples: 24066. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-15 20:27:27,690][00278] Avg episode reward: [(0, '4.441')] [2024-12-15 20:27:27,701][02465] Saving new best policy, reward=4.441! [2024-12-15 20:27:31,874][02478] Updated weights for policy 0, policy_version 30 (0.0021) [2024-12-15 20:27:32,686][00278] Fps is (10 sec: 4505.7, 60 sec: 2821.8, 300 sec: 2821.8). Total num frames: 126976. Throughput: 0: 695.6. Samples: 31302. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-12-15 20:27:32,690][00278] Avg episode reward: [(0, '4.424')] [2024-12-15 20:27:37,686][00278] Fps is (10 sec: 4505.7, 60 sec: 2949.2, 300 sec: 2949.2). Total num frames: 147456. Throughput: 0: 773.0. Samples: 34786. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-15 20:27:37,688][00278] Avg episode reward: [(0, '4.427')] [2024-12-15 20:27:42,686][00278] Fps is (10 sec: 3276.8, 60 sec: 2904.5, 300 sec: 2904.5). Total num frames: 159744. Throughput: 0: 879.2. Samples: 39604. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-15 20:27:42,691][00278] Avg episode reward: [(0, '4.434')] [2024-12-15 20:27:42,859][02478] Updated weights for policy 0, policy_version 40 (0.0037) [2024-12-15 20:27:47,686][00278] Fps is (10 sec: 3686.4, 60 sec: 3072.1, 300 sec: 3072.1). Total num frames: 184320. Throughput: 0: 965.7. Samples: 46134. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-15 20:27:47,689][00278] Avg episode reward: [(0, '4.387')] [2024-12-15 20:27:51,482][02478] Updated weights for policy 0, policy_version 50 (0.0037) [2024-12-15 20:27:52,687][00278] Fps is (10 sec: 4914.7, 60 sec: 3481.5, 300 sec: 3213.8). Total num frames: 208896. Throughput: 0: 1017.9. Samples: 49746. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-15 20:27:52,689][00278] Avg episode reward: [(0, '4.497')] [2024-12-15 20:27:52,695][02465] Saving new best policy, reward=4.497! [2024-12-15 20:27:57,686][00278] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3159.8). Total num frames: 221184. Throughput: 0: 1026.2. Samples: 55358. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-15 20:27:57,695][00278] Avg episode reward: [(0, '4.360')] [2024-12-15 20:28:02,686][00278] Fps is (10 sec: 3277.1, 60 sec: 3959.5, 300 sec: 3222.2). Total num frames: 241664. Throughput: 0: 989.2. Samples: 60980. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-15 20:28:02,693][00278] Avg episode reward: [(0, '4.387')] [2024-12-15 20:28:02,831][02478] Updated weights for policy 0, policy_version 60 (0.0020) [2024-12-15 20:28:07,686][00278] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 3328.0). Total num frames: 266240. Throughput: 0: 995.7. Samples: 64468. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-15 20:28:07,692][00278] Avg episode reward: [(0, '4.531')] [2024-12-15 20:28:07,699][02465] Saving new best policy, reward=4.531! [2024-12-15 20:28:12,525][02478] Updated weights for policy 0, policy_version 70 (0.0020) [2024-12-15 20:28:12,686][00278] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 3373.2). Total num frames: 286720. Throughput: 0: 1043.3. Samples: 71016. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-15 20:28:12,689][00278] Avg episode reward: [(0, '4.637')] [2024-12-15 20:28:12,693][02465] Saving new best policy, reward=4.637! [2024-12-15 20:28:17,686][00278] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3367.9). Total num frames: 303104. Throughput: 0: 985.4. Samples: 75644. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-15 20:28:17,691][00278] Avg episode reward: [(0, '4.631')] [2024-12-15 20:28:17,697][02465] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000074_303104.pth... [2024-12-15 20:28:22,439][02478] Updated weights for policy 0, policy_version 80 (0.0021) [2024-12-15 20:28:22,686][00278] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 3449.3). Total num frames: 327680. Throughput: 0: 989.6. Samples: 79318. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-15 20:28:22,689][00278] Avg episode reward: [(0, '4.760')] [2024-12-15 20:28:22,694][02465] Saving new best policy, reward=4.760! [2024-12-15 20:28:27,687][00278] Fps is (10 sec: 4505.1, 60 sec: 4095.9, 300 sec: 3481.6). Total num frames: 348160. Throughput: 0: 1038.9. Samples: 86354. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-15 20:28:27,690][00278] Avg episode reward: [(0, '4.696')] [2024-12-15 20:28:32,691][00278] Fps is (10 sec: 3684.7, 60 sec: 3959.2, 300 sec: 3471.7). Total num frames: 364544. Throughput: 0: 997.8. Samples: 91038. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-15 20:28:32,693][00278] Avg episode reward: [(0, '4.666')] [2024-12-15 20:28:33,761][02478] Updated weights for policy 0, policy_version 90 (0.0023) [2024-12-15 20:28:37,686][00278] Fps is (10 sec: 3686.8, 60 sec: 3959.5, 300 sec: 3500.2). Total num frames: 385024. Throughput: 0: 981.5. Samples: 93912. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-15 20:28:37,688][00278] Avg episode reward: [(0, '4.607')] [2024-12-15 20:28:42,154][02478] Updated weights for policy 0, policy_version 100 (0.0020) [2024-12-15 20:28:42,686][00278] Fps is (10 sec: 4507.7, 60 sec: 4164.3, 300 sec: 3561.8). Total num frames: 409600. Throughput: 0: 1022.5. Samples: 101370. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-15 20:28:42,696][00278] Avg episode reward: [(0, '4.816')] [2024-12-15 20:28:42,702][02465] Saving new best policy, reward=4.816! [2024-12-15 20:28:47,686][00278] Fps is (10 sec: 4096.1, 60 sec: 4027.7, 300 sec: 3549.9). Total num frames: 425984. Throughput: 0: 1021.0. Samples: 106924. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-15 20:28:47,690][00278] Avg episode reward: [(0, '4.737')] [2024-12-15 20:28:52,686][00278] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3571.7). Total num frames: 446464. Throughput: 0: 995.1. Samples: 109246. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-15 20:28:52,689][00278] Avg episode reward: [(0, '4.771')] [2024-12-15 20:28:53,257][02478] Updated weights for policy 0, policy_version 110 (0.0032) [2024-12-15 20:28:57,686][00278] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 3623.4). Total num frames: 471040. Throughput: 0: 1009.5. Samples: 116444. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-15 20:28:57,693][00278] Avg episode reward: [(0, '4.753')] [2024-12-15 20:29:02,468][02478] Updated weights for policy 0, policy_version 120 (0.0014) [2024-12-15 20:29:02,690][00278] Fps is (10 sec: 4504.9, 60 sec: 4164.2, 300 sec: 3640.9). Total num frames: 491520. Throughput: 0: 1053.4. Samples: 123050. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-15 20:29:02,700][00278] Avg episode reward: [(0, '4.621')] [2024-12-15 20:29:07,687][00278] Fps is (10 sec: 3276.5, 60 sec: 3959.4, 300 sec: 3598.6). Total num frames: 503808. Throughput: 0: 1021.4. Samples: 125284. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-15 20:29:07,692][00278] Avg episode reward: [(0, '4.654')] [2024-12-15 20:29:12,657][02478] Updated weights for policy 0, policy_version 130 (0.0018) [2024-12-15 20:29:12,686][00278] Fps is (10 sec: 4096.7, 60 sec: 4096.0, 300 sec: 3672.3). Total num frames: 532480. Throughput: 0: 1004.3. Samples: 131544. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-15 20:29:12,691][00278] Avg episode reward: [(0, '4.680')] [2024-12-15 20:29:17,687][00278] Fps is (10 sec: 4914.9, 60 sec: 4164.2, 300 sec: 3686.4). Total num frames: 552960. Throughput: 0: 1064.1. Samples: 138920. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-15 20:29:17,692][00278] Avg episode reward: [(0, '4.712')] [2024-12-15 20:29:22,688][00278] Fps is (10 sec: 3685.7, 60 sec: 4027.6, 300 sec: 3673.2). Total num frames: 569344. Throughput: 0: 1054.3. Samples: 141358. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-15 20:29:22,690][00278] Avg episode reward: [(0, '4.675')] [2024-12-15 20:29:23,398][02478] Updated weights for policy 0, policy_version 140 (0.0031) [2024-12-15 20:29:27,686][00278] Fps is (10 sec: 3686.8, 60 sec: 4027.8, 300 sec: 3686.4). Total num frames: 589824. Throughput: 0: 1006.8. Samples: 146676. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-15 20:29:27,689][00278] Avg episode reward: [(0, '5.209')] [2024-12-15 20:29:27,714][02465] Saving new best policy, reward=5.209! [2024-12-15 20:29:32,007][02478] Updated weights for policy 0, policy_version 150 (0.0018) [2024-12-15 20:29:32,686][00278] Fps is (10 sec: 4506.4, 60 sec: 4164.6, 300 sec: 3723.7). Total num frames: 614400. Throughput: 0: 1046.6. Samples: 154020. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-15 20:29:32,689][00278] Avg episode reward: [(0, '5.301')] [2024-12-15 20:29:32,693][02465] Saving new best policy, reward=5.301! [2024-12-15 20:29:37,686][00278] Fps is (10 sec: 4505.7, 60 sec: 4164.3, 300 sec: 3734.6). Total num frames: 634880. Throughput: 0: 1064.2. Samples: 157134. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-15 20:29:37,690][00278] Avg episode reward: [(0, '5.081')] [2024-12-15 20:29:42,686][00278] Fps is (10 sec: 3686.3, 60 sec: 4027.7, 300 sec: 3721.5). Total num frames: 651264. Throughput: 0: 1006.3. Samples: 161728. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-15 20:29:42,693][00278] Avg episode reward: [(0, '4.885')] [2024-12-15 20:29:43,215][02478] Updated weights for policy 0, policy_version 160 (0.0024) [2024-12-15 20:29:47,686][00278] Fps is (10 sec: 4096.0, 60 sec: 4164.3, 300 sec: 3754.7). Total num frames: 675840. Throughput: 0: 1019.9. Samples: 168942. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-15 20:29:47,690][00278] Avg episode reward: [(0, '4.705')] [2024-12-15 20:29:51,642][02478] Updated weights for policy 0, policy_version 170 (0.0017) [2024-12-15 20:29:52,686][00278] Fps is (10 sec: 4505.7, 60 sec: 4164.3, 300 sec: 3763.9). Total num frames: 696320. Throughput: 0: 1053.5. Samples: 172692. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-15 20:29:52,689][00278] Avg episode reward: [(0, '4.913')] [2024-12-15 20:29:57,688][00278] Fps is (10 sec: 3685.8, 60 sec: 4027.6, 300 sec: 3751.1). Total num frames: 712704. Throughput: 0: 1028.1. Samples: 177812. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-15 20:29:57,691][00278] Avg episode reward: [(0, '5.056')] [2024-12-15 20:30:02,189][02478] Updated weights for policy 0, policy_version 180 (0.0023) [2024-12-15 20:30:02,686][00278] Fps is (10 sec: 4096.0, 60 sec: 4096.1, 300 sec: 3780.9). Total num frames: 737280. Throughput: 0: 1011.0. Samples: 184412. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-15 20:30:02,692][00278] Avg episode reward: [(0, '5.188')] [2024-12-15 20:30:07,686][00278] Fps is (10 sec: 4916.0, 60 sec: 4300.9, 300 sec: 3809.3). Total num frames: 761856. Throughput: 0: 1036.0. Samples: 187974. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-15 20:30:07,693][00278] Avg episode reward: [(0, '5.374')] [2024-12-15 20:30:07,701][02465] Saving new best policy, reward=5.374! [2024-12-15 20:30:12,166][02478] Updated weights for policy 0, policy_version 190 (0.0017) [2024-12-15 20:30:12,686][00278] Fps is (10 sec: 4095.9, 60 sec: 4096.0, 300 sec: 3796.3). Total num frames: 778240. Throughput: 0: 1051.9. Samples: 194012. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-15 20:30:12,692][00278] Avg episode reward: [(0, '5.841')] [2024-12-15 20:30:12,694][02465] Saving new best policy, reward=5.841! [2024-12-15 20:30:17,686][00278] Fps is (10 sec: 3686.3, 60 sec: 4096.1, 300 sec: 3803.4). Total num frames: 798720. Throughput: 0: 1011.8. Samples: 199552. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-15 20:30:17,692][00278] Avg episode reward: [(0, '6.172')] [2024-12-15 20:30:17,700][02465] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000195_798720.pth... [2024-12-15 20:30:17,821][02465] Saving new best policy, reward=6.172! [2024-12-15 20:30:21,692][02478] Updated weights for policy 0, policy_version 200 (0.0039) [2024-12-15 20:30:22,686][00278] Fps is (10 sec: 4505.4, 60 sec: 4232.6, 300 sec: 3829.3). Total num frames: 823296. Throughput: 0: 1021.8. Samples: 203114. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-15 20:30:22,692][00278] Avg episode reward: [(0, '5.963')] [2024-12-15 20:30:27,686][00278] Fps is (10 sec: 4096.0, 60 sec: 4164.3, 300 sec: 3816.7). Total num frames: 839680. Throughput: 0: 1071.6. Samples: 209950. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-15 20:30:27,692][00278] Avg episode reward: [(0, '5.477')] [2024-12-15 20:30:32,686][00278] Fps is (10 sec: 3277.0, 60 sec: 4027.7, 300 sec: 3804.7). Total num frames: 856064. Throughput: 0: 1013.8. Samples: 214562. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-15 20:30:32,688][00278] Avg episode reward: [(0, '5.453')] [2024-12-15 20:30:32,818][02478] Updated weights for policy 0, policy_version 210 (0.0014) [2024-12-15 20:30:37,686][00278] Fps is (10 sec: 4096.1, 60 sec: 4096.0, 300 sec: 3828.9). Total num frames: 880640. Throughput: 0: 1010.6. Samples: 218170. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-15 20:30:37,689][00278] Avg episode reward: [(0, '5.937')] [2024-12-15 20:30:41,146][02478] Updated weights for policy 0, policy_version 220 (0.0019) [2024-12-15 20:30:42,686][00278] Fps is (10 sec: 4915.2, 60 sec: 4232.6, 300 sec: 3852.0). Total num frames: 905216. Throughput: 0: 1062.7. Samples: 225632. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-15 20:30:42,696][00278] Avg episode reward: [(0, '6.495')] [2024-12-15 20:30:42,698][02465] Saving new best policy, reward=6.495! [2024-12-15 20:30:47,686][00278] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 3840.0). Total num frames: 921600. Throughput: 0: 1023.8. Samples: 230482. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-15 20:30:47,689][00278] Avg episode reward: [(0, '6.553')] [2024-12-15 20:30:47,697][02465] Saving new best policy, reward=6.553! [2024-12-15 20:30:52,204][02478] Updated weights for policy 0, policy_version 230 (0.0027) [2024-12-15 20:30:52,686][00278] Fps is (10 sec: 3686.4, 60 sec: 4096.0, 300 sec: 3845.2). Total num frames: 942080. Throughput: 0: 1006.0. Samples: 233246. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-15 20:30:52,693][00278] Avg episode reward: [(0, '6.398')] [2024-12-15 20:30:57,686][00278] Fps is (10 sec: 4505.6, 60 sec: 4232.7, 300 sec: 3866.6). Total num frames: 966656. Throughput: 0: 1038.4. Samples: 240740. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-15 20:30:57,694][00278] Avg episode reward: [(0, '6.152')] [2024-12-15 20:31:01,406][02478] Updated weights for policy 0, policy_version 240 (0.0023) [2024-12-15 20:31:02,686][00278] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 3855.1). Total num frames: 983040. Throughput: 0: 1046.6. Samples: 246648. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-15 20:31:02,690][00278] Avg episode reward: [(0, '5.997')] [2024-12-15 20:31:07,686][00278] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3859.7). Total num frames: 1003520. Throughput: 0: 1018.4. Samples: 248940. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-15 20:31:07,691][00278] Avg episode reward: [(0, '6.293')] [2024-12-15 20:31:11,460][02478] Updated weights for policy 0, policy_version 250 (0.0021) [2024-12-15 20:31:12,686][00278] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 3879.6). Total num frames: 1028096. Throughput: 0: 1020.1. Samples: 255854. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-15 20:31:12,688][00278] Avg episode reward: [(0, '6.900')] [2024-12-15 20:31:12,693][02465] Saving new best policy, reward=6.900! [2024-12-15 20:31:17,686][00278] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 3883.6). Total num frames: 1048576. Throughput: 0: 1069.0. Samples: 262668. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-15 20:31:17,689][00278] Avg episode reward: [(0, '7.257')] [2024-12-15 20:31:17,695][02465] Saving new best policy, reward=7.257! [2024-12-15 20:31:22,413][02478] Updated weights for policy 0, policy_version 260 (0.0035) [2024-12-15 20:31:22,686][00278] Fps is (10 sec: 3686.4, 60 sec: 4027.8, 300 sec: 3872.6). Total num frames: 1064960. Throughput: 0: 1034.4. Samples: 264718. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-15 20:31:22,690][00278] Avg episode reward: [(0, '7.288')] [2024-12-15 20:31:22,694][02465] Saving new best policy, reward=7.288! [2024-12-15 20:31:27,686][00278] Fps is (10 sec: 4096.0, 60 sec: 4164.3, 300 sec: 3891.2). Total num frames: 1089536. Throughput: 0: 1005.8. Samples: 270892. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-15 20:31:27,692][00278] Avg episode reward: [(0, '7.375')] [2024-12-15 20:31:27,700][02465] Saving new best policy, reward=7.375! [2024-12-15 20:31:31,257][02478] Updated weights for policy 0, policy_version 270 (0.0028) [2024-12-15 20:31:32,686][00278] Fps is (10 sec: 4505.6, 60 sec: 4232.5, 300 sec: 3894.8). Total num frames: 1110016. Throughput: 0: 1051.9. Samples: 277816. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-15 20:31:32,689][00278] Avg episode reward: [(0, '7.580')] [2024-12-15 20:31:32,696][02465] Saving new best policy, reward=7.580! [2024-12-15 20:31:37,686][00278] Fps is (10 sec: 3686.4, 60 sec: 4096.0, 300 sec: 3884.2). Total num frames: 1126400. Throughput: 0: 1045.7. Samples: 280304. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-15 20:31:37,691][00278] Avg episode reward: [(0, '8.391')] [2024-12-15 20:31:37,702][02465] Saving new best policy, reward=8.391! [2024-12-15 20:31:42,686][00278] Fps is (10 sec: 3276.8, 60 sec: 3959.5, 300 sec: 3873.9). Total num frames: 1142784. Throughput: 0: 984.8. Samples: 285056. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-15 20:31:42,688][00278] Avg episode reward: [(0, '8.792')] [2024-12-15 20:31:42,764][02465] Saving new best policy, reward=8.792! [2024-12-15 20:31:42,775][02478] Updated weights for policy 0, policy_version 280 (0.0039) [2024-12-15 20:31:47,686][00278] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 3957.2). Total num frames: 1167360. Throughput: 0: 1012.3. Samples: 292202. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-15 20:31:47,688][00278] Avg episode reward: [(0, '9.137')] [2024-12-15 20:31:47,698][02465] Saving new best policy, reward=9.137! [2024-12-15 20:31:52,084][02478] Updated weights for policy 0, policy_version 290 (0.0015) [2024-12-15 20:31:52,686][00278] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 4026.6). Total num frames: 1187840. Throughput: 0: 1040.0. Samples: 295742. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-15 20:31:52,691][00278] Avg episode reward: [(0, '9.350')] [2024-12-15 20:31:52,694][02465] Saving new best policy, reward=9.350! [2024-12-15 20:31:57,686][00278] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 4068.2). Total num frames: 1204224. Throughput: 0: 982.7. Samples: 300074. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-15 20:31:57,692][00278] Avg episode reward: [(0, '9.074')] [2024-12-15 20:32:02,393][02478] Updated weights for policy 0, policy_version 300 (0.0013) [2024-12-15 20:32:02,686][00278] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4096.0). Total num frames: 1228800. Throughput: 0: 984.7. Samples: 306980. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-15 20:32:02,692][00278] Avg episode reward: [(0, '8.384')] [2024-12-15 20:32:07,686][00278] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 4096.0). Total num frames: 1249280. Throughput: 0: 1018.9. Samples: 310570. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-15 20:32:07,689][00278] Avg episode reward: [(0, '9.233')] [2024-12-15 20:32:12,687][00278] Fps is (10 sec: 3276.6, 60 sec: 3891.2, 300 sec: 4054.3). Total num frames: 1261568. Throughput: 0: 991.2. Samples: 315498. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-15 20:32:12,691][00278] Avg episode reward: [(0, '10.323')] [2024-12-15 20:32:12,699][02465] Saving new best policy, reward=10.323! [2024-12-15 20:32:14,099][02478] Updated weights for policy 0, policy_version 310 (0.0022) [2024-12-15 20:32:17,686][00278] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 4082.1). Total num frames: 1286144. Throughput: 0: 962.8. Samples: 321144. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-15 20:32:17,693][00278] Avg episode reward: [(0, '11.067')] [2024-12-15 20:32:17,702][02465] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000314_1286144.pth... [2024-12-15 20:32:17,835][02465] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000074_303104.pth [2024-12-15 20:32:17,852][02465] Saving new best policy, reward=11.067! [2024-12-15 20:32:22,686][00278] Fps is (10 sec: 4505.9, 60 sec: 4027.7, 300 sec: 4082.1). Total num frames: 1306624. Throughput: 0: 982.6. Samples: 324522. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-15 20:32:22,694][00278] Avg episode reward: [(0, '11.297')] [2024-12-15 20:32:22,700][02465] Saving new best policy, reward=11.297! [2024-12-15 20:32:23,017][02478] Updated weights for policy 0, policy_version 320 (0.0026) [2024-12-15 20:32:27,686][00278] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 4054.3). Total num frames: 1323008. Throughput: 0: 1011.2. Samples: 330560. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-15 20:32:27,688][00278] Avg episode reward: [(0, '12.762')] [2024-12-15 20:32:27,698][02465] Saving new best policy, reward=12.762! [2024-12-15 20:32:32,686][00278] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 4040.5). Total num frames: 1339392. Throughput: 0: 959.2. Samples: 335364. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-15 20:32:32,692][00278] Avg episode reward: [(0, '12.468')] [2024-12-15 20:32:34,397][02478] Updated weights for policy 0, policy_version 330 (0.0018) [2024-12-15 20:32:37,686][00278] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 4082.1). Total num frames: 1363968. Throughput: 0: 962.7. Samples: 339064. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-15 20:32:37,689][00278] Avg episode reward: [(0, '12.937')] [2024-12-15 20:32:37,696][02465] Saving new best policy, reward=12.937! [2024-12-15 20:32:42,686][00278] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 4068.2). Total num frames: 1384448. Throughput: 0: 1025.8. Samples: 346236. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-15 20:32:42,688][00278] Avg episode reward: [(0, '13.765')] [2024-12-15 20:32:42,692][02465] Saving new best policy, reward=13.765! [2024-12-15 20:32:44,252][02478] Updated weights for policy 0, policy_version 340 (0.0037) [2024-12-15 20:32:47,691][00278] Fps is (10 sec: 3684.6, 60 sec: 3890.9, 300 sec: 4040.4). Total num frames: 1400832. Throughput: 0: 970.4. Samples: 350654. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-15 20:32:47,696][00278] Avg episode reward: [(0, '13.115')] [2024-12-15 20:32:52,686][00278] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 4082.1). Total num frames: 1425408. Throughput: 0: 962.3. Samples: 353874. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-15 20:32:52,688][00278] Avg episode reward: [(0, '13.612')] [2024-12-15 20:32:53,941][02478] Updated weights for policy 0, policy_version 350 (0.0019) [2024-12-15 20:32:57,686][00278] Fps is (10 sec: 4917.6, 60 sec: 4096.0, 300 sec: 4096.0). Total num frames: 1449984. Throughput: 0: 1016.2. Samples: 361228. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-15 20:32:57,692][00278] Avg episode reward: [(0, '15.557')] [2024-12-15 20:32:57,699][02465] Saving new best policy, reward=15.557! [2024-12-15 20:33:02,686][00278] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 4068.2). Total num frames: 1466368. Throughput: 0: 1008.6. Samples: 366532. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-15 20:33:02,689][00278] Avg episode reward: [(0, '15.585')] [2024-12-15 20:33:02,691][02465] Saving new best policy, reward=15.585! [2024-12-15 20:33:05,107][02478] Updated weights for policy 0, policy_version 360 (0.0031) [2024-12-15 20:33:07,686][00278] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 4054.3). Total num frames: 1482752. Throughput: 0: 986.0. Samples: 368890. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-15 20:33:07,690][00278] Avg episode reward: [(0, '15.715')] [2024-12-15 20:33:07,701][02465] Saving new best policy, reward=15.715! [2024-12-15 20:33:12,686][00278] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4082.1). Total num frames: 1507328. Throughput: 0: 1009.2. Samples: 375976. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-15 20:33:12,689][00278] Avg episode reward: [(0, '16.470')] [2024-12-15 20:33:12,695][02465] Saving new best policy, reward=16.470! [2024-12-15 20:33:13,832][02478] Updated weights for policy 0, policy_version 370 (0.0028) [2024-12-15 20:33:17,686][00278] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 4068.2). Total num frames: 1527808. Throughput: 0: 1038.3. Samples: 382086. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-15 20:33:17,690][00278] Avg episode reward: [(0, '15.154')] [2024-12-15 20:33:22,686][00278] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 4054.4). Total num frames: 1544192. Throughput: 0: 1006.9. Samples: 384374. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-15 20:33:22,690][00278] Avg episode reward: [(0, '14.937')] [2024-12-15 20:33:24,999][02478] Updated weights for policy 0, policy_version 380 (0.0036) [2024-12-15 20:33:27,686][00278] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4082.2). Total num frames: 1568768. Throughput: 0: 990.2. Samples: 390796. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-15 20:33:27,688][00278] Avg episode reward: [(0, '15.539')] [2024-12-15 20:33:32,686][00278] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 4082.1). Total num frames: 1589248. Throughput: 0: 1052.8. Samples: 398024. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-15 20:33:32,691][00278] Avg episode reward: [(0, '15.872')] [2024-12-15 20:33:34,265][02478] Updated weights for policy 0, policy_version 390 (0.0028) [2024-12-15 20:33:37,687][00278] Fps is (10 sec: 3686.2, 60 sec: 4027.7, 300 sec: 4054.3). Total num frames: 1605632. Throughput: 0: 1029.4. Samples: 400198. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-15 20:33:37,690][00278] Avg episode reward: [(0, '16.593')] [2024-12-15 20:33:37,701][02465] Saving new best policy, reward=16.593! [2024-12-15 20:33:42,686][00278] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4068.2). Total num frames: 1626112. Throughput: 0: 988.1. Samples: 405692. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-15 20:33:42,693][00278] Avg episode reward: [(0, '18.505')] [2024-12-15 20:33:42,745][02465] Saving new best policy, reward=18.505! [2024-12-15 20:33:44,383][02478] Updated weights for policy 0, policy_version 400 (0.0017) [2024-12-15 20:33:47,686][00278] Fps is (10 sec: 4505.8, 60 sec: 4164.6, 300 sec: 4082.1). Total num frames: 1650688. Throughput: 0: 1032.9. Samples: 413012. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-15 20:33:47,688][00278] Avg episode reward: [(0, '17.901')] [2024-12-15 20:33:52,689][00278] Fps is (10 sec: 4094.9, 60 sec: 4027.5, 300 sec: 4054.3). Total num frames: 1667072. Throughput: 0: 1049.2. Samples: 416106. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-15 20:33:52,691][00278] Avg episode reward: [(0, '17.831')] [2024-12-15 20:33:55,600][02478] Updated weights for policy 0, policy_version 410 (0.0030) [2024-12-15 20:33:57,686][00278] Fps is (10 sec: 3686.5, 60 sec: 3959.5, 300 sec: 4054.4). Total num frames: 1687552. Throughput: 0: 995.7. Samples: 420784. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-15 20:33:57,688][00278] Avg episode reward: [(0, '16.781')] [2024-12-15 20:34:02,686][00278] Fps is (10 sec: 4506.8, 60 sec: 4096.0, 300 sec: 4096.0). Total num frames: 1712128. Throughput: 0: 1026.4. Samples: 428274. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-15 20:34:02,692][00278] Avg episode reward: [(0, '15.842')] [2024-12-15 20:34:03,695][02478] Updated weights for policy 0, policy_version 420 (0.0028) [2024-12-15 20:34:07,687][00278] Fps is (10 sec: 4505.3, 60 sec: 4164.2, 300 sec: 4068.2). Total num frames: 1732608. Throughput: 0: 1057.7. Samples: 431970. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-15 20:34:07,692][00278] Avg episode reward: [(0, '16.209')] [2024-12-15 20:34:12,686][00278] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4054.4). Total num frames: 1748992. Throughput: 0: 1020.1. Samples: 436700. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-15 20:34:12,688][00278] Avg episode reward: [(0, '16.907')] [2024-12-15 20:34:14,921][02478] Updated weights for policy 0, policy_version 430 (0.0028) [2024-12-15 20:34:17,686][00278] Fps is (10 sec: 4096.2, 60 sec: 4096.0, 300 sec: 4082.1). Total num frames: 1773568. Throughput: 0: 1007.3. Samples: 443354. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-15 20:34:17,692][00278] Avg episode reward: [(0, '16.031')] [2024-12-15 20:34:17,701][02465] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000433_1773568.pth... [2024-12-15 20:34:17,832][02465] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000195_798720.pth [2024-12-15 20:34:22,686][00278] Fps is (10 sec: 4915.2, 60 sec: 4232.5, 300 sec: 4096.0). Total num frames: 1798144. Throughput: 0: 1039.7. Samples: 446984. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-15 20:34:22,688][00278] Avg episode reward: [(0, '16.587')] [2024-12-15 20:34:23,661][02478] Updated weights for policy 0, policy_version 440 (0.0015) [2024-12-15 20:34:27,686][00278] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4054.3). Total num frames: 1810432. Throughput: 0: 1045.6. Samples: 452742. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-15 20:34:27,691][00278] Avg episode reward: [(0, '17.522')] [2024-12-15 20:34:32,686][00278] Fps is (10 sec: 3686.4, 60 sec: 4096.0, 300 sec: 4068.2). Total num frames: 1835008. Throughput: 0: 1008.8. Samples: 458406. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-15 20:34:32,689][00278] Avg episode reward: [(0, '17.727')] [2024-12-15 20:34:34,123][02478] Updated weights for policy 0, policy_version 450 (0.0024) [2024-12-15 20:34:37,686][00278] Fps is (10 sec: 4915.2, 60 sec: 4232.6, 300 sec: 4096.0). Total num frames: 1859584. Throughput: 0: 1023.6. Samples: 462164. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-15 20:34:37,690][00278] Avg episode reward: [(0, '18.440')] [2024-12-15 20:34:42,686][00278] Fps is (10 sec: 4095.9, 60 sec: 4164.3, 300 sec: 4068.2). Total num frames: 1875968. Throughput: 0: 1065.9. Samples: 468750. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-15 20:34:42,690][00278] Avg episode reward: [(0, '18.703')] [2024-12-15 20:34:42,694][02465] Saving new best policy, reward=18.703! [2024-12-15 20:34:44,757][02478] Updated weights for policy 0, policy_version 460 (0.0019) [2024-12-15 20:34:47,686][00278] Fps is (10 sec: 3276.8, 60 sec: 4027.8, 300 sec: 4054.3). Total num frames: 1892352. Throughput: 0: 1004.8. Samples: 473488. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-15 20:34:47,692][00278] Avg episode reward: [(0, '18.325')] [2024-12-15 20:34:52,686][00278] Fps is (10 sec: 4096.1, 60 sec: 4164.5, 300 sec: 4082.1). Total num frames: 1916928. Throughput: 0: 1004.1. Samples: 477154. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-15 20:34:52,690][00278] Avg episode reward: [(0, '18.689')] [2024-12-15 20:34:53,552][02478] Updated weights for policy 0, policy_version 470 (0.0018) [2024-12-15 20:34:57,686][00278] Fps is (10 sec: 4915.2, 60 sec: 4232.5, 300 sec: 4082.1). Total num frames: 1941504. Throughput: 0: 1066.0. Samples: 484672. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-15 20:34:57,693][00278] Avg episode reward: [(0, '19.805')] [2024-12-15 20:34:57,700][02465] Saving new best policy, reward=19.805! [2024-12-15 20:35:02,686][00278] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4040.5). Total num frames: 1953792. Throughput: 0: 1017.5. Samples: 489140. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-15 20:35:02,691][00278] Avg episode reward: [(0, '20.151')] [2024-12-15 20:35:02,693][02465] Saving new best policy, reward=20.151! [2024-12-15 20:35:04,775][02478] Updated weights for policy 0, policy_version 480 (0.0032) [2024-12-15 20:35:07,686][00278] Fps is (10 sec: 3686.4, 60 sec: 4096.0, 300 sec: 4068.2). Total num frames: 1978368. Throughput: 0: 1005.2. Samples: 492220. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-15 20:35:07,693][00278] Avg episode reward: [(0, '21.407')] [2024-12-15 20:35:07,703][02465] Saving new best policy, reward=21.407! [2024-12-15 20:35:12,686][00278] Fps is (10 sec: 4915.2, 60 sec: 4232.5, 300 sec: 4082.1). Total num frames: 2002944. Throughput: 0: 1035.3. Samples: 499332. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-15 20:35:12,689][00278] Avg episode reward: [(0, '20.755')] [2024-12-15 20:35:13,369][02478] Updated weights for policy 0, policy_version 490 (0.0019) [2024-12-15 20:35:17,686][00278] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4054.4). Total num frames: 2019328. Throughput: 0: 1034.1. Samples: 504942. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-15 20:35:17,694][00278] Avg episode reward: [(0, '21.468')] [2024-12-15 20:35:17,709][02465] Saving new best policy, reward=21.468! [2024-12-15 20:35:22,686][00278] Fps is (10 sec: 3276.8, 60 sec: 3959.5, 300 sec: 4054.3). Total num frames: 2035712. Throughput: 0: 1000.4. Samples: 507182. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-15 20:35:22,688][00278] Avg episode reward: [(0, '21.047')] [2024-12-15 20:35:24,399][02478] Updated weights for policy 0, policy_version 500 (0.0032) [2024-12-15 20:35:27,686][00278] Fps is (10 sec: 4505.6, 60 sec: 4232.5, 300 sec: 4096.0). Total num frames: 2064384. Throughput: 0: 1016.5. Samples: 514494. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-15 20:35:27,688][00278] Avg episode reward: [(0, '19.855')] [2024-12-15 20:35:32,686][00278] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 4068.2). Total num frames: 2080768. Throughput: 0: 1056.0. Samples: 521008. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-15 20:35:32,690][00278] Avg episode reward: [(0, '19.218')] [2024-12-15 20:35:34,259][02478] Updated weights for policy 0, policy_version 510 (0.0015) [2024-12-15 20:35:37,686][00278] Fps is (10 sec: 3276.8, 60 sec: 3959.5, 300 sec: 4040.5). Total num frames: 2097152. Throughput: 0: 1024.3. Samples: 523248. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-15 20:35:37,693][00278] Avg episode reward: [(0, '19.567')] [2024-12-15 20:35:42,686][00278] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4068.2). Total num frames: 2121728. Throughput: 0: 997.4. Samples: 529556. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-15 20:35:42,692][00278] Avg episode reward: [(0, '18.643')] [2024-12-15 20:35:43,607][02478] Updated weights for policy 0, policy_version 520 (0.0044) [2024-12-15 20:35:47,686][00278] Fps is (10 sec: 4915.2, 60 sec: 4232.5, 300 sec: 4082.1). Total num frames: 2146304. Throughput: 0: 1063.9. Samples: 537014. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-15 20:35:47,693][00278] Avg episode reward: [(0, '19.510')] [2024-12-15 20:35:52,686][00278] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4054.3). Total num frames: 2162688. Throughput: 0: 1044.4. Samples: 539220. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-15 20:35:52,690][00278] Avg episode reward: [(0, '19.152')] [2024-12-15 20:35:54,528][02478] Updated weights for policy 0, policy_version 530 (0.0020) [2024-12-15 20:35:57,686][00278] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4068.2). Total num frames: 2183168. Throughput: 0: 1010.7. Samples: 544814. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-15 20:35:57,689][00278] Avg episode reward: [(0, '18.489')] [2024-12-15 20:36:02,686][00278] Fps is (10 sec: 4505.6, 60 sec: 4232.5, 300 sec: 4082.1). Total num frames: 2207744. Throughput: 0: 1050.9. Samples: 552232. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-15 20:36:02,688][00278] Avg episode reward: [(0, '18.714')] [2024-12-15 20:36:02,994][02478] Updated weights for policy 0, policy_version 540 (0.0016) [2024-12-15 20:36:07,686][00278] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4054.3). Total num frames: 2224128. Throughput: 0: 1068.7. Samples: 555274. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-15 20:36:07,689][00278] Avg episode reward: [(0, '18.830')] [2024-12-15 20:36:12,686][00278] Fps is (10 sec: 3686.3, 60 sec: 4027.7, 300 sec: 4054.3). Total num frames: 2244608. Throughput: 0: 1006.4. Samples: 559780. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-15 20:36:12,688][00278] Avg episode reward: [(0, '18.727')] [2024-12-15 20:36:14,037][02478] Updated weights for policy 0, policy_version 550 (0.0037) [2024-12-15 20:36:17,686][00278] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 4082.1). Total num frames: 2269184. Throughput: 0: 1026.4. Samples: 567194. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-15 20:36:17,689][00278] Avg episode reward: [(0, '20.602')] [2024-12-15 20:36:17,700][02465] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000554_2269184.pth... [2024-12-15 20:36:17,819][02465] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000314_1286144.pth [2024-12-15 20:36:22,688][00278] Fps is (10 sec: 4504.7, 60 sec: 4232.4, 300 sec: 4068.2). Total num frames: 2289664. Throughput: 0: 1056.6. Samples: 570796. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-15 20:36:22,693][00278] Avg episode reward: [(0, '21.691')] [2024-12-15 20:36:22,698][02465] Saving new best policy, reward=21.691! [2024-12-15 20:36:23,770][02478] Updated weights for policy 0, policy_version 560 (0.0021) [2024-12-15 20:36:27,686][00278] Fps is (10 sec: 3276.8, 60 sec: 3959.5, 300 sec: 4040.5). Total num frames: 2301952. Throughput: 0: 1021.6. Samples: 575530. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-15 20:36:27,692][00278] Avg episode reward: [(0, '21.249')] [2024-12-15 20:36:32,686][00278] Fps is (10 sec: 3687.2, 60 sec: 4096.0, 300 sec: 4068.2). Total num frames: 2326528. Throughput: 0: 995.2. Samples: 581796. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-15 20:36:32,693][00278] Avg episode reward: [(0, '19.903')] [2024-12-15 20:36:33,862][02478] Updated weights for policy 0, policy_version 570 (0.0013) [2024-12-15 20:36:37,687][00278] Fps is (10 sec: 4914.8, 60 sec: 4232.5, 300 sec: 4096.0). Total num frames: 2351104. Throughput: 0: 1025.3. Samples: 585358. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-15 20:36:37,689][00278] Avg episode reward: [(0, '18.086')] [2024-12-15 20:36:42,686][00278] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4054.3). Total num frames: 2363392. Throughput: 0: 1027.3. Samples: 591042. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-15 20:36:42,692][00278] Avg episode reward: [(0, '17.773')] [2024-12-15 20:36:45,358][02478] Updated weights for policy 0, policy_version 580 (0.0028) [2024-12-15 20:36:47,686][00278] Fps is (10 sec: 3277.1, 60 sec: 3959.5, 300 sec: 4054.3). Total num frames: 2383872. Throughput: 0: 982.3. Samples: 596436. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-15 20:36:47,695][00278] Avg episode reward: [(0, '17.212')] [2024-12-15 20:36:52,686][00278] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 4082.1). Total num frames: 2408448. Throughput: 0: 993.7. Samples: 599992. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-15 20:36:52,688][00278] Avg episode reward: [(0, '18.609')] [2024-12-15 20:36:53,901][02478] Updated weights for policy 0, policy_version 590 (0.0013) [2024-12-15 20:36:57,686][00278] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 4068.2). Total num frames: 2428928. Throughput: 0: 1044.8. Samples: 606796. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-15 20:36:57,694][00278] Avg episode reward: [(0, '18.838')] [2024-12-15 20:37:02,686][00278] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 4054.3). Total num frames: 2445312. Throughput: 0: 975.8. Samples: 611104. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-15 20:37:02,688][00278] Avg episode reward: [(0, '20.058')] [2024-12-15 20:37:05,070][02478] Updated weights for policy 0, policy_version 600 (0.0014) [2024-12-15 20:37:07,686][00278] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4096.0). Total num frames: 2469888. Throughput: 0: 973.1. Samples: 614584. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-15 20:37:07,694][00278] Avg episode reward: [(0, '20.222')] [2024-12-15 20:37:12,686][00278] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 4082.1). Total num frames: 2490368. Throughput: 0: 1026.8. Samples: 621736. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-15 20:37:12,691][00278] Avg episode reward: [(0, '20.406')] [2024-12-15 20:37:14,873][02478] Updated weights for policy 0, policy_version 610 (0.0028) [2024-12-15 20:37:17,686][00278] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 4068.2). Total num frames: 2506752. Throughput: 0: 996.8. Samples: 626652. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-15 20:37:17,688][00278] Avg episode reward: [(0, '19.906')] [2024-12-15 20:37:22,686][00278] Fps is (10 sec: 3686.4, 60 sec: 3959.6, 300 sec: 4082.1). Total num frames: 2527232. Throughput: 0: 976.1. Samples: 629282. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-15 20:37:22,692][00278] Avg episode reward: [(0, '20.933')] [2024-12-15 20:37:24,804][02478] Updated weights for policy 0, policy_version 620 (0.0035) [2024-12-15 20:37:27,686][00278] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 4109.9). Total num frames: 2551808. Throughput: 0: 1010.1. Samples: 636498. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-15 20:37:27,688][00278] Avg episode reward: [(0, '20.106')] [2024-12-15 20:37:32,686][00278] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 4082.1). Total num frames: 2568192. Throughput: 0: 1017.5. Samples: 642222. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-15 20:37:32,688][00278] Avg episode reward: [(0, '20.701')] [2024-12-15 20:37:36,310][02478] Updated weights for policy 0, policy_version 630 (0.0018) [2024-12-15 20:37:37,686][00278] Fps is (10 sec: 3276.8, 60 sec: 3891.3, 300 sec: 4068.2). Total num frames: 2584576. Throughput: 0: 988.2. Samples: 644460. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-15 20:37:37,689][00278] Avg episode reward: [(0, '21.768')] [2024-12-15 20:37:37,696][02465] Saving new best policy, reward=21.768! [2024-12-15 20:37:42,686][00278] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4096.1). Total num frames: 2609152. Throughput: 0: 982.8. Samples: 651024. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-15 20:37:42,692][00278] Avg episode reward: [(0, '21.453')] [2024-12-15 20:37:44,952][02478] Updated weights for policy 0, policy_version 640 (0.0018) [2024-12-15 20:37:47,686][00278] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 4082.1). Total num frames: 2629632. Throughput: 0: 1040.5. Samples: 657926. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-15 20:37:47,696][00278] Avg episode reward: [(0, '21.218')] [2024-12-15 20:37:52,686][00278] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 4054.3). Total num frames: 2646016. Throughput: 0: 1011.0. Samples: 660080. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-15 20:37:52,688][00278] Avg episode reward: [(0, '22.000')] [2024-12-15 20:37:52,692][02465] Saving new best policy, reward=22.000! [2024-12-15 20:37:56,226][02478] Updated weights for policy 0, policy_version 650 (0.0033) [2024-12-15 20:37:57,686][00278] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 4068.2). Total num frames: 2666496. Throughput: 0: 978.1. Samples: 665752. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-15 20:37:57,689][00278] Avg episode reward: [(0, '22.349')] [2024-12-15 20:37:57,696][02465] Saving new best policy, reward=22.349! [2024-12-15 20:38:02,686][00278] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 4096.0). Total num frames: 2691072. Throughput: 0: 1024.1. Samples: 672736. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-15 20:38:02,689][00278] Avg episode reward: [(0, '21.872')] [2024-12-15 20:38:06,486][02478] Updated weights for policy 0, policy_version 660 (0.0024) [2024-12-15 20:38:07,686][00278] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 4054.3). Total num frames: 2703360. Throughput: 0: 1023.6. Samples: 675344. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-15 20:38:07,688][00278] Avg episode reward: [(0, '21.331')] [2024-12-15 20:38:12,686][00278] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 4054.3). Total num frames: 2723840. Throughput: 0: 963.6. Samples: 679858. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-15 20:38:12,693][00278] Avg episode reward: [(0, '21.613')] [2024-12-15 20:38:16,453][02478] Updated weights for policy 0, policy_version 670 (0.0032) [2024-12-15 20:38:17,686][00278] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 4082.1). Total num frames: 2748416. Throughput: 0: 997.1. Samples: 687092. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-15 20:38:17,692][00278] Avg episode reward: [(0, '22.417')] [2024-12-15 20:38:17,708][02465] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000671_2748416.pth... [2024-12-15 20:38:17,819][02465] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000433_1773568.pth [2024-12-15 20:38:17,839][02465] Saving new best policy, reward=22.417! [2024-12-15 20:38:22,686][00278] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 4054.3). Total num frames: 2764800. Throughput: 0: 1021.0. Samples: 690406. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-15 20:38:22,695][00278] Avg episode reward: [(0, '22.659')] [2024-12-15 20:38:22,697][02465] Saving new best policy, reward=22.659! [2024-12-15 20:38:27,690][00278] Fps is (10 sec: 3275.5, 60 sec: 3822.7, 300 sec: 4040.4). Total num frames: 2781184. Throughput: 0: 970.5. Samples: 694700. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-15 20:38:27,695][00278] Avg episode reward: [(0, '23.903')] [2024-12-15 20:38:27,704][02465] Saving new best policy, reward=23.903! [2024-12-15 20:38:28,051][02478] Updated weights for policy 0, policy_version 680 (0.0018) [2024-12-15 20:38:32,686][00278] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 4068.2). Total num frames: 2805760. Throughput: 0: 967.6. Samples: 701468. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-15 20:38:32,693][00278] Avg episode reward: [(0, '25.073')] [2024-12-15 20:38:32,700][02465] Saving new best policy, reward=25.073! [2024-12-15 20:38:37,145][02478] Updated weights for policy 0, policy_version 690 (0.0029) [2024-12-15 20:38:37,686][00278] Fps is (10 sec: 4507.4, 60 sec: 4027.7, 300 sec: 4068.2). Total num frames: 2826240. Throughput: 0: 993.6. Samples: 704792. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-15 20:38:37,688][00278] Avg episode reward: [(0, '24.554')] [2024-12-15 20:38:42,686][00278] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 4026.6). Total num frames: 2838528. Throughput: 0: 975.4. Samples: 709646. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-15 20:38:42,690][00278] Avg episode reward: [(0, '24.600')] [2024-12-15 20:38:47,686][00278] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 4040.5). Total num frames: 2859008. Throughput: 0: 942.7. Samples: 715156. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-15 20:38:47,695][00278] Avg episode reward: [(0, '25.254')] [2024-12-15 20:38:47,704][02465] Saving new best policy, reward=25.254! [2024-12-15 20:38:48,914][02478] Updated weights for policy 0, policy_version 700 (0.0019) [2024-12-15 20:38:52,686][00278] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 4054.3). Total num frames: 2883584. Throughput: 0: 956.7. Samples: 718394. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-15 20:38:52,691][00278] Avg episode reward: [(0, '25.971')] [2024-12-15 20:38:52,696][02465] Saving new best policy, reward=25.971! [2024-12-15 20:38:57,689][00278] Fps is (10 sec: 3685.4, 60 sec: 3822.8, 300 sec: 4012.7). Total num frames: 2895872. Throughput: 0: 987.1. Samples: 724278. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-15 20:38:57,693][00278] Avg episode reward: [(0, '25.653')] [2024-12-15 20:39:00,863][02478] Updated weights for policy 0, policy_version 710 (0.0022) [2024-12-15 20:39:02,686][00278] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 4012.7). Total num frames: 2916352. Throughput: 0: 927.2. Samples: 728814. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-15 20:39:02,695][00278] Avg episode reward: [(0, '26.613')] [2024-12-15 20:39:02,703][02465] Saving new best policy, reward=26.613! [2024-12-15 20:39:07,686][00278] Fps is (10 sec: 4097.1, 60 sec: 3891.2, 300 sec: 4026.6). Total num frames: 2936832. Throughput: 0: 931.4. Samples: 732320. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-15 20:39:07,693][00278] Avg episode reward: [(0, '26.038')] [2024-12-15 20:39:09,723][02478] Updated weights for policy 0, policy_version 720 (0.0020) [2024-12-15 20:39:12,686][00278] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 4012.7). Total num frames: 2957312. Throughput: 0: 985.2. Samples: 739030. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-15 20:39:12,690][00278] Avg episode reward: [(0, '26.371')] [2024-12-15 20:39:17,686][00278] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3971.0). Total num frames: 2969600. Throughput: 0: 925.6. Samples: 743122. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-15 20:39:17,689][00278] Avg episode reward: [(0, '27.822')] [2024-12-15 20:39:17,704][02465] Saving new best policy, reward=27.822! [2024-12-15 20:39:21,786][02478] Updated weights for policy 0, policy_version 730 (0.0022) [2024-12-15 20:39:22,686][00278] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 4012.7). Total num frames: 2994176. Throughput: 0: 915.2. Samples: 745976. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-15 20:39:22,693][00278] Avg episode reward: [(0, '28.828')] [2024-12-15 20:39:22,698][02465] Saving new best policy, reward=28.828! [2024-12-15 20:39:27,686][00278] Fps is (10 sec: 4505.4, 60 sec: 3891.4, 300 sec: 3998.8). Total num frames: 3014656. Throughput: 0: 957.7. Samples: 752744. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-15 20:39:27,690][00278] Avg episode reward: [(0, '28.502')] [2024-12-15 20:39:32,687][00278] Fps is (10 sec: 3276.5, 60 sec: 3686.3, 300 sec: 3957.1). Total num frames: 3026944. Throughput: 0: 944.0. Samples: 757636. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-15 20:39:32,689][00278] Avg episode reward: [(0, '27.610')] [2024-12-15 20:39:32,715][02478] Updated weights for policy 0, policy_version 740 (0.0020) [2024-12-15 20:39:37,686][00278] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3971.0). Total num frames: 3047424. Throughput: 0: 919.1. Samples: 759754. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-15 20:39:37,690][00278] Avg episode reward: [(0, '28.427')] [2024-12-15 20:39:42,574][02478] Updated weights for policy 0, policy_version 750 (0.0020) [2024-12-15 20:39:42,686][00278] Fps is (10 sec: 4506.0, 60 sec: 3891.2, 300 sec: 3998.8). Total num frames: 3072000. Throughput: 0: 940.8. Samples: 766612. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-15 20:39:42,690][00278] Avg episode reward: [(0, '28.663')] [2024-12-15 20:39:47,686][00278] Fps is (10 sec: 4096.2, 60 sec: 3822.9, 300 sec: 3971.0). Total num frames: 3088384. Throughput: 0: 969.1. Samples: 772422. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-15 20:39:47,691][00278] Avg episode reward: [(0, '26.408')] [2024-12-15 20:39:52,686][00278] Fps is (10 sec: 2867.2, 60 sec: 3618.1, 300 sec: 3929.4). Total num frames: 3100672. Throughput: 0: 934.1. Samples: 774356. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-15 20:39:52,689][00278] Avg episode reward: [(0, '25.405')] [2024-12-15 20:39:54,587][02478] Updated weights for policy 0, policy_version 760 (0.0023) [2024-12-15 20:39:57,686][00278] Fps is (10 sec: 3686.4, 60 sec: 3823.1, 300 sec: 3971.0). Total num frames: 3125248. Throughput: 0: 916.1. Samples: 780254. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-15 20:39:57,689][00278] Avg episode reward: [(0, '26.417')] [2024-12-15 20:40:02,686][00278] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3957.2). Total num frames: 3145728. Throughput: 0: 971.8. Samples: 786854. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-15 20:40:02,689][00278] Avg episode reward: [(0, '25.721')] [2024-12-15 20:40:04,892][02478] Updated weights for policy 0, policy_version 770 (0.0018) [2024-12-15 20:40:07,686][00278] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3915.5). Total num frames: 3158016. Throughput: 0: 953.5. Samples: 788884. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-15 20:40:07,688][00278] Avg episode reward: [(0, '25.195')] [2024-12-15 20:40:12,686][00278] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3929.4). Total num frames: 3178496. Throughput: 0: 913.3. Samples: 793840. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-15 20:40:12,689][00278] Avg episode reward: [(0, '26.511')] [2024-12-15 20:40:15,820][02478] Updated weights for policy 0, policy_version 780 (0.0029) [2024-12-15 20:40:17,686][00278] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3957.2). Total num frames: 3203072. Throughput: 0: 949.6. Samples: 800366. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-15 20:40:17,688][00278] Avg episode reward: [(0, '26.907')] [2024-12-15 20:40:17,699][02465] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000782_3203072.pth... [2024-12-15 20:40:17,820][02465] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000554_2269184.pth [2024-12-15 20:40:22,686][00278] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3901.6). Total num frames: 3215360. Throughput: 0: 962.7. Samples: 803074. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-12-15 20:40:22,694][00278] Avg episode reward: [(0, '27.580')] [2024-12-15 20:40:27,686][00278] Fps is (10 sec: 2867.2, 60 sec: 3618.2, 300 sec: 3901.6). Total num frames: 3231744. Throughput: 0: 903.6. Samples: 807272. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-15 20:40:27,693][00278] Avg episode reward: [(0, '27.771')] [2024-12-15 20:40:27,922][02478] Updated weights for policy 0, policy_version 790 (0.0017) [2024-12-15 20:40:32,686][00278] Fps is (10 sec: 4096.0, 60 sec: 3823.0, 300 sec: 3929.4). Total num frames: 3256320. Throughput: 0: 927.4. Samples: 814156. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-15 20:40:32,693][00278] Avg episode reward: [(0, '29.491')] [2024-12-15 20:40:32,699][02465] Saving new best policy, reward=29.491! [2024-12-15 20:40:37,686][00278] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3901.6). Total num frames: 3272704. Throughput: 0: 957.9. Samples: 817462. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-15 20:40:37,690][00278] Avg episode reward: [(0, '29.967')] [2024-12-15 20:40:37,776][02465] Saving new best policy, reward=29.967! [2024-12-15 20:40:37,782][02478] Updated weights for policy 0, policy_version 800 (0.0017) [2024-12-15 20:40:42,686][00278] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3873.8). Total num frames: 3289088. Throughput: 0: 920.0. Samples: 821654. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-15 20:40:42,688][00278] Avg episode reward: [(0, '28.900')] [2024-12-15 20:40:47,686][00278] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3887.7). Total num frames: 3309568. Throughput: 0: 900.3. Samples: 827368. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-15 20:40:47,690][00278] Avg episode reward: [(0, '27.679')] [2024-12-15 20:40:49,237][02478] Updated weights for policy 0, policy_version 810 (0.0019) [2024-12-15 20:40:52,686][00278] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3887.7). Total num frames: 3330048. Throughput: 0: 928.7. Samples: 830674. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-15 20:40:52,693][00278] Avg episode reward: [(0, '25.895')] [2024-12-15 20:40:57,686][00278] Fps is (10 sec: 3276.7, 60 sec: 3618.1, 300 sec: 3846.1). Total num frames: 3342336. Throughput: 0: 934.3. Samples: 835886. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-12-15 20:40:57,688][00278] Avg episode reward: [(0, '24.775')] [2024-12-15 20:41:01,417][02478] Updated weights for policy 0, policy_version 820 (0.0020) [2024-12-15 20:41:02,686][00278] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3860.0). Total num frames: 3362816. Throughput: 0: 899.3. Samples: 840836. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-15 20:41:02,689][00278] Avg episode reward: [(0, '24.030')] [2024-12-15 20:41:07,686][00278] Fps is (10 sec: 4505.7, 60 sec: 3822.9, 300 sec: 3873.8). Total num frames: 3387392. Throughput: 0: 916.3. Samples: 844306. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-15 20:41:07,689][00278] Avg episode reward: [(0, '23.514')] [2024-12-15 20:41:10,648][02478] Updated weights for policy 0, policy_version 830 (0.0023) [2024-12-15 20:41:12,686][00278] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3846.1). Total num frames: 3403776. Throughput: 0: 962.4. Samples: 850582. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-15 20:41:12,691][00278] Avg episode reward: [(0, '24.977')] [2024-12-15 20:41:17,686][00278] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3832.2). Total num frames: 3420160. Throughput: 0: 899.7. Samples: 854644. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-15 20:41:17,689][00278] Avg episode reward: [(0, '26.518')] [2024-12-15 20:41:22,191][02478] Updated weights for policy 0, policy_version 840 (0.0018) [2024-12-15 20:41:22,686][00278] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3860.0). Total num frames: 3440640. Throughput: 0: 901.1. Samples: 858010. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-15 20:41:22,693][00278] Avg episode reward: [(0, '26.302')] [2024-12-15 20:41:27,686][00278] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 3461120. Throughput: 0: 963.2. Samples: 864996. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-15 20:41:27,693][00278] Avg episode reward: [(0, '27.728')] [2024-12-15 20:41:32,691][00278] Fps is (10 sec: 3275.1, 60 sec: 3617.8, 300 sec: 3804.4). Total num frames: 3473408. Throughput: 0: 929.7. Samples: 869208. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-15 20:41:32,703][00278] Avg episode reward: [(0, '27.191')] [2024-12-15 20:41:34,048][02478] Updated weights for policy 0, policy_version 850 (0.0019) [2024-12-15 20:41:37,686][00278] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3846.1). Total num frames: 3497984. Throughput: 0: 917.1. Samples: 871944. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-15 20:41:37,688][00278] Avg episode reward: [(0, '26.175')] [2024-12-15 20:41:42,686][00278] Fps is (10 sec: 4507.9, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 3518464. Throughput: 0: 955.4. Samples: 878878. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-15 20:41:42,693][00278] Avg episode reward: [(0, '24.654')] [2024-12-15 20:41:42,968][02478] Updated weights for policy 0, policy_version 860 (0.0033) [2024-12-15 20:41:47,686][00278] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3818.3). Total num frames: 3534848. Throughput: 0: 963.0. Samples: 884172. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-15 20:41:47,692][00278] Avg episode reward: [(0, '24.883')] [2024-12-15 20:41:52,686][00278] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3804.4). Total num frames: 3551232. Throughput: 0: 933.3. Samples: 886304. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-15 20:41:52,688][00278] Avg episode reward: [(0, '23.939')] [2024-12-15 20:41:54,569][02478] Updated weights for policy 0, policy_version 870 (0.0023) [2024-12-15 20:41:57,686][00278] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3832.2). Total num frames: 3575808. Throughput: 0: 944.7. Samples: 893094. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-15 20:41:57,691][00278] Avg episode reward: [(0, '24.690')] [2024-12-15 20:42:02,688][00278] Fps is (10 sec: 4504.8, 60 sec: 3891.1, 300 sec: 3818.3). Total num frames: 3596288. Throughput: 0: 989.1. Samples: 899154. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-15 20:42:02,691][00278] Avg episode reward: [(0, '26.271')] [2024-12-15 20:42:05,828][02478] Updated weights for policy 0, policy_version 880 (0.0025) [2024-12-15 20:42:07,686][00278] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3790.5). Total num frames: 3608576. Throughput: 0: 958.0. Samples: 901120. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-15 20:42:07,694][00278] Avg episode reward: [(0, '26.738')] [2024-12-15 20:42:12,686][00278] Fps is (10 sec: 3687.0, 60 sec: 3822.9, 300 sec: 3818.3). Total num frames: 3633152. Throughput: 0: 936.3. Samples: 907130. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-15 20:42:12,695][00278] Avg episode reward: [(0, '28.080')] [2024-12-15 20:42:15,122][02478] Updated weights for policy 0, policy_version 890 (0.0015) [2024-12-15 20:42:17,686][00278] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3818.3). Total num frames: 3653632. Throughput: 0: 996.2. Samples: 914030. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-15 20:42:17,689][00278] Avg episode reward: [(0, '27.829')] [2024-12-15 20:42:17,698][02465] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000892_3653632.pth... [2024-12-15 20:42:17,846][02465] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000671_2748416.pth [2024-12-15 20:42:22,686][00278] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3776.7). Total num frames: 3665920. Throughput: 0: 978.9. Samples: 915994. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-15 20:42:22,695][00278] Avg episode reward: [(0, '27.607')] [2024-12-15 20:42:26,754][02478] Updated weights for policy 0, policy_version 900 (0.0019) [2024-12-15 20:42:27,686][00278] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3804.4). Total num frames: 3690496. Throughput: 0: 935.7. Samples: 920984. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-15 20:42:27,692][00278] Avg episode reward: [(0, '29.667')] [2024-12-15 20:42:32,686][00278] Fps is (10 sec: 4505.6, 60 sec: 3959.8, 300 sec: 3818.3). Total num frames: 3710976. Throughput: 0: 976.0. Samples: 928092. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-15 20:42:32,693][00278] Avg episode reward: [(0, '28.058')] [2024-12-15 20:42:36,483][02478] Updated weights for policy 0, policy_version 910 (0.0034) [2024-12-15 20:42:37,686][00278] Fps is (10 sec: 3686.3, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 3727360. Throughput: 0: 999.8. Samples: 931294. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-15 20:42:37,691][00278] Avg episode reward: [(0, '26.867')] [2024-12-15 20:42:42,686][00278] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 3747840. Throughput: 0: 950.3. Samples: 935856. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-15 20:42:42,692][00278] Avg episode reward: [(0, '25.927')] [2024-12-15 20:42:46,439][02478] Updated weights for policy 0, policy_version 920 (0.0027) [2024-12-15 20:42:47,686][00278] Fps is (10 sec: 4505.7, 60 sec: 3959.5, 300 sec: 3818.3). Total num frames: 3772416. Throughput: 0: 977.9. Samples: 943156. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-15 20:42:47,692][00278] Avg episode reward: [(0, '25.057')] [2024-12-15 20:42:52,686][00278] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3818.3). Total num frames: 3792896. Throughput: 0: 1013.6. Samples: 946730. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-15 20:42:52,688][00278] Avg episode reward: [(0, '23.628')] [2024-12-15 20:42:57,248][02478] Updated weights for policy 0, policy_version 930 (0.0032) [2024-12-15 20:42:57,686][00278] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3790.5). Total num frames: 3809280. Throughput: 0: 989.1. Samples: 951640. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-15 20:42:57,690][00278] Avg episode reward: [(0, '23.459')] [2024-12-15 20:43:02,686][00278] Fps is (10 sec: 4096.0, 60 sec: 3959.6, 300 sec: 3832.2). Total num frames: 3833856. Throughput: 0: 983.1. Samples: 958270. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-15 20:43:02,693][00278] Avg episode reward: [(0, '25.319')] [2024-12-15 20:43:05,786][02478] Updated weights for policy 0, policy_version 940 (0.0023) [2024-12-15 20:43:07,686][00278] Fps is (10 sec: 4915.2, 60 sec: 4164.3, 300 sec: 3846.1). Total num frames: 3858432. Throughput: 0: 1021.9. Samples: 961980. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-15 20:43:07,694][00278] Avg episode reward: [(0, '26.104')] [2024-12-15 20:43:12,693][00278] Fps is (10 sec: 4093.3, 60 sec: 4027.3, 300 sec: 3818.2). Total num frames: 3874816. Throughput: 0: 1041.7. Samples: 967866. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-15 20:43:12,697][00278] Avg episode reward: [(0, '25.892')] [2024-12-15 20:43:16,774][02478] Updated weights for policy 0, policy_version 950 (0.0013) [2024-12-15 20:43:17,686][00278] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3832.2). Total num frames: 3895296. Throughput: 0: 1004.5. Samples: 973296. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-15 20:43:17,690][00278] Avg episode reward: [(0, '27.076')] [2024-12-15 20:43:22,686][00278] Fps is (10 sec: 4098.7, 60 sec: 4164.3, 300 sec: 3846.1). Total num frames: 3915776. Throughput: 0: 1016.6. Samples: 977040. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-15 20:43:22,694][00278] Avg episode reward: [(0, '28.503')] [2024-12-15 20:43:25,717][02478] Updated weights for policy 0, policy_version 960 (0.0014) [2024-12-15 20:43:27,686][00278] Fps is (10 sec: 4095.8, 60 sec: 4096.0, 300 sec: 3832.2). Total num frames: 3936256. Throughput: 0: 1063.6. Samples: 983720. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-15 20:43:27,689][00278] Avg episode reward: [(0, '26.662')] [2024-12-15 20:43:32,686][00278] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3818.3). Total num frames: 3952640. Throughput: 0: 1002.6. Samples: 988272. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-15 20:43:32,688][00278] Avg episode reward: [(0, '26.277')] [2024-12-15 20:43:36,381][02478] Updated weights for policy 0, policy_version 970 (0.0034) [2024-12-15 20:43:37,686][00278] Fps is (10 sec: 4096.2, 60 sec: 4164.3, 300 sec: 3860.0). Total num frames: 3977216. Throughput: 0: 1005.2. Samples: 991964. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-15 20:43:37,688][00278] Avg episode reward: [(0, '27.207')] [2024-12-15 20:43:42,686][00278] Fps is (10 sec: 4915.2, 60 sec: 4232.5, 300 sec: 3873.8). Total num frames: 4001792. Throughput: 0: 1058.8. Samples: 999288. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-15 20:43:42,692][00278] Avg episode reward: [(0, '27.252')] [2024-12-15 20:43:43,964][02465] Stopping Batcher_0... [2024-12-15 20:43:43,966][02465] Loop batcher_evt_loop terminating... [2024-12-15 20:43:43,966][00278] Component Batcher_0 stopped! [2024-12-15 20:43:43,973][02465] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-12-15 20:43:44,064][02478] Weights refcount: 2 0 [2024-12-15 20:43:44,082][02478] Stopping InferenceWorker_p0-w0... [2024-12-15 20:43:44,083][02478] Loop inference_proc0-0_evt_loop terminating... [2024-12-15 20:43:44,078][00278] Component InferenceWorker_p0-w0 stopped! [2024-12-15 20:43:44,160][02465] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000782_3203072.pth [2024-12-15 20:43:44,188][02465] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-12-15 20:43:44,432][00278] Component LearnerWorker_p0 stopped! [2024-12-15 20:43:44,432][02465] Stopping LearnerWorker_p0... [2024-12-15 20:43:44,439][02465] Loop learner_proc0_evt_loop terminating... [2024-12-15 20:43:44,542][02486] Stopping RolloutWorker_w6... [2024-12-15 20:43:44,547][02486] Loop rollout_proc6_evt_loop terminating... [2024-12-15 20:43:44,542][00278] Component RolloutWorker_w6 stopped! [2024-12-15 20:43:44,580][00278] Component RolloutWorker_w2 stopped! [2024-12-15 20:43:44,586][02482] Stopping RolloutWorker_w2... [2024-12-15 20:43:44,596][02482] Loop rollout_proc2_evt_loop terminating... [2024-12-15 20:43:44,605][00278] Component RolloutWorker_w4 stopped! [2024-12-15 20:43:44,611][02481] Stopping RolloutWorker_w4... [2024-12-15 20:43:44,612][02481] Loop rollout_proc4_evt_loop terminating... [2024-12-15 20:43:44,617][00278] Component RolloutWorker_w0 stopped! [2024-12-15 20:43:44,617][02479] Stopping RolloutWorker_w0... [2024-12-15 20:43:44,627][02479] Loop rollout_proc0_evt_loop terminating... [2024-12-15 20:43:44,698][02485] Stopping RolloutWorker_w7... [2024-12-15 20:43:44,699][02485] Loop rollout_proc7_evt_loop terminating... [2024-12-15 20:43:44,699][00278] Component RolloutWorker_w7 stopped! [2024-12-15 20:43:44,717][02480] Stopping RolloutWorker_w1... [2024-12-15 20:43:44,718][00278] Component RolloutWorker_w1 stopped! [2024-12-15 20:43:44,727][02484] Stopping RolloutWorker_w5... [2024-12-15 20:43:44,718][02480] Loop rollout_proc1_evt_loop terminating... [2024-12-15 20:43:44,729][00278] Component RolloutWorker_w5 stopped! [2024-12-15 20:43:44,733][02484] Loop rollout_proc5_evt_loop terminating... [2024-12-15 20:43:44,782][00278] Component RolloutWorker_w3 stopped! [2024-12-15 20:43:44,784][00278] Waiting for process learner_proc0 to stop... [2024-12-15 20:43:44,786][02483] Stopping RolloutWorker_w3... [2024-12-15 20:43:44,787][02483] Loop rollout_proc3_evt_loop terminating... [2024-12-15 20:43:46,471][00278] Waiting for process inference_proc0-0 to join... [2024-12-15 20:43:46,593][00278] Waiting for process rollout_proc0 to join... [2024-12-15 20:43:48,870][00278] Waiting for process rollout_proc1 to join... [2024-12-15 20:43:48,879][00278] Waiting for process rollout_proc2 to join... [2024-12-15 20:43:48,882][00278] Waiting for process rollout_proc3 to join... [2024-12-15 20:43:48,887][00278] Waiting for process rollout_proc4 to join... [2024-12-15 20:43:48,891][00278] Waiting for process rollout_proc5 to join... [2024-12-15 20:43:48,895][00278] Waiting for process rollout_proc6 to join... [2024-12-15 20:43:48,898][00278] Waiting for process rollout_proc7 to join... [2024-12-15 20:43:48,903][00278] Batcher 0 profile tree view: batching: 26.4105, releasing_batches: 0.0320 [2024-12-15 20:43:48,904][00278] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0000 wait_policy_total: 404.3615 update_model: 8.5416 weight_update: 0.0021 one_step: 0.0096 handle_policy_step: 560.1052 deserialize: 14.4102, stack: 3.0957, obs_to_device_normalize: 118.9161, forward: 280.8124, send_messages: 28.2729 prepare_outputs: 86.0716 to_cpu: 51.7627 [2024-12-15 20:43:48,909][00278] Learner 0 profile tree view: misc: 0.0050, prepare_batch: 13.6771 train: 73.3869 epoch_init: 0.0133, minibatch_init: 0.0064, losses_postprocess: 0.6890, kl_divergence: 0.6345, after_optimizer: 34.4440 calculate_losses: 25.4639 losses_init: 0.0037, forward_head: 1.2805, bptt_initial: 17.0496, tail: 1.0236, advantages_returns: 0.2828, losses: 3.6655 bptt: 1.8253 bptt_forward_core: 1.7034 update: 11.5296 clip: 0.8750 [2024-12-15 20:43:48,911][00278] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.3353, enqueue_policy_requests: 94.4144, env_step: 797.5351, overhead: 12.7211, complete_rollouts: 6.6299 save_policy_outputs: 19.9375 split_output_tensors: 8.0333 [2024-12-15 20:43:48,914][00278] RolloutWorker_w7 profile tree view: wait_for_trajectories: 0.3148, enqueue_policy_requests: 94.3721, env_step: 792.0238, overhead: 12.3217, complete_rollouts: 6.8114 save_policy_outputs: 20.2472 split_output_tensors: 8.1897 [2024-12-15 20:43:48,915][00278] Loop Runner_EvtLoop terminating... [2024-12-15 20:43:48,916][00278] Runner profile tree view: main_loop: 1045.2524 [2024-12-15 20:43:48,917][00278] Collected {0: 4005888}, FPS: 3832.5 [2024-12-15 20:43:55,549][00278] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-12-15 20:43:55,550][00278] Overriding arg 'num_workers' with value 1 passed from command line [2024-12-15 20:43:55,553][00278] Adding new argument 'no_render'=True that is not in the saved config file! [2024-12-15 20:43:55,555][00278] Adding new argument 'save_video'=True that is not in the saved config file! [2024-12-15 20:43:55,557][00278] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-12-15 20:43:55,559][00278] Adding new argument 'video_name'=None that is not in the saved config file! [2024-12-15 20:43:55,560][00278] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! [2024-12-15 20:43:55,561][00278] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-12-15 20:43:55,562][00278] Adding new argument 'push_to_hub'=False that is not in the saved config file! [2024-12-15 20:43:55,563][00278] Adding new argument 'hf_repository'=None that is not in the saved config file! [2024-12-15 20:43:55,564][00278] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-12-15 20:43:55,565][00278] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-12-15 20:43:55,566][00278] Adding new argument 'train_script'=None that is not in the saved config file! [2024-12-15 20:43:55,567][00278] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-12-15 20:43:55,568][00278] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-12-15 20:43:55,602][00278] Doom resolution: 160x120, resize resolution: (128, 72) [2024-12-15 20:43:55,606][00278] RunningMeanStd input shape: (3, 72, 128) [2024-12-15 20:43:55,608][00278] RunningMeanStd input shape: (1,) [2024-12-15 20:43:55,624][00278] ConvEncoder: input_channels=3 [2024-12-15 20:43:55,747][00278] Conv encoder output size: 512 [2024-12-15 20:43:55,749][00278] Policy head output size: 512 [2024-12-15 20:43:56,017][00278] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-12-15 20:43:56,799][00278] Num frames 100... [2024-12-15 20:43:56,925][00278] Num frames 200... [2024-12-15 20:43:57,096][00278] Num frames 300... [2024-12-15 20:43:57,263][00278] Num frames 400... [2024-12-15 20:43:57,430][00278] Num frames 500... [2024-12-15 20:43:57,599][00278] Num frames 600... [2024-12-15 20:43:57,773][00278] Num frames 700... [2024-12-15 20:43:57,936][00278] Num frames 800... [2024-12-15 20:43:58,101][00278] Num frames 900... [2024-12-15 20:43:58,266][00278] Num frames 1000... [2024-12-15 20:43:58,443][00278] Num frames 1100... [2024-12-15 20:43:58,537][00278] Avg episode rewards: #0: 23.200, true rewards: #0: 11.200 [2024-12-15 20:43:58,539][00278] Avg episode reward: 23.200, avg true_objective: 11.200 [2024-12-15 20:43:58,685][00278] Num frames 1200... [2024-12-15 20:43:58,866][00278] Num frames 1300... [2024-12-15 20:43:59,039][00278] Num frames 1400... [2024-12-15 20:43:59,122][00278] Avg episode rewards: #0: 13.065, true rewards: #0: 7.065 [2024-12-15 20:43:59,123][00278] Avg episode reward: 13.065, avg true_objective: 7.065 [2024-12-15 20:43:59,269][00278] Num frames 1500... [2024-12-15 20:43:59,409][00278] Num frames 1600... [2024-12-15 20:43:59,531][00278] Num frames 1700... [2024-12-15 20:43:59,659][00278] Num frames 1800... [2024-12-15 20:43:59,784][00278] Num frames 1900... [2024-12-15 20:43:59,904][00278] Num frames 2000... [2024-12-15 20:44:00,059][00278] Avg episode rewards: #0: 13.283, true rewards: #0: 6.950 [2024-12-15 20:44:00,060][00278] Avg episode reward: 13.283, avg true_objective: 6.950 [2024-12-15 20:44:00,082][00278] Num frames 2100... [2024-12-15 20:44:00,202][00278] Num frames 2200... [2024-12-15 20:44:00,324][00278] Num frames 2300... [2024-12-15 20:44:00,448][00278] Num frames 2400... [2024-12-15 20:44:00,567][00278] Num frames 2500... [2024-12-15 20:44:00,705][00278] Num frames 2600... [2024-12-15 20:44:00,823][00278] Num frames 2700... [2024-12-15 20:44:00,948][00278] Num frames 2800... [2024-12-15 20:44:01,070][00278] Num frames 2900... [2024-12-15 20:44:01,190][00278] Num frames 3000... [2024-12-15 20:44:01,313][00278] Num frames 3100... [2024-12-15 20:44:01,435][00278] Num frames 3200... [2024-12-15 20:44:01,553][00278] Num frames 3300... [2024-12-15 20:44:01,683][00278] Num frames 3400... [2024-12-15 20:44:01,765][00278] Avg episode rewards: #0: 18.045, true rewards: #0: 8.545 [2024-12-15 20:44:01,767][00278] Avg episode reward: 18.045, avg true_objective: 8.545 [2024-12-15 20:44:01,868][00278] Num frames 3500... [2024-12-15 20:44:01,986][00278] Num frames 3600... [2024-12-15 20:44:02,106][00278] Num frames 3700... [2024-12-15 20:44:02,224][00278] Num frames 3800... [2024-12-15 20:44:02,345][00278] Num frames 3900... [2024-12-15 20:44:02,515][00278] Avg episode rewards: #0: 16.188, true rewards: #0: 7.988 [2024-12-15 20:44:02,516][00278] Avg episode reward: 16.188, avg true_objective: 7.988 [2024-12-15 20:44:02,526][00278] Num frames 4000... [2024-12-15 20:44:02,645][00278] Num frames 4100... [2024-12-15 20:44:02,780][00278] Num frames 4200... [2024-12-15 20:44:02,899][00278] Num frames 4300... [2024-12-15 20:44:03,020][00278] Num frames 4400... [2024-12-15 20:44:03,141][00278] Num frames 4500... [2024-12-15 20:44:03,262][00278] Num frames 4600... [2024-12-15 20:44:03,382][00278] Num frames 4700... [2024-12-15 20:44:03,502][00278] Num frames 4800... [2024-12-15 20:44:03,620][00278] Num frames 4900... [2024-12-15 20:44:03,752][00278] Num frames 5000... [2024-12-15 20:44:03,875][00278] Num frames 5100... [2024-12-15 20:44:03,997][00278] Num frames 5200... [2024-12-15 20:44:04,138][00278] Avg episode rewards: #0: 18.457, true rewards: #0: 8.790 [2024-12-15 20:44:04,140][00278] Avg episode reward: 18.457, avg true_objective: 8.790 [2024-12-15 20:44:04,172][00278] Num frames 5300... [2024-12-15 20:44:04,291][00278] Num frames 5400... [2024-12-15 20:44:04,415][00278] Num frames 5500... [2024-12-15 20:44:04,534][00278] Num frames 5600... [2024-12-15 20:44:04,659][00278] Num frames 5700... [2024-12-15 20:44:04,785][00278] Num frames 5800... [2024-12-15 20:44:04,913][00278] Num frames 5900... [2024-12-15 20:44:05,028][00278] Num frames 6000... [2024-12-15 20:44:05,149][00278] Num frames 6100... [2024-12-15 20:44:05,268][00278] Num frames 6200... [2024-12-15 20:44:05,393][00278] Num frames 6300... [2024-12-15 20:44:05,514][00278] Num frames 6400... [2024-12-15 20:44:05,636][00278] Num frames 6500... [2024-12-15 20:44:05,760][00278] Num frames 6600... [2024-12-15 20:44:05,890][00278] Num frames 6700... [2024-12-15 20:44:06,010][00278] Num frames 6800... [2024-12-15 20:44:06,136][00278] Num frames 6900... [2024-12-15 20:44:06,258][00278] Num frames 7000... [2024-12-15 20:44:06,392][00278] Num frames 7100... [2024-12-15 20:44:06,530][00278] Num frames 7200... [2024-12-15 20:44:06,678][00278] Num frames 7300... [2024-12-15 20:44:06,821][00278] Avg episode rewards: #0: 23.963, true rewards: #0: 10.534 [2024-12-15 20:44:06,822][00278] Avg episode reward: 23.963, avg true_objective: 10.534 [2024-12-15 20:44:06,864][00278] Num frames 7400... [2024-12-15 20:44:06,985][00278] Num frames 7500... [2024-12-15 20:44:07,106][00278] Num frames 7600... [2024-12-15 20:44:07,224][00278] Num frames 7700... [2024-12-15 20:44:07,348][00278] Num frames 7800... [2024-12-15 20:44:07,468][00278] Num frames 7900... [2024-12-15 20:44:07,588][00278] Num frames 8000... [2024-12-15 20:44:07,714][00278] Num frames 8100... [2024-12-15 20:44:07,811][00278] Avg episode rewards: #0: 23.044, true rewards: #0: 10.169 [2024-12-15 20:44:07,812][00278] Avg episode reward: 23.044, avg true_objective: 10.169 [2024-12-15 20:44:07,900][00278] Num frames 8200... [2024-12-15 20:44:08,016][00278] Num frames 8300... [2024-12-15 20:44:08,138][00278] Num frames 8400... [2024-12-15 20:44:08,255][00278] Num frames 8500... [2024-12-15 20:44:08,379][00278] Num frames 8600... [2024-12-15 20:44:08,498][00278] Num frames 8700... [2024-12-15 20:44:08,622][00278] Num frames 8800... [2024-12-15 20:44:08,750][00278] Num frames 8900... [2024-12-15 20:44:08,871][00278] Num frames 9000... [2024-12-15 20:44:08,999][00278] Num frames 9100... [2024-12-15 20:44:09,136][00278] Num frames 9200... [2024-12-15 20:44:09,253][00278] Num frames 9300... [2024-12-15 20:44:09,406][00278] Num frames 9400... [2024-12-15 20:44:09,581][00278] Num frames 9500... [2024-12-15 20:44:09,754][00278] Num frames 9600... [2024-12-15 20:44:09,902][00278] Avg episode rewards: #0: 24.061, true rewards: #0: 10.728 [2024-12-15 20:44:09,907][00278] Avg episode reward: 24.061, avg true_objective: 10.728 [2024-12-15 20:44:10,001][00278] Num frames 9700... [2024-12-15 20:44:10,168][00278] Num frames 9800... [2024-12-15 20:44:10,332][00278] Num frames 9900... [2024-12-15 20:44:10,493][00278] Num frames 10000... [2024-12-15 20:44:10,661][00278] Num frames 10100... [2024-12-15 20:44:10,833][00278] Num frames 10200... [2024-12-15 20:44:11,010][00278] Num frames 10300... [2024-12-15 20:44:11,183][00278] Num frames 10400... [2024-12-15 20:44:11,361][00278] Num frames 10500... [2024-12-15 20:44:11,532][00278] Num frames 10600... [2024-12-15 20:44:11,711][00278] Num frames 10700... [2024-12-15 20:44:11,852][00278] Num frames 10800... [2024-12-15 20:44:11,970][00278] Num frames 10900... [2024-12-15 20:44:12,097][00278] Num frames 11000... [2024-12-15 20:44:12,220][00278] Num frames 11100... [2024-12-15 20:44:12,343][00278] Num frames 11200... [2024-12-15 20:44:12,466][00278] Num frames 11300... [2024-12-15 20:44:12,586][00278] Num frames 11400... [2024-12-15 20:44:12,714][00278] Num frames 11500... [2024-12-15 20:44:12,836][00278] Num frames 11600... [2024-12-15 20:44:12,958][00278] Num frames 11700... [2024-12-15 20:44:13,082][00278] Avg episode rewards: #0: 26.755, true rewards: #0: 11.755 [2024-12-15 20:44:13,084][00278] Avg episode reward: 26.755, avg true_objective: 11.755 [2024-12-15 20:45:17,691][00278] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2024-12-15 20:48:25,413][00278] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-12-15 20:48:25,414][00278] Overriding arg 'num_workers' with value 1 passed from command line [2024-12-15 20:48:25,416][00278] Adding new argument 'no_render'=True that is not in the saved config file! [2024-12-15 20:48:25,418][00278] Adding new argument 'save_video'=True that is not in the saved config file! [2024-12-15 20:48:25,420][00278] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-12-15 20:48:25,422][00278] Adding new argument 'video_name'=None that is not in the saved config file! [2024-12-15 20:48:25,424][00278] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2024-12-15 20:48:25,426][00278] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-12-15 20:48:25,427][00278] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2024-12-15 20:48:25,429][00278] Adding new argument 'hf_repository'='lodist/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2024-12-15 20:48:25,430][00278] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-12-15 20:48:25,432][00278] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-12-15 20:48:25,433][00278] Adding new argument 'train_script'=None that is not in the saved config file! [2024-12-15 20:48:25,434][00278] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-12-15 20:48:25,436][00278] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-12-15 20:48:25,463][00278] RunningMeanStd input shape: (3, 72, 128) [2024-12-15 20:48:25,465][00278] RunningMeanStd input shape: (1,) [2024-12-15 20:48:25,477][00278] ConvEncoder: input_channels=3 [2024-12-15 20:48:25,513][00278] Conv encoder output size: 512 [2024-12-15 20:48:25,514][00278] Policy head output size: 512 [2024-12-15 20:48:25,535][00278] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-12-15 20:48:25,970][00278] Num frames 100... [2024-12-15 20:48:26,091][00278] Num frames 200... [2024-12-15 20:48:26,213][00278] Num frames 300... [2024-12-15 20:48:26,337][00278] Num frames 400... [2024-12-15 20:48:26,456][00278] Num frames 500... [2024-12-15 20:48:26,577][00278] Num frames 600... [2024-12-15 20:48:26,750][00278] Avg episode rewards: #0: 13.890, true rewards: #0: 6.890 [2024-12-15 20:48:26,752][00278] Avg episode reward: 13.890, avg true_objective: 6.890 [2024-12-15 20:48:26,767][00278] Num frames 700... [2024-12-15 20:48:26,888][00278] Num frames 800... [2024-12-15 20:48:27,010][00278] Num frames 900... [2024-12-15 20:48:27,130][00278] Num frames 1000... [2024-12-15 20:48:27,273][00278] Avg episode rewards: #0: 10.365, true rewards: #0: 5.365 [2024-12-15 20:48:27,275][00278] Avg episode reward: 10.365, avg true_objective: 5.365 [2024-12-15 20:48:27,311][00278] Num frames 1100... [2024-12-15 20:48:27,429][00278] Num frames 1200... [2024-12-15 20:48:27,554][00278] Num frames 1300... [2024-12-15 20:48:27,681][00278] Num frames 1400... [2024-12-15 20:48:27,815][00278] Num frames 1500... [2024-12-15 20:48:27,939][00278] Num frames 1600... [2024-12-15 20:48:28,058][00278] Num frames 1700... [2024-12-15 20:48:28,176][00278] Num frames 1800... [2024-12-15 20:48:28,299][00278] Num frames 1900... [2024-12-15 20:48:28,424][00278] Num frames 2000... [2024-12-15 20:48:28,498][00278] Avg episode rewards: #0: 15.380, true rewards: #0: 6.713 [2024-12-15 20:48:28,500][00278] Avg episode reward: 15.380, avg true_objective: 6.713 [2024-12-15 20:48:28,601][00278] Num frames 2100... [2024-12-15 20:48:28,736][00278] Num frames 2200... [2024-12-15 20:48:28,858][00278] Num frames 2300... [2024-12-15 20:48:28,979][00278] Num frames 2400... [2024-12-15 20:48:29,097][00278] Num frames 2500... [2024-12-15 20:48:29,217][00278] Num frames 2600... [2024-12-15 20:48:29,344][00278] Num frames 2700... [2024-12-15 20:48:29,463][00278] Num frames 2800... [2024-12-15 20:48:29,583][00278] Num frames 2900... [2024-12-15 20:48:29,711][00278] Num frames 3000... [2024-12-15 20:48:29,839][00278] Num frames 3100... [2024-12-15 20:48:29,958][00278] Num frames 3200... [2024-12-15 20:48:30,079][00278] Num frames 3300... [2024-12-15 20:48:30,196][00278] Num frames 3400... [2024-12-15 20:48:30,317][00278] Num frames 3500... [2024-12-15 20:48:30,436][00278] Num frames 3600... [2024-12-15 20:48:30,562][00278] Avg episode rewards: #0: 21.400, true rewards: #0: 9.150 [2024-12-15 20:48:30,564][00278] Avg episode reward: 21.400, avg true_objective: 9.150 [2024-12-15 20:48:30,613][00278] Num frames 3700... [2024-12-15 20:48:30,737][00278] Num frames 3800... [2024-12-15 20:48:30,891][00278] Num frames 3900... [2024-12-15 20:48:31,061][00278] Num frames 4000... [2024-12-15 20:48:31,225][00278] Num frames 4100... [2024-12-15 20:48:31,393][00278] Num frames 4200... [2024-12-15 20:48:31,555][00278] Num frames 4300... [2024-12-15 20:48:31,729][00278] Num frames 4400... [2024-12-15 20:48:31,888][00278] Avg episode rewards: #0: 20.120, true rewards: #0: 8.920 [2024-12-15 20:48:31,889][00278] Avg episode reward: 20.120, avg true_objective: 8.920 [2024-12-15 20:48:31,957][00278] Num frames 4500... [2024-12-15 20:48:32,120][00278] Num frames 4600... [2024-12-15 20:48:32,286][00278] Num frames 4700... [2024-12-15 20:48:32,459][00278] Num frames 4800... [2024-12-15 20:48:32,626][00278] Num frames 4900... [2024-12-15 20:48:32,797][00278] Num frames 5000... [2024-12-15 20:48:32,977][00278] Num frames 5100... [2024-12-15 20:48:33,150][00278] Num frames 5200... [2024-12-15 20:48:33,324][00278] Num frames 5300... [2024-12-15 20:48:33,451][00278] Num frames 5400... [2024-12-15 20:48:33,568][00278] Num frames 5500... [2024-12-15 20:48:33,695][00278] Num frames 5600... [2024-12-15 20:48:33,814][00278] Num frames 5700... [2024-12-15 20:48:33,942][00278] Num frames 5800... [2024-12-15 20:48:34,061][00278] Num frames 5900... [2024-12-15 20:48:34,184][00278] Num frames 6000... [2024-12-15 20:48:34,303][00278] Num frames 6100... [2024-12-15 20:48:34,424][00278] Num frames 6200... [2024-12-15 20:48:34,543][00278] Num frames 6300... [2024-12-15 20:48:34,674][00278] Num frames 6400... [2024-12-15 20:48:34,796][00278] Num frames 6500... [2024-12-15 20:48:34,920][00278] Avg episode rewards: #0: 26.765, true rewards: #0: 10.932 [2024-12-15 20:48:34,922][00278] Avg episode reward: 26.765, avg true_objective: 10.932 [2024-12-15 20:48:34,978][00278] Num frames 6600... [2024-12-15 20:48:35,096][00278] Num frames 6700... [2024-12-15 20:48:35,215][00278] Num frames 6800... [2024-12-15 20:48:35,332][00278] Num frames 6900... [2024-12-15 20:48:35,448][00278] Num frames 7000... [2024-12-15 20:48:35,570][00278] Num frames 7100... [2024-12-15 20:48:35,695][00278] Num frames 7200... [2024-12-15 20:48:35,824][00278] Avg episode rewards: #0: 24.518, true rewards: #0: 10.376 [2024-12-15 20:48:35,827][00278] Avg episode reward: 24.518, avg true_objective: 10.376 [2024-12-15 20:48:35,874][00278] Num frames 7300... [2024-12-15 20:48:36,005][00278] Num frames 7400... [2024-12-15 20:48:36,121][00278] Num frames 7500... [2024-12-15 20:48:36,240][00278] Num frames 7600... [2024-12-15 20:48:36,362][00278] Num frames 7700... [2024-12-15 20:48:36,480][00278] Num frames 7800... [2024-12-15 20:48:36,604][00278] Num frames 7900... [2024-12-15 20:48:36,784][00278] Avg episode rewards: #0: 22.999, true rewards: #0: 9.999 [2024-12-15 20:48:36,786][00278] Avg episode reward: 22.999, avg true_objective: 9.999 [2024-12-15 20:48:36,789][00278] Num frames 8000... [2024-12-15 20:48:36,912][00278] Num frames 8100... [2024-12-15 20:48:37,041][00278] Num frames 8200... [2024-12-15 20:48:37,169][00278] Num frames 8300... [2024-12-15 20:48:37,291][00278] Num frames 8400... [2024-12-15 20:48:37,423][00278] Num frames 8500... [2024-12-15 20:48:37,551][00278] Num frames 8600... [2024-12-15 20:48:37,690][00278] Num frames 8700... [2024-12-15 20:48:37,815][00278] Num frames 8800... [2024-12-15 20:48:37,941][00278] Num frames 8900... [2024-12-15 20:48:38,075][00278] Num frames 9000... [2024-12-15 20:48:38,199][00278] Num frames 9100... [2024-12-15 20:48:38,319][00278] Num frames 9200... [2024-12-15 20:48:38,445][00278] Num frames 9300... [2024-12-15 20:48:38,529][00278] Avg episode rewards: #0: 24.359, true rewards: #0: 10.359 [2024-12-15 20:48:38,531][00278] Avg episode reward: 24.359, avg true_objective: 10.359 [2024-12-15 20:48:38,629][00278] Num frames 9400... [2024-12-15 20:48:38,770][00278] Num frames 9500... [2024-12-15 20:48:38,897][00278] Num frames 9600... [2024-12-15 20:48:39,031][00278] Num frames 9700... [2024-12-15 20:48:39,150][00278] Num frames 9800... [2024-12-15 20:48:39,274][00278] Num frames 9900... [2024-12-15 20:48:39,399][00278] Num frames 10000... [2024-12-15 20:48:39,532][00278] Num frames 10100... [2024-12-15 20:48:39,658][00278] Num frames 10200... [2024-12-15 20:48:39,788][00278] Num frames 10300... [2024-12-15 20:48:39,910][00278] Num frames 10400... [2024-12-15 20:48:40,037][00278] Num frames 10500... [2024-12-15 20:48:40,167][00278] Num frames 10600... [2024-12-15 20:48:40,295][00278] Num frames 10700... [2024-12-15 20:48:40,429][00278] Num frames 10800... [2024-12-15 20:48:40,550][00278] Num frames 10900... [2024-12-15 20:48:40,683][00278] Num frames 11000... [2024-12-15 20:48:40,807][00278] Num frames 11100... [2024-12-15 20:48:40,935][00278] Num frames 11200... [2024-12-15 20:48:41,063][00278] Num frames 11300... [2024-12-15 20:48:41,192][00278] Num frames 11400... [2024-12-15 20:48:41,268][00278] Avg episode rewards: #0: 27.516, true rewards: #0: 11.416 [2024-12-15 20:48:41,269][00278] Avg episode reward: 27.516, avg true_objective: 11.416 [2024-12-15 20:49:44,106][00278] Replay video saved to /content/train_dir/default_experiment/replay.mp4!