[2024-10-08 03:44:08,357][04317] Saving configuration to /content/train_dir/default_experiment/config.json... [2024-10-08 03:44:08,359][04317] Rollout worker 0 uses device cpu [2024-10-08 03:44:08,360][04317] Rollout worker 1 uses device cpu [2024-10-08 03:44:08,362][04317] Rollout worker 2 uses device cpu [2024-10-08 03:44:08,363][04317] Rollout worker 3 uses device cpu [2024-10-08 03:44:08,365][04317] Rollout worker 4 uses device cpu [2024-10-08 03:44:08,366][04317] Rollout worker 5 uses device cpu [2024-10-08 03:44:08,367][04317] Rollout worker 6 uses device cpu [2024-10-08 03:44:08,368][04317] Rollout worker 7 uses device cpu [2024-10-08 03:44:08,423][04317] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-10-08 03:44:08,424][04317] InferenceWorker_p0-w0: min num requests: 2 [2024-10-08 03:44:08,458][04317] Starting all processes... [2024-10-08 03:44:08,459][04317] Starting process learner_proc0 [2024-10-08 03:44:08,508][04317] Starting all processes... [2024-10-08 03:44:08,513][04317] Starting process inference_proc0-0 [2024-10-08 03:44:08,514][04317] Starting process rollout_proc0 [2024-10-08 03:44:08,516][04317] Starting process rollout_proc1 [2024-10-08 03:44:08,520][04317] Starting process rollout_proc2 [2024-10-08 03:44:08,520][04317] Starting process rollout_proc3 [2024-10-08 03:44:08,521][04317] Starting process rollout_proc4 [2024-10-08 03:44:08,521][04317] Starting process rollout_proc5 [2024-10-08 03:44:08,521][04317] Starting process rollout_proc6 [2024-10-08 03:44:08,521][04317] Starting process rollout_proc7 [2024-10-08 03:44:10,457][07032] Worker 6 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] [2024-10-08 03:44:10,614][07027] Worker 0 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] [2024-10-08 03:44:10,633][07031] Worker 5 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] [2024-10-08 03:44:10,680][07008] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-10-08 03:44:10,681][07008] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2024-10-08 03:44:10,699][07008] Num visible devices: 1 [2024-10-08 03:44:10,731][07008] Starting seed is not provided [2024-10-08 03:44:10,732][07008] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-10-08 03:44:10,732][07008] Initializing actor-critic model on device cuda:0 [2024-10-08 03:44:10,732][07008] RunningMeanStd input shape: (3, 72, 128) [2024-10-08 03:44:10,734][07008] RunningMeanStd input shape: (1,) [2024-10-08 03:44:10,754][07008] ConvEncoder: input_channels=3 [2024-10-08 03:44:10,764][07029] Worker 2 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] [2024-10-08 03:44:10,803][07021] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-10-08 03:44:10,803][07021] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2024-10-08 03:44:10,810][07022] Worker 1 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] [2024-10-08 03:44:10,819][07021] Num visible devices: 1 [2024-10-08 03:44:10,846][07030] Worker 4 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] [2024-10-08 03:44:10,909][07044] Worker 7 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] [2024-10-08 03:44:10,940][07028] Worker 3 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] [2024-10-08 03:44:10,947][07008] Conv encoder output size: 512 [2024-10-08 03:44:10,948][07008] Policy head output size: 512 [2024-10-08 03:44:10,965][07008] Created Actor Critic model with architecture: [2024-10-08 03:44:10,965][07008] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2024-10-08 03:44:12,800][07008] Using optimizer [2024-10-08 03:44:12,800][07008] No checkpoints found [2024-10-08 03:44:12,801][07008] Did not load from checkpoint, starting from scratch! [2024-10-08 03:44:12,801][07008] Initialized policy 0 weights for model version 0 [2024-10-08 03:44:12,803][07008] LearnerWorker_p0 finished initialization! [2024-10-08 03:44:12,803][07008] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-10-08 03:44:12,878][07021] RunningMeanStd input shape: (3, 72, 128) [2024-10-08 03:44:12,879][07021] RunningMeanStd input shape: (1,) [2024-10-08 03:44:12,891][07021] ConvEncoder: input_channels=3 [2024-10-08 03:44:12,995][07021] Conv encoder output size: 512 [2024-10-08 03:44:12,995][07021] Policy head output size: 512 [2024-10-08 03:44:14,748][04317] Inference worker 0-0 is ready! [2024-10-08 03:44:14,749][04317] All inference workers are ready! Signal rollout workers to start! [2024-10-08 03:44:14,763][07031] Doom resolution: 160x120, resize resolution: (128, 72) [2024-10-08 03:44:14,764][07032] Doom resolution: 160x120, resize resolution: (128, 72) [2024-10-08 03:44:14,764][07030] Doom resolution: 160x120, resize resolution: (128, 72) [2024-10-08 03:44:14,764][07028] Doom resolution: 160x120, resize resolution: (128, 72) [2024-10-08 03:44:14,768][07029] Doom resolution: 160x120, resize resolution: (128, 72) [2024-10-08 03:44:14,768][07027] Doom resolution: 160x120, resize resolution: (128, 72) [2024-10-08 03:44:14,769][07022] Doom resolution: 160x120, resize resolution: (128, 72) [2024-10-08 03:44:14,769][07044] Doom resolution: 160x120, resize resolution: (128, 72) [2024-10-08 03:44:15,090][07029] Decorrelating experience for 0 frames... [2024-10-08 03:44:15,090][07032] Decorrelating experience for 0 frames... [2024-10-08 03:44:15,090][07022] Decorrelating experience for 0 frames... [2024-10-08 03:44:15,091][07027] Decorrelating experience for 0 frames... [2024-10-08 03:44:15,091][07031] Decorrelating experience for 0 frames... [2024-10-08 03:44:15,091][07030] Decorrelating experience for 0 frames... [2024-10-08 03:44:15,341][07028] Decorrelating experience for 0 frames... [2024-10-08 03:44:15,354][07022] Decorrelating experience for 32 frames... [2024-10-08 03:44:15,355][07029] Decorrelating experience for 32 frames... [2024-10-08 03:44:15,357][07027] Decorrelating experience for 32 frames... [2024-10-08 03:44:15,358][07044] Decorrelating experience for 0 frames... [2024-10-08 03:44:15,505][04317] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-10-08 03:44:15,590][07030] Decorrelating experience for 32 frames... [2024-10-08 03:44:15,610][07044] Decorrelating experience for 32 frames... [2024-10-08 03:44:15,612][07028] Decorrelating experience for 32 frames... [2024-10-08 03:44:15,643][07032] Decorrelating experience for 32 frames... [2024-10-08 03:44:15,671][07022] Decorrelating experience for 64 frames... [2024-10-08 03:44:15,767][07031] Decorrelating experience for 32 frames... [2024-10-08 03:44:15,788][07027] Decorrelating experience for 64 frames... [2024-10-08 03:44:15,883][07029] Decorrelating experience for 64 frames... [2024-10-08 03:44:15,917][07028] Decorrelating experience for 64 frames... [2024-10-08 03:44:15,919][07044] Decorrelating experience for 64 frames... [2024-10-08 03:44:15,962][07022] Decorrelating experience for 96 frames... [2024-10-08 03:44:16,082][07031] Decorrelating experience for 64 frames... [2024-10-08 03:44:16,099][07027] Decorrelating experience for 96 frames... [2024-10-08 03:44:16,160][07030] Decorrelating experience for 64 frames... [2024-10-08 03:44:16,185][07029] Decorrelating experience for 96 frames... [2024-10-08 03:44:16,377][07031] Decorrelating experience for 96 frames... [2024-10-08 03:44:16,417][07028] Decorrelating experience for 96 frames... [2024-10-08 03:44:16,446][07030] Decorrelating experience for 96 frames... [2024-10-08 03:44:16,472][07044] Decorrelating experience for 96 frames... [2024-10-08 03:44:16,727][07032] Decorrelating experience for 64 frames... [2024-10-08 03:44:17,117][07032] Decorrelating experience for 96 frames... [2024-10-08 03:44:17,315][07008] Signal inference workers to stop experience collection... [2024-10-08 03:44:17,330][07021] InferenceWorker_p0-w0: stopping experience collection [2024-10-08 03:44:18,345][07008] Signal inference workers to resume experience collection... [2024-10-08 03:44:18,346][07021] InferenceWorker_p0-w0: resuming experience collection [2024-10-08 03:44:20,505][04317] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6553.6). Total num frames: 32768. Throughput: 0: 592.8. Samples: 2964. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-10-08 03:44:20,508][04317] Avg episode reward: [(0, '3.746')] [2024-10-08 03:44:20,783][07021] Updated weights for policy 0, policy_version 10 (0.0369) [2024-10-08 03:44:22,786][07021] Updated weights for policy 0, policy_version 20 (0.0012) [2024-10-08 03:44:24,901][07021] Updated weights for policy 0, policy_version 30 (0.0012) [2024-10-08 03:44:25,505][04317] Fps is (10 sec: 13516.8, 60 sec: 13516.8, 300 sec: 13516.8). Total num frames: 135168. Throughput: 0: 2542.6. Samples: 25426. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-10-08 03:44:25,507][04317] Avg episode reward: [(0, '4.579')] [2024-10-08 03:44:25,515][07008] Saving new best policy, reward=4.579! [2024-10-08 03:44:26,869][07021] Updated weights for policy 0, policy_version 40 (0.0012) [2024-10-08 03:44:28,417][04317] Heartbeat connected on Batcher_0 [2024-10-08 03:44:28,430][04317] Heartbeat connected on InferenceWorker_p0-w0 [2024-10-08 03:44:28,432][04317] Heartbeat connected on RolloutWorker_w0 [2024-10-08 03:44:28,436][04317] Heartbeat connected on RolloutWorker_w1 [2024-10-08 03:44:28,440][04317] Heartbeat connected on RolloutWorker_w2 [2024-10-08 03:44:28,443][04317] Heartbeat connected on RolloutWorker_w3 [2024-10-08 03:44:28,446][04317] Heartbeat connected on LearnerWorker_p0 [2024-10-08 03:44:28,448][04317] Heartbeat connected on RolloutWorker_w4 [2024-10-08 03:44:28,452][04317] Heartbeat connected on RolloutWorker_w5 [2024-10-08 03:44:28,454][04317] Heartbeat connected on RolloutWorker_w6 [2024-10-08 03:44:28,459][04317] Heartbeat connected on RolloutWorker_w7 [2024-10-08 03:44:28,843][07021] Updated weights for policy 0, policy_version 50 (0.0012) [2024-10-08 03:44:30,505][04317] Fps is (10 sec: 20480.0, 60 sec: 15837.9, 300 sec: 15837.9). Total num frames: 237568. Throughput: 0: 3761.3. Samples: 56420. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-10-08 03:44:30,507][04317] Avg episode reward: [(0, '4.508')] [2024-10-08 03:44:30,856][07021] Updated weights for policy 0, policy_version 60 (0.0011) [2024-10-08 03:44:32,980][07021] Updated weights for policy 0, policy_version 70 (0.0012) [2024-10-08 03:44:35,035][07021] Updated weights for policy 0, policy_version 80 (0.0011) [2024-10-08 03:44:35,505][04317] Fps is (10 sec: 20070.1, 60 sec: 16793.5, 300 sec: 16793.5). Total num frames: 335872. Throughput: 0: 3565.4. Samples: 71308. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-10-08 03:44:35,508][04317] Avg episode reward: [(0, '4.285')] [2024-10-08 03:44:37,035][07021] Updated weights for policy 0, policy_version 90 (0.0011) [2024-10-08 03:44:39,028][07021] Updated weights for policy 0, policy_version 100 (0.0011) [2024-10-08 03:44:40,505][04317] Fps is (10 sec: 20070.4, 60 sec: 17530.9, 300 sec: 17530.9). Total num frames: 438272. Throughput: 0: 4072.9. Samples: 101822. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-10-08 03:44:40,508][04317] Avg episode reward: [(0, '4.472')] [2024-10-08 03:44:41,013][07021] Updated weights for policy 0, policy_version 110 (0.0011) [2024-10-08 03:44:42,979][07021] Updated weights for policy 0, policy_version 120 (0.0011) [2024-10-08 03:44:45,049][07021] Updated weights for policy 0, policy_version 130 (0.0011) [2024-10-08 03:44:45,505][04317] Fps is (10 sec: 20480.4, 60 sec: 18022.4, 300 sec: 18022.4). Total num frames: 540672. Throughput: 0: 4413.2. Samples: 132396. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-10-08 03:44:45,507][04317] Avg episode reward: [(0, '4.779')] [2024-10-08 03:44:45,516][07008] Saving new best policy, reward=4.779! [2024-10-08 03:44:47,184][07021] Updated weights for policy 0, policy_version 140 (0.0012) [2024-10-08 03:44:49,265][07021] Updated weights for policy 0, policy_version 150 (0.0011) [2024-10-08 03:44:50,505][04317] Fps is (10 sec: 20070.4, 60 sec: 18256.5, 300 sec: 18256.5). Total num frames: 638976. Throughput: 0: 4196.3. Samples: 146872. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-10-08 03:44:50,507][04317] Avg episode reward: [(0, '4.530')] [2024-10-08 03:44:51,250][07021] Updated weights for policy 0, policy_version 160 (0.0011) [2024-10-08 03:44:53,333][07021] Updated weights for policy 0, policy_version 170 (0.0012) [2024-10-08 03:44:55,487][07021] Updated weights for policy 0, policy_version 180 (0.0012) [2024-10-08 03:44:55,505][04317] Fps is (10 sec: 19660.7, 60 sec: 18432.0, 300 sec: 18432.0). Total num frames: 737280. Throughput: 0: 4420.9. Samples: 176836. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-10-08 03:44:55,507][04317] Avg episode reward: [(0, '4.479')] [2024-10-08 03:44:57,496][07021] Updated weights for policy 0, policy_version 190 (0.0011) [2024-10-08 03:44:59,493][07021] Updated weights for policy 0, policy_version 200 (0.0011) [2024-10-08 03:45:00,505][04317] Fps is (10 sec: 19660.8, 60 sec: 18568.6, 300 sec: 18568.6). Total num frames: 835584. Throughput: 0: 4598.9. Samples: 206950. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-10-08 03:45:00,507][04317] Avg episode reward: [(0, '4.344')] [2024-10-08 03:45:01,606][07021] Updated weights for policy 0, policy_version 210 (0.0011) [2024-10-08 03:45:03,666][07021] Updated weights for policy 0, policy_version 220 (0.0011) [2024-10-08 03:45:05,505][04317] Fps is (10 sec: 20070.3, 60 sec: 18759.7, 300 sec: 18759.7). Total num frames: 937984. Throughput: 0: 4857.5. Samples: 221550. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-10-08 03:45:05,508][04317] Avg episode reward: [(0, '4.417')] [2024-10-08 03:45:05,651][07021] Updated weights for policy 0, policy_version 230 (0.0011) [2024-10-08 03:45:07,624][07021] Updated weights for policy 0, policy_version 240 (0.0011) [2024-10-08 03:45:09,616][07021] Updated weights for policy 0, policy_version 250 (0.0011) [2024-10-08 03:45:10,505][04317] Fps is (10 sec: 20480.2, 60 sec: 18916.1, 300 sec: 18916.1). Total num frames: 1040384. Throughput: 0: 5043.5. Samples: 252382. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-10-08 03:45:10,507][04317] Avg episode reward: [(0, '4.859')] [2024-10-08 03:45:10,510][07008] Saving new best policy, reward=4.859! [2024-10-08 03:45:11,606][07021] Updated weights for policy 0, policy_version 260 (0.0011) [2024-10-08 03:45:13,636][07021] Updated weights for policy 0, policy_version 270 (0.0012) [2024-10-08 03:45:15,505][04317] Fps is (10 sec: 20070.4, 60 sec: 18978.1, 300 sec: 18978.1). Total num frames: 1138688. Throughput: 0: 5030.4. Samples: 282786. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-10-08 03:45:15,508][04317] Avg episode reward: [(0, '4.682')] [2024-10-08 03:45:15,735][07021] Updated weights for policy 0, policy_version 280 (0.0012) [2024-10-08 03:45:17,787][07021] Updated weights for policy 0, policy_version 290 (0.0011) [2024-10-08 03:45:19,788][07021] Updated weights for policy 0, policy_version 300 (0.0011) [2024-10-08 03:45:20,505][04317] Fps is (10 sec: 20069.9, 60 sec: 20138.6, 300 sec: 19093.6). Total num frames: 1241088. Throughput: 0: 5030.0. Samples: 297656. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-10-08 03:45:20,508][04317] Avg episode reward: [(0, '4.792')] [2024-10-08 03:45:21,798][07021] Updated weights for policy 0, policy_version 310 (0.0011) [2024-10-08 03:45:23,776][07021] Updated weights for policy 0, policy_version 320 (0.0011) [2024-10-08 03:45:25,505][04317] Fps is (10 sec: 20480.1, 60 sec: 20138.7, 300 sec: 19192.7). Total num frames: 1343488. Throughput: 0: 5035.3. Samples: 328412. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-10-08 03:45:25,507][04317] Avg episode reward: [(0, '4.972')] [2024-10-08 03:45:25,515][07008] Saving new best policy, reward=4.972! [2024-10-08 03:45:25,823][07021] Updated weights for policy 0, policy_version 330 (0.0012) [2024-10-08 03:45:27,865][07021] Updated weights for policy 0, policy_version 340 (0.0012) [2024-10-08 03:45:29,985][07021] Updated weights for policy 0, policy_version 350 (0.0011) [2024-10-08 03:45:30,505][04317] Fps is (10 sec: 20070.6, 60 sec: 20070.4, 300 sec: 19223.9). Total num frames: 1441792. Throughput: 0: 5016.9. Samples: 358158. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-10-08 03:45:30,507][04317] Avg episode reward: [(0, '4.786')] [2024-10-08 03:45:32,006][07021] Updated weights for policy 0, policy_version 360 (0.0011) [2024-10-08 03:45:34,012][07021] Updated weights for policy 0, policy_version 370 (0.0011) [2024-10-08 03:45:35,505][04317] Fps is (10 sec: 20070.4, 60 sec: 20138.7, 300 sec: 19302.4). Total num frames: 1544192. Throughput: 0: 5031.2. Samples: 373278. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-10-08 03:45:35,507][04317] Avg episode reward: [(0, '4.970')] [2024-10-08 03:45:36,002][07021] Updated weights for policy 0, policy_version 380 (0.0011) [2024-10-08 03:45:37,997][07021] Updated weights for policy 0, policy_version 390 (0.0011) [2024-10-08 03:45:39,995][07021] Updated weights for policy 0, policy_version 400 (0.0011) [2024-10-08 03:45:40,505][04317] Fps is (10 sec: 20480.0, 60 sec: 20138.7, 300 sec: 19371.7). Total num frames: 1646592. Throughput: 0: 5049.9. Samples: 404080. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-10-08 03:45:40,507][04317] Avg episode reward: [(0, '5.066')] [2024-10-08 03:45:40,509][07008] Saving new best policy, reward=5.066! [2024-10-08 03:45:42,046][07021] Updated weights for policy 0, policy_version 410 (0.0011) [2024-10-08 03:45:44,122][07021] Updated weights for policy 0, policy_version 420 (0.0011) [2024-10-08 03:45:45,505][04317] Fps is (10 sec: 20070.4, 60 sec: 20070.4, 300 sec: 19387.7). Total num frames: 1744896. Throughput: 0: 5047.1. Samples: 434070. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-10-08 03:45:45,508][04317] Avg episode reward: [(0, '5.450')] [2024-10-08 03:45:45,515][07008] Saving new best policy, reward=5.450! [2024-10-08 03:45:46,159][07021] Updated weights for policy 0, policy_version 430 (0.0011) [2024-10-08 03:45:48,154][07021] Updated weights for policy 0, policy_version 440 (0.0011) [2024-10-08 03:45:50,136][07021] Updated weights for policy 0, policy_version 450 (0.0011) [2024-10-08 03:45:50,505][04317] Fps is (10 sec: 20070.6, 60 sec: 20138.7, 300 sec: 19445.2). Total num frames: 1847296. Throughput: 0: 5062.3. Samples: 449354. Policy #0 lag: (min: 0.0, avg: 0.8, max: 1.0) [2024-10-08 03:45:50,507][04317] Avg episode reward: [(0, '7.312')] [2024-10-08 03:45:50,534][07008] Saving new best policy, reward=7.312! [2024-10-08 03:45:52,122][07021] Updated weights for policy 0, policy_version 460 (0.0011) [2024-10-08 03:45:54,103][07021] Updated weights for policy 0, policy_version 470 (0.0011) [2024-10-08 03:45:55,505][04317] Fps is (10 sec: 20889.5, 60 sec: 20275.2, 300 sec: 19537.9). Total num frames: 1953792. Throughput: 0: 5064.3. Samples: 480278. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-10-08 03:45:55,508][04317] Avg episode reward: [(0, '8.058')] [2024-10-08 03:45:55,515][07008] Saving new best policy, reward=8.058! [2024-10-08 03:45:56,098][07021] Updated weights for policy 0, policy_version 480 (0.0011) [2024-10-08 03:45:58,150][07021] Updated weights for policy 0, policy_version 490 (0.0012) [2024-10-08 03:46:00,178][07021] Updated weights for policy 0, policy_version 500 (0.0011) [2024-10-08 03:46:00,505][04317] Fps is (10 sec: 20479.8, 60 sec: 20275.2, 300 sec: 19543.8). Total num frames: 2052096. Throughput: 0: 5058.8. Samples: 510434. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-08 03:46:00,508][04317] Avg episode reward: [(0, '8.725')] [2024-10-08 03:46:00,510][07008] Saving new best policy, reward=8.725! [2024-10-08 03:46:02,167][07021] Updated weights for policy 0, policy_version 510 (0.0011) [2024-10-08 03:46:04,129][07021] Updated weights for policy 0, policy_version 520 (0.0011) [2024-10-08 03:46:05,505][04317] Fps is (10 sec: 20480.0, 60 sec: 20343.5, 300 sec: 19623.6). Total num frames: 2158592. Throughput: 0: 5072.6. Samples: 525922. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-10-08 03:46:05,508][04317] Avg episode reward: [(0, '10.800')] [2024-10-08 03:46:05,515][07008] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000527_2158592.pth... [2024-10-08 03:46:05,584][07008] Saving new best policy, reward=10.800! [2024-10-08 03:46:06,099][07021] Updated weights for policy 0, policy_version 530 (0.0011) [2024-10-08 03:46:08,054][07021] Updated weights for policy 0, policy_version 540 (0.0012) [2024-10-08 03:46:10,077][07021] Updated weights for policy 0, policy_version 550 (0.0011) [2024-10-08 03:46:10,505][04317] Fps is (10 sec: 20889.7, 60 sec: 20343.4, 300 sec: 19660.8). Total num frames: 2260992. Throughput: 0: 5081.2. Samples: 557066. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-10-08 03:46:10,508][04317] Avg episode reward: [(0, '10.034')] [2024-10-08 03:46:12,195][07021] Updated weights for policy 0, policy_version 560 (0.0011) [2024-10-08 03:46:14,225][07021] Updated weights for policy 0, policy_version 570 (0.0011) [2024-10-08 03:46:15,505][04317] Fps is (10 sec: 20070.4, 60 sec: 20343.5, 300 sec: 19660.8). Total num frames: 2359296. Throughput: 0: 5086.8. Samples: 587064. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-08 03:46:15,508][04317] Avg episode reward: [(0, '12.551')] [2024-10-08 03:46:15,515][07008] Saving new best policy, reward=12.551! [2024-10-08 03:46:16,207][07021] Updated weights for policy 0, policy_version 580 (0.0011) [2024-10-08 03:46:18,183][07021] Updated weights for policy 0, policy_version 590 (0.0011) [2024-10-08 03:46:20,143][07021] Updated weights for policy 0, policy_version 600 (0.0011) [2024-10-08 03:46:20,505][04317] Fps is (10 sec: 20070.4, 60 sec: 20343.5, 300 sec: 19693.6). Total num frames: 2461696. Throughput: 0: 5093.4. Samples: 602482. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-10-08 03:46:20,508][04317] Avg episode reward: [(0, '13.357')] [2024-10-08 03:46:20,534][07008] Saving new best policy, reward=13.357! [2024-10-08 03:46:22,131][07021] Updated weights for policy 0, policy_version 610 (0.0011) [2024-10-08 03:46:24,133][07021] Updated weights for policy 0, policy_version 620 (0.0011) [2024-10-08 03:46:25,505][04317] Fps is (10 sec: 20480.2, 60 sec: 20343.5, 300 sec: 19723.8). Total num frames: 2564096. Throughput: 0: 5098.9. Samples: 633530. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-08 03:46:25,510][04317] Avg episode reward: [(0, '14.219')] [2024-10-08 03:46:25,519][07008] Saving new best policy, reward=14.219! [2024-10-08 03:46:26,237][07021] Updated weights for policy 0, policy_version 630 (0.0011) [2024-10-08 03:46:28,266][07021] Updated weights for policy 0, policy_version 640 (0.0011) [2024-10-08 03:46:30,247][07021] Updated weights for policy 0, policy_version 650 (0.0011) [2024-10-08 03:46:30,505][04317] Fps is (10 sec: 20479.9, 60 sec: 20411.7, 300 sec: 19751.8). Total num frames: 2666496. Throughput: 0: 5101.1. Samples: 663618. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-10-08 03:46:30,508][04317] Avg episode reward: [(0, '14.796')] [2024-10-08 03:46:30,510][07008] Saving new best policy, reward=14.796! [2024-10-08 03:46:32,196][07021] Updated weights for policy 0, policy_version 660 (0.0011) [2024-10-08 03:46:34,172][07021] Updated weights for policy 0, policy_version 670 (0.0012) [2024-10-08 03:46:35,505][04317] Fps is (10 sec: 20479.9, 60 sec: 20411.7, 300 sec: 19777.8). Total num frames: 2768896. Throughput: 0: 5110.9. Samples: 679344. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-10-08 03:46:35,507][04317] Avg episode reward: [(0, '16.917')] [2024-10-08 03:46:35,515][07008] Saving new best policy, reward=16.917! [2024-10-08 03:46:36,141][07021] Updated weights for policy 0, policy_version 680 (0.0011) [2024-10-08 03:46:38,172][07021] Updated weights for policy 0, policy_version 690 (0.0011) [2024-10-08 03:46:40,264][07021] Updated weights for policy 0, policy_version 700 (0.0011) [2024-10-08 03:46:40,505][04317] Fps is (10 sec: 20480.0, 60 sec: 20411.7, 300 sec: 19802.0). Total num frames: 2871296. Throughput: 0: 5107.3. Samples: 710108. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-08 03:46:40,508][04317] Avg episode reward: [(0, '17.158')] [2024-10-08 03:46:40,510][07008] Saving new best policy, reward=17.158! [2024-10-08 03:46:42,273][07021] Updated weights for policy 0, policy_version 710 (0.0011) [2024-10-08 03:46:44,254][07021] Updated weights for policy 0, policy_version 720 (0.0011) [2024-10-08 03:46:45,505][04317] Fps is (10 sec: 20480.1, 60 sec: 20480.0, 300 sec: 19824.7). Total num frames: 2973696. Throughput: 0: 5114.0. Samples: 740564. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-08 03:46:45,507][04317] Avg episode reward: [(0, '17.717')] [2024-10-08 03:46:45,516][07008] Saving new best policy, reward=17.717! [2024-10-08 03:46:46,237][07021] Updated weights for policy 0, policy_version 730 (0.0011) [2024-10-08 03:46:48,181][07021] Updated weights for policy 0, policy_version 740 (0.0011) [2024-10-08 03:46:50,161][07021] Updated weights for policy 0, policy_version 750 (0.0011) [2024-10-08 03:46:50,505][04317] Fps is (10 sec: 20480.0, 60 sec: 20480.0, 300 sec: 19845.8). Total num frames: 3076096. Throughput: 0: 5115.9. Samples: 756138. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-08 03:46:50,508][04317] Avg episode reward: [(0, '17.503')] [2024-10-08 03:46:52,150][07021] Updated weights for policy 0, policy_version 760 (0.0011) [2024-10-08 03:46:54,257][07021] Updated weights for policy 0, policy_version 770 (0.0012) [2024-10-08 03:46:55,505][04317] Fps is (10 sec: 20070.2, 60 sec: 20343.5, 300 sec: 19840.0). Total num frames: 3174400. Throughput: 0: 5103.7. Samples: 786734. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-10-08 03:46:55,507][04317] Avg episode reward: [(0, '20.499')] [2024-10-08 03:46:55,515][07008] Saving new best policy, reward=20.499! [2024-10-08 03:46:56,298][07021] Updated weights for policy 0, policy_version 780 (0.0011) [2024-10-08 03:46:58,263][07021] Updated weights for policy 0, policy_version 790 (0.0012) [2024-10-08 03:47:00,228][07021] Updated weights for policy 0, policy_version 800 (0.0011) [2024-10-08 03:47:00,505][04317] Fps is (10 sec: 20480.0, 60 sec: 20480.0, 300 sec: 19884.2). Total num frames: 3280896. Throughput: 0: 5120.5. Samples: 817488. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-10-08 03:47:00,508][04317] Avg episode reward: [(0, '20.218')] [2024-10-08 03:47:02,224][07021] Updated weights for policy 0, policy_version 810 (0.0011) [2024-10-08 03:47:04,180][07021] Updated weights for policy 0, policy_version 820 (0.0011) [2024-10-08 03:47:05,505][04317] Fps is (10 sec: 20889.7, 60 sec: 20411.7, 300 sec: 19901.7). Total num frames: 3383296. Throughput: 0: 5123.0. Samples: 833016. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-10-08 03:47:05,507][04317] Avg episode reward: [(0, '17.323')] [2024-10-08 03:47:06,164][07021] Updated weights for policy 0, policy_version 830 (0.0011) [2024-10-08 03:47:08,220][07021] Updated weights for policy 0, policy_version 840 (0.0011) [2024-10-08 03:47:10,279][07021] Updated weights for policy 0, policy_version 850 (0.0011) [2024-10-08 03:47:10,505][04317] Fps is (10 sec: 20479.9, 60 sec: 20411.7, 300 sec: 19918.3). Total num frames: 3485696. Throughput: 0: 5108.6. Samples: 863418. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-10-08 03:47:10,507][04317] Avg episode reward: [(0, '21.221')] [2024-10-08 03:47:10,510][07008] Saving new best policy, reward=21.221! [2024-10-08 03:47:12,259][07021] Updated weights for policy 0, policy_version 860 (0.0011) [2024-10-08 03:47:14,225][07021] Updated weights for policy 0, policy_version 870 (0.0011) [2024-10-08 03:47:15,505][04317] Fps is (10 sec: 20480.0, 60 sec: 20480.0, 300 sec: 19933.9). Total num frames: 3588096. Throughput: 0: 5129.9. Samples: 894462. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-10-08 03:47:15,508][04317] Avg episode reward: [(0, '21.242')] [2024-10-08 03:47:15,516][07008] Saving new best policy, reward=21.242! [2024-10-08 03:47:16,208][07021] Updated weights for policy 0, policy_version 880 (0.0012) [2024-10-08 03:47:18,209][07021] Updated weights for policy 0, policy_version 890 (0.0011) [2024-10-08 03:47:20,203][07021] Updated weights for policy 0, policy_version 900 (0.0011) [2024-10-08 03:47:20,505][04317] Fps is (10 sec: 20480.0, 60 sec: 20480.0, 300 sec: 19948.6). Total num frames: 3690496. Throughput: 0: 5124.3. Samples: 909936. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-10-08 03:47:20,507][04317] Avg episode reward: [(0, '21.015')] [2024-10-08 03:47:22,270][07021] Updated weights for policy 0, policy_version 910 (0.0011) [2024-10-08 03:47:24,295][07021] Updated weights for policy 0, policy_version 920 (0.0012) [2024-10-08 03:47:25,505][04317] Fps is (10 sec: 20479.9, 60 sec: 20480.0, 300 sec: 19962.6). Total num frames: 3792896. Throughput: 0: 5107.0. Samples: 939922. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-10-08 03:47:25,508][04317] Avg episode reward: [(0, '22.854')] [2024-10-08 03:47:25,516][07008] Saving new best policy, reward=22.854! [2024-10-08 03:47:26,256][07021] Updated weights for policy 0, policy_version 930 (0.0012) [2024-10-08 03:47:28,270][07021] Updated weights for policy 0, policy_version 940 (0.0011) [2024-10-08 03:47:30,224][07021] Updated weights for policy 0, policy_version 950 (0.0012) [2024-10-08 03:47:30,505][04317] Fps is (10 sec: 20479.9, 60 sec: 20480.0, 300 sec: 19975.9). Total num frames: 3895296. Throughput: 0: 5121.6. Samples: 971038. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-10-08 03:47:30,508][04317] Avg episode reward: [(0, '20.259')] [2024-10-08 03:47:32,193][07021] Updated weights for policy 0, policy_version 960 (0.0010) [2024-10-08 03:47:34,180][07021] Updated weights for policy 0, policy_version 970 (0.0011) [2024-10-08 03:47:35,505][04317] Fps is (10 sec: 20480.2, 60 sec: 20480.0, 300 sec: 19988.5). Total num frames: 3997696. Throughput: 0: 5123.8. Samples: 986708. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-10-08 03:47:35,508][04317] Avg episode reward: [(0, '19.869')] [2024-10-08 03:47:35,855][07008] Stopping Batcher_0... [2024-10-08 03:47:35,855][07008] Loop batcher_evt_loop terminating... [2024-10-08 03:47:35,856][07008] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-10-08 03:47:35,855][04317] Component Batcher_0 stopped! [2024-10-08 03:47:35,871][07021] Weights refcount: 2 0 [2024-10-08 03:47:35,873][07021] Stopping InferenceWorker_p0-w0... [2024-10-08 03:47:35,873][07021] Loop inference_proc0-0_evt_loop terminating... [2024-10-08 03:47:35,873][04317] Component InferenceWorker_p0-w0 stopped! [2024-10-08 03:47:35,911][07044] Stopping RolloutWorker_w7... [2024-10-08 03:47:35,912][07044] Loop rollout_proc7_evt_loop terminating... [2024-10-08 03:47:35,912][04317] Component RolloutWorker_w7 stopped! [2024-10-08 03:47:35,917][07031] Stopping RolloutWorker_w5... [2024-10-08 03:47:35,918][07031] Loop rollout_proc5_evt_loop terminating... [2024-10-08 03:47:35,918][07027] Stopping RolloutWorker_w0... [2024-10-08 03:47:35,918][07029] Stopping RolloutWorker_w2... [2024-10-08 03:47:35,918][07027] Loop rollout_proc0_evt_loop terminating... [2024-10-08 03:47:35,918][07029] Loop rollout_proc2_evt_loop terminating... [2024-10-08 03:47:35,917][04317] Component RolloutWorker_w5 stopped! [2024-10-08 03:47:35,919][07022] Stopping RolloutWorker_w1... [2024-10-08 03:47:35,920][07022] Loop rollout_proc1_evt_loop terminating... [2024-10-08 03:47:35,919][04317] Component RolloutWorker_w0 stopped! [2024-10-08 03:47:35,921][07030] Stopping RolloutWorker_w4... [2024-10-08 03:47:35,921][07032] Stopping RolloutWorker_w6... [2024-10-08 03:47:35,922][07030] Loop rollout_proc4_evt_loop terminating... [2024-10-08 03:47:35,922][07032] Loop rollout_proc6_evt_loop terminating... [2024-10-08 03:47:35,923][07028] Stopping RolloutWorker_w3... [2024-10-08 03:47:35,921][04317] Component RolloutWorker_w2 stopped! [2024-10-08 03:47:35,923][07028] Loop rollout_proc3_evt_loop terminating... [2024-10-08 03:47:35,923][04317] Component RolloutWorker_w1 stopped! [2024-10-08 03:47:35,925][04317] Component RolloutWorker_w4 stopped! [2024-10-08 03:47:35,926][04317] Component RolloutWorker_w6 stopped! [2024-10-08 03:47:35,928][04317] Component RolloutWorker_w3 stopped! [2024-10-08 03:47:35,938][07008] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-10-08 03:47:36,053][07008] Stopping LearnerWorker_p0... [2024-10-08 03:47:36,054][07008] Loop learner_proc0_evt_loop terminating... [2024-10-08 03:47:36,053][04317] Component LearnerWorker_p0 stopped! [2024-10-08 03:47:36,056][04317] Waiting for process learner_proc0 to stop... [2024-10-08 03:47:36,785][04317] Waiting for process inference_proc0-0 to join... [2024-10-08 03:47:36,788][04317] Waiting for process rollout_proc0 to join... [2024-10-08 03:47:36,790][04317] Waiting for process rollout_proc1 to join... [2024-10-08 03:47:36,792][04317] Waiting for process rollout_proc2 to join... [2024-10-08 03:47:36,794][04317] Waiting for process rollout_proc3 to join... [2024-10-08 03:47:36,796][04317] Waiting for process rollout_proc4 to join... [2024-10-08 03:47:36,798][04317] Waiting for process rollout_proc5 to join... [2024-10-08 03:47:36,800][04317] Waiting for process rollout_proc6 to join... [2024-10-08 03:47:36,801][04317] Waiting for process rollout_proc7 to join... [2024-10-08 03:47:36,803][04317] Batcher 0 profile tree view: batching: 15.9929, releasing_batches: 0.0206 [2024-10-08 03:47:36,804][04317] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0000 wait_policy_total: 3.7280 update_model: 3.1113 weight_update: 0.0011 one_step: 0.0023 handle_policy_step: 183.7676 deserialize: 7.5591, stack: 1.2014, obs_to_device_normalize: 44.3436, forward: 85.3254, send_messages: 13.4925 prepare_outputs: 23.7374 to_cpu: 15.2000 [2024-10-08 03:47:36,807][04317] Learner 0 profile tree view: misc: 0.0049, prepare_batch: 8.9619 train: 19.3725 epoch_init: 0.0057, minibatch_init: 0.0063, losses_postprocess: 0.3112, kl_divergence: 0.4220, after_optimizer: 1.4131 calculate_losses: 7.8730 losses_init: 0.0033, forward_head: 0.9183, bptt_initial: 3.5001, tail: 0.6437, advantages_returns: 0.1728, losses: 1.0377 bptt: 1.4136 bptt_forward_core: 1.3582 update: 8.9947 clip: 1.1159 [2024-10-08 03:47:36,808][04317] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.1452, enqueue_policy_requests: 7.6561, env_step: 125.3836, overhead: 6.2686, complete_rollouts: 0.2347 save_policy_outputs: 10.3628 split_output_tensors: 3.5845 [2024-10-08 03:47:36,809][04317] RolloutWorker_w7 profile tree view: wait_for_trajectories: 0.1455, enqueue_policy_requests: 7.6012, env_step: 125.4226, overhead: 6.2732, complete_rollouts: 0.2301 save_policy_outputs: 10.2584 split_output_tensors: 3.5595 [2024-10-08 03:47:36,815][04317] Loop Runner_EvtLoop terminating... [2024-10-08 03:47:36,816][04317] Runner profile tree view: main_loop: 208.3583 [2024-10-08 03:47:36,817][04317] Collected {0: 4005888}, FPS: 19226.0 [2024-10-08 03:47:47,529][04317] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-10-08 03:47:47,531][04317] Overriding arg 'num_workers' with value 1 passed from command line [2024-10-08 03:47:47,532][04317] Adding new argument 'no_render'=True that is not in the saved config file! [2024-10-08 03:47:47,535][04317] Adding new argument 'save_video'=True that is not in the saved config file! [2024-10-08 03:47:47,536][04317] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-10-08 03:47:47,537][04317] Adding new argument 'video_name'=None that is not in the saved config file! [2024-10-08 03:47:47,539][04317] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! [2024-10-08 03:47:47,540][04317] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-10-08 03:47:47,541][04317] Adding new argument 'push_to_hub'=False that is not in the saved config file! [2024-10-08 03:47:47,542][04317] Adding new argument 'hf_repository'=None that is not in the saved config file! [2024-10-08 03:47:47,544][04317] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-10-08 03:47:47,546][04317] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-10-08 03:47:47,547][04317] Adding new argument 'train_script'=None that is not in the saved config file! [2024-10-08 03:47:47,548][04317] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-10-08 03:47:47,550][04317] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-10-08 03:47:47,562][04317] Doom resolution: 160x120, resize resolution: (128, 72) [2024-10-08 03:47:47,565][04317] RunningMeanStd input shape: (3, 72, 128) [2024-10-08 03:47:47,567][04317] RunningMeanStd input shape: (1,) [2024-10-08 03:47:47,581][04317] ConvEncoder: input_channels=3 [2024-10-08 03:47:47,715][04317] Conv encoder output size: 512 [2024-10-08 03:47:47,717][04317] Policy head output size: 512 [2024-10-08 03:47:49,581][04317] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-10-08 03:47:50,495][04317] Num frames 100... [2024-10-08 03:47:50,611][04317] Num frames 200... [2024-10-08 03:47:50,725][04317] Num frames 300... [2024-10-08 03:47:50,842][04317] Num frames 400... [2024-10-08 03:47:50,955][04317] Num frames 500... [2024-10-08 03:47:51,097][04317] Avg episode rewards: #0: 10.760, true rewards: #0: 5.760 [2024-10-08 03:47:51,098][04317] Avg episode reward: 10.760, avg true_objective: 5.760 [2024-10-08 03:47:51,129][04317] Num frames 600... [2024-10-08 03:47:51,241][04317] Num frames 700... [2024-10-08 03:47:51,365][04317] Num frames 800... [2024-10-08 03:47:51,480][04317] Num frames 900... [2024-10-08 03:47:51,597][04317] Num frames 1000... [2024-10-08 03:47:51,711][04317] Num frames 1100... [2024-10-08 03:47:51,827][04317] Num frames 1200... [2024-10-08 03:47:51,942][04317] Num frames 1300... [2024-10-08 03:47:52,056][04317] Num frames 1400... [2024-10-08 03:47:52,171][04317] Num frames 1500... [2024-10-08 03:47:52,288][04317] Num frames 1600... [2024-10-08 03:47:52,406][04317] Num frames 1700... [2024-10-08 03:47:52,523][04317] Num frames 1800... [2024-10-08 03:47:52,641][04317] Num frames 1900... [2024-10-08 03:47:52,736][04317] Avg episode rewards: #0: 21.670, true rewards: #0: 9.670 [2024-10-08 03:47:52,737][04317] Avg episode reward: 21.670, avg true_objective: 9.670 [2024-10-08 03:47:52,814][04317] Num frames 2000... [2024-10-08 03:47:52,926][04317] Num frames 2100... [2024-10-08 03:47:53,083][04317] Avg episode rewards: #0: 15.300, true rewards: #0: 7.300 [2024-10-08 03:47:53,085][04317] Avg episode reward: 15.300, avg true_objective: 7.300 [2024-10-08 03:47:53,098][04317] Num frames 2200... [2024-10-08 03:47:53,210][04317] Num frames 2300... [2024-10-08 03:47:53,327][04317] Num frames 2400... [2024-10-08 03:47:53,440][04317] Num frames 2500... [2024-10-08 03:47:53,558][04317] Num frames 2600... [2024-10-08 03:47:53,674][04317] Num frames 2700... [2024-10-08 03:47:53,787][04317] Num frames 2800... [2024-10-08 03:47:53,899][04317] Num frames 2900... [2024-10-08 03:47:54,016][04317] Num frames 3000... [2024-10-08 03:47:54,132][04317] Num frames 3100... [2024-10-08 03:47:54,247][04317] Num frames 3200... [2024-10-08 03:47:54,364][04317] Num frames 3300... [2024-10-08 03:47:54,481][04317] Num frames 3400... [2024-10-08 03:47:54,599][04317] Num frames 3500... [2024-10-08 03:47:54,714][04317] Num frames 3600... [2024-10-08 03:47:54,829][04317] Num frames 3700... [2024-10-08 03:47:54,944][04317] Num frames 3800... [2024-10-08 03:47:55,062][04317] Num frames 3900... [2024-10-08 03:47:55,175][04317] Num frames 4000... [2024-10-08 03:47:55,292][04317] Num frames 4100... [2024-10-08 03:47:55,407][04317] Num frames 4200... [2024-10-08 03:47:55,564][04317] Avg episode rewards: #0: 25.725, true rewards: #0: 10.725 [2024-10-08 03:47:55,565][04317] Avg episode reward: 25.725, avg true_objective: 10.725 [2024-10-08 03:47:55,578][04317] Num frames 4300... [2024-10-08 03:47:55,690][04317] Num frames 4400... [2024-10-08 03:47:55,803][04317] Num frames 4500... [2024-10-08 03:47:55,916][04317] Num frames 4600... [2024-10-08 03:47:56,032][04317] Num frames 4700... [2024-10-08 03:47:56,132][04317] Avg episode rewards: #0: 22.076, true rewards: #0: 9.476 [2024-10-08 03:47:56,133][04317] Avg episode reward: 22.076, avg true_objective: 9.476 [2024-10-08 03:47:56,206][04317] Num frames 4800... [2024-10-08 03:47:56,323][04317] Num frames 4900... [2024-10-08 03:47:56,441][04317] Num frames 5000... [2024-10-08 03:47:56,599][04317] Avg episode rewards: #0: 19.150, true rewards: #0: 8.483 [2024-10-08 03:47:56,600][04317] Avg episode reward: 19.150, avg true_objective: 8.483 [2024-10-08 03:47:56,615][04317] Num frames 5100... [2024-10-08 03:47:56,729][04317] Num frames 5200... [2024-10-08 03:47:56,842][04317] Num frames 5300... [2024-10-08 03:47:56,955][04317] Num frames 5400... [2024-10-08 03:47:57,072][04317] Num frames 5500... [2024-10-08 03:47:57,184][04317] Num frames 5600... [2024-10-08 03:47:57,316][04317] Avg episode rewards: #0: 17.951, true rewards: #0: 8.094 [2024-10-08 03:47:57,317][04317] Avg episode reward: 17.951, avg true_objective: 8.094 [2024-10-08 03:47:57,359][04317] Num frames 5700... [2024-10-08 03:47:57,474][04317] Num frames 5800... [2024-10-08 03:47:57,589][04317] Num frames 5900... [2024-10-08 03:47:57,703][04317] Num frames 6000... [2024-10-08 03:47:57,818][04317] Num frames 6100... [2024-10-08 03:47:57,933][04317] Num frames 6200... [2024-10-08 03:47:58,036][04317] Avg episode rewards: #0: 16.802, true rewards: #0: 7.802 [2024-10-08 03:47:58,037][04317] Avg episode reward: 16.802, avg true_objective: 7.802 [2024-10-08 03:47:58,104][04317] Num frames 6300... [2024-10-08 03:47:58,220][04317] Num frames 6400... [2024-10-08 03:47:58,341][04317] Num frames 6500... [2024-10-08 03:47:58,459][04317] Num frames 6600... [2024-10-08 03:47:58,575][04317] Num frames 6700... [2024-10-08 03:47:58,691][04317] Num frames 6800... [2024-10-08 03:47:58,813][04317] Num frames 6900... [2024-10-08 03:47:58,939][04317] Num frames 7000... [2024-10-08 03:47:59,068][04317] Num frames 7100... [2024-10-08 03:47:59,188][04317] Num frames 7200... [2024-10-08 03:47:59,309][04317] Num frames 7300... [2024-10-08 03:47:59,431][04317] Num frames 7400... [2024-10-08 03:47:59,550][04317] Num frames 7500... [2024-10-08 03:47:59,667][04317] Num frames 7600... [2024-10-08 03:47:59,791][04317] Num frames 7700... [2024-10-08 03:47:59,955][04317] Avg episode rewards: #0: 19.323, true rewards: #0: 8.657 [2024-10-08 03:47:59,957][04317] Avg episode reward: 19.323, avg true_objective: 8.657 [2024-10-08 03:47:59,970][04317] Num frames 7800... [2024-10-08 03:48:00,083][04317] Num frames 7900... [2024-10-08 03:48:00,195][04317] Num frames 8000... [2024-10-08 03:48:00,311][04317] Num frames 8100... [2024-10-08 03:48:00,429][04317] Num frames 8200... [2024-10-08 03:48:00,546][04317] Num frames 8300... [2024-10-08 03:48:00,662][04317] Num frames 8400... [2024-10-08 03:48:00,780][04317] Num frames 8500... [2024-10-08 03:48:00,898][04317] Num frames 8600... [2024-10-08 03:48:01,015][04317] Num frames 8700... [2024-10-08 03:48:01,132][04317] Num frames 8800... [2024-10-08 03:48:01,251][04317] Num frames 8900... [2024-10-08 03:48:01,376][04317] Num frames 9000... [2024-10-08 03:48:01,497][04317] Num frames 9100... [2024-10-08 03:48:01,620][04317] Num frames 9200... [2024-10-08 03:48:01,745][04317] Num frames 9300... [2024-10-08 03:48:01,864][04317] Num frames 9400... [2024-10-08 03:48:01,985][04317] Avg episode rewards: #0: 21.455, true rewards: #0: 9.455 [2024-10-08 03:48:01,986][04317] Avg episode reward: 21.455, avg true_objective: 9.455 [2024-10-08 03:48:24,521][04317] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2024-10-08 03:50:12,684][04317] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-10-08 03:50:12,686][04317] Overriding arg 'num_workers' with value 1 passed from command line [2024-10-08 03:50:12,687][04317] Adding new argument 'no_render'=True that is not in the saved config file! [2024-10-08 03:50:12,689][04317] Adding new argument 'save_video'=True that is not in the saved config file! [2024-10-08 03:50:12,691][04317] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-10-08 03:50:12,692][04317] Adding new argument 'video_name'=None that is not in the saved config file! [2024-10-08 03:50:12,694][04317] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2024-10-08 03:50:12,695][04317] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-10-08 03:50:12,697][04317] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2024-10-08 03:50:12,699][04317] Adding new argument 'hf_repository'='EntropicLettuce/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2024-10-08 03:50:12,700][04317] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-10-08 03:50:12,702][04317] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-10-08 03:50:12,703][04317] Adding new argument 'train_script'=None that is not in the saved config file! [2024-10-08 03:50:12,705][04317] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-10-08 03:50:12,707][04317] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-10-08 03:50:12,714][04317] RunningMeanStd input shape: (3, 72, 128) [2024-10-08 03:50:12,716][04317] RunningMeanStd input shape: (1,) [2024-10-08 03:50:12,728][04317] ConvEncoder: input_channels=3 [2024-10-08 03:50:12,764][04317] Conv encoder output size: 512 [2024-10-08 03:50:12,765][04317] Policy head output size: 512 [2024-10-08 03:50:12,786][04317] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-10-08 03:50:13,263][04317] Num frames 100... [2024-10-08 03:50:13,376][04317] Num frames 200... [2024-10-08 03:50:13,488][04317] Num frames 300... [2024-10-08 03:50:13,599][04317] Num frames 400... [2024-10-08 03:50:13,716][04317] Num frames 500... [2024-10-08 03:50:13,830][04317] Num frames 600... [2024-10-08 03:50:13,942][04317] Num frames 700... [2024-10-08 03:50:14,056][04317] Num frames 800... [2024-10-08 03:50:14,168][04317] Num frames 900... [2024-10-08 03:50:14,285][04317] Num frames 1000... [2024-10-08 03:50:14,398][04317] Num frames 1100... [2024-10-08 03:50:14,513][04317] Num frames 1200... [2024-10-08 03:50:14,631][04317] Num frames 1300... [2024-10-08 03:50:14,746][04317] Num frames 1400... [2024-10-08 03:50:14,860][04317] Num frames 1500... [2024-10-08 03:50:14,977][04317] Num frames 1600... [2024-10-08 03:50:15,094][04317] Num frames 1700... [2024-10-08 03:50:15,181][04317] Avg episode rewards: #0: 47.279, true rewards: #0: 17.280 [2024-10-08 03:50:15,182][04317] Avg episode reward: 47.279, avg true_objective: 17.280 [2024-10-08 03:50:15,265][04317] Num frames 1800... [2024-10-08 03:50:15,378][04317] Num frames 1900... [2024-10-08 03:50:15,491][04317] Num frames 2000... [2024-10-08 03:50:15,603][04317] Num frames 2100... [2024-10-08 03:50:15,723][04317] Num frames 2200... [2024-10-08 03:50:15,839][04317] Num frames 2300... [2024-10-08 03:50:15,952][04317] Num frames 2400... [2024-10-08 03:50:16,070][04317] Num frames 2500... [2024-10-08 03:50:16,185][04317] Num frames 2600... [2024-10-08 03:50:16,301][04317] Num frames 2700... [2024-10-08 03:50:16,427][04317] Num frames 2800... [2024-10-08 03:50:16,541][04317] Num frames 2900... [2024-10-08 03:50:16,653][04317] Num frames 3000... [2024-10-08 03:50:16,790][04317] Avg episode rewards: #0: 38.845, true rewards: #0: 15.345 [2024-10-08 03:50:16,791][04317] Avg episode reward: 38.845, avg true_objective: 15.345 [2024-10-08 03:50:16,828][04317] Num frames 3100... [2024-10-08 03:50:16,944][04317] Num frames 3200... [2024-10-08 03:50:17,061][04317] Num frames 3300... [2024-10-08 03:50:17,174][04317] Num frames 3400... [2024-10-08 03:50:17,289][04317] Num frames 3500... [2024-10-08 03:50:17,401][04317] Num frames 3600... [2024-10-08 03:50:17,515][04317] Num frames 3700... [2024-10-08 03:50:17,630][04317] Num frames 3800... [2024-10-08 03:50:17,755][04317] Num frames 3900... [2024-10-08 03:50:17,877][04317] Num frames 4000... [2024-10-08 03:50:17,991][04317] Num frames 4100... [2024-10-08 03:50:18,110][04317] Num frames 4200... [2024-10-08 03:50:18,223][04317] Num frames 4300... [2024-10-08 03:50:18,335][04317] Num frames 4400... [2024-10-08 03:50:18,455][04317] Num frames 4500... [2024-10-08 03:50:18,578][04317] Num frames 4600... [2024-10-08 03:50:18,694][04317] Num frames 4700... [2024-10-08 03:50:18,813][04317] Num frames 4800... [2024-10-08 03:50:18,932][04317] Num frames 4900... [2024-10-08 03:50:19,089][04317] Avg episode rewards: #0: 42.296, true rewards: #0: 16.630 [2024-10-08 03:50:19,091][04317] Avg episode reward: 42.296, avg true_objective: 16.630 [2024-10-08 03:50:19,107][04317] Num frames 5000... [2024-10-08 03:50:19,232][04317] Num frames 5100... [2024-10-08 03:50:19,352][04317] Num frames 5200... [2024-10-08 03:50:19,468][04317] Num frames 5300... [2024-10-08 03:50:19,582][04317] Num frames 5400... [2024-10-08 03:50:19,697][04317] Num frames 5500... [2024-10-08 03:50:19,813][04317] Num frames 5600... [2024-10-08 03:50:19,925][04317] Num frames 5700... [2024-10-08 03:50:20,051][04317] Num frames 5800... [2024-10-08 03:50:20,171][04317] Num frames 5900... [2024-10-08 03:50:20,286][04317] Num frames 6000... [2024-10-08 03:50:20,404][04317] Num frames 6100... [2024-10-08 03:50:20,522][04317] Num frames 6200... [2024-10-08 03:50:20,640][04317] Num frames 6300... [2024-10-08 03:50:20,754][04317] Num frames 6400... [2024-10-08 03:50:20,870][04317] Num frames 6500... [2024-10-08 03:50:20,985][04317] Num frames 6600... [2024-10-08 03:50:21,102][04317] Num frames 6700... [2024-10-08 03:50:21,224][04317] Num frames 6800... [2024-10-08 03:50:21,342][04317] Num frames 6900... [2024-10-08 03:50:21,460][04317] Num frames 7000... [2024-10-08 03:50:21,621][04317] Avg episode rewards: #0: 44.722, true rewards: #0: 17.723 [2024-10-08 03:50:21,623][04317] Avg episode reward: 44.722, avg true_objective: 17.723 [2024-10-08 03:50:21,637][04317] Num frames 7100... [2024-10-08 03:50:21,750][04317] Num frames 7200... [2024-10-08 03:50:21,862][04317] Num frames 7300... [2024-10-08 03:50:21,975][04317] Num frames 7400... [2024-10-08 03:50:22,089][04317] Num frames 7500... [2024-10-08 03:50:22,222][04317] Avg episode rewards: #0: 37.537, true rewards: #0: 15.138 [2024-10-08 03:50:22,223][04317] Avg episode reward: 37.537, avg true_objective: 15.138 [2024-10-08 03:50:22,258][04317] Num frames 7600... [2024-10-08 03:50:22,376][04317] Num frames 7700... [2024-10-08 03:50:22,492][04317] Num frames 7800... [2024-10-08 03:50:22,608][04317] Num frames 7900... [2024-10-08 03:50:22,724][04317] Num frames 8000... [2024-10-08 03:50:22,838][04317] Num frames 8100... [2024-10-08 03:50:22,951][04317] Num frames 8200... [2024-10-08 03:50:23,068][04317] Num frames 8300... [2024-10-08 03:50:23,184][04317] Num frames 8400... [2024-10-08 03:50:23,301][04317] Num frames 8500... [2024-10-08 03:50:23,418][04317] Num frames 8600... [2024-10-08 03:50:23,537][04317] Num frames 8700... [2024-10-08 03:50:23,656][04317] Num frames 8800... [2024-10-08 03:50:23,774][04317] Num frames 8900... [2024-10-08 03:50:23,890][04317] Num frames 9000... [2024-10-08 03:50:24,006][04317] Num frames 9100... [2024-10-08 03:50:24,125][04317] Num frames 9200... [2024-10-08 03:50:24,241][04317] Num frames 9300... [2024-10-08 03:50:24,357][04317] Num frames 9400... [2024-10-08 03:50:24,474][04317] Num frames 9500... [2024-10-08 03:50:24,592][04317] Num frames 9600... [2024-10-08 03:50:24,730][04317] Avg episode rewards: #0: 40.448, true rewards: #0: 16.115 [2024-10-08 03:50:24,731][04317] Avg episode reward: 40.448, avg true_objective: 16.115 [2024-10-08 03:50:24,770][04317] Num frames 9700... [2024-10-08 03:50:24,884][04317] Num frames 9800... [2024-10-08 03:50:24,999][04317] Num frames 9900... [2024-10-08 03:50:25,115][04317] Num frames 10000... [2024-10-08 03:50:25,230][04317] Avg episode rewards: #0: 35.218, true rewards: #0: 14.361 [2024-10-08 03:50:25,232][04317] Avg episode reward: 35.218, avg true_objective: 14.361 [2024-10-08 03:50:25,288][04317] Num frames 10100... [2024-10-08 03:50:25,404][04317] Num frames 10200... [2024-10-08 03:50:25,519][04317] Num frames 10300... [2024-10-08 03:50:25,636][04317] Num frames 10400... [2024-10-08 03:50:25,754][04317] Num frames 10500... [2024-10-08 03:50:25,871][04317] Num frames 10600... [2024-10-08 03:50:25,986][04317] Num frames 10700... [2024-10-08 03:50:26,101][04317] Num frames 10800... [2024-10-08 03:50:26,217][04317] Num frames 10900... [2024-10-08 03:50:26,336][04317] Num frames 11000... [2024-10-08 03:50:26,442][04317] Avg episode rewards: #0: 33.306, true rewards: #0: 13.806 [2024-10-08 03:50:26,444][04317] Avg episode reward: 33.306, avg true_objective: 13.806 [2024-10-08 03:50:26,509][04317] Num frames 11100... [2024-10-08 03:50:26,625][04317] Num frames 11200... [2024-10-08 03:50:26,740][04317] Num frames 11300... [2024-10-08 03:50:26,856][04317] Num frames 11400... [2024-10-08 03:50:26,974][04317] Num frames 11500... [2024-10-08 03:50:27,088][04317] Num frames 11600... [2024-10-08 03:50:27,203][04317] Num frames 11700... [2024-10-08 03:50:27,279][04317] Avg episode rewards: #0: 31.018, true rewards: #0: 13.019 [2024-10-08 03:50:27,280][04317] Avg episode reward: 31.018, avg true_objective: 13.019 [2024-10-08 03:50:27,377][04317] Num frames 11800... [2024-10-08 03:50:27,491][04317] Num frames 11900... [2024-10-08 03:50:27,607][04317] Num frames 12000... [2024-10-08 03:50:27,723][04317] Num frames 12100... [2024-10-08 03:50:27,841][04317] Num frames 12200... [2024-10-08 03:50:27,957][04317] Num frames 12300... [2024-10-08 03:50:28,070][04317] Num frames 12400... [2024-10-08 03:50:28,186][04317] Num frames 12500... [2024-10-08 03:50:28,303][04317] Num frames 12600... [2024-10-08 03:50:28,420][04317] Num frames 12700... [2024-10-08 03:50:28,535][04317] Num frames 12800... [2024-10-08 03:50:28,652][04317] Num frames 12900... [2024-10-08 03:50:28,768][04317] Num frames 13000... [2024-10-08 03:50:28,883][04317] Num frames 13100... [2024-10-08 03:50:28,997][04317] Num frames 13200... [2024-10-08 03:50:29,117][04317] Num frames 13300... [2024-10-08 03:50:29,237][04317] Num frames 13400... [2024-10-08 03:50:29,358][04317] Num frames 13500... [2024-10-08 03:50:29,480][04317] Num frames 13600... [2024-10-08 03:50:29,597][04317] Num frames 13700... [2024-10-08 03:50:29,717][04317] Num frames 13800... [2024-10-08 03:50:29,792][04317] Avg episode rewards: #0: 33.717, true rewards: #0: 13.817 [2024-10-08 03:50:29,794][04317] Avg episode reward: 33.717, avg true_objective: 13.817 [2024-10-08 03:51:02,121][04317] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2024-10-08 03:51:06,565][04317] The model has been pushed to https://huggingface.co./EntropicLettuce/rl_course_vizdoom_health_gathering_supreme [2024-10-08 03:51:57,185][04317] Loading legacy config file train_dir/doom_health_gathering_supreme_2222/cfg.json instead of train_dir/doom_health_gathering_supreme_2222/config.json [2024-10-08 03:51:57,187][04317] Loading existing experiment configuration from train_dir/doom_health_gathering_supreme_2222/config.json [2024-10-08 03:51:57,188][04317] Overriding arg 'experiment' with value 'doom_health_gathering_supreme_2222' passed from command line [2024-10-08 03:51:57,189][04317] Overriding arg 'train_dir' with value 'train_dir' passed from command line [2024-10-08 03:51:57,191][04317] Overriding arg 'num_workers' with value 1 passed from command line [2024-10-08 03:51:57,193][04317] Adding new argument 'lr_adaptive_min'=1e-06 that is not in the saved config file! [2024-10-08 03:51:57,194][04317] Adding new argument 'lr_adaptive_max'=0.01 that is not in the saved config file! [2024-10-08 03:51:57,195][04317] Adding new argument 'env_gpu_observations'=True that is not in the saved config file! [2024-10-08 03:51:57,197][04317] Adding new argument 'no_render'=True that is not in the saved config file! [2024-10-08 03:51:57,198][04317] Adding new argument 'save_video'=True that is not in the saved config file! [2024-10-08 03:51:57,200][04317] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-10-08 03:51:57,201][04317] Adding new argument 'video_name'=None that is not in the saved config file! [2024-10-08 03:51:57,202][04317] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! [2024-10-08 03:51:57,205][04317] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-10-08 03:51:57,206][04317] Adding new argument 'push_to_hub'=False that is not in the saved config file! [2024-10-08 03:51:57,207][04317] Adding new argument 'hf_repository'=None that is not in the saved config file! [2024-10-08 03:51:57,209][04317] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-10-08 03:51:57,210][04317] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-10-08 03:51:57,212][04317] Adding new argument 'train_script'=None that is not in the saved config file! [2024-10-08 03:51:57,213][04317] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-10-08 03:51:57,214][04317] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-10-08 03:51:57,221][04317] RunningMeanStd input shape: (3, 72, 128) [2024-10-08 03:51:57,223][04317] RunningMeanStd input shape: (1,) [2024-10-08 03:51:57,234][04317] ConvEncoder: input_channels=3 [2024-10-08 03:51:57,280][04317] Conv encoder output size: 512 [2024-10-08 03:51:57,281][04317] Policy head output size: 512 [2024-10-08 03:51:57,304][04317] Loading state from checkpoint train_dir/doom_health_gathering_supreme_2222/checkpoint_p0/checkpoint_000539850_4422451200.pth... [2024-10-08 03:51:57,802][04317] Num frames 100... [2024-10-08 03:51:57,918][04317] Num frames 200... [2024-10-08 03:51:58,035][04317] Num frames 300... [2024-10-08 03:51:58,150][04317] Num frames 400... [2024-10-08 03:51:58,264][04317] Num frames 500... [2024-10-08 03:51:58,383][04317] Num frames 600... [2024-10-08 03:51:58,500][04317] Num frames 700... [2024-10-08 03:51:58,618][04317] Num frames 800... [2024-10-08 03:51:58,734][04317] Num frames 900... [2024-10-08 03:51:58,851][04317] Num frames 1000... [2024-10-08 03:51:58,967][04317] Num frames 1100... [2024-10-08 03:51:59,084][04317] Num frames 1200... [2024-10-08 03:51:59,200][04317] Num frames 1300... [2024-10-08 03:51:59,316][04317] Num frames 1400... [2024-10-08 03:51:59,438][04317] Num frames 1500... [2024-10-08 03:51:59,557][04317] Num frames 1600... [2024-10-08 03:51:59,674][04317] Num frames 1700... [2024-10-08 03:51:59,791][04317] Num frames 1800... [2024-10-08 03:51:59,909][04317] Num frames 1900... [2024-10-08 03:52:00,027][04317] Num frames 2000... [2024-10-08 03:52:00,153][04317] Num frames 2100... [2024-10-08 03:52:00,205][04317] Avg episode rewards: #0: 62.998, true rewards: #0: 21.000 [2024-10-08 03:52:00,207][04317] Avg episode reward: 62.998, avg true_objective: 21.000 [2024-10-08 03:52:00,324][04317] Num frames 2200... [2024-10-08 03:52:00,447][04317] Num frames 2300... [2024-10-08 03:52:00,567][04317] Num frames 2400... [2024-10-08 03:52:00,682][04317] Num frames 2500... [2024-10-08 03:52:00,804][04317] Num frames 2600... [2024-10-08 03:52:00,917][04317] Num frames 2700... [2024-10-08 03:52:01,032][04317] Num frames 2800... [2024-10-08 03:52:01,150][04317] Num frames 2900... [2024-10-08 03:52:01,267][04317] Num frames 3000... [2024-10-08 03:52:01,391][04317] Num frames 3100... [2024-10-08 03:52:01,510][04317] Num frames 3200... [2024-10-08 03:52:01,628][04317] Num frames 3300... [2024-10-08 03:52:01,745][04317] Num frames 3400... [2024-10-08 03:52:01,865][04317] Num frames 3500... [2024-10-08 03:52:01,986][04317] Num frames 3600... [2024-10-08 03:52:02,104][04317] Num frames 3700... [2024-10-08 03:52:02,222][04317] Num frames 3800... [2024-10-08 03:52:02,345][04317] Num frames 3900... [2024-10-08 03:52:02,465][04317] Num frames 4000... [2024-10-08 03:52:02,585][04317] Num frames 4100... [2024-10-08 03:52:02,703][04317] Num frames 4200... [2024-10-08 03:52:02,756][04317] Avg episode rewards: #0: 62.499, true rewards: #0: 21.000 [2024-10-08 03:52:02,758][04317] Avg episode reward: 62.499, avg true_objective: 21.000 [2024-10-08 03:52:02,875][04317] Num frames 4300... [2024-10-08 03:52:02,994][04317] Num frames 4400... [2024-10-08 03:52:03,115][04317] Num frames 4500... [2024-10-08 03:52:03,240][04317] Num frames 4600... [2024-10-08 03:52:03,364][04317] Num frames 4700... [2024-10-08 03:52:03,488][04317] Num frames 4800... [2024-10-08 03:52:03,617][04317] Num frames 4900... [2024-10-08 03:52:03,743][04317] Num frames 5000... [2024-10-08 03:52:03,860][04317] Num frames 5100... [2024-10-08 03:52:03,991][04317] Num frames 5200... [2024-10-08 03:52:04,116][04317] Num frames 5300... [2024-10-08 03:52:04,240][04317] Num frames 5400... [2024-10-08 03:52:04,315][04317] Avg episode rewards: #0: 52.052, true rewards: #0: 18.053 [2024-10-08 03:52:04,317][04317] Avg episode reward: 52.052, avg true_objective: 18.053 [2024-10-08 03:52:04,423][04317] Num frames 5500... [2024-10-08 03:52:04,543][04317] Num frames 5600... [2024-10-08 03:52:04,670][04317] Num frames 5700... [2024-10-08 03:52:04,790][04317] Num frames 5800... [2024-10-08 03:52:04,904][04317] Num frames 5900... [2024-10-08 03:52:05,030][04317] Num frames 6000... [2024-10-08 03:52:05,154][04317] Num frames 6100... [2024-10-08 03:52:05,273][04317] Num frames 6200... [2024-10-08 03:52:05,395][04317] Num frames 6300... [2024-10-08 03:52:05,514][04317] Num frames 6400... [2024-10-08 03:52:05,638][04317] Num frames 6500... [2024-10-08 03:52:05,760][04317] Num frames 6600... [2024-10-08 03:52:05,878][04317] Num frames 6700... [2024-10-08 03:52:05,996][04317] Num frames 6800... [2024-10-08 03:52:06,114][04317] Num frames 6900... [2024-10-08 03:52:06,230][04317] Num frames 7000... [2024-10-08 03:52:06,348][04317] Num frames 7100... [2024-10-08 03:52:06,464][04317] Num frames 7200... [2024-10-08 03:52:06,581][04317] Num frames 7300... [2024-10-08 03:52:06,699][04317] Num frames 7400... [2024-10-08 03:52:06,817][04317] Num frames 7500... [2024-10-08 03:52:06,891][04317] Avg episode rewards: #0: 54.289, true rewards: #0: 18.790 [2024-10-08 03:52:06,892][04317] Avg episode reward: 54.289, avg true_objective: 18.790 [2024-10-08 03:52:06,992][04317] Num frames 7600... [2024-10-08 03:52:07,110][04317] Num frames 7700... [2024-10-08 03:52:07,226][04317] Num frames 7800... [2024-10-08 03:52:07,342][04317] Num frames 7900... [2024-10-08 03:52:07,460][04317] Num frames 8000... [2024-10-08 03:52:07,576][04317] Num frames 8100... [2024-10-08 03:52:07,700][04317] Num frames 8200... [2024-10-08 03:52:07,821][04317] Num frames 8300... [2024-10-08 03:52:07,939][04317] Num frames 8400... [2024-10-08 03:52:08,058][04317] Num frames 8500... [2024-10-08 03:52:08,176][04317] Num frames 8600... [2024-10-08 03:52:08,294][04317] Num frames 8700... [2024-10-08 03:52:08,414][04317] Num frames 8800... [2024-10-08 03:52:08,536][04317] Num frames 8900... [2024-10-08 03:52:08,655][04317] Num frames 9000... [2024-10-08 03:52:08,771][04317] Num frames 9100... [2024-10-08 03:52:08,887][04317] Num frames 9200... [2024-10-08 03:52:09,007][04317] Num frames 9300... [2024-10-08 03:52:09,128][04317] Num frames 9400... [2024-10-08 03:52:09,247][04317] Num frames 9500... [2024-10-08 03:52:09,368][04317] Num frames 9600... [2024-10-08 03:52:09,442][04317] Avg episode rewards: #0: 57.431, true rewards: #0: 19.232 [2024-10-08 03:52:09,443][04317] Avg episode reward: 57.431, avg true_objective: 19.232 [2024-10-08 03:52:09,546][04317] Num frames 9700... [2024-10-08 03:52:09,664][04317] Num frames 9800... [2024-10-08 03:52:09,782][04317] Num frames 9900... [2024-10-08 03:52:09,900][04317] Num frames 10000... [2024-10-08 03:52:10,018][04317] Num frames 10100... [2024-10-08 03:52:10,136][04317] Num frames 10200... [2024-10-08 03:52:10,260][04317] Num frames 10300... [2024-10-08 03:52:10,387][04317] Num frames 10400... [2024-10-08 03:52:10,504][04317] Num frames 10500... [2024-10-08 03:52:10,620][04317] Num frames 10600... [2024-10-08 03:52:10,737][04317] Num frames 10700... [2024-10-08 03:52:10,854][04317] Num frames 10800... [2024-10-08 03:52:10,972][04317] Num frames 10900... [2024-10-08 03:52:11,088][04317] Num frames 11000... [2024-10-08 03:52:11,202][04317] Num frames 11100... [2024-10-08 03:52:11,319][04317] Num frames 11200... [2024-10-08 03:52:11,444][04317] Num frames 11300... [2024-10-08 03:52:11,562][04317] Num frames 11400... [2024-10-08 03:52:11,681][04317] Num frames 11500... [2024-10-08 03:52:11,801][04317] Num frames 11600... [2024-10-08 03:52:11,925][04317] Num frames 11700... [2024-10-08 03:52:12,000][04317] Avg episode rewards: #0: 58.859, true rewards: #0: 19.527 [2024-10-08 03:52:12,002][04317] Avg episode reward: 58.859, avg true_objective: 19.527 [2024-10-08 03:52:12,102][04317] Num frames 11800... [2024-10-08 03:52:12,218][04317] Num frames 11900... [2024-10-08 03:52:12,335][04317] Num frames 12000... [2024-10-08 03:52:12,456][04317] Num frames 12100... [2024-10-08 03:52:12,572][04317] Num frames 12200... [2024-10-08 03:52:12,689][04317] Num frames 12300... [2024-10-08 03:52:12,808][04317] Num frames 12400... [2024-10-08 03:52:12,928][04317] Num frames 12500... [2024-10-08 03:52:13,046][04317] Num frames 12600... [2024-10-08 03:52:13,165][04317] Num frames 12700... [2024-10-08 03:52:13,284][04317] Num frames 12800... [2024-10-08 03:52:13,403][04317] Num frames 12900... [2024-10-08 03:52:13,519][04317] Num frames 13000... [2024-10-08 03:52:13,634][04317] Num frames 13100... [2024-10-08 03:52:13,754][04317] Num frames 13200... [2024-10-08 03:52:13,872][04317] Num frames 13300... [2024-10-08 03:52:13,992][04317] Num frames 13400... [2024-10-08 03:52:14,111][04317] Num frames 13500... [2024-10-08 03:52:14,230][04317] Num frames 13600... [2024-10-08 03:52:14,352][04317] Num frames 13700... [2024-10-08 03:52:14,473][04317] Num frames 13800... [2024-10-08 03:52:14,547][04317] Avg episode rewards: #0: 59.308, true rewards: #0: 19.737 [2024-10-08 03:52:14,548][04317] Avg episode reward: 59.308, avg true_objective: 19.737 [2024-10-08 03:52:14,647][04317] Num frames 13900... [2024-10-08 03:52:14,766][04317] Num frames 14000... [2024-10-08 03:52:14,883][04317] Num frames 14100... [2024-10-08 03:52:15,006][04317] Num frames 14200... [2024-10-08 03:52:15,124][04317] Num frames 14300... [2024-10-08 03:52:15,250][04317] Num frames 14400... [2024-10-08 03:52:15,379][04317] Num frames 14500... [2024-10-08 03:52:15,504][04317] Num frames 14600... [2024-10-08 03:52:15,621][04317] Num frames 14700... [2024-10-08 03:52:15,739][04317] Num frames 14800... [2024-10-08 03:52:15,856][04317] Num frames 14900... [2024-10-08 03:52:15,978][04317] Num frames 15000... [2024-10-08 03:52:16,101][04317] Num frames 15100... [2024-10-08 03:52:16,219][04317] Num frames 15200... [2024-10-08 03:52:16,340][04317] Num frames 15300... [2024-10-08 03:52:16,462][04317] Num frames 15400... [2024-10-08 03:52:16,582][04317] Num frames 15500... [2024-10-08 03:52:16,709][04317] Num frames 15600... [2024-10-08 03:52:16,838][04317] Num frames 15700... [2024-10-08 03:52:16,958][04317] Num frames 15800... [2024-10-08 03:52:17,077][04317] Num frames 15900... [2024-10-08 03:52:17,152][04317] Avg episode rewards: #0: 59.894, true rewards: #0: 19.895 [2024-10-08 03:52:17,153][04317] Avg episode reward: 59.894, avg true_objective: 19.895 [2024-10-08 03:52:17,252][04317] Num frames 16000... [2024-10-08 03:52:17,373][04317] Num frames 16100... [2024-10-08 03:52:17,491][04317] Num frames 16200... [2024-10-08 03:52:17,612][04317] Num frames 16300... [2024-10-08 03:52:17,731][04317] Num frames 16400... [2024-10-08 03:52:17,853][04317] Num frames 16500... [2024-10-08 03:52:17,980][04317] Num frames 16600... [2024-10-08 03:52:18,106][04317] Num frames 16700... [2024-10-08 03:52:18,222][04317] Num frames 16800... [2024-10-08 03:52:18,342][04317] Num frames 16900... [2024-10-08 03:52:18,462][04317] Num frames 17000... [2024-10-08 03:52:18,581][04317] Num frames 17100... [2024-10-08 03:52:18,703][04317] Num frames 17200... [2024-10-08 03:52:18,830][04317] Num frames 17300... [2024-10-08 03:52:18,952][04317] Num frames 17400... [2024-10-08 03:52:19,072][04317] Num frames 17500... [2024-10-08 03:52:19,196][04317] Num frames 17600... [2024-10-08 03:52:19,324][04317] Num frames 17700... [2024-10-08 03:52:19,451][04317] Num frames 17800... [2024-10-08 03:52:19,578][04317] Num frames 17900... [2024-10-08 03:52:19,705][04317] Num frames 18000... [2024-10-08 03:52:19,781][04317] Avg episode rewards: #0: 59.906, true rewards: #0: 20.018 [2024-10-08 03:52:19,783][04317] Avg episode reward: 59.906, avg true_objective: 20.018 [2024-10-08 03:52:19,890][04317] Num frames 18100... [2024-10-08 03:52:20,017][04317] Num frames 18200... [2024-10-08 03:52:20,145][04317] Num frames 18300... [2024-10-08 03:52:20,268][04317] Num frames 18400... [2024-10-08 03:52:20,395][04317] Num frames 18500... [2024-10-08 03:52:20,522][04317] Num frames 18600... [2024-10-08 03:52:20,649][04317] Num frames 18700... [2024-10-08 03:52:20,772][04317] Num frames 18800... [2024-10-08 03:52:20,892][04317] Num frames 18900... [2024-10-08 03:52:21,012][04317] Num frames 19000... [2024-10-08 03:52:21,131][04317] Num frames 19100... [2024-10-08 03:52:21,248][04317] Num frames 19200... [2024-10-08 03:52:21,370][04317] Num frames 19300... [2024-10-08 03:52:21,491][04317] Num frames 19400... [2024-10-08 03:52:21,618][04317] Num frames 19500... [2024-10-08 03:52:21,744][04317] Num frames 19600... [2024-10-08 03:52:21,869][04317] Num frames 19700... [2024-10-08 03:52:21,998][04317] Num frames 19800... [2024-10-08 03:52:22,128][04317] Num frames 19900... [2024-10-08 03:52:22,257][04317] Num frames 20000... [2024-10-08 03:52:22,379][04317] Num frames 20100... [2024-10-08 03:52:22,453][04317] Avg episode rewards: #0: 59.815, true rewards: #0: 20.116 [2024-10-08 03:52:22,454][04317] Avg episode reward: 59.815, avg true_objective: 20.116 [2024-10-08 03:53:09,881][04317] Replay video saved to train_dir/doom_health_gathering_supreme_2222/replay.mp4! [2024-10-08 03:56:17,887][04317] Environment doom_basic already registered, overwriting... [2024-10-08 03:56:17,889][04317] Environment doom_two_colors_easy already registered, overwriting... [2024-10-08 03:56:17,890][04317] Environment doom_two_colors_hard already registered, overwriting... [2024-10-08 03:56:17,892][04317] Environment doom_dm already registered, overwriting... [2024-10-08 03:56:17,893][04317] Environment doom_dwango5 already registered, overwriting... [2024-10-08 03:56:17,895][04317] Environment doom_my_way_home_flat_actions already registered, overwriting... [2024-10-08 03:56:17,896][04317] Environment doom_defend_the_center_flat_actions already registered, overwriting... [2024-10-08 03:56:17,897][04317] Environment doom_my_way_home already registered, overwriting... [2024-10-08 03:56:17,899][04317] Environment doom_deadly_corridor already registered, overwriting... [2024-10-08 03:56:17,900][04317] Environment doom_defend_the_center already registered, overwriting... [2024-10-08 03:56:17,901][04317] Environment doom_defend_the_line already registered, overwriting... [2024-10-08 03:56:17,902][04317] Environment doom_health_gathering already registered, overwriting... [2024-10-08 03:56:17,903][04317] Environment doom_health_gathering_supreme already registered, overwriting... [2024-10-08 03:56:17,904][04317] Environment doom_battle already registered, overwriting... [2024-10-08 03:56:17,906][04317] Environment doom_battle2 already registered, overwriting... [2024-10-08 03:56:17,907][04317] Environment doom_duel_bots already registered, overwriting... [2024-10-08 03:56:17,908][04317] Environment doom_deathmatch_bots already registered, overwriting... [2024-10-08 03:56:17,910][04317] Environment doom_duel already registered, overwriting... [2024-10-08 03:56:17,911][04317] Environment doom_deathmatch_full already registered, overwriting... [2024-10-08 03:56:17,912][04317] Environment doom_benchmark already registered, overwriting... [2024-10-08 03:56:17,913][04317] register_encoder_factory: [2024-10-08 03:56:17,924][04317] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-10-08 03:56:17,925][04317] Overriding arg 'train_for_env_steps' with value 12000000 passed from command line [2024-10-08 03:56:17,931][04317] Experiment dir /content/train_dir/default_experiment already exists! [2024-10-08 03:56:17,931][04317] Resuming existing experiment from /content/train_dir/default_experiment... [2024-10-08 03:56:17,932][04317] Weights and Biases integration disabled [2024-10-08 03:56:17,936][04317] Environment var CUDA_VISIBLE_DEVICES is 0 [2024-10-08 03:56:19,417][04317] Starting experiment with the following configuration: help=False algo=APPO env=doom_health_gathering_supreme experiment=default_experiment train_dir=/content/train_dir restart_behavior=resume device=gpu seed=None num_policies=1 async_rl=True serial_mode=False batched_sampling=False num_batches_to_accumulate=2 worker_num_splits=2 policy_workers_per_policy=1 max_policy_lag=1000 num_workers=8 num_envs_per_worker=4 batch_size=1024 num_batches_per_epoch=1 num_epochs=1 rollout=32 recurrence=32 shuffle_minibatches=False gamma=0.99 reward_scale=1.0 reward_clip=1000.0 value_bootstrap=False normalize_returns=True exploration_loss_coeff=0.001 value_loss_coeff=0.5 kl_loss_coeff=0.0 exploration_loss=symmetric_kl gae_lambda=0.95 ppo_clip_ratio=0.1 ppo_clip_value=0.2 with_vtrace=False vtrace_rho=1.0 vtrace_c=1.0 optimizer=adam adam_eps=1e-06 adam_beta1=0.9 adam_beta2=0.999 max_grad_norm=4.0 learning_rate=0.0001 lr_schedule=constant lr_schedule_kl_threshold=0.008 lr_adaptive_min=1e-06 lr_adaptive_max=0.01 obs_subtract_mean=0.0 obs_scale=255.0 normalize_input=True normalize_input_keys=None decorrelate_experience_max_seconds=0 decorrelate_envs_on_one_worker=True actor_worker_gpus=[] set_workers_cpu_affinity=True force_envs_single_thread=False default_niceness=0 log_to_file=True experiment_summaries_interval=10 flush_summaries_interval=30 stats_avg=100 summaries_use_frameskip=True heartbeat_interval=20 heartbeat_reporting_interval=600 train_for_env_steps=12000000 train_for_seconds=10000000000 save_every_sec=120 keep_checkpoints=2 load_checkpoint_kind=latest save_milestones_sec=-1 save_best_every_sec=5 save_best_metric=reward save_best_after=100000 benchmark=False encoder_mlp_layers=[512, 512] encoder_conv_architecture=convnet_simple encoder_conv_mlp_layers=[512] use_rnn=True rnn_size=512 rnn_type=gru rnn_num_layers=1 decoder_mlp_layers=[] nonlinearity=elu policy_initialization=orthogonal policy_init_gain=1.0 actor_critic_share_weights=True adaptive_stddev=True continuous_tanh_scale=0.0 initial_stddev=1.0 use_env_info_cache=False env_gpu_actions=False env_gpu_observations=True env_frameskip=4 env_framestack=1 pixel_format=CHW use_record_episode_statistics=False with_wandb=False wandb_user=None wandb_project=sample_factory wandb_group=None wandb_job_type=SF wandb_tags=[] with_pbt=False pbt_mix_policies_in_one_env=True pbt_period_env_steps=5000000 pbt_start_mutation=20000000 pbt_replace_fraction=0.3 pbt_mutation_rate=0.15 pbt_replace_reward_gap=0.1 pbt_replace_reward_gap_absolute=1e-06 pbt_optimize_gamma=False pbt_target_objective=true_objective pbt_perturb_min=1.1 pbt_perturb_max=1.5 num_agents=-1 num_humans=0 num_bots=-1 start_bot_difficulty=None timelimit=None res_w=128 res_h=72 wide_aspect_ratio=False eval_env_frameskip=1 fps=35 command_line=--env=doom_health_gathering_supreme --num_workers=8 --num_envs_per_worker=4 --train_for_env_steps=4000000 cli_args={'env': 'doom_health_gathering_supreme', 'num_workers': 8, 'num_envs_per_worker': 4, 'train_for_env_steps': 4000000} git_hash=unknown git_repo_name=not a git repository [2024-10-08 03:56:19,419][04317] Saving configuration to /content/train_dir/default_experiment/config.json... [2024-10-08 03:56:19,422][04317] Rollout worker 0 uses device cpu [2024-10-08 03:56:19,423][04317] Rollout worker 1 uses device cpu [2024-10-08 03:56:19,425][04317] Rollout worker 2 uses device cpu [2024-10-08 03:56:19,425][04317] Rollout worker 3 uses device cpu [2024-10-08 03:56:19,427][04317] Rollout worker 4 uses device cpu [2024-10-08 03:56:19,429][04317] Rollout worker 5 uses device cpu [2024-10-08 03:56:19,430][04317] Rollout worker 6 uses device cpu [2024-10-08 03:56:19,432][04317] Rollout worker 7 uses device cpu [2024-10-08 03:56:19,473][04317] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-10-08 03:56:19,474][04317] InferenceWorker_p0-w0: min num requests: 2 [2024-10-08 03:56:19,507][04317] Starting all processes... [2024-10-08 03:56:19,508][04317] Starting process learner_proc0 [2024-10-08 03:56:19,557][04317] Starting all processes... [2024-10-08 03:56:19,561][04317] Starting process inference_proc0-0 [2024-10-08 03:56:19,561][04317] Starting process rollout_proc0 [2024-10-08 03:56:19,562][04317] Starting process rollout_proc1 [2024-10-08 03:56:19,563][04317] Starting process rollout_proc2 [2024-10-08 03:56:19,564][04317] Starting process rollout_proc3 [2024-10-08 03:56:19,565][04317] Starting process rollout_proc4 [2024-10-08 03:56:19,567][04317] Starting process rollout_proc5 [2024-10-08 03:56:19,576][04317] Starting process rollout_proc6 [2024-10-08 03:56:19,582][04317] Starting process rollout_proc7 [2024-10-08 03:56:21,574][11573] Worker 0 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] [2024-10-08 03:56:21,709][11578] Worker 5 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] [2024-10-08 03:56:21,847][11574] Worker 1 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] [2024-10-08 03:56:21,852][11575] Worker 2 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] [2024-10-08 03:56:21,854][11559] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-10-08 03:56:21,854][11559] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2024-10-08 03:56:21,867][11580] Worker 7 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] [2024-10-08 03:56:21,870][11559] Num visible devices: 1 [2024-10-08 03:56:21,872][11572] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-10-08 03:56:21,872][11572] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2024-10-08 03:56:21,895][11572] Num visible devices: 1 [2024-10-08 03:56:21,896][11576] Worker 3 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] [2024-10-08 03:56:21,926][11559] Starting seed is not provided [2024-10-08 03:56:21,926][11559] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-10-08 03:56:21,926][11559] Initializing actor-critic model on device cuda:0 [2024-10-08 03:56:21,927][11559] RunningMeanStd input shape: (3, 72, 128) [2024-10-08 03:56:21,928][11559] RunningMeanStd input shape: (1,) [2024-10-08 03:56:21,943][11559] ConvEncoder: input_channels=3 [2024-10-08 03:56:21,949][11577] Worker 4 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] [2024-10-08 03:56:22,006][11579] Worker 6 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] [2024-10-08 03:56:22,060][11559] Conv encoder output size: 512 [2024-10-08 03:56:22,061][11559] Policy head output size: 512 [2024-10-08 03:56:22,076][11559] Created Actor Critic model with architecture: [2024-10-08 03:56:22,076][11559] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2024-10-08 03:56:23,901][11559] Using optimizer [2024-10-08 03:56:23,901][11559] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-10-08 03:56:23,935][11559] Loading model from checkpoint [2024-10-08 03:56:23,939][11559] Loaded experiment state at self.train_step=978, self.env_steps=4005888 [2024-10-08 03:56:23,939][11559] Initialized policy 0 weights for model version 978 [2024-10-08 03:56:23,941][11559] LearnerWorker_p0 finished initialization! [2024-10-08 03:56:23,941][11559] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-10-08 03:56:24,027][11572] RunningMeanStd input shape: (3, 72, 128) [2024-10-08 03:56:24,028][11572] RunningMeanStd input shape: (1,) [2024-10-08 03:56:24,040][11572] ConvEncoder: input_channels=3 [2024-10-08 03:56:24,144][11572] Conv encoder output size: 512 [2024-10-08 03:56:24,145][11572] Policy head output size: 512 [2024-10-08 03:56:25,891][04317] Inference worker 0-0 is ready! [2024-10-08 03:56:25,892][04317] All inference workers are ready! Signal rollout workers to start! [2024-10-08 03:56:25,907][11577] Doom resolution: 160x120, resize resolution: (128, 72) [2024-10-08 03:56:25,907][11576] Doom resolution: 160x120, resize resolution: (128, 72) [2024-10-08 03:56:25,907][11580] Doom resolution: 160x120, resize resolution: (128, 72) [2024-10-08 03:56:25,907][11575] Doom resolution: 160x120, resize resolution: (128, 72) [2024-10-08 03:56:25,912][11573] Doom resolution: 160x120, resize resolution: (128, 72) [2024-10-08 03:56:25,912][11579] Doom resolution: 160x120, resize resolution: (128, 72) [2024-10-08 03:56:25,912][11578] Doom resolution: 160x120, resize resolution: (128, 72) [2024-10-08 03:56:25,912][11574] Doom resolution: 160x120, resize resolution: (128, 72) [2024-10-08 03:56:26,199][11576] Decorrelating experience for 0 frames... [2024-10-08 03:56:26,202][11573] Decorrelating experience for 0 frames... [2024-10-08 03:56:26,202][11577] Decorrelating experience for 0 frames... [2024-10-08 03:56:26,202][11580] Decorrelating experience for 0 frames... [2024-10-08 03:56:26,204][11575] Decorrelating experience for 0 frames... [2024-10-08 03:56:26,205][11579] Decorrelating experience for 0 frames... [2024-10-08 03:56:26,455][11580] Decorrelating experience for 32 frames... [2024-10-08 03:56:26,455][11576] Decorrelating experience for 32 frames... [2024-10-08 03:56:26,457][11573] Decorrelating experience for 32 frames... [2024-10-08 03:56:26,458][11577] Decorrelating experience for 32 frames... [2024-10-08 03:56:26,696][11579] Decorrelating experience for 32 frames... [2024-10-08 03:56:26,710][11575] Decorrelating experience for 32 frames... [2024-10-08 03:56:26,743][11577] Decorrelating experience for 64 frames... [2024-10-08 03:56:26,763][11576] Decorrelating experience for 64 frames... [2024-10-08 03:56:26,765][11573] Decorrelating experience for 64 frames... [2024-10-08 03:56:26,950][11580] Decorrelating experience for 64 frames... [2024-10-08 03:56:26,990][11579] Decorrelating experience for 64 frames... [2024-10-08 03:56:27,016][11575] Decorrelating experience for 64 frames... [2024-10-08 03:56:27,056][11576] Decorrelating experience for 96 frames... [2024-10-08 03:56:27,225][11578] Decorrelating experience for 0 frames... [2024-10-08 03:56:27,230][11577] Decorrelating experience for 96 frames... [2024-10-08 03:56:27,284][11575] Decorrelating experience for 96 frames... [2024-10-08 03:56:27,330][11573] Decorrelating experience for 96 frames... [2024-10-08 03:56:27,473][11578] Decorrelating experience for 32 frames... [2024-10-08 03:56:27,510][11574] Decorrelating experience for 0 frames... [2024-10-08 03:56:27,575][11579] Decorrelating experience for 96 frames... [2024-10-08 03:56:27,790][11574] Decorrelating experience for 32 frames... [2024-10-08 03:56:27,799][11578] Decorrelating experience for 64 frames... [2024-10-08 03:56:27,809][11580] Decorrelating experience for 96 frames... [2024-10-08 03:56:27,936][04317] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 4005888. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-10-08 03:56:27,939][04317] Avg episode reward: [(0, '0.320')] [2024-10-08 03:56:28,168][11574] Decorrelating experience for 64 frames... [2024-10-08 03:56:28,173][11578] Decorrelating experience for 96 frames... [2024-10-08 03:56:28,310][11559] Signal inference workers to stop experience collection... [2024-10-08 03:56:28,315][11572] InferenceWorker_p0-w0: stopping experience collection [2024-10-08 03:56:28,486][11574] Decorrelating experience for 96 frames... [2024-10-08 03:56:29,363][11559] Signal inference workers to resume experience collection... [2024-10-08 03:56:29,364][11572] InferenceWorker_p0-w0: resuming experience collection [2024-10-08 03:56:31,711][11572] Updated weights for policy 0, policy_version 988 (0.0367) [2024-10-08 03:56:32,936][04317] Fps is (10 sec: 13107.3, 60 sec: 13107.3, 300 sec: 13107.3). Total num frames: 4071424. Throughput: 0: 2352.4. Samples: 11762. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-08 03:56:32,939][04317] Avg episode reward: [(0, '14.564')] [2024-10-08 03:56:33,778][11572] Updated weights for policy 0, policy_version 998 (0.0011) [2024-10-08 03:56:35,844][11572] Updated weights for policy 0, policy_version 1008 (0.0012) [2024-10-08 03:56:37,840][11572] Updated weights for policy 0, policy_version 1018 (0.0012) [2024-10-08 03:56:37,936][04317] Fps is (10 sec: 16384.0, 60 sec: 16384.0, 300 sec: 16384.0). Total num frames: 4169728. Throughput: 0: 4189.4. Samples: 41894. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-08 03:56:37,938][04317] Avg episode reward: [(0, '23.495')] [2024-10-08 03:56:37,945][11559] Saving new best policy, reward=23.495! [2024-10-08 03:56:39,464][04317] Heartbeat connected on Batcher_0 [2024-10-08 03:56:39,468][04317] Heartbeat connected on LearnerWorker_p0 [2024-10-08 03:56:39,477][04317] Heartbeat connected on InferenceWorker_p0-w0 [2024-10-08 03:56:39,482][04317] Heartbeat connected on RolloutWorker_w0 [2024-10-08 03:56:39,485][04317] Heartbeat connected on RolloutWorker_w1 [2024-10-08 03:56:39,488][04317] Heartbeat connected on RolloutWorker_w2 [2024-10-08 03:56:39,494][04317] Heartbeat connected on RolloutWorker_w3 [2024-10-08 03:56:39,497][04317] Heartbeat connected on RolloutWorker_w4 [2024-10-08 03:56:39,500][04317] Heartbeat connected on RolloutWorker_w5 [2024-10-08 03:56:39,503][04317] Heartbeat connected on RolloutWorker_w6 [2024-10-08 03:56:39,509][04317] Heartbeat connected on RolloutWorker_w7 [2024-10-08 03:56:39,798][11572] Updated weights for policy 0, policy_version 1028 (0.0011) [2024-10-08 03:56:41,888][11572] Updated weights for policy 0, policy_version 1038 (0.0012) [2024-10-08 03:56:42,936][04317] Fps is (10 sec: 20070.4, 60 sec: 17749.4, 300 sec: 17749.4). Total num frames: 4272128. Throughput: 0: 3788.4. Samples: 56826. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-08 03:56:42,938][04317] Avg episode reward: [(0, '22.151')] [2024-10-08 03:56:43,864][11572] Updated weights for policy 0, policy_version 1048 (0.0011) [2024-10-08 03:56:45,832][11572] Updated weights for policy 0, policy_version 1058 (0.0011) [2024-10-08 03:56:47,936][04317] Fps is (10 sec: 20070.3, 60 sec: 18227.2, 300 sec: 18227.2). Total num frames: 4370432. Throughput: 0: 4394.3. Samples: 87886. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-08 03:56:47,940][04317] Avg episode reward: [(0, '19.779')] [2024-10-08 03:56:47,993][11572] Updated weights for policy 0, policy_version 1068 (0.0011) [2024-10-08 03:56:50,094][11572] Updated weights for policy 0, policy_version 1078 (0.0011) [2024-10-08 03:56:52,090][11572] Updated weights for policy 0, policy_version 1088 (0.0012) [2024-10-08 03:56:52,936][04317] Fps is (10 sec: 20070.3, 60 sec: 18677.8, 300 sec: 18677.8). Total num frames: 4472832. Throughput: 0: 4704.7. Samples: 117618. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-10-08 03:56:52,939][04317] Avg episode reward: [(0, '19.246')] [2024-10-08 03:56:54,057][11572] Updated weights for policy 0, policy_version 1098 (0.0011) [2024-10-08 03:56:56,005][11572] Updated weights for policy 0, policy_version 1108 (0.0011) [2024-10-08 03:56:57,936][04317] Fps is (10 sec: 20480.0, 60 sec: 18978.1, 300 sec: 18978.1). Total num frames: 4575232. Throughput: 0: 4442.5. Samples: 133276. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-10-08 03:56:57,938][04317] Avg episode reward: [(0, '20.329')] [2024-10-08 03:56:57,959][11572] Updated weights for policy 0, policy_version 1118 (0.0011) [2024-10-08 03:56:59,914][11572] Updated weights for policy 0, policy_version 1128 (0.0011) [2024-10-08 03:57:01,928][11572] Updated weights for policy 0, policy_version 1138 (0.0011) [2024-10-08 03:57:02,937][04317] Fps is (10 sec: 20479.6, 60 sec: 19192.6, 300 sec: 19192.6). Total num frames: 4677632. Throughput: 0: 4697.9. Samples: 164428. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-10-08 03:57:02,939][04317] Avg episode reward: [(0, '21.221')] [2024-10-08 03:57:03,986][11572] Updated weights for policy 0, policy_version 1148 (0.0011) [2024-10-08 03:57:05,965][11572] Updated weights for policy 0, policy_version 1158 (0.0011) [2024-10-08 03:57:07,936][04317] Fps is (10 sec: 20480.2, 60 sec: 19353.6, 300 sec: 19353.6). Total num frames: 4780032. Throughput: 0: 4875.9. Samples: 195036. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-10-08 03:57:07,938][04317] Avg episode reward: [(0, '22.147')] [2024-10-08 03:57:07,947][11572] Updated weights for policy 0, policy_version 1168 (0.0011) [2024-10-08 03:57:09,921][11572] Updated weights for policy 0, policy_version 1178 (0.0011) [2024-10-08 03:57:11,904][11572] Updated weights for policy 0, policy_version 1188 (0.0011) [2024-10-08 03:57:12,936][04317] Fps is (10 sec: 20889.7, 60 sec: 19569.7, 300 sec: 19569.7). Total num frames: 4886528. Throughput: 0: 4681.1. Samples: 210648. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-10-08 03:57:12,939][04317] Avg episode reward: [(0, '22.034')] [2024-10-08 03:57:13,883][11572] Updated weights for policy 0, policy_version 1198 (0.0011) [2024-10-08 03:57:15,919][11572] Updated weights for policy 0, policy_version 1208 (0.0012) [2024-10-08 03:57:17,936][04317] Fps is (10 sec: 20479.8, 60 sec: 19578.9, 300 sec: 19578.9). Total num frames: 4984832. Throughput: 0: 5100.6. Samples: 241290. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-10-08 03:57:17,939][04317] Avg episode reward: [(0, '23.041')] [2024-10-08 03:57:17,977][11572] Updated weights for policy 0, policy_version 1218 (0.0012) [2024-10-08 03:57:20,011][11572] Updated weights for policy 0, policy_version 1228 (0.0011) [2024-10-08 03:57:21,955][11572] Updated weights for policy 0, policy_version 1238 (0.0011) [2024-10-08 03:57:22,936][04317] Fps is (10 sec: 20480.3, 60 sec: 19735.3, 300 sec: 19735.3). Total num frames: 5091328. Throughput: 0: 5116.0. Samples: 272114. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-10-08 03:57:22,938][04317] Avg episode reward: [(0, '23.406')] [2024-10-08 03:57:23,909][11572] Updated weights for policy 0, policy_version 1248 (0.0011) [2024-10-08 03:57:25,870][11572] Updated weights for policy 0, policy_version 1258 (0.0012) [2024-10-08 03:57:27,831][11572] Updated weights for policy 0, policy_version 1268 (0.0011) [2024-10-08 03:57:27,936][04317] Fps is (10 sec: 20889.7, 60 sec: 19797.3, 300 sec: 19797.3). Total num frames: 5193728. Throughput: 0: 5131.5. Samples: 287742. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-10-08 03:57:27,939][04317] Avg episode reward: [(0, '22.520')] [2024-10-08 03:57:29,825][11572] Updated weights for policy 0, policy_version 1278 (0.0012) [2024-10-08 03:57:31,902][11572] Updated weights for policy 0, policy_version 1288 (0.0011) [2024-10-08 03:57:32,936][04317] Fps is (10 sec: 20480.2, 60 sec: 20411.8, 300 sec: 19849.9). Total num frames: 5296128. Throughput: 0: 5118.4. Samples: 318214. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-10-08 03:57:32,939][04317] Avg episode reward: [(0, '21.058')] [2024-10-08 03:57:33,913][11572] Updated weights for policy 0, policy_version 1298 (0.0011) [2024-10-08 03:57:35,855][11572] Updated weights for policy 0, policy_version 1308 (0.0011) [2024-10-08 03:57:37,819][11572] Updated weights for policy 0, policy_version 1318 (0.0012) [2024-10-08 03:57:37,936][04317] Fps is (10 sec: 20480.0, 60 sec: 20480.0, 300 sec: 19894.9). Total num frames: 5398528. Throughput: 0: 5151.2. Samples: 349420. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-10-08 03:57:37,938][04317] Avg episode reward: [(0, '23.475')] [2024-10-08 03:57:39,779][11572] Updated weights for policy 0, policy_version 1328 (0.0011) [2024-10-08 03:57:41,763][11572] Updated weights for policy 0, policy_version 1338 (0.0011) [2024-10-08 03:57:42,936][04317] Fps is (10 sec: 20479.8, 60 sec: 20480.0, 300 sec: 19933.9). Total num frames: 5500928. Throughput: 0: 5150.7. Samples: 365058. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-10-08 03:57:42,938][04317] Avg episode reward: [(0, '21.367')] [2024-10-08 03:57:43,842][11572] Updated weights for policy 0, policy_version 1348 (0.0012) [2024-10-08 03:57:45,945][11572] Updated weights for policy 0, policy_version 1358 (0.0012) [2024-10-08 03:57:47,936][04317] Fps is (10 sec: 20070.3, 60 sec: 20480.0, 300 sec: 19916.8). Total num frames: 5599232. Throughput: 0: 5121.6. Samples: 394898. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-10-08 03:57:47,938][04317] Avg episode reward: [(0, '21.273')] [2024-10-08 03:57:47,955][11572] Updated weights for policy 0, policy_version 1368 (0.0011) [2024-10-08 03:57:49,898][11572] Updated weights for policy 0, policy_version 1378 (0.0011) [2024-10-08 03:57:51,853][11572] Updated weights for policy 0, policy_version 1388 (0.0012) [2024-10-08 03:57:52,936][04317] Fps is (10 sec: 20480.1, 60 sec: 20548.3, 300 sec: 19998.1). Total num frames: 5705728. Throughput: 0: 5137.0. Samples: 426202. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-10-08 03:57:52,939][04317] Avg episode reward: [(0, '22.656')] [2024-10-08 03:57:53,822][11572] Updated weights for policy 0, policy_version 1398 (0.0012) [2024-10-08 03:57:55,755][11572] Updated weights for policy 0, policy_version 1408 (0.0011) [2024-10-08 03:57:57,738][11572] Updated weights for policy 0, policy_version 1418 (0.0012) [2024-10-08 03:57:57,936][04317] Fps is (10 sec: 21299.4, 60 sec: 20616.5, 300 sec: 20070.4). Total num frames: 5812224. Throughput: 0: 5141.0. Samples: 441992. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-10-08 03:57:57,938][04317] Avg episode reward: [(0, '25.252')] [2024-10-08 03:57:57,946][11559] Saving new best policy, reward=25.252! [2024-10-08 03:57:59,789][11572] Updated weights for policy 0, policy_version 1428 (0.0012) [2024-10-08 03:58:01,804][11572] Updated weights for policy 0, policy_version 1438 (0.0011) [2024-10-08 03:58:02,936][04317] Fps is (10 sec: 20479.9, 60 sec: 20548.3, 300 sec: 20048.8). Total num frames: 5910528. Throughput: 0: 5130.9. Samples: 472182. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-08 03:58:02,938][04317] Avg episode reward: [(0, '23.229')] [2024-10-08 03:58:03,788][11572] Updated weights for policy 0, policy_version 1448 (0.0011) [2024-10-08 03:58:05,745][11572] Updated weights for policy 0, policy_version 1458 (0.0011) [2024-10-08 03:58:07,696][11572] Updated weights for policy 0, policy_version 1468 (0.0011) [2024-10-08 03:58:07,936][04317] Fps is (10 sec: 20480.0, 60 sec: 20616.5, 300 sec: 20111.4). Total num frames: 6017024. Throughput: 0: 5141.4. Samples: 503478. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-10-08 03:58:07,939][04317] Avg episode reward: [(0, '23.110')] [2024-10-08 03:58:09,655][11572] Updated weights for policy 0, policy_version 1478 (0.0011) [2024-10-08 03:58:11,644][11572] Updated weights for policy 0, policy_version 1488 (0.0011) [2024-10-08 03:58:12,937][04317] Fps is (10 sec: 20889.0, 60 sec: 20548.2, 300 sec: 20128.9). Total num frames: 6119424. Throughput: 0: 5144.6. Samples: 519252. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-10-08 03:58:12,939][04317] Avg episode reward: [(0, '24.684')] [2024-10-08 03:58:13,756][11572] Updated weights for policy 0, policy_version 1498 (0.0011) [2024-10-08 03:58:15,764][11572] Updated weights for policy 0, policy_version 1508 (0.0012) [2024-10-08 03:58:17,722][11572] Updated weights for policy 0, policy_version 1518 (0.0011) [2024-10-08 03:58:17,936][04317] Fps is (10 sec: 20480.1, 60 sec: 20616.6, 300 sec: 20144.9). Total num frames: 6221824. Throughput: 0: 5140.4. Samples: 549534. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-10-08 03:58:17,938][04317] Avg episode reward: [(0, '21.926')] [2024-10-08 03:58:17,946][11559] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001519_6221824.pth... [2024-10-08 03:58:18,015][11559] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000527_2158592.pth [2024-10-08 03:58:19,708][11572] Updated weights for policy 0, policy_version 1528 (0.0011) [2024-10-08 03:58:21,672][11572] Updated weights for policy 0, policy_version 1538 (0.0011) [2024-10-08 03:58:22,936][04317] Fps is (10 sec: 20480.6, 60 sec: 20548.2, 300 sec: 20159.4). Total num frames: 6324224. Throughput: 0: 5138.9. Samples: 580672. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-10-08 03:58:22,938][04317] Avg episode reward: [(0, '23.383')] [2024-10-08 03:58:23,642][11572] Updated weights for policy 0, policy_version 1548 (0.0011) [2024-10-08 03:58:25,603][11572] Updated weights for policy 0, policy_version 1558 (0.0011) [2024-10-08 03:58:27,682][11572] Updated weights for policy 0, policy_version 1568 (0.0012) [2024-10-08 03:58:27,936][04317] Fps is (10 sec: 20479.9, 60 sec: 20548.3, 300 sec: 20172.8). Total num frames: 6426624. Throughput: 0: 5136.9. Samples: 596220. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-10-08 03:58:27,938][04317] Avg episode reward: [(0, '21.375')] [2024-10-08 03:58:29,699][11572] Updated weights for policy 0, policy_version 1578 (0.0011) [2024-10-08 03:58:31,644][11572] Updated weights for policy 0, policy_version 1588 (0.0011) [2024-10-08 03:58:32,936][04317] Fps is (10 sec: 20480.0, 60 sec: 20548.2, 300 sec: 20185.1). Total num frames: 6529024. Throughput: 0: 5151.0. Samples: 626694. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-10-08 03:58:32,938][04317] Avg episode reward: [(0, '22.632')] [2024-10-08 03:58:33,614][11572] Updated weights for policy 0, policy_version 1598 (0.0012) [2024-10-08 03:58:35,577][11572] Updated weights for policy 0, policy_version 1608 (0.0011) [2024-10-08 03:58:37,531][11572] Updated weights for policy 0, policy_version 1618 (0.0011) [2024-10-08 03:58:37,936][04317] Fps is (10 sec: 20889.8, 60 sec: 20616.6, 300 sec: 20227.9). Total num frames: 6635520. Throughput: 0: 5153.1. Samples: 658090. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-10-08 03:58:37,939][04317] Avg episode reward: [(0, '21.081')] [2024-10-08 03:58:39,476][11572] Updated weights for policy 0, policy_version 1628 (0.0011) [2024-10-08 03:58:41,565][11572] Updated weights for policy 0, policy_version 1638 (0.0012) [2024-10-08 03:58:42,936][04317] Fps is (10 sec: 20480.0, 60 sec: 20548.3, 300 sec: 20206.9). Total num frames: 6733824. Throughput: 0: 5143.7. Samples: 673460. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-10-08 03:58:42,939][04317] Avg episode reward: [(0, '22.563')] [2024-10-08 03:58:43,616][11572] Updated weights for policy 0, policy_version 1648 (0.0012) [2024-10-08 03:58:45,577][11572] Updated weights for policy 0, policy_version 1658 (0.0011) [2024-10-08 03:58:47,531][11572] Updated weights for policy 0, policy_version 1668 (0.0012) [2024-10-08 03:58:47,937][04317] Fps is (10 sec: 20478.8, 60 sec: 20684.6, 300 sec: 20245.9). Total num frames: 6840320. Throughput: 0: 5150.1. Samples: 703940. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-10-08 03:58:47,941][04317] Avg episode reward: [(0, '24.330')] [2024-10-08 03:58:49,526][11572] Updated weights for policy 0, policy_version 1678 (0.0012) [2024-10-08 03:58:51,483][11572] Updated weights for policy 0, policy_version 1688 (0.0011) [2024-10-08 03:58:52,936][04317] Fps is (10 sec: 20889.6, 60 sec: 20616.5, 300 sec: 20254.0). Total num frames: 6942720. Throughput: 0: 5147.2. Samples: 735100. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-08 03:58:52,938][04317] Avg episode reward: [(0, '24.165')] [2024-10-08 03:58:53,440][11572] Updated weights for policy 0, policy_version 1698 (0.0012) [2024-10-08 03:58:55,510][11572] Updated weights for policy 0, policy_version 1708 (0.0012) [2024-10-08 03:58:57,601][11572] Updated weights for policy 0, policy_version 1718 (0.0012) [2024-10-08 03:58:57,936][04317] Fps is (10 sec: 20071.4, 60 sec: 20480.0, 300 sec: 20234.2). Total num frames: 7041024. Throughput: 0: 5129.5. Samples: 750078. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-08 03:58:57,938][04317] Avg episode reward: [(0, '23.100')] [2024-10-08 03:58:59,590][11572] Updated weights for policy 0, policy_version 1728 (0.0011) [2024-10-08 03:59:01,542][11572] Updated weights for policy 0, policy_version 1738 (0.0011) [2024-10-08 03:59:02,936][04317] Fps is (10 sec: 20480.1, 60 sec: 20616.6, 300 sec: 20268.6). Total num frames: 7147520. Throughput: 0: 5141.2. Samples: 780888. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-10-08 03:59:02,939][04317] Avg episode reward: [(0, '23.079')] [2024-10-08 03:59:03,476][11572] Updated weights for policy 0, policy_version 1748 (0.0012) [2024-10-08 03:59:05,456][11572] Updated weights for policy 0, policy_version 1758 (0.0011) [2024-10-08 03:59:07,418][11572] Updated weights for policy 0, policy_version 1768 (0.0011) [2024-10-08 03:59:07,936][04317] Fps is (10 sec: 20889.4, 60 sec: 20548.2, 300 sec: 20275.2). Total num frames: 7249920. Throughput: 0: 5146.9. Samples: 812284. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-10-08 03:59:07,939][04317] Avg episode reward: [(0, '23.355')] [2024-10-08 03:59:09,442][11572] Updated weights for policy 0, policy_version 1778 (0.0011) [2024-10-08 03:59:11,547][11572] Updated weights for policy 0, policy_version 1788 (0.0011) [2024-10-08 03:59:12,936][04317] Fps is (10 sec: 20479.9, 60 sec: 20548.4, 300 sec: 20281.4). Total num frames: 7352320. Throughput: 0: 5129.9. Samples: 827066. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-10-08 03:59:12,939][04317] Avg episode reward: [(0, '24.926')] [2024-10-08 03:59:13,510][11572] Updated weights for policy 0, policy_version 1798 (0.0011) [2024-10-08 03:59:15,471][11572] Updated weights for policy 0, policy_version 1808 (0.0012) [2024-10-08 03:59:17,450][11572] Updated weights for policy 0, policy_version 1818 (0.0012) [2024-10-08 03:59:17,936][04317] Fps is (10 sec: 20480.3, 60 sec: 20548.3, 300 sec: 20287.3). Total num frames: 7454720. Throughput: 0: 5142.8. Samples: 858120. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-10-08 03:59:17,938][04317] Avg episode reward: [(0, '24.797')] [2024-10-08 03:59:19,427][11572] Updated weights for policy 0, policy_version 1828 (0.0012) [2024-10-08 03:59:21,399][11572] Updated weights for policy 0, policy_version 1838 (0.0012) [2024-10-08 03:59:22,936][04317] Fps is (10 sec: 20479.9, 60 sec: 20548.3, 300 sec: 20292.8). Total num frames: 7557120. Throughput: 0: 5134.9. Samples: 889160. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-10-08 03:59:22,938][04317] Avg episode reward: [(0, '22.900')] [2024-10-08 03:59:23,422][11572] Updated weights for policy 0, policy_version 1848 (0.0012) [2024-10-08 03:59:25,493][11572] Updated weights for policy 0, policy_version 1858 (0.0011) [2024-10-08 03:59:27,504][11572] Updated weights for policy 0, policy_version 1868 (0.0011) [2024-10-08 03:59:27,936][04317] Fps is (10 sec: 20480.1, 60 sec: 20548.3, 300 sec: 20298.0). Total num frames: 7659520. Throughput: 0: 5121.8. Samples: 903942. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-10-08 03:59:27,939][04317] Avg episode reward: [(0, '22.327')] [2024-10-08 03:59:29,471][11572] Updated weights for policy 0, policy_version 1878 (0.0011) [2024-10-08 03:59:31,445][11572] Updated weights for policy 0, policy_version 1888 (0.0011) [2024-10-08 03:59:32,936][04317] Fps is (10 sec: 20480.1, 60 sec: 20548.3, 300 sec: 20302.9). Total num frames: 7761920. Throughput: 0: 5134.1. Samples: 934970. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-10-08 03:59:32,938][04317] Avg episode reward: [(0, '24.365')] [2024-10-08 03:59:33,424][11572] Updated weights for policy 0, policy_version 1898 (0.0012) [2024-10-08 03:59:35,432][11572] Updated weights for policy 0, policy_version 1908 (0.0011) [2024-10-08 03:59:37,434][11572] Updated weights for policy 0, policy_version 1918 (0.0011) [2024-10-08 03:59:37,936][04317] Fps is (10 sec: 20479.8, 60 sec: 20480.0, 300 sec: 20307.5). Total num frames: 7864320. Throughput: 0: 5121.7. Samples: 965578. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-10-08 03:59:37,938][04317] Avg episode reward: [(0, '24.042')] [2024-10-08 03:59:39,534][11572] Updated weights for policy 0, policy_version 1928 (0.0011) [2024-10-08 03:59:41,544][11572] Updated weights for policy 0, policy_version 1938 (0.0011) [2024-10-08 03:59:42,936][04317] Fps is (10 sec: 20480.1, 60 sec: 20548.3, 300 sec: 20312.0). Total num frames: 7966720. Throughput: 0: 5120.1. Samples: 980480. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-10-08 03:59:42,939][04317] Avg episode reward: [(0, '25.167')] [2024-10-08 03:59:43,521][11572] Updated weights for policy 0, policy_version 1948 (0.0012) [2024-10-08 03:59:45,534][11572] Updated weights for policy 0, policy_version 1958 (0.0011) [2024-10-08 03:59:47,477][11572] Updated weights for policy 0, policy_version 1968 (0.0012) [2024-10-08 03:59:47,936][04317] Fps is (10 sec: 20480.1, 60 sec: 20480.2, 300 sec: 20316.2). Total num frames: 8069120. Throughput: 0: 5125.1. Samples: 1011516. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-10-08 03:59:47,938][04317] Avg episode reward: [(0, '21.983')] [2024-10-08 03:59:49,482][11572] Updated weights for policy 0, policy_version 1978 (0.0011) [2024-10-08 03:59:51,516][11572] Updated weights for policy 0, policy_version 1988 (0.0011) [2024-10-08 03:59:52,936][04317] Fps is (10 sec: 20070.3, 60 sec: 20411.7, 300 sec: 20300.2). Total num frames: 8167424. Throughput: 0: 5101.8. Samples: 1041864. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-10-08 03:59:52,939][04317] Avg episode reward: [(0, '25.703')] [2024-10-08 03:59:52,941][11559] Saving new best policy, reward=25.703! [2024-10-08 03:59:53,627][11572] Updated weights for policy 0, policy_version 1998 (0.0012) [2024-10-08 03:59:55,635][11572] Updated weights for policy 0, policy_version 2008 (0.0012) [2024-10-08 03:59:57,590][11572] Updated weights for policy 0, policy_version 2018 (0.0011) [2024-10-08 03:59:57,936][04317] Fps is (10 sec: 20070.4, 60 sec: 20480.0, 300 sec: 20304.5). Total num frames: 8269824. Throughput: 0: 5109.3. Samples: 1056984. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-10-08 03:59:57,938][04317] Avg episode reward: [(0, '24.339')] [2024-10-08 03:59:59,570][11572] Updated weights for policy 0, policy_version 2028 (0.0011) [2024-10-08 04:00:01,521][11572] Updated weights for policy 0, policy_version 2038 (0.0011) [2024-10-08 04:00:02,936][04317] Fps is (10 sec: 20889.8, 60 sec: 20480.0, 300 sec: 20327.6). Total num frames: 8376320. Throughput: 0: 5115.6. Samples: 1088324. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-10-08 04:00:02,939][04317] Avg episode reward: [(0, '23.672')] [2024-10-08 04:00:03,492][11572] Updated weights for policy 0, policy_version 2048 (0.0011) [2024-10-08 04:00:05,511][11572] Updated weights for policy 0, policy_version 2058 (0.0011) [2024-10-08 04:00:07,620][11572] Updated weights for policy 0, policy_version 2068 (0.0012) [2024-10-08 04:00:07,936][04317] Fps is (10 sec: 20480.0, 60 sec: 20411.8, 300 sec: 20312.4). Total num frames: 8474624. Throughput: 0: 5097.0. Samples: 1118526. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-10-08 04:00:07,939][04317] Avg episode reward: [(0, '22.304')] [2024-10-08 04:00:09,626][11572] Updated weights for policy 0, policy_version 2078 (0.0011) [2024-10-08 04:00:11,610][11572] Updated weights for policy 0, policy_version 2088 (0.0011) [2024-10-08 04:00:12,936][04317] Fps is (10 sec: 20070.4, 60 sec: 20411.8, 300 sec: 20316.2). Total num frames: 8577024. Throughput: 0: 5109.4. Samples: 1133864. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-10-08 04:00:12,939][04317] Avg episode reward: [(0, '26.880')] [2024-10-08 04:00:12,941][11559] Saving new best policy, reward=26.880! [2024-10-08 04:00:13,594][11572] Updated weights for policy 0, policy_version 2098 (0.0011) [2024-10-08 04:00:15,552][11572] Updated weights for policy 0, policy_version 2108 (0.0011) [2024-10-08 04:00:17,525][11572] Updated weights for policy 0, policy_version 2118 (0.0011) [2024-10-08 04:00:17,936][04317] Fps is (10 sec: 20889.5, 60 sec: 20480.0, 300 sec: 20337.5). Total num frames: 8683520. Throughput: 0: 5108.3. Samples: 1164842. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-10-08 04:00:17,939][04317] Avg episode reward: [(0, '22.555')] [2024-10-08 04:00:17,947][11559] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002120_8683520.pth... [2024-10-08 04:00:18,018][11559] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth [2024-10-08 04:00:19,564][11572] Updated weights for policy 0, policy_version 2128 (0.0012) [2024-10-08 04:00:21,654][11572] Updated weights for policy 0, policy_version 2138 (0.0012) [2024-10-08 04:00:22,936][04317] Fps is (10 sec: 20479.8, 60 sec: 20411.7, 300 sec: 20323.1). Total num frames: 8781824. Throughput: 0: 5090.2. Samples: 1194638. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-10-08 04:00:22,939][04317] Avg episode reward: [(0, '25.120')] [2024-10-08 04:00:23,672][11572] Updated weights for policy 0, policy_version 2148 (0.0011) [2024-10-08 04:00:25,652][11572] Updated weights for policy 0, policy_version 2158 (0.0011) [2024-10-08 04:00:27,604][11572] Updated weights for policy 0, policy_version 2168 (0.0011) [2024-10-08 04:00:27,936][04317] Fps is (10 sec: 20070.3, 60 sec: 20411.7, 300 sec: 20326.4). Total num frames: 8884224. Throughput: 0: 5106.4. Samples: 1210268. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-10-08 04:00:27,938][04317] Avg episode reward: [(0, '24.576')] [2024-10-08 04:00:29,594][11572] Updated weights for policy 0, policy_version 2178 (0.0012) [2024-10-08 04:00:31,552][11572] Updated weights for policy 0, policy_version 2188 (0.0011) [2024-10-08 04:00:32,936][04317] Fps is (10 sec: 20480.1, 60 sec: 20411.7, 300 sec: 20329.5). Total num frames: 8986624. Throughput: 0: 5111.4. Samples: 1241530. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-10-08 04:00:32,939][04317] Avg episode reward: [(0, '23.833')] [2024-10-08 04:00:33,582][11572] Updated weights for policy 0, policy_version 2198 (0.0011) [2024-10-08 04:00:35,694][11572] Updated weights for policy 0, policy_version 2208 (0.0012) [2024-10-08 04:00:37,746][11572] Updated weights for policy 0, policy_version 2218 (0.0011) [2024-10-08 04:00:37,936][04317] Fps is (10 sec: 20480.2, 60 sec: 20411.7, 300 sec: 20332.5). Total num frames: 9089024. Throughput: 0: 5103.4. Samples: 1271518. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-10-08 04:00:37,939][04317] Avg episode reward: [(0, '22.379')] [2024-10-08 04:00:39,725][11572] Updated weights for policy 0, policy_version 2228 (0.0012) [2024-10-08 04:00:41,675][11572] Updated weights for policy 0, policy_version 2238 (0.0011) [2024-10-08 04:00:42,936][04317] Fps is (10 sec: 20480.0, 60 sec: 20411.7, 300 sec: 20335.4). Total num frames: 9191424. Throughput: 0: 5114.3. Samples: 1287126. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-10-08 04:00:42,939][04317] Avg episode reward: [(0, '23.347')] [2024-10-08 04:00:43,623][11572] Updated weights for policy 0, policy_version 2248 (0.0011) [2024-10-08 04:00:45,580][11572] Updated weights for policy 0, policy_version 2258 (0.0011) [2024-10-08 04:00:47,590][11572] Updated weights for policy 0, policy_version 2268 (0.0012) [2024-10-08 04:00:47,936][04317] Fps is (10 sec: 20479.9, 60 sec: 20411.7, 300 sec: 20338.2). Total num frames: 9293824. Throughput: 0: 5116.3. Samples: 1318560. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-10-08 04:00:47,938][04317] Avg episode reward: [(0, '25.236')] [2024-10-08 04:00:49,700][11572] Updated weights for policy 0, policy_version 2278 (0.0012) [2024-10-08 04:00:51,695][11572] Updated weights for policy 0, policy_version 2288 (0.0011) [2024-10-08 04:00:52,936][04317] Fps is (10 sec: 20480.1, 60 sec: 20480.0, 300 sec: 20340.9). Total num frames: 9396224. Throughput: 0: 5109.5. Samples: 1348452. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-10-08 04:00:52,938][04317] Avg episode reward: [(0, '24.036')] [2024-10-08 04:00:53,643][11572] Updated weights for policy 0, policy_version 2298 (0.0011) [2024-10-08 04:00:55,599][11572] Updated weights for policy 0, policy_version 2308 (0.0011) [2024-10-08 04:00:57,567][11572] Updated weights for policy 0, policy_version 2318 (0.0012) [2024-10-08 04:00:57,936][04317] Fps is (10 sec: 20480.0, 60 sec: 20480.0, 300 sec: 20343.5). Total num frames: 9498624. Throughput: 0: 5115.9. Samples: 1364082. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-08 04:00:57,939][04317] Avg episode reward: [(0, '27.558')] [2024-10-08 04:00:57,958][11559] Saving new best policy, reward=27.558! [2024-10-08 04:00:59,558][11572] Updated weights for policy 0, policy_version 2328 (0.0012) [2024-10-08 04:01:01,556][11572] Updated weights for policy 0, policy_version 2338 (0.0012) [2024-10-08 04:01:02,936][04317] Fps is (10 sec: 20479.9, 60 sec: 20411.7, 300 sec: 20345.9). Total num frames: 9601024. Throughput: 0: 5117.3. Samples: 1395122. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-10-08 04:01:02,938][04317] Avg episode reward: [(0, '24.142')] [2024-10-08 04:01:03,643][11572] Updated weights for policy 0, policy_version 2348 (0.0011) [2024-10-08 04:01:05,713][11572] Updated weights for policy 0, policy_version 2358 (0.0012) [2024-10-08 04:01:07,669][11572] Updated weights for policy 0, policy_version 2368 (0.0011) [2024-10-08 04:01:07,936][04317] Fps is (10 sec: 20480.1, 60 sec: 20480.0, 300 sec: 20348.3). Total num frames: 9703424. Throughput: 0: 5125.6. Samples: 1425292. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-10-08 04:01:07,939][04317] Avg episode reward: [(0, '24.929')] [2024-10-08 04:01:09,647][11572] Updated weights for policy 0, policy_version 2378 (0.0012) [2024-10-08 04:01:11,609][11572] Updated weights for policy 0, policy_version 2388 (0.0011) [2024-10-08 04:01:12,936][04317] Fps is (10 sec: 20480.0, 60 sec: 20480.0, 300 sec: 20350.7). Total num frames: 9805824. Throughput: 0: 5126.0. Samples: 1440936. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-10-08 04:01:12,938][04317] Avg episode reward: [(0, '26.500')] [2024-10-08 04:01:13,542][11572] Updated weights for policy 0, policy_version 2398 (0.0011) [2024-10-08 04:01:15,512][11572] Updated weights for policy 0, policy_version 2408 (0.0011) [2024-10-08 04:01:17,620][11572] Updated weights for policy 0, policy_version 2418 (0.0011) [2024-10-08 04:01:17,936][04317] Fps is (10 sec: 20480.1, 60 sec: 20411.8, 300 sec: 20352.9). Total num frames: 9908224. Throughput: 0: 5118.2. Samples: 1471850. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-10-08 04:01:17,939][04317] Avg episode reward: [(0, '26.629')] [2024-10-08 04:01:19,724][11572] Updated weights for policy 0, policy_version 2428 (0.0011) [2024-10-08 04:01:21,697][11572] Updated weights for policy 0, policy_version 2438 (0.0011) [2024-10-08 04:01:22,936][04317] Fps is (10 sec: 20480.0, 60 sec: 20480.0, 300 sec: 20355.0). Total num frames: 10010624. Throughput: 0: 5128.2. Samples: 1502288. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-10-08 04:01:22,939][04317] Avg episode reward: [(0, '25.941')] [2024-10-08 04:01:23,639][11572] Updated weights for policy 0, policy_version 2448 (0.0011) [2024-10-08 04:01:25,617][11572] Updated weights for policy 0, policy_version 2458 (0.0011) [2024-10-08 04:01:27,569][11572] Updated weights for policy 0, policy_version 2468 (0.0011) [2024-10-08 04:01:27,937][04317] Fps is (10 sec: 20478.5, 60 sec: 20479.8, 300 sec: 20480.0). Total num frames: 10113024. Throughput: 0: 5128.7. Samples: 1517922. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-10-08 04:01:27,939][04317] Avg episode reward: [(0, '27.799')] [2024-10-08 04:01:27,948][11559] Saving new best policy, reward=27.799! [2024-10-08 04:01:29,531][11572] Updated weights for policy 0, policy_version 2478 (0.0011) [2024-10-08 04:01:31,607][11572] Updated weights for policy 0, policy_version 2488 (0.0012) [2024-10-08 04:01:32,936][04317] Fps is (10 sec: 20480.0, 60 sec: 20480.0, 300 sec: 20493.9). Total num frames: 10215424. Throughput: 0: 5110.3. Samples: 1548524. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-10-08 04:01:32,939][04317] Avg episode reward: [(0, '24.315')] [2024-10-08 04:01:33,671][11572] Updated weights for policy 0, policy_version 2498 (0.0011) [2024-10-08 04:01:35,648][11572] Updated weights for policy 0, policy_version 2508 (0.0012) [2024-10-08 04:01:37,601][11572] Updated weights for policy 0, policy_version 2518 (0.0011) [2024-10-08 04:01:37,937][04317] Fps is (10 sec: 20481.0, 60 sec: 20479.9, 300 sec: 20493.9). Total num frames: 10317824. Throughput: 0: 5130.6. Samples: 1579332. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-10-08 04:01:37,939][04317] Avg episode reward: [(0, '24.389')] [2024-10-08 04:01:39,574][11572] Updated weights for policy 0, policy_version 2528 (0.0012) [2024-10-08 04:01:41,530][11572] Updated weights for policy 0, policy_version 2538 (0.0011) [2024-10-08 04:01:42,936][04317] Fps is (10 sec: 20889.7, 60 sec: 20548.3, 300 sec: 20521.7). Total num frames: 10424320. Throughput: 0: 5131.1. Samples: 1594982. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-10-08 04:01:42,938][04317] Avg episode reward: [(0, '22.886')] [2024-10-08 04:01:43,497][11572] Updated weights for policy 0, policy_version 2548 (0.0011) [2024-10-08 04:01:45,553][11572] Updated weights for policy 0, policy_version 2558 (0.0012) [2024-10-08 04:01:47,647][11572] Updated weights for policy 0, policy_version 2568 (0.0012) [2024-10-08 04:01:47,936][04317] Fps is (10 sec: 20480.1, 60 sec: 20480.0, 300 sec: 20507.8). Total num frames: 10522624. Throughput: 0: 5114.9. Samples: 1625292. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-10-08 04:01:47,938][04317] Avg episode reward: [(0, '24.352')] [2024-10-08 04:01:49,668][11572] Updated weights for policy 0, policy_version 2578 (0.0011) [2024-10-08 04:01:51,632][11572] Updated weights for policy 0, policy_version 2588 (0.0012) [2024-10-08 04:01:52,936][04317] Fps is (10 sec: 20070.4, 60 sec: 20480.0, 300 sec: 20507.8). Total num frames: 10625024. Throughput: 0: 5127.6. Samples: 1656032. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-10-08 04:01:52,938][04317] Avg episode reward: [(0, '26.593')] [2024-10-08 04:01:53,611][11572] Updated weights for policy 0, policy_version 2598 (0.0011) [2024-10-08 04:01:55,585][11572] Updated weights for policy 0, policy_version 2608 (0.0011) [2024-10-08 04:01:57,539][11572] Updated weights for policy 0, policy_version 2618 (0.0011) [2024-10-08 04:01:57,936][04317] Fps is (10 sec: 20889.8, 60 sec: 20548.3, 300 sec: 20521.7). Total num frames: 10731520. Throughput: 0: 5127.3. Samples: 1671666. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-10-08 04:01:57,939][04317] Avg episode reward: [(0, '26.743')] [2024-10-08 04:01:59,616][11572] Updated weights for policy 0, policy_version 2628 (0.0011) [2024-10-08 04:02:01,695][11572] Updated weights for policy 0, policy_version 2638 (0.0011) [2024-10-08 04:02:02,936][04317] Fps is (10 sec: 20480.1, 60 sec: 20480.0, 300 sec: 20507.8). Total num frames: 10829824. Throughput: 0: 5109.9. Samples: 1701796. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-10-08 04:02:02,939][04317] Avg episode reward: [(0, '24.577')] [2024-10-08 04:02:03,664][11572] Updated weights for policy 0, policy_version 2648 (0.0011) [2024-10-08 04:02:05,605][11572] Updated weights for policy 0, policy_version 2658 (0.0011) [2024-10-08 04:02:07,574][11572] Updated weights for policy 0, policy_version 2668 (0.0011) [2024-10-08 04:02:07,936][04317] Fps is (10 sec: 20070.5, 60 sec: 20480.0, 300 sec: 20493.9). Total num frames: 10932224. Throughput: 0: 5132.2. Samples: 1733238. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-10-08 04:02:07,940][04317] Avg episode reward: [(0, '24.281')] [2024-10-08 04:02:09,504][11572] Updated weights for policy 0, policy_version 2678 (0.0011) [2024-10-08 04:02:11,463][11572] Updated weights for policy 0, policy_version 2688 (0.0012) [2024-10-08 04:02:12,936][04317] Fps is (10 sec: 20889.5, 60 sec: 20548.3, 300 sec: 20521.7). Total num frames: 11038720. Throughput: 0: 5133.7. Samples: 1748934. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-10-08 04:02:12,939][04317] Avg episode reward: [(0, '23.592')] [2024-10-08 04:02:13,476][11572] Updated weights for policy 0, policy_version 2698 (0.0012) [2024-10-08 04:02:15,564][11572] Updated weights for policy 0, policy_version 2708 (0.0012) [2024-10-08 04:02:17,540][11572] Updated weights for policy 0, policy_version 2718 (0.0011) [2024-10-08 04:02:17,937][04317] Fps is (10 sec: 20479.6, 60 sec: 20479.9, 300 sec: 20493.9). Total num frames: 11137024. Throughput: 0: 5124.2. Samples: 1779114. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-10-08 04:02:17,939][04317] Avg episode reward: [(0, '26.850')] [2024-10-08 04:02:17,956][11559] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002720_11141120.pth... [2024-10-08 04:02:18,026][11559] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001519_6221824.pth [2024-10-08 04:02:19,538][11572] Updated weights for policy 0, policy_version 2728 (0.0011) [2024-10-08 04:02:21,496][11572] Updated weights for policy 0, policy_version 2738 (0.0010) [2024-10-08 04:02:22,936][04317] Fps is (10 sec: 20480.0, 60 sec: 20548.3, 300 sec: 20507.8). Total num frames: 11243520. Throughput: 0: 5131.4. Samples: 1810242. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-10-08 04:02:22,939][04317] Avg episode reward: [(0, '23.670')] [2024-10-08 04:02:23,453][11572] Updated weights for policy 0, policy_version 2748 (0.0012) [2024-10-08 04:02:25,434][11572] Updated weights for policy 0, policy_version 2758 (0.0011) [2024-10-08 04:02:27,458][11572] Updated weights for policy 0, policy_version 2768 (0.0011) [2024-10-08 04:02:27,936][04317] Fps is (10 sec: 20889.7, 60 sec: 20548.5, 300 sec: 20507.8). Total num frames: 11345920. Throughput: 0: 5130.5. Samples: 1825854. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-10-08 04:02:27,939][04317] Avg episode reward: [(0, '26.611')] [2024-10-08 04:02:29,549][11572] Updated weights for policy 0, policy_version 2778 (0.0011) [2024-10-08 04:02:31,561][11572] Updated weights for policy 0, policy_version 2788 (0.0011) [2024-10-08 04:02:32,943][04317] Fps is (10 sec: 20466.7, 60 sec: 20546.1, 300 sec: 20507.3). Total num frames: 11448320. Throughput: 0: 5125.5. Samples: 1855974. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-10-08 04:02:32,947][04317] Avg episode reward: [(0, '26.837')] [2024-10-08 04:02:33,534][11572] Updated weights for policy 0, policy_version 2798 (0.0011) [2024-10-08 04:02:35,514][11572] Updated weights for policy 0, policy_version 2808 (0.0011) [2024-10-08 04:02:37,466][11572] Updated weights for policy 0, policy_version 2818 (0.0011) [2024-10-08 04:02:37,936][04317] Fps is (10 sec: 20480.2, 60 sec: 20548.3, 300 sec: 20507.8). Total num frames: 11550720. Throughput: 0: 5137.2. Samples: 1887204. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-10-08 04:02:37,938][04317] Avg episode reward: [(0, '25.663')] [2024-10-08 04:02:39,443][11572] Updated weights for policy 0, policy_version 2828 (0.0011) [2024-10-08 04:02:41,492][11572] Updated weights for policy 0, policy_version 2838 (0.0011) [2024-10-08 04:02:42,936][04317] Fps is (10 sec: 20083.2, 60 sec: 20411.7, 300 sec: 20507.8). Total num frames: 11649024. Throughput: 0: 5131.3. Samples: 1902576. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-10-08 04:02:42,939][04317] Avg episode reward: [(0, '24.921')] [2024-10-08 04:02:43,607][11572] Updated weights for policy 0, policy_version 2848 (0.0012) [2024-10-08 04:02:45,610][11572] Updated weights for policy 0, policy_version 2858 (0.0012) [2024-10-08 04:02:47,575][11572] Updated weights for policy 0, policy_version 2868 (0.0011) [2024-10-08 04:02:47,936][04317] Fps is (10 sec: 20070.4, 60 sec: 20480.0, 300 sec: 20493.9). Total num frames: 11751424. Throughput: 0: 5131.0. Samples: 1932692. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-10-08 04:02:47,939][04317] Avg episode reward: [(0, '25.724')] [2024-10-08 04:02:49,580][11572] Updated weights for policy 0, policy_version 2878 (0.0011) [2024-10-08 04:02:51,543][11572] Updated weights for policy 0, policy_version 2888 (0.0011) [2024-10-08 04:02:52,936][04317] Fps is (10 sec: 20889.7, 60 sec: 20548.3, 300 sec: 20493.9). Total num frames: 11857920. Throughput: 0: 5121.6. Samples: 1963710. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-10-08 04:02:52,938][04317] Avg episode reward: [(0, '26.737')] [2024-10-08 04:02:53,498][11572] Updated weights for policy 0, policy_version 2898 (0.0011) [2024-10-08 04:02:55,504][11572] Updated weights for policy 0, policy_version 2908 (0.0011) [2024-10-08 04:02:57,596][11572] Updated weights for policy 0, policy_version 2918 (0.0011) [2024-10-08 04:02:57,936][04317] Fps is (10 sec: 20480.1, 60 sec: 20411.7, 300 sec: 20493.9). Total num frames: 11956224. Throughput: 0: 5110.2. Samples: 1978892. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-08 04:02:57,938][04317] Avg episode reward: [(0, '26.981')] [2024-10-08 04:02:59,600][11572] Updated weights for policy 0, policy_version 2928 (0.0012) [2024-10-08 04:03:00,187][11559] Stopping Batcher_0... [2024-10-08 04:03:00,188][11559] Loop batcher_evt_loop terminating... [2024-10-08 04:03:00,188][11559] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002931_12005376.pth... [2024-10-08 04:03:00,187][04317] Component Batcher_0 stopped! [2024-10-08 04:03:00,203][11572] Weights refcount: 2 0 [2024-10-08 04:03:00,205][11572] Stopping InferenceWorker_p0-w0... [2024-10-08 04:03:00,205][11572] Loop inference_proc0-0_evt_loop terminating... [2024-10-08 04:03:00,205][04317] Component InferenceWorker_p0-w0 stopped! [2024-10-08 04:03:00,234][11576] Stopping RolloutWorker_w3... [2024-10-08 04:03:00,235][11576] Loop rollout_proc3_evt_loop terminating... [2024-10-08 04:03:00,235][11580] Stopping RolloutWorker_w7... [2024-10-08 04:03:00,235][11574] Stopping RolloutWorker_w1... [2024-10-08 04:03:00,235][11575] Stopping RolloutWorker_w2... [2024-10-08 04:03:00,235][11580] Loop rollout_proc7_evt_loop terminating... [2024-10-08 04:03:00,235][11574] Loop rollout_proc1_evt_loop terminating... [2024-10-08 04:03:00,235][11575] Loop rollout_proc2_evt_loop terminating... [2024-10-08 04:03:00,236][11573] Stopping RolloutWorker_w0... [2024-10-08 04:03:00,237][11573] Loop rollout_proc0_evt_loop terminating... [2024-10-08 04:03:00,234][04317] Component RolloutWorker_w3 stopped! [2024-10-08 04:03:00,238][11578] Stopping RolloutWorker_w5... [2024-10-08 04:03:00,239][11577] Stopping RolloutWorker_w4... [2024-10-08 04:03:00,239][11578] Loop rollout_proc5_evt_loop terminating... [2024-10-08 04:03:00,237][04317] Component RolloutWorker_w7 stopped! [2024-10-08 04:03:00,239][11577] Loop rollout_proc4_evt_loop terminating... [2024-10-08 04:03:00,239][04317] Component RolloutWorker_w1 stopped! [2024-10-08 04:03:00,240][11579] Stopping RolloutWorker_w6... [2024-10-08 04:03:00,242][11579] Loop rollout_proc6_evt_loop terminating... [2024-10-08 04:03:00,241][04317] Component RolloutWorker_w2 stopped! [2024-10-08 04:03:00,243][04317] Component RolloutWorker_w0 stopped! [2024-10-08 04:03:00,244][04317] Component RolloutWorker_w5 stopped! [2024-10-08 04:03:00,246][04317] Component RolloutWorker_w4 stopped! [2024-10-08 04:03:00,247][04317] Component RolloutWorker_w6 stopped! [2024-10-08 04:03:00,269][11559] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002120_8683520.pth [2024-10-08 04:03:00,279][11559] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002931_12005376.pth... [2024-10-08 04:03:00,402][11559] Stopping LearnerWorker_p0... [2024-10-08 04:03:00,403][11559] Loop learner_proc0_evt_loop terminating... [2024-10-08 04:03:00,402][04317] Component LearnerWorker_p0 stopped! [2024-10-08 04:03:00,406][04317] Waiting for process learner_proc0 to stop... [2024-10-08 04:03:01,041][04317] Waiting for process inference_proc0-0 to join... [2024-10-08 04:03:01,043][04317] Waiting for process rollout_proc0 to join... [2024-10-08 04:03:01,046][04317] Waiting for process rollout_proc1 to join... [2024-10-08 04:03:01,048][04317] Waiting for process rollout_proc2 to join... [2024-10-08 04:03:01,050][04317] Waiting for process rollout_proc3 to join... [2024-10-08 04:03:01,053][04317] Waiting for process rollout_proc4 to join... [2024-10-08 04:03:01,055][04317] Waiting for process rollout_proc5 to join... [2024-10-08 04:03:01,057][04317] Waiting for process rollout_proc6 to join... [2024-10-08 04:03:01,061][04317] Waiting for process rollout_proc7 to join... [2024-10-08 04:03:01,063][04317] Batcher 0 profile tree view: batching: 31.8651, releasing_batches: 0.0427 [2024-10-08 04:03:01,065][04317] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0000 wait_policy_total: 6.1122 update_model: 5.8496 weight_update: 0.0012 one_step: 0.0025 handle_policy_step: 362.9040 deserialize: 15.0768, stack: 2.3418, obs_to_device_normalize: 88.5838, forward: 166.3535, send_messages: 26.4312 prepare_outputs: 47.7377 to_cpu: 30.6777 [2024-10-08 04:03:01,066][04317] Learner 0 profile tree view: misc: 0.0094, prepare_batch: 15.7864 train: 46.5480 epoch_init: 0.0110, minibatch_init: 0.0112, losses_postprocess: 0.6159, kl_divergence: 0.8342, after_optimizer: 1.1864 calculate_losses: 15.2727 losses_init: 0.0070, forward_head: 1.8093, bptt_initial: 6.5186, tail: 1.2817, advantages_returns: 0.3389, losses: 2.1307 bptt: 2.8336 bptt_forward_core: 2.7219 update: 27.9222 clip: 2.2075 [2024-10-08 04:03:01,068][04317] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.2941, enqueue_policy_requests: 15.2562, env_step: 250.2268, overhead: 12.3881, complete_rollouts: 0.4594 save_policy_outputs: 20.7463 split_output_tensors: 7.1634 [2024-10-08 04:03:01,069][04317] RolloutWorker_w7 profile tree view: wait_for_trajectories: 0.2953, enqueue_policy_requests: 15.3496, env_step: 249.8908, overhead: 12.3257, complete_rollouts: 0.4593 save_policy_outputs: 20.5782 split_output_tensors: 7.1420 [2024-10-08 04:03:01,071][04317] Loop Runner_EvtLoop terminating... [2024-10-08 04:03:01,072][04317] Runner profile tree view: main_loop: 401.5654 [2024-10-08 04:03:01,074][04317] Collected {0: 12005376}, FPS: 19920.8 [2024-10-08 04:03:09,445][04317] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-10-08 04:03:09,446][04317] Overriding arg 'num_workers' with value 1 passed from command line [2024-10-08 04:03:09,447][04317] Adding new argument 'no_render'=True that is not in the saved config file! [2024-10-08 04:03:09,449][04317] Adding new argument 'save_video'=True that is not in the saved config file! [2024-10-08 04:03:09,450][04317] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-10-08 04:03:09,451][04317] Adding new argument 'video_name'=None that is not in the saved config file! [2024-10-08 04:03:09,452][04317] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! [2024-10-08 04:03:09,454][04317] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-10-08 04:03:09,455][04317] Adding new argument 'push_to_hub'=False that is not in the saved config file! [2024-10-08 04:03:09,456][04317] Adding new argument 'hf_repository'=None that is not in the saved config file! [2024-10-08 04:03:09,457][04317] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-10-08 04:03:09,458][04317] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-10-08 04:03:09,461][04317] Adding new argument 'train_script'=None that is not in the saved config file! [2024-10-08 04:03:09,462][04317] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-10-08 04:03:09,464][04317] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-10-08 04:03:09,470][04317] RunningMeanStd input shape: (3, 72, 128) [2024-10-08 04:03:09,472][04317] RunningMeanStd input shape: (1,) [2024-10-08 04:03:09,483][04317] ConvEncoder: input_channels=3 [2024-10-08 04:03:09,520][04317] Conv encoder output size: 512 [2024-10-08 04:03:09,521][04317] Policy head output size: 512 [2024-10-08 04:03:09,542][04317] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002931_12005376.pth... [2024-10-08 04:03:10,048][04317] Num frames 100... [2024-10-08 04:03:10,171][04317] Num frames 200... [2024-10-08 04:03:10,290][04317] Num frames 300... [2024-10-08 04:03:10,411][04317] Num frames 400... [2024-10-08 04:03:10,535][04317] Num frames 500... [2024-10-08 04:03:10,657][04317] Num frames 600... [2024-10-08 04:03:10,774][04317] Num frames 700... [2024-10-08 04:03:10,870][04317] Avg episode rewards: #0: 13.360, true rewards: #0: 7.360 [2024-10-08 04:03:10,872][04317] Avg episode reward: 13.360, avg true_objective: 7.360 [2024-10-08 04:03:10,945][04317] Num frames 800... [2024-10-08 04:03:11,059][04317] Num frames 900... [2024-10-08 04:03:11,177][04317] Num frames 1000... [2024-10-08 04:03:11,291][04317] Num frames 1100... [2024-10-08 04:03:11,406][04317] Num frames 1200... [2024-10-08 04:03:11,528][04317] Num frames 1300... [2024-10-08 04:03:11,650][04317] Num frames 1400... [2024-10-08 04:03:11,767][04317] Num frames 1500... [2024-10-08 04:03:11,880][04317] Num frames 1600... [2024-10-08 04:03:11,997][04317] Num frames 1700... [2024-10-08 04:03:12,113][04317] Num frames 1800... [2024-10-08 04:03:12,227][04317] Num frames 1900... [2024-10-08 04:03:12,343][04317] Num frames 2000... [2024-10-08 04:03:12,460][04317] Num frames 2100... [2024-10-08 04:03:12,575][04317] Num frames 2200... [2024-10-08 04:03:12,693][04317] Num frames 2300... [2024-10-08 04:03:12,789][04317] Avg episode rewards: #0: 25.180, true rewards: #0: 11.680 [2024-10-08 04:03:12,790][04317] Avg episode reward: 25.180, avg true_objective: 11.680 [2024-10-08 04:03:12,864][04317] Num frames 2400... [2024-10-08 04:03:12,978][04317] Num frames 2500... [2024-10-08 04:03:13,095][04317] Num frames 2600... [2024-10-08 04:03:13,210][04317] Num frames 2700... [2024-10-08 04:03:13,322][04317] Num frames 2800... [2024-10-08 04:03:13,438][04317] Num frames 2900... [2024-10-08 04:03:13,553][04317] Num frames 3000... [2024-10-08 04:03:13,671][04317] Num frames 3100... [2024-10-08 04:03:13,786][04317] Num frames 3200... [2024-10-08 04:03:13,901][04317] Num frames 3300... [2024-10-08 04:03:14,018][04317] Num frames 3400... [2024-10-08 04:03:14,138][04317] Num frames 3500... [2024-10-08 04:03:14,253][04317] Num frames 3600... [2024-10-08 04:03:14,368][04317] Num frames 3700... [2024-10-08 04:03:14,487][04317] Num frames 3800... [2024-10-08 04:03:14,603][04317] Num frames 3900... [2024-10-08 04:03:14,725][04317] Avg episode rewards: #0: 30.187, true rewards: #0: 13.187 [2024-10-08 04:03:14,727][04317] Avg episode reward: 30.187, avg true_objective: 13.187 [2024-10-08 04:03:14,778][04317] Num frames 4000... [2024-10-08 04:03:14,892][04317] Num frames 4100... [2024-10-08 04:03:15,006][04317] Num frames 4200... [2024-10-08 04:03:15,122][04317] Num frames 4300... [2024-10-08 04:03:15,239][04317] Num frames 4400... [2024-10-08 04:03:15,356][04317] Num frames 4500... [2024-10-08 04:03:15,471][04317] Num frames 4600... [2024-10-08 04:03:15,589][04317] Num frames 4700... [2024-10-08 04:03:15,708][04317] Num frames 4800... [2024-10-08 04:03:15,825][04317] Num frames 4900... [2024-10-08 04:03:15,944][04317] Num frames 5000... [2024-10-08 04:03:16,062][04317] Num frames 5100... [2024-10-08 04:03:16,176][04317] Num frames 5200... [2024-10-08 04:03:16,291][04317] Num frames 5300... [2024-10-08 04:03:16,418][04317] Avg episode rewards: #0: 30.410, true rewards: #0: 13.410 [2024-10-08 04:03:16,419][04317] Avg episode reward: 30.410, avg true_objective: 13.410 [2024-10-08 04:03:16,465][04317] Num frames 5400... [2024-10-08 04:03:16,578][04317] Num frames 5500... [2024-10-08 04:03:16,695][04317] Num frames 5600... [2024-10-08 04:03:16,809][04317] Num frames 5700... [2024-10-08 04:03:16,921][04317] Num frames 5800... [2024-10-08 04:03:17,037][04317] Num frames 5900... [2024-10-08 04:03:17,151][04317] Num frames 6000... [2024-10-08 04:03:17,264][04317] Num frames 6100... [2024-10-08 04:03:17,379][04317] Num frames 6200... [2024-10-08 04:03:17,493][04317] Num frames 6300... [2024-10-08 04:03:17,611][04317] Num frames 6400... [2024-10-08 04:03:17,730][04317] Num frames 6500... [2024-10-08 04:03:17,868][04317] Avg episode rewards: #0: 30.142, true rewards: #0: 13.142 [2024-10-08 04:03:17,869][04317] Avg episode reward: 30.142, avg true_objective: 13.142 [2024-10-08 04:03:17,905][04317] Num frames 6600... [2024-10-08 04:03:18,019][04317] Num frames 6700... [2024-10-08 04:03:18,134][04317] Num frames 6800... [2024-10-08 04:03:18,251][04317] Num frames 6900... [2024-10-08 04:03:18,366][04317] Num frames 7000... [2024-10-08 04:03:18,483][04317] Num frames 7100... [2024-10-08 04:03:18,600][04317] Num frames 7200... [2024-10-08 04:03:18,718][04317] Num frames 7300... [2024-10-08 04:03:18,834][04317] Num frames 7400... [2024-10-08 04:03:18,949][04317] Num frames 7500... [2024-10-08 04:03:19,068][04317] Num frames 7600... [2024-10-08 04:03:19,184][04317] Num frames 7700... [2024-10-08 04:03:19,298][04317] Num frames 7800... [2024-10-08 04:03:19,377][04317] Avg episode rewards: #0: 29.865, true rewards: #0: 13.032 [2024-10-08 04:03:19,379][04317] Avg episode reward: 29.865, avg true_objective: 13.032 [2024-10-08 04:03:19,477][04317] Num frames 7900... [2024-10-08 04:03:19,594][04317] Num frames 8000... [2024-10-08 04:03:19,752][04317] Avg episode rewards: #0: 26.411, true rewards: #0: 11.554 [2024-10-08 04:03:19,754][04317] Avg episode reward: 26.411, avg true_objective: 11.554 [2024-10-08 04:03:19,770][04317] Num frames 8100... [2024-10-08 04:03:19,884][04317] Num frames 8200... [2024-10-08 04:03:20,000][04317] Num frames 8300... [2024-10-08 04:03:20,114][04317] Num frames 8400... [2024-10-08 04:03:20,236][04317] Num frames 8500... [2024-10-08 04:03:20,352][04317] Num frames 8600... [2024-10-08 04:03:20,469][04317] Num frames 8700... [2024-10-08 04:03:20,583][04317] Num frames 8800... [2024-10-08 04:03:20,708][04317] Num frames 8900... [2024-10-08 04:03:20,824][04317] Num frames 9000... [2024-10-08 04:03:20,898][04317] Avg episode rewards: #0: 25.645, true rewards: #0: 11.270 [2024-10-08 04:03:20,900][04317] Avg episode reward: 25.645, avg true_objective: 11.270 [2024-10-08 04:03:20,999][04317] Num frames 9100... [2024-10-08 04:03:21,121][04317] Num frames 9200... [2024-10-08 04:03:21,243][04317] Num frames 9300... [2024-10-08 04:03:21,367][04317] Num frames 9400... [2024-10-08 04:03:21,484][04317] Num frames 9500... [2024-10-08 04:03:21,603][04317] Num frames 9600... [2024-10-08 04:03:21,724][04317] Num frames 9700... [2024-10-08 04:03:21,844][04317] Num frames 9800... [2024-10-08 04:03:21,962][04317] Num frames 9900... [2024-10-08 04:03:22,067][04317] Avg episode rewards: #0: 25.493, true rewards: #0: 11.049 [2024-10-08 04:03:22,069][04317] Avg episode reward: 25.493, avg true_objective: 11.049 [2024-10-08 04:03:22,137][04317] Num frames 10000... [2024-10-08 04:03:22,250][04317] Num frames 10100... [2024-10-08 04:03:22,365][04317] Num frames 10200... [2024-10-08 04:03:22,482][04317] Num frames 10300... [2024-10-08 04:03:22,597][04317] Num frames 10400... [2024-10-08 04:03:22,719][04317] Num frames 10500... [2024-10-08 04:03:22,833][04317] Num frames 10600... [2024-10-08 04:03:22,948][04317] Num frames 10700... [2024-10-08 04:03:23,065][04317] Num frames 10800... [2024-10-08 04:03:23,183][04317] Num frames 10900... [2024-10-08 04:03:23,298][04317] Num frames 11000... [2024-10-08 04:03:23,414][04317] Num frames 11100... [2024-10-08 04:03:23,530][04317] Num frames 11200... [2024-10-08 04:03:23,649][04317] Num frames 11300... [2024-10-08 04:03:23,767][04317] Num frames 11400... [2024-10-08 04:03:23,881][04317] Num frames 11500... [2024-10-08 04:03:23,997][04317] Num frames 11600... [2024-10-08 04:03:24,112][04317] Num frames 11700... [2024-10-08 04:03:24,229][04317] Num frames 11800... [2024-10-08 04:03:24,348][04317] Num frames 11900... [2024-10-08 04:03:24,476][04317] Avg episode rewards: #0: 28.460, true rewards: #0: 11.960 [2024-10-08 04:03:24,477][04317] Avg episode reward: 28.460, avg true_objective: 11.960 [2024-10-08 04:03:52,768][04317] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2024-10-08 04:04:55,645][04317] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-10-08 04:04:55,646][04317] Overriding arg 'num_workers' with value 1 passed from command line [2024-10-08 04:04:55,647][04317] Adding new argument 'no_render'=True that is not in the saved config file! [2024-10-08 04:04:55,649][04317] Adding new argument 'save_video'=True that is not in the saved config file! [2024-10-08 04:04:55,650][04317] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-10-08 04:04:55,652][04317] Adding new argument 'video_name'=None that is not in the saved config file! [2024-10-08 04:04:55,653][04317] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2024-10-08 04:04:55,654][04317] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-10-08 04:04:55,655][04317] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2024-10-08 04:04:55,657][04317] Adding new argument 'hf_repository'='EntropicLettuce/rl_course_vizdoom_health_gathering_supreme_b' that is not in the saved config file! [2024-10-08 04:04:55,659][04317] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-10-08 04:04:55,660][04317] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-10-08 04:04:55,661][04317] Adding new argument 'train_script'=None that is not in the saved config file! [2024-10-08 04:04:55,663][04317] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-10-08 04:04:55,664][04317] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-10-08 04:04:55,672][04317] RunningMeanStd input shape: (3, 72, 128) [2024-10-08 04:04:55,674][04317] RunningMeanStd input shape: (1,) [2024-10-08 04:04:55,685][04317] ConvEncoder: input_channels=3 [2024-10-08 04:04:55,722][04317] Conv encoder output size: 512 [2024-10-08 04:04:55,724][04317] Policy head output size: 512 [2024-10-08 04:04:55,744][04317] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002931_12005376.pth... [2024-10-08 04:04:56,235][04317] Num frames 100... [2024-10-08 04:04:56,350][04317] Num frames 200... [2024-10-08 04:04:56,469][04317] Num frames 300... [2024-10-08 04:04:56,592][04317] Num frames 400... [2024-10-08 04:04:56,710][04317] Num frames 500... [2024-10-08 04:04:56,821][04317] Num frames 600... [2024-10-08 04:04:56,932][04317] Num frames 700... [2024-10-08 04:04:57,028][04317] Avg episode rewards: #0: 14.370, true rewards: #0: 7.370 [2024-10-08 04:04:57,030][04317] Avg episode reward: 14.370, avg true_objective: 7.370 [2024-10-08 04:04:57,104][04317] Num frames 800... [2024-10-08 04:04:57,218][04317] Num frames 900... [2024-10-08 04:04:57,333][04317] Num frames 1000... [2024-10-08 04:04:57,447][04317] Num frames 1100... [2024-10-08 04:04:57,560][04317] Num frames 1200... [2024-10-08 04:04:57,675][04317] Num frames 1300... [2024-10-08 04:04:57,788][04317] Num frames 1400... [2024-10-08 04:04:57,903][04317] Num frames 1500... [2024-10-08 04:04:58,016][04317] Num frames 1600... [2024-10-08 04:04:58,133][04317] Num frames 1700... [2024-10-08 04:04:58,258][04317] Avg episode rewards: #0: 16.305, true rewards: #0: 8.805 [2024-10-08 04:04:58,259][04317] Avg episode reward: 16.305, avg true_objective: 8.805 [2024-10-08 04:04:58,305][04317] Num frames 1800... [2024-10-08 04:04:58,421][04317] Num frames 1900... [2024-10-08 04:04:58,536][04317] Num frames 2000... [2024-10-08 04:04:58,656][04317] Num frames 2100... [2024-10-08 04:04:58,770][04317] Num frames 2200... [2024-10-08 04:04:58,881][04317] Num frames 2300... [2024-10-08 04:04:58,994][04317] Num frames 2400... [2024-10-08 04:04:59,088][04317] Avg episode rewards: #0: 14.443, true rewards: #0: 8.110 [2024-10-08 04:04:59,090][04317] Avg episode reward: 14.443, avg true_objective: 8.110 [2024-10-08 04:04:59,167][04317] Num frames 2500... [2024-10-08 04:04:59,281][04317] Num frames 2600... [2024-10-08 04:04:59,395][04317] Num frames 2700... [2024-10-08 04:04:59,509][04317] Num frames 2800... [2024-10-08 04:04:59,584][04317] Avg episode rewards: #0: 11.793, true rewards: #0: 7.042 [2024-10-08 04:04:59,585][04317] Avg episode reward: 11.793, avg true_objective: 7.042 [2024-10-08 04:04:59,684][04317] Num frames 2900... [2024-10-08 04:04:59,796][04317] Num frames 3000... [2024-10-08 04:04:59,909][04317] Num frames 3100... [2024-10-08 04:05:00,026][04317] Num frames 3200... [2024-10-08 04:05:00,102][04317] Avg episode rewards: #0: 11.436, true rewards: #0: 6.436 [2024-10-08 04:05:00,104][04317] Avg episode reward: 11.436, avg true_objective: 6.436 [2024-10-08 04:05:00,198][04317] Num frames 3300... [2024-10-08 04:05:00,315][04317] Num frames 3400... [2024-10-08 04:05:00,428][04317] Num frames 3500... [2024-10-08 04:05:00,546][04317] Num frames 3600... [2024-10-08 04:05:00,668][04317] Num frames 3700... [2024-10-08 04:05:00,784][04317] Num frames 3800... [2024-10-08 04:05:00,898][04317] Num frames 3900... [2024-10-08 04:05:01,013][04317] Num frames 4000... [2024-10-08 04:05:01,125][04317] Num frames 4100... [2024-10-08 04:05:01,239][04317] Num frames 4200... [2024-10-08 04:05:01,377][04317] Avg episode rewards: #0: 13.123, true rewards: #0: 7.123 [2024-10-08 04:05:01,378][04317] Avg episode reward: 13.123, avg true_objective: 7.123 [2024-10-08 04:05:01,410][04317] Num frames 4300... [2024-10-08 04:05:01,525][04317] Num frames 4400... [2024-10-08 04:05:01,645][04317] Num frames 4500... [2024-10-08 04:05:01,761][04317] Num frames 4600... [2024-10-08 04:05:01,880][04317] Num frames 4700... [2024-10-08 04:05:02,035][04317] Avg episode rewards: #0: 12.694, true rewards: #0: 6.837 [2024-10-08 04:05:02,037][04317] Avg episode reward: 12.694, avg true_objective: 6.837 [2024-10-08 04:05:02,055][04317] Num frames 4800... [2024-10-08 04:05:02,170][04317] Num frames 4900... [2024-10-08 04:05:02,287][04317] Num frames 5000... [2024-10-08 04:05:02,409][04317] Num frames 5100... [2024-10-08 04:05:02,528][04317] Num frames 5200... [2024-10-08 04:05:02,648][04317] Num frames 5300... [2024-10-08 04:05:02,769][04317] Num frames 5400... [2024-10-08 04:05:02,890][04317] Num frames 5500... [2024-10-08 04:05:03,011][04317] Num frames 5600... [2024-10-08 04:05:03,124][04317] Num frames 5700... [2024-10-08 04:05:03,231][04317] Avg episode rewards: #0: 13.308, true rewards: #0: 7.182 [2024-10-08 04:05:03,233][04317] Avg episode reward: 13.308, avg true_objective: 7.182 [2024-10-08 04:05:03,299][04317] Num frames 5800... [2024-10-08 04:05:03,422][04317] Num frames 5900... [2024-10-08 04:05:03,545][04317] Num frames 6000... [2024-10-08 04:05:03,667][04317] Num frames 6100... [2024-10-08 04:05:03,789][04317] Num frames 6200... [2024-10-08 04:05:03,915][04317] Num frames 6300... [2024-10-08 04:05:04,039][04317] Num frames 6400... [2024-10-08 04:05:04,164][04317] Num frames 6500... [2024-10-08 04:05:04,288][04317] Num frames 6600... [2024-10-08 04:05:04,410][04317] Num frames 6700... [2024-10-08 04:05:04,535][04317] Num frames 6800... [2024-10-08 04:05:04,654][04317] Num frames 6900... [2024-10-08 04:05:04,776][04317] Num frames 7000... [2024-10-08 04:05:04,897][04317] Num frames 7100... [2024-10-08 04:05:05,011][04317] Num frames 7200... [2024-10-08 04:05:05,135][04317] Avg episode rewards: #0: 15.622, true rewards: #0: 8.067 [2024-10-08 04:05:05,137][04317] Avg episode reward: 15.622, avg true_objective: 8.067 [2024-10-08 04:05:05,189][04317] Num frames 7300... [2024-10-08 04:05:05,309][04317] Num frames 7400... [2024-10-08 04:05:05,429][04317] Num frames 7500... [2024-10-08 04:05:05,551][04317] Num frames 7600... [2024-10-08 04:05:05,671][04317] Num frames 7700... [2024-10-08 04:05:05,788][04317] Num frames 7800... [2024-10-08 04:05:05,906][04317] Num frames 7900... [2024-10-08 04:05:06,029][04317] Num frames 8000... [2024-10-08 04:05:06,199][04317] Avg episode rewards: #0: 15.793, true rewards: #0: 8.093 [2024-10-08 04:05:06,201][04317] Avg episode reward: 15.793, avg true_objective: 8.093 [2024-10-08 04:05:25,177][04317] Replay video saved to /content/train_dir/default_experiment/replay.mp4!