[2024-10-13 23:01:23,393][00667] Saving configuration to /content/train_dir/default_experiment/config.json... [2024-10-13 23:01:23,397][00667] Rollout worker 0 uses device cpu [2024-10-13 23:01:23,398][00667] Rollout worker 1 uses device cpu [2024-10-13 23:01:23,400][00667] Rollout worker 2 uses device cpu [2024-10-13 23:01:23,401][00667] Rollout worker 3 uses device cpu [2024-10-13 23:01:23,405][00667] Rollout worker 4 uses device cpu [2024-10-13 23:01:23,407][00667] Rollout worker 5 uses device cpu [2024-10-13 23:01:23,409][00667] Rollout worker 6 uses device cpu [2024-10-13 23:01:23,410][00667] Rollout worker 7 uses device cpu [2024-10-13 23:01:23,582][00667] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-10-13 23:01:23,583][00667] InferenceWorker_p0-w0: min num requests: 2 [2024-10-13 23:01:23,621][00667] Starting all processes... [2024-10-13 23:01:23,623][00667] Starting process learner_proc0 [2024-10-13 23:01:24,354][00667] Starting all processes... [2024-10-13 23:01:24,366][00667] Starting process inference_proc0-0 [2024-10-13 23:01:24,370][00667] Starting process rollout_proc0 [2024-10-13 23:01:24,370][00667] Starting process rollout_proc1 [2024-10-13 23:01:24,370][00667] Starting process rollout_proc2 [2024-10-13 23:01:24,370][00667] Starting process rollout_proc3 [2024-10-13 23:01:24,370][00667] Starting process rollout_proc4 [2024-10-13 23:01:24,370][00667] Starting process rollout_proc5 [2024-10-13 23:01:24,370][00667] Starting process rollout_proc6 [2024-10-13 23:01:24,370][00667] Starting process rollout_proc7 [2024-10-13 23:01:41,833][03315] Worker 3 uses CPU cores [1] [2024-10-13 23:01:42,712][03318] Worker 4 uses CPU cores [0] [2024-10-13 23:01:42,714][03300] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-10-13 23:01:42,737][03300] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2024-10-13 23:01:42,753][03316] Worker 1 uses CPU cores [1] [2024-10-13 23:01:42,832][03320] Worker 5 uses CPU cores [1] [2024-10-13 23:01:42,833][03300] Num visible devices: 1 [2024-10-13 23:01:42,903][03300] Starting seed is not provided [2024-10-13 23:01:42,904][03300] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-10-13 23:01:42,905][03300] Initializing actor-critic model on device cuda:0 [2024-10-13 23:01:42,905][03300] RunningMeanStd input shape: (3, 72, 128) [2024-10-13 23:01:42,909][03300] RunningMeanStd input shape: (1,) [2024-10-13 23:01:43,007][03300] ConvEncoder: input_channels=3 [2024-10-13 23:01:43,047][03321] Worker 7 uses CPU cores [1] [2024-10-13 23:01:43,068][03314] Worker 0 uses CPU cores [0] [2024-10-13 23:01:43,158][03317] Worker 2 uses CPU cores [0] [2024-10-13 23:01:43,214][03319] Worker 6 uses CPU cores [0] [2024-10-13 23:01:43,491][03313] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-10-13 23:01:43,491][03313] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2024-10-13 23:01:43,544][03313] Num visible devices: 1 [2024-10-13 23:01:43,574][00667] Heartbeat connected on Batcher_0 [2024-10-13 23:01:43,583][00667] Heartbeat connected on InferenceWorker_p0-w0 [2024-10-13 23:01:43,593][00667] Heartbeat connected on RolloutWorker_w0 [2024-10-13 23:01:43,596][00667] Heartbeat connected on RolloutWorker_w1 [2024-10-13 23:01:43,607][00667] Heartbeat connected on RolloutWorker_w2 [2024-10-13 23:01:43,610][00667] Heartbeat connected on RolloutWorker_w3 [2024-10-13 23:01:43,611][00667] Heartbeat connected on RolloutWorker_w4 [2024-10-13 23:01:43,612][00667] Heartbeat connected on RolloutWorker_w5 [2024-10-13 23:01:43,628][00667] Heartbeat connected on RolloutWorker_w6 [2024-10-13 23:01:43,630][00667] Heartbeat connected on RolloutWorker_w7 [2024-10-13 23:01:43,768][03300] Conv encoder output size: 512 [2024-10-13 23:01:43,774][03300] Policy head output size: 512 [2024-10-13 23:01:43,914][03300] Created Actor Critic model with architecture: [2024-10-13 23:01:43,914][03300] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2024-10-13 23:01:44,423][03300] Using optimizer [2024-10-13 23:01:45,145][03300] No checkpoints found [2024-10-13 23:01:45,146][03300] Did not load from checkpoint, starting from scratch! [2024-10-13 23:01:45,146][03300] Initialized policy 0 weights for model version 0 [2024-10-13 23:01:45,151][03300] LearnerWorker_p0 finished initialization! [2024-10-13 23:01:45,152][03300] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-10-13 23:01:45,159][00667] Heartbeat connected on LearnerWorker_p0 [2024-10-13 23:01:45,361][03313] RunningMeanStd input shape: (3, 72, 128) [2024-10-13 23:01:45,362][03313] RunningMeanStd input shape: (1,) [2024-10-13 23:01:45,375][03313] ConvEncoder: input_channels=3 [2024-10-13 23:01:45,484][03313] Conv encoder output size: 512 [2024-10-13 23:01:45,484][03313] Policy head output size: 512 [2024-10-13 23:01:45,541][00667] Inference worker 0-0 is ready! [2024-10-13 23:01:45,542][00667] All inference workers are ready! Signal rollout workers to start! [2024-10-13 23:01:45,749][03320] Doom resolution: 160x120, resize resolution: (128, 72) [2024-10-13 23:01:45,751][03315] Doom resolution: 160x120, resize resolution: (128, 72) [2024-10-13 23:01:45,747][03321] Doom resolution: 160x120, resize resolution: (128, 72) [2024-10-13 23:01:45,754][03316] Doom resolution: 160x120, resize resolution: (128, 72) [2024-10-13 23:01:45,761][03319] Doom resolution: 160x120, resize resolution: (128, 72) [2024-10-13 23:01:45,758][03314] Doom resolution: 160x120, resize resolution: (128, 72) [2024-10-13 23:01:45,765][03317] Doom resolution: 160x120, resize resolution: (128, 72) [2024-10-13 23:01:45,766][03318] Doom resolution: 160x120, resize resolution: (128, 72) [2024-10-13 23:01:47,235][03319] Decorrelating experience for 0 frames... [2024-10-13 23:01:47,236][03317] Decorrelating experience for 0 frames... [2024-10-13 23:01:47,240][03314] Decorrelating experience for 0 frames... [2024-10-13 23:01:47,435][03315] Decorrelating experience for 0 frames... [2024-10-13 23:01:47,437][03321] Decorrelating experience for 0 frames... [2024-10-13 23:01:47,434][03320] Decorrelating experience for 0 frames... [2024-10-13 23:01:47,446][03316] Decorrelating experience for 0 frames... [2024-10-13 23:01:48,155][00667] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-10-13 23:01:48,463][03319] Decorrelating experience for 32 frames... [2024-10-13 23:01:48,462][03317] Decorrelating experience for 32 frames... [2024-10-13 23:01:48,481][03318] Decorrelating experience for 0 frames... [2024-10-13 23:01:48,937][03315] Decorrelating experience for 32 frames... [2024-10-13 23:01:48,946][03321] Decorrelating experience for 32 frames... [2024-10-13 23:01:48,954][03320] Decorrelating experience for 32 frames... [2024-10-13 23:01:48,958][03316] Decorrelating experience for 32 frames... [2024-10-13 23:01:50,025][03314] Decorrelating experience for 32 frames... [2024-10-13 23:01:50,043][03318] Decorrelating experience for 32 frames... [2024-10-13 23:01:50,501][03319] Decorrelating experience for 64 frames... [2024-10-13 23:01:50,566][03315] Decorrelating experience for 64 frames... [2024-10-13 23:01:50,585][03316] Decorrelating experience for 64 frames... [2024-10-13 23:01:50,593][03317] Decorrelating experience for 64 frames... [2024-10-13 23:01:50,606][03320] Decorrelating experience for 64 frames... [2024-10-13 23:01:51,512][03321] Decorrelating experience for 64 frames... [2024-10-13 23:01:51,630][03320] Decorrelating experience for 96 frames... [2024-10-13 23:01:52,047][03318] Decorrelating experience for 64 frames... [2024-10-13 23:01:52,138][03314] Decorrelating experience for 64 frames... [2024-10-13 23:01:52,195][03319] Decorrelating experience for 96 frames... [2024-10-13 23:01:52,300][03317] Decorrelating experience for 96 frames... [2024-10-13 23:01:53,155][00667] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-10-13 23:01:53,496][03316] Decorrelating experience for 96 frames... [2024-10-13 23:01:53,505][03318] Decorrelating experience for 96 frames... [2024-10-13 23:01:53,704][03314] Decorrelating experience for 96 frames... [2024-10-13 23:01:53,763][03315] Decorrelating experience for 96 frames... [2024-10-13 23:01:53,933][03321] Decorrelating experience for 96 frames... [2024-10-13 23:01:58,157][00667] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 161.4. Samples: 1614. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-10-13 23:01:58,159][00667] Avg episode reward: [(0, '1.953')] [2024-10-13 23:01:59,102][03300] Signal inference workers to stop experience collection... [2024-10-13 23:01:59,164][03313] InferenceWorker_p0-w0: stopping experience collection [2024-10-13 23:02:02,668][03300] Signal inference workers to resume experience collection... [2024-10-13 23:02:02,670][03313] InferenceWorker_p0-w0: resuming experience collection [2024-10-13 23:02:03,155][00667] Fps is (10 sec: 409.6, 60 sec: 273.1, 300 sec: 273.1). Total num frames: 4096. Throughput: 0: 148.9. Samples: 2234. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-10-13 23:02:03,157][00667] Avg episode reward: [(0, '2.293')] [2024-10-13 23:02:08,155][00667] Fps is (10 sec: 2867.8, 60 sec: 1433.6, 300 sec: 1433.6). Total num frames: 28672. Throughput: 0: 294.2. Samples: 5884. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-10-13 23:02:08,157][00667] Avg episode reward: [(0, '3.691')] [2024-10-13 23:02:11,193][03313] Updated weights for policy 0, policy_version 10 (0.0184) [2024-10-13 23:02:13,155][00667] Fps is (10 sec: 4096.0, 60 sec: 1802.2, 300 sec: 1802.2). Total num frames: 45056. Throughput: 0: 454.6. Samples: 11364. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-10-13 23:02:13,157][00667] Avg episode reward: [(0, '4.130')] [2024-10-13 23:02:18,155][00667] Fps is (10 sec: 2867.2, 60 sec: 1911.5, 300 sec: 1911.5). Total num frames: 57344. Throughput: 0: 438.5. Samples: 13154. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-10-13 23:02:18,158][00667] Avg episode reward: [(0, '4.301')] [2024-10-13 23:02:23,155][00667] Fps is (10 sec: 3276.8, 60 sec: 2223.5, 300 sec: 2223.5). Total num frames: 77824. Throughput: 0: 524.6. Samples: 18360. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-10-13 23:02:23,160][00667] Avg episode reward: [(0, '4.430')] [2024-10-13 23:02:23,979][03313] Updated weights for policy 0, policy_version 20 (0.0031) [2024-10-13 23:02:28,155][00667] Fps is (10 sec: 4096.0, 60 sec: 2457.6, 300 sec: 2457.6). Total num frames: 98304. Throughput: 0: 616.3. Samples: 24652. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-13 23:02:28,159][00667] Avg episode reward: [(0, '4.237')] [2024-10-13 23:02:33,155][00667] Fps is (10 sec: 3276.7, 60 sec: 2457.6, 300 sec: 2457.6). Total num frames: 110592. Throughput: 0: 594.4. Samples: 26746. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-13 23:02:33,160][00667] Avg episode reward: [(0, '4.328')] [2024-10-13 23:02:33,162][03300] Saving new best policy, reward=4.328! [2024-10-13 23:02:37,146][03313] Updated weights for policy 0, policy_version 30 (0.0053) [2024-10-13 23:02:38,155][00667] Fps is (10 sec: 2867.2, 60 sec: 2539.5, 300 sec: 2539.5). Total num frames: 126976. Throughput: 0: 684.7. Samples: 30812. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-10-13 23:02:38,159][00667] Avg episode reward: [(0, '4.473')] [2024-10-13 23:02:38,170][03300] Saving new best policy, reward=4.473! [2024-10-13 23:02:43,158][00667] Fps is (10 sec: 3275.7, 60 sec: 2606.4, 300 sec: 2606.4). Total num frames: 143360. Throughput: 0: 780.6. Samples: 36744. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-10-13 23:02:43,163][00667] Avg episode reward: [(0, '4.473')] [2024-10-13 23:02:48,157][00667] Fps is (10 sec: 3276.1, 60 sec: 2662.3, 300 sec: 2662.3). Total num frames: 159744. Throughput: 0: 833.8. Samples: 39756. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-13 23:02:48,163][00667] Avg episode reward: [(0, '4.498')] [2024-10-13 23:02:48,174][03300] Saving new best policy, reward=4.498! [2024-10-13 23:02:48,689][03313] Updated weights for policy 0, policy_version 40 (0.0032) [2024-10-13 23:02:53,155][00667] Fps is (10 sec: 2868.2, 60 sec: 2867.2, 300 sec: 2646.6). Total num frames: 172032. Throughput: 0: 831.0. Samples: 43278. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-13 23:02:53,157][00667] Avg episode reward: [(0, '4.451')] [2024-10-13 23:02:58,155][00667] Fps is (10 sec: 3277.5, 60 sec: 3208.7, 300 sec: 2750.2). Total num frames: 192512. Throughput: 0: 836.8. Samples: 49022. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-13 23:02:58,157][00667] Avg episode reward: [(0, '4.413')] [2024-10-13 23:03:00,123][03313] Updated weights for policy 0, policy_version 50 (0.0016) [2024-10-13 23:03:03,155][00667] Fps is (10 sec: 4096.0, 60 sec: 3481.6, 300 sec: 2839.9). Total num frames: 212992. Throughput: 0: 863.7. Samples: 52020. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-13 23:03:03,163][00667] Avg episode reward: [(0, '4.408')] [2024-10-13 23:03:08,155][00667] Fps is (10 sec: 3276.8, 60 sec: 3276.8, 300 sec: 2816.0). Total num frames: 225280. Throughput: 0: 843.5. Samples: 56318. Policy #0 lag: (min: 0.0, avg: 0.3, max: 2.0) [2024-10-13 23:03:08,157][00667] Avg episode reward: [(0, '4.576')] [2024-10-13 23:03:08,176][03300] Saving new best policy, reward=4.576! [2024-10-13 23:03:13,155][00667] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 2843.1). Total num frames: 241664. Throughput: 0: 812.0. Samples: 61192. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-10-13 23:03:13,160][00667] Avg episode reward: [(0, '4.338')] [2024-10-13 23:03:13,376][03313] Updated weights for policy 0, policy_version 60 (0.0043) [2024-10-13 23:03:18,155][00667] Fps is (10 sec: 4096.0, 60 sec: 3481.6, 300 sec: 2958.2). Total num frames: 266240. Throughput: 0: 834.3. Samples: 64290. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-10-13 23:03:18,159][00667] Avg episode reward: [(0, '4.403')] [2024-10-13 23:03:18,173][03300] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000065_266240.pth... [2024-10-13 23:03:23,155][00667] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 2931.9). Total num frames: 278528. Throughput: 0: 863.0. Samples: 69648. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-10-13 23:03:23,158][00667] Avg episode reward: [(0, '4.505')] [2024-10-13 23:03:25,842][03313] Updated weights for policy 0, policy_version 70 (0.0030) [2024-10-13 23:03:28,155][00667] Fps is (10 sec: 2457.6, 60 sec: 3208.5, 300 sec: 2908.2). Total num frames: 290816. Throughput: 0: 818.3. Samples: 73566. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-10-13 23:03:28,157][00667] Avg episode reward: [(0, '4.379')] [2024-10-13 23:03:33,162][00667] Fps is (10 sec: 3274.4, 60 sec: 3344.7, 300 sec: 2964.5). Total num frames: 311296. Throughput: 0: 815.0. Samples: 76436. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-10-13 23:03:33,164][00667] Avg episode reward: [(0, '4.220')] [2024-10-13 23:03:38,158][00667] Fps is (10 sec: 2866.3, 60 sec: 3208.4, 300 sec: 2904.4). Total num frames: 319488. Throughput: 0: 827.1. Samples: 80498. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-10-13 23:03:38,160][00667] Avg episode reward: [(0, '4.198')] [2024-10-13 23:03:40,612][03313] Updated weights for policy 0, policy_version 80 (0.0038) [2024-10-13 23:03:43,155][00667] Fps is (10 sec: 2049.5, 60 sec: 3140.5, 300 sec: 2885.0). Total num frames: 331776. Throughput: 0: 767.8. Samples: 83572. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-10-13 23:03:43,162][00667] Avg episode reward: [(0, '4.322')] [2024-10-13 23:03:48,155][00667] Fps is (10 sec: 2868.1, 60 sec: 3140.4, 300 sec: 2901.3). Total num frames: 348160. Throughput: 0: 744.4. Samples: 85518. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-13 23:03:48,157][00667] Avg episode reward: [(0, '4.372')] [2024-10-13 23:03:52,642][03313] Updated weights for policy 0, policy_version 90 (0.0038) [2024-10-13 23:03:53,155][00667] Fps is (10 sec: 3686.3, 60 sec: 3276.8, 300 sec: 2949.1). Total num frames: 368640. Throughput: 0: 788.4. Samples: 91794. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-10-13 23:03:53,160][00667] Avg episode reward: [(0, '4.461')] [2024-10-13 23:03:58,155][00667] Fps is (10 sec: 3686.4, 60 sec: 3208.5, 300 sec: 2961.7). Total num frames: 385024. Throughput: 0: 795.0. Samples: 96968. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-13 23:03:58,158][00667] Avg episode reward: [(0, '4.484')] [2024-10-13 23:04:03,155][00667] Fps is (10 sec: 2867.3, 60 sec: 3072.0, 300 sec: 2943.1). Total num frames: 397312. Throughput: 0: 768.7. Samples: 98882. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-13 23:04:03,157][00667] Avg episode reward: [(0, '4.505')] [2024-10-13 23:04:05,084][03313] Updated weights for policy 0, policy_version 100 (0.0026) [2024-10-13 23:04:08,155][00667] Fps is (10 sec: 3686.4, 60 sec: 3276.8, 300 sec: 3013.5). Total num frames: 421888. Throughput: 0: 770.1. Samples: 104304. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-10-13 23:04:08,156][00667] Avg episode reward: [(0, '4.571')] [2024-10-13 23:04:13,156][00667] Fps is (10 sec: 4095.5, 60 sec: 3276.7, 300 sec: 3022.5). Total num frames: 438272. Throughput: 0: 817.1. Samples: 110338. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-10-13 23:04:13,162][00667] Avg episode reward: [(0, '4.565')] [2024-10-13 23:04:17,098][03313] Updated weights for policy 0, policy_version 110 (0.0042) [2024-10-13 23:04:18,155][00667] Fps is (10 sec: 2867.2, 60 sec: 3072.0, 300 sec: 3003.7). Total num frames: 450560. Throughput: 0: 794.6. Samples: 112186. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-10-13 23:04:18,160][00667] Avg episode reward: [(0, '4.579')] [2024-10-13 23:04:18,172][03300] Saving new best policy, reward=4.579! [2024-10-13 23:04:23,155][00667] Fps is (10 sec: 3277.2, 60 sec: 3208.5, 300 sec: 3039.0). Total num frames: 471040. Throughput: 0: 802.9. Samples: 116628. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-10-13 23:04:23,161][00667] Avg episode reward: [(0, '4.588')] [2024-10-13 23:04:23,165][03300] Saving new best policy, reward=4.588! [2024-10-13 23:04:28,155][00667] Fps is (10 sec: 3686.4, 60 sec: 3276.8, 300 sec: 3046.4). Total num frames: 487424. Throughput: 0: 863.1. Samples: 122410. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-10-13 23:04:28,160][00667] Avg episode reward: [(0, '4.491')] [2024-10-13 23:04:28,675][03313] Updated weights for policy 0, policy_version 120 (0.0026) [2024-10-13 23:04:33,155][00667] Fps is (10 sec: 2867.2, 60 sec: 3140.6, 300 sec: 3028.6). Total num frames: 499712. Throughput: 0: 872.2. Samples: 124766. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-10-13 23:04:33,159][00667] Avg episode reward: [(0, '4.474')] [2024-10-13 23:04:38,155][00667] Fps is (10 sec: 2457.6, 60 sec: 3208.7, 300 sec: 3011.8). Total num frames: 512000. Throughput: 0: 806.5. Samples: 128088. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-10-13 23:04:38,162][00667] Avg episode reward: [(0, '4.623')] [2024-10-13 23:04:38,172][03300] Saving new best policy, reward=4.623! [2024-10-13 23:04:42,933][03313] Updated weights for policy 0, policy_version 130 (0.0029) [2024-10-13 23:04:43,157][00667] Fps is (10 sec: 3276.1, 60 sec: 3344.9, 300 sec: 3042.7). Total num frames: 532480. Throughput: 0: 810.3. Samples: 133434. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-13 23:04:43,167][00667] Avg episode reward: [(0, '4.778')] [2024-10-13 23:04:43,171][03300] Saving new best policy, reward=4.778! [2024-10-13 23:04:48,155][00667] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 3049.2). Total num frames: 548864. Throughput: 0: 828.9. Samples: 136184. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-10-13 23:04:48,159][00667] Avg episode reward: [(0, '4.767')] [2024-10-13 23:04:53,155][00667] Fps is (10 sec: 2867.8, 60 sec: 3208.5, 300 sec: 3033.2). Total num frames: 561152. Throughput: 0: 801.7. Samples: 140380. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-10-13 23:04:53,159][00667] Avg episode reward: [(0, '4.582')] [2024-10-13 23:04:56,278][03313] Updated weights for policy 0, policy_version 140 (0.0025) [2024-10-13 23:04:58,155][00667] Fps is (10 sec: 2867.1, 60 sec: 3208.5, 300 sec: 3039.7). Total num frames: 577536. Throughput: 0: 773.3. Samples: 145138. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-10-13 23:04:58,158][00667] Avg episode reward: [(0, '4.415')] [2024-10-13 23:05:03,155][00667] Fps is (10 sec: 3686.5, 60 sec: 3345.1, 300 sec: 3066.7). Total num frames: 598016. Throughput: 0: 799.3. Samples: 148156. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-13 23:05:03,157][00667] Avg episode reward: [(0, '4.479')] [2024-10-13 23:05:07,459][03313] Updated weights for policy 0, policy_version 150 (0.0039) [2024-10-13 23:05:08,155][00667] Fps is (10 sec: 3686.5, 60 sec: 3208.5, 300 sec: 3072.0). Total num frames: 614400. Throughput: 0: 819.4. Samples: 153500. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-13 23:05:08,158][00667] Avg episode reward: [(0, '4.495')] [2024-10-13 23:05:13,155][00667] Fps is (10 sec: 2867.2, 60 sec: 3140.3, 300 sec: 3057.0). Total num frames: 626688. Throughput: 0: 781.5. Samples: 157576. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-10-13 23:05:13,161][00667] Avg episode reward: [(0, '4.622')] [2024-10-13 23:05:18,155][00667] Fps is (10 sec: 3276.8, 60 sec: 3276.8, 300 sec: 3081.8). Total num frames: 647168. Throughput: 0: 797.4. Samples: 160650. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-10-13 23:05:18,159][00667] Avg episode reward: [(0, '4.467')] [2024-10-13 23:05:18,168][03300] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000158_647168.pth... [2024-10-13 23:05:19,406][03313] Updated weights for policy 0, policy_version 160 (0.0025) [2024-10-13 23:05:23,158][00667] Fps is (10 sec: 4094.5, 60 sec: 3276.6, 300 sec: 3105.3). Total num frames: 667648. Throughput: 0: 860.8. Samples: 166826. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-10-13 23:05:23,161][00667] Avg episode reward: [(0, '4.530')] [2024-10-13 23:05:28,155][00667] Fps is (10 sec: 2867.1, 60 sec: 3140.3, 300 sec: 3072.0). Total num frames: 675840. Throughput: 0: 806.4. Samples: 169722. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-10-13 23:05:28,158][00667] Avg episode reward: [(0, '4.436')] [2024-10-13 23:05:32,623][03313] Updated weights for policy 0, policy_version 170 (0.0038) [2024-10-13 23:05:33,155][00667] Fps is (10 sec: 2868.3, 60 sec: 3276.8, 300 sec: 3094.8). Total num frames: 696320. Throughput: 0: 804.5. Samples: 172388. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-10-13 23:05:33,157][00667] Avg episode reward: [(0, '4.425')] [2024-10-13 23:05:38,155][00667] Fps is (10 sec: 4096.1, 60 sec: 3413.3, 300 sec: 3116.5). Total num frames: 716800. Throughput: 0: 850.9. Samples: 178672. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-10-13 23:05:38,162][00667] Avg episode reward: [(0, '4.475')] [2024-10-13 23:05:43,155][00667] Fps is (10 sec: 3276.8, 60 sec: 3276.9, 300 sec: 3102.5). Total num frames: 729088. Throughput: 0: 845.9. Samples: 183204. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-13 23:05:43,159][00667] Avg episode reward: [(0, '4.622')] [2024-10-13 23:05:45,113][03313] Updated weights for policy 0, policy_version 180 (0.0025) [2024-10-13 23:05:48,155][00667] Fps is (10 sec: 3276.7, 60 sec: 3345.0, 300 sec: 3123.2). Total num frames: 749568. Throughput: 0: 824.8. Samples: 185272. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-13 23:05:48,162][00667] Avg episode reward: [(0, '4.694')] [2024-10-13 23:05:53,155][00667] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3126.3). Total num frames: 765952. Throughput: 0: 839.6. Samples: 191282. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-13 23:05:53,157][00667] Avg episode reward: [(0, '4.665')] [2024-10-13 23:05:58,087][03313] Updated weights for policy 0, policy_version 190 (0.0031) [2024-10-13 23:05:58,160][00667] Fps is (10 sec: 2865.8, 60 sec: 3344.8, 300 sec: 3112.9). Total num frames: 778240. Throughput: 0: 826.0. Samples: 194750. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-13 23:05:58,164][00667] Avg episode reward: [(0, '4.516')] [2024-10-13 23:06:03,155][00667] Fps is (10 sec: 2457.6, 60 sec: 3208.5, 300 sec: 3100.1). Total num frames: 790528. Throughput: 0: 796.1. Samples: 196474. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-13 23:06:03,162][00667] Avg episode reward: [(0, '4.507')] [2024-10-13 23:06:08,155][00667] Fps is (10 sec: 2458.8, 60 sec: 3140.3, 300 sec: 3087.8). Total num frames: 802816. Throughput: 0: 756.1. Samples: 200848. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-10-13 23:06:08,163][00667] Avg episode reward: [(0, '4.400')] [2024-10-13 23:06:12,158][03313] Updated weights for policy 0, policy_version 200 (0.0051) [2024-10-13 23:06:13,155][00667] Fps is (10 sec: 3276.8, 60 sec: 3276.8, 300 sec: 3106.8). Total num frames: 823296. Throughput: 0: 803.2. Samples: 205866. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-13 23:06:13,159][00667] Avg episode reward: [(0, '4.414')] [2024-10-13 23:06:18,155][00667] Fps is (10 sec: 3276.9, 60 sec: 3140.3, 300 sec: 3094.8). Total num frames: 835584. Throughput: 0: 803.8. Samples: 208558. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-10-13 23:06:18,157][00667] Avg episode reward: [(0, '4.322')] [2024-10-13 23:06:23,155][00667] Fps is (10 sec: 2867.1, 60 sec: 3072.2, 300 sec: 3098.1). Total num frames: 851968. Throughput: 0: 748.4. Samples: 212352. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-10-13 23:06:23,160][00667] Avg episode reward: [(0, '4.336')] [2024-10-13 23:06:24,612][03313] Updated weights for policy 0, policy_version 210 (0.0030) [2024-10-13 23:06:28,155][00667] Fps is (10 sec: 3686.3, 60 sec: 3276.8, 300 sec: 3115.9). Total num frames: 872448. Throughput: 0: 790.7. Samples: 218786. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-13 23:06:28,162][00667] Avg episode reward: [(0, '4.506')] [2024-10-13 23:06:33,160][00667] Fps is (10 sec: 4094.0, 60 sec: 3276.5, 300 sec: 3133.0). Total num frames: 892928. Throughput: 0: 817.9. Samples: 222082. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-10-13 23:06:33,162][00667] Avg episode reward: [(0, '4.480')] [2024-10-13 23:06:35,403][03313] Updated weights for policy 0, policy_version 220 (0.0031) [2024-10-13 23:06:38,155][00667] Fps is (10 sec: 3276.9, 60 sec: 3140.3, 300 sec: 3121.4). Total num frames: 905216. Throughput: 0: 780.8. Samples: 226418. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-10-13 23:06:38,159][00667] Avg episode reward: [(0, '4.474')] [2024-10-13 23:06:43,155][00667] Fps is (10 sec: 3278.5, 60 sec: 3276.8, 300 sec: 3138.0). Total num frames: 925696. Throughput: 0: 823.4. Samples: 231800. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-13 23:06:43,160][00667] Avg episode reward: [(0, '4.527')] [2024-10-13 23:06:46,500][03313] Updated weights for policy 0, policy_version 230 (0.0031) [2024-10-13 23:06:48,155][00667] Fps is (10 sec: 4096.0, 60 sec: 3276.8, 300 sec: 3207.4). Total num frames: 946176. Throughput: 0: 858.3. Samples: 235096. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-10-13 23:06:48,162][00667] Avg episode reward: [(0, '4.457')] [2024-10-13 23:06:53,155][00667] Fps is (10 sec: 3686.3, 60 sec: 3276.8, 300 sec: 3262.9). Total num frames: 962560. Throughput: 0: 890.2. Samples: 240908. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-10-13 23:06:53,164][00667] Avg episode reward: [(0, '4.430')] [2024-10-13 23:06:57,971][03313] Updated weights for policy 0, policy_version 240 (0.0014) [2024-10-13 23:06:58,155][00667] Fps is (10 sec: 3686.4, 60 sec: 3413.6, 300 sec: 3318.5). Total num frames: 983040. Throughput: 0: 891.8. Samples: 245998. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-13 23:06:58,160][00667] Avg episode reward: [(0, '4.429')] [2024-10-13 23:07:03,155][00667] Fps is (10 sec: 4096.1, 60 sec: 3549.9, 300 sec: 3304.6). Total num frames: 1003520. Throughput: 0: 909.4. Samples: 249482. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-10-13 23:07:03,160][00667] Avg episode reward: [(0, '4.333')] [2024-10-13 23:07:07,328][03313] Updated weights for policy 0, policy_version 250 (0.0019) [2024-10-13 23:07:08,155][00667] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3318.5). Total num frames: 1024000. Throughput: 0: 972.8. Samples: 256126. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-10-13 23:07:08,160][00667] Avg episode reward: [(0, '4.516')] [2024-10-13 23:07:13,155][00667] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3332.3). Total num frames: 1040384. Throughput: 0: 925.7. Samples: 260442. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-10-13 23:07:13,157][00667] Avg episode reward: [(0, '4.641')] [2024-10-13 23:07:18,157][00667] Fps is (10 sec: 3685.6, 60 sec: 3754.5, 300 sec: 3332.3). Total num frames: 1060864. Throughput: 0: 921.0. Samples: 263524. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-10-13 23:07:18,159][00667] Avg episode reward: [(0, '4.592')] [2024-10-13 23:07:18,171][03300] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000259_1060864.pth... [2024-10-13 23:07:18,301][03300] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000065_266240.pth [2024-10-13 23:07:18,588][03313] Updated weights for policy 0, policy_version 260 (0.0029) [2024-10-13 23:07:23,155][00667] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3346.2). Total num frames: 1085440. Throughput: 0: 975.9. Samples: 270332. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-10-13 23:07:23,157][00667] Avg episode reward: [(0, '4.537')] [2024-10-13 23:07:28,157][00667] Fps is (10 sec: 3686.4, 60 sec: 3754.6, 300 sec: 3346.2). Total num frames: 1097728. Throughput: 0: 964.4. Samples: 275200. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-10-13 23:07:28,159][00667] Avg episode reward: [(0, '4.824')] [2024-10-13 23:07:28,176][03300] Saving new best policy, reward=4.824! [2024-10-13 23:07:30,020][03313] Updated weights for policy 0, policy_version 270 (0.0057) [2024-10-13 23:07:33,155][00667] Fps is (10 sec: 3276.8, 60 sec: 3755.0, 300 sec: 3360.1). Total num frames: 1118208. Throughput: 0: 946.1. Samples: 277672. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-10-13 23:07:33,160][00667] Avg episode reward: [(0, '5.156')] [2024-10-13 23:07:33,165][03300] Saving new best policy, reward=5.156! [2024-10-13 23:07:38,155][00667] Fps is (10 sec: 4506.5, 60 sec: 3959.5, 300 sec: 3387.9). Total num frames: 1142784. Throughput: 0: 972.1. Samples: 284652. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-10-13 23:07:38,156][00667] Avg episode reward: [(0, '5.064')] [2024-10-13 23:07:38,622][03313] Updated weights for policy 0, policy_version 280 (0.0030) [2024-10-13 23:07:43,155][00667] Fps is (10 sec: 4095.9, 60 sec: 3891.2, 300 sec: 3387.9). Total num frames: 1159168. Throughput: 0: 990.2. Samples: 290556. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-10-13 23:07:43,159][00667] Avg episode reward: [(0, '5.056')] [2024-10-13 23:07:48,155][00667] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3401.8). Total num frames: 1175552. Throughput: 0: 960.0. Samples: 292682. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-13 23:07:48,161][00667] Avg episode reward: [(0, '5.189')] [2024-10-13 23:07:48,172][03300] Saving new best policy, reward=5.189! [2024-10-13 23:07:50,100][03313] Updated weights for policy 0, policy_version 290 (0.0031) [2024-10-13 23:07:53,155][00667] Fps is (10 sec: 4096.1, 60 sec: 3959.5, 300 sec: 3415.6). Total num frames: 1200128. Throughput: 0: 959.9. Samples: 299320. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-10-13 23:07:53,157][00667] Avg episode reward: [(0, '5.105')] [2024-10-13 23:07:58,155][00667] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3415.6). Total num frames: 1220608. Throughput: 0: 1016.0. Samples: 306160. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-13 23:07:58,157][00667] Avg episode reward: [(0, '5.038')] [2024-10-13 23:07:59,823][03313] Updated weights for policy 0, policy_version 300 (0.0031) [2024-10-13 23:08:03,160][00667] Fps is (10 sec: 3684.3, 60 sec: 3890.8, 300 sec: 3429.5). Total num frames: 1236992. Throughput: 0: 993.8. Samples: 308248. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-13 23:08:03,166][00667] Avg episode reward: [(0, '5.351')] [2024-10-13 23:08:03,171][03300] Saving new best policy, reward=5.351! [2024-10-13 23:08:08,155][00667] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3443.4). Total num frames: 1257472. Throughput: 0: 958.8. Samples: 313478. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-13 23:08:08,163][00667] Avg episode reward: [(0, '5.457')] [2024-10-13 23:08:08,175][03300] Saving new best policy, reward=5.457! [2024-10-13 23:08:11,132][03313] Updated weights for policy 0, policy_version 310 (0.0042) [2024-10-13 23:08:13,155][00667] Fps is (10 sec: 3688.5, 60 sec: 3891.2, 300 sec: 3415.6). Total num frames: 1273856. Throughput: 0: 973.8. Samples: 319020. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-13 23:08:13,158][00667] Avg episode reward: [(0, '5.194')] [2024-10-13 23:08:18,155][00667] Fps is (10 sec: 2867.2, 60 sec: 3754.8, 300 sec: 3415.6). Total num frames: 1286144. Throughput: 0: 962.3. Samples: 320974. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-13 23:08:18,160][00667] Avg episode reward: [(0, '5.216')] [2024-10-13 23:08:23,155][00667] Fps is (10 sec: 2867.2, 60 sec: 3618.1, 300 sec: 3429.5). Total num frames: 1302528. Throughput: 0: 898.2. Samples: 325072. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-10-13 23:08:23,157][00667] Avg episode reward: [(0, '5.173')] [2024-10-13 23:08:24,494][03313] Updated weights for policy 0, policy_version 320 (0.0015) [2024-10-13 23:08:28,155][00667] Fps is (10 sec: 4096.0, 60 sec: 3823.1, 300 sec: 3443.5). Total num frames: 1327104. Throughput: 0: 918.8. Samples: 331904. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-13 23:08:28,158][00667] Avg episode reward: [(0, '5.162')] [2024-10-13 23:08:33,156][00667] Fps is (10 sec: 4505.0, 60 sec: 3822.9, 300 sec: 3485.1). Total num frames: 1347584. Throughput: 0: 950.8. Samples: 335470. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-13 23:08:33,158][00667] Avg episode reward: [(0, '5.315')] [2024-10-13 23:08:33,712][03313] Updated weights for policy 0, policy_version 330 (0.0033) [2024-10-13 23:08:38,159][00667] Fps is (10 sec: 3684.8, 60 sec: 3686.1, 300 sec: 3498.9). Total num frames: 1363968. Throughput: 0: 916.8. Samples: 340582. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-10-13 23:08:38,162][00667] Avg episode reward: [(0, '5.285')] [2024-10-13 23:08:43,155][00667] Fps is (10 sec: 3686.9, 60 sec: 3754.7, 300 sec: 3512.8). Total num frames: 1384448. Throughput: 0: 889.7. Samples: 346196. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-10-13 23:08:43,161][00667] Avg episode reward: [(0, '5.436')] [2024-10-13 23:08:44,672][03313] Updated weights for policy 0, policy_version 340 (0.0047) [2024-10-13 23:08:48,155][00667] Fps is (10 sec: 4507.5, 60 sec: 3891.2, 300 sec: 3526.7). Total num frames: 1409024. Throughput: 0: 921.7. Samples: 349720. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-13 23:08:48,162][00667] Avg episode reward: [(0, '5.360')] [2024-10-13 23:08:53,161][00667] Fps is (10 sec: 3684.0, 60 sec: 3686.0, 300 sec: 3512.8). Total num frames: 1421312. Throughput: 0: 928.8. Samples: 355280. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-13 23:08:53,164][00667] Avg episode reward: [(0, '5.215')] [2024-10-13 23:08:57,270][03313] Updated weights for policy 0, policy_version 350 (0.0022) [2024-10-13 23:08:58,155][00667] Fps is (10 sec: 2457.6, 60 sec: 3549.9, 300 sec: 3512.8). Total num frames: 1433600. Throughput: 0: 893.9. Samples: 359246. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-10-13 23:08:58,157][00667] Avg episode reward: [(0, '5.290')] [2024-10-13 23:09:03,155][00667] Fps is (10 sec: 3278.9, 60 sec: 3618.5, 300 sec: 3499.0). Total num frames: 1454080. Throughput: 0: 916.6. Samples: 362220. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-10-13 23:09:03,162][00667] Avg episode reward: [(0, '5.446')] [2024-10-13 23:09:08,159][00667] Fps is (10 sec: 3684.8, 60 sec: 3549.6, 300 sec: 3498.9). Total num frames: 1470464. Throughput: 0: 956.2. Samples: 368104. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-10-13 23:09:08,161][00667] Avg episode reward: [(0, '5.275')] [2024-10-13 23:09:08,262][03313] Updated weights for policy 0, policy_version 360 (0.0016) [2024-10-13 23:09:13,159][00667] Fps is (10 sec: 2866.0, 60 sec: 3481.4, 300 sec: 3498.9). Total num frames: 1482752. Throughput: 0: 883.6. Samples: 371668. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-10-13 23:09:13,162][00667] Avg episode reward: [(0, '5.110')] [2024-10-13 23:09:18,155][00667] Fps is (10 sec: 3278.2, 60 sec: 3618.1, 300 sec: 3499.0). Total num frames: 1503232. Throughput: 0: 861.8. Samples: 374248. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-10-13 23:09:18,157][00667] Avg episode reward: [(0, '5.624')] [2024-10-13 23:09:18,172][03300] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000367_1503232.pth... [2024-10-13 23:09:18,308][03300] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000158_647168.pth [2024-10-13 23:09:18,328][03300] Saving new best policy, reward=5.624! [2024-10-13 23:09:20,950][03313] Updated weights for policy 0, policy_version 370 (0.0031) [2024-10-13 23:09:23,155][00667] Fps is (10 sec: 3688.0, 60 sec: 3618.1, 300 sec: 3499.0). Total num frames: 1519616. Throughput: 0: 876.4. Samples: 380014. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-13 23:09:23,161][00667] Avg episode reward: [(0, '5.872')] [2024-10-13 23:09:23,185][03300] Saving new best policy, reward=5.872! [2024-10-13 23:09:28,155][00667] Fps is (10 sec: 2867.2, 60 sec: 3413.3, 300 sec: 3499.0). Total num frames: 1531904. Throughput: 0: 844.8. Samples: 384210. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-13 23:09:28,157][00667] Avg episode reward: [(0, '6.155')] [2024-10-13 23:09:28,197][03300] Saving new best policy, reward=6.155! [2024-10-13 23:09:33,155][00667] Fps is (10 sec: 2867.2, 60 sec: 3345.1, 300 sec: 3512.8). Total num frames: 1548288. Throughput: 0: 805.2. Samples: 385952. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-13 23:09:33,160][00667] Avg episode reward: [(0, '5.800')] [2024-10-13 23:09:34,448][03313] Updated weights for policy 0, policy_version 380 (0.0031) [2024-10-13 23:09:38,155][00667] Fps is (10 sec: 3686.2, 60 sec: 3413.6, 300 sec: 3512.9). Total num frames: 1568768. Throughput: 0: 811.1. Samples: 391774. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-13 23:09:38,163][00667] Avg episode reward: [(0, '6.058')] [2024-10-13 23:09:43,158][00667] Fps is (10 sec: 3685.1, 60 sec: 3344.9, 300 sec: 3512.8). Total num frames: 1585152. Throughput: 0: 839.9. Samples: 397044. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-13 23:09:43,171][00667] Avg episode reward: [(0, '5.903')] [2024-10-13 23:09:47,267][03313] Updated weights for policy 0, policy_version 390 (0.0042) [2024-10-13 23:09:48,155][00667] Fps is (10 sec: 2867.3, 60 sec: 3140.3, 300 sec: 3512.8). Total num frames: 1597440. Throughput: 0: 812.6. Samples: 398786. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-10-13 23:09:48,158][00667] Avg episode reward: [(0, '5.907')] [2024-10-13 23:09:53,155][00667] Fps is (10 sec: 3278.0, 60 sec: 3277.1, 300 sec: 3526.7). Total num frames: 1617920. Throughput: 0: 793.9. Samples: 403828. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-10-13 23:09:53,162][00667] Avg episode reward: [(0, '5.803')] [2024-10-13 23:09:58,155][00667] Fps is (10 sec: 3686.3, 60 sec: 3345.1, 300 sec: 3512.8). Total num frames: 1634304. Throughput: 0: 844.2. Samples: 409654. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-13 23:09:58,161][00667] Avg episode reward: [(0, '5.724')] [2024-10-13 23:09:58,189][03313] Updated weights for policy 0, policy_version 400 (0.0041) [2024-10-13 23:10:03,155][00667] Fps is (10 sec: 2867.2, 60 sec: 3208.5, 300 sec: 3499.0). Total num frames: 1646592. Throughput: 0: 828.4. Samples: 411528. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-13 23:10:03,160][00667] Avg episode reward: [(0, '5.591')] [2024-10-13 23:10:08,155][00667] Fps is (10 sec: 2867.3, 60 sec: 3208.8, 300 sec: 3512.8). Total num frames: 1662976. Throughput: 0: 785.4. Samples: 415358. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-13 23:10:08,158][00667] Avg episode reward: [(0, '6.109')] [2024-10-13 23:10:11,484][03313] Updated weights for policy 0, policy_version 410 (0.0028) [2024-10-13 23:10:13,155][00667] Fps is (10 sec: 3686.3, 60 sec: 3345.3, 300 sec: 3512.8). Total num frames: 1683456. Throughput: 0: 825.0. Samples: 421334. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-10-13 23:10:13,157][00667] Avg episode reward: [(0, '6.704')] [2024-10-13 23:10:13,170][03300] Saving new best policy, reward=6.704! [2024-10-13 23:10:18,155][00667] Fps is (10 sec: 3686.4, 60 sec: 3276.8, 300 sec: 3499.0). Total num frames: 1699840. Throughput: 0: 848.4. Samples: 424130. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-10-13 23:10:18,159][00667] Avg episode reward: [(0, '6.725')] [2024-10-13 23:10:18,171][03300] Saving new best policy, reward=6.725! [2024-10-13 23:10:23,155][00667] Fps is (10 sec: 2867.3, 60 sec: 3208.5, 300 sec: 3512.8). Total num frames: 1712128. Throughput: 0: 800.9. Samples: 427816. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-13 23:10:23,156][00667] Avg episode reward: [(0, '6.489')] [2024-10-13 23:10:24,735][03313] Updated weights for policy 0, policy_version 420 (0.0035) [2024-10-13 23:10:28,155][00667] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3512.8). Total num frames: 1732608. Throughput: 0: 807.7. Samples: 433388. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-10-13 23:10:28,159][00667] Avg episode reward: [(0, '6.599')] [2024-10-13 23:10:33,155][00667] Fps is (10 sec: 2867.2, 60 sec: 3208.5, 300 sec: 3471.2). Total num frames: 1740800. Throughput: 0: 811.2. Samples: 435292. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-10-13 23:10:33,157][00667] Avg episode reward: [(0, '6.794')] [2024-10-13 23:10:33,189][03300] Saving new best policy, reward=6.794! [2024-10-13 23:10:38,155][00667] Fps is (10 sec: 2047.9, 60 sec: 3072.0, 300 sec: 3471.2). Total num frames: 1753088. Throughput: 0: 765.5. Samples: 438276. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-13 23:10:38,163][00667] Avg episode reward: [(0, '7.186')] [2024-10-13 23:10:38,173][03300] Saving new best policy, reward=7.186! [2024-10-13 23:10:40,927][03313] Updated weights for policy 0, policy_version 430 (0.0058) [2024-10-13 23:10:43,155][00667] Fps is (10 sec: 2867.2, 60 sec: 3072.2, 300 sec: 3457.3). Total num frames: 1769472. Throughput: 0: 725.8. Samples: 442316. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-10-13 23:10:43,160][00667] Avg episode reward: [(0, '7.457')] [2024-10-13 23:10:43,162][03300] Saving new best policy, reward=7.457! [2024-10-13 23:10:48,156][00667] Fps is (10 sec: 3276.7, 60 sec: 3140.2, 300 sec: 3457.3). Total num frames: 1785856. Throughput: 0: 748.9. Samples: 445228. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-10-13 23:10:48,157][00667] Avg episode reward: [(0, '7.346')] [2024-10-13 23:10:51,535][03313] Updated weights for policy 0, policy_version 440 (0.0030) [2024-10-13 23:10:53,155][00667] Fps is (10 sec: 3276.8, 60 sec: 3072.0, 300 sec: 3471.2). Total num frames: 1802240. Throughput: 0: 793.6. Samples: 451072. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-10-13 23:10:53,163][00667] Avg episode reward: [(0, '7.603')] [2024-10-13 23:10:53,165][03300] Saving new best policy, reward=7.603! [2024-10-13 23:10:58,158][00667] Fps is (10 sec: 2866.6, 60 sec: 3003.6, 300 sec: 3471.2). Total num frames: 1814528. Throughput: 0: 738.1. Samples: 454552. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-10-13 23:10:58,160][00667] Avg episode reward: [(0, '7.066')] [2024-10-13 23:11:03,155][00667] Fps is (10 sec: 3276.8, 60 sec: 3140.3, 300 sec: 3499.0). Total num frames: 1835008. Throughput: 0: 737.9. Samples: 457334. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-10-13 23:11:03,157][00667] Avg episode reward: [(0, '6.992')] [2024-10-13 23:11:04,194][03313] Updated weights for policy 0, policy_version 450 (0.0024) [2024-10-13 23:11:08,155][00667] Fps is (10 sec: 4097.2, 60 sec: 3208.5, 300 sec: 3499.0). Total num frames: 1855488. Throughput: 0: 792.3. Samples: 463468. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-10-13 23:11:08,162][00667] Avg episode reward: [(0, '6.628')] [2024-10-13 23:11:13,155][00667] Fps is (10 sec: 3276.7, 60 sec: 3072.0, 300 sec: 3499.0). Total num frames: 1867776. Throughput: 0: 760.5. Samples: 467610. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-10-13 23:11:13,157][00667] Avg episode reward: [(0, '6.979')] [2024-10-13 23:11:17,729][03313] Updated weights for policy 0, policy_version 460 (0.0031) [2024-10-13 23:11:18,159][00667] Fps is (10 sec: 2866.0, 60 sec: 3071.8, 300 sec: 3498.9). Total num frames: 1884160. Throughput: 0: 758.3. Samples: 469418. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-10-13 23:11:18,162][00667] Avg episode reward: [(0, '7.331')] [2024-10-13 23:11:18,172][03300] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000460_1884160.pth... [2024-10-13 23:11:18,295][03300] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000259_1060864.pth [2024-10-13 23:11:23,155][00667] Fps is (10 sec: 3686.5, 60 sec: 3208.5, 300 sec: 3499.0). Total num frames: 1904640. Throughput: 0: 823.6. Samples: 475338. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-10-13 23:11:23,158][00667] Avg episode reward: [(0, '7.267')] [2024-10-13 23:11:28,155][00667] Fps is (10 sec: 3688.0, 60 sec: 3140.3, 300 sec: 3485.1). Total num frames: 1921024. Throughput: 0: 850.2. Samples: 480574. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-10-13 23:11:28,159][00667] Avg episode reward: [(0, '7.297')] [2024-10-13 23:11:29,445][03313] Updated weights for policy 0, policy_version 470 (0.0018) [2024-10-13 23:11:33,155][00667] Fps is (10 sec: 2867.2, 60 sec: 3208.5, 300 sec: 3485.1). Total num frames: 1933312. Throughput: 0: 825.3. Samples: 482368. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-10-13 23:11:33,157][00667] Avg episode reward: [(0, '7.345')] [2024-10-13 23:11:38,155][00667] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3485.1). Total num frames: 1953792. Throughput: 0: 810.6. Samples: 487550. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-10-13 23:11:38,162][00667] Avg episode reward: [(0, '7.669')] [2024-10-13 23:11:38,174][03300] Saving new best policy, reward=7.669! [2024-10-13 23:11:41,243][03313] Updated weights for policy 0, policy_version 480 (0.0028) [2024-10-13 23:11:43,155][00667] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 3471.2). Total num frames: 1970176. Throughput: 0: 864.1. Samples: 493432. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-10-13 23:11:43,159][00667] Avg episode reward: [(0, '7.637')] [2024-10-13 23:11:48,155][00667] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 3457.3). Total num frames: 1982464. Throughput: 0: 844.8. Samples: 495350. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-13 23:11:48,157][00667] Avg episode reward: [(0, '7.831')] [2024-10-13 23:11:48,171][03300] Saving new best policy, reward=7.831! [2024-10-13 23:11:53,155][00667] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 3443.4). Total num frames: 1998848. Throughput: 0: 805.3. Samples: 499708. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-10-13 23:11:53,156][00667] Avg episode reward: [(0, '8.009')] [2024-10-13 23:11:53,216][03300] Saving new best policy, reward=8.009! [2024-10-13 23:11:54,261][03313] Updated weights for policy 0, policy_version 490 (0.0014) [2024-10-13 23:11:58,155][00667] Fps is (10 sec: 3686.4, 60 sec: 3413.5, 300 sec: 3443.4). Total num frames: 2019328. Throughput: 0: 844.3. Samples: 505602. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-13 23:11:58,159][00667] Avg episode reward: [(0, '8.682')] [2024-10-13 23:11:58,168][03300] Saving new best policy, reward=8.682! [2024-10-13 23:12:03,155][00667] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 3429.5). Total num frames: 2035712. Throughput: 0: 862.8. Samples: 508240. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-13 23:12:03,159][00667] Avg episode reward: [(0, '8.989')] [2024-10-13 23:12:03,165][03300] Saving new best policy, reward=8.989! [2024-10-13 23:12:07,535][03313] Updated weights for policy 0, policy_version 500 (0.0028) [2024-10-13 23:12:08,155][00667] Fps is (10 sec: 2867.2, 60 sec: 3208.5, 300 sec: 3415.6). Total num frames: 2048000. Throughput: 0: 808.7. Samples: 511730. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-13 23:12:08,157][00667] Avg episode reward: [(0, '9.095')] [2024-10-13 23:12:08,168][03300] Saving new best policy, reward=9.095! [2024-10-13 23:12:13,155][00667] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3415.7). Total num frames: 2068480. Throughput: 0: 818.0. Samples: 517384. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-13 23:12:13,160][00667] Avg episode reward: [(0, '8.799')] [2024-10-13 23:12:18,157][00667] Fps is (10 sec: 3685.6, 60 sec: 3345.2, 300 sec: 3387.9). Total num frames: 2084864. Throughput: 0: 845.6. Samples: 520422. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-13 23:12:18,161][00667] Avg episode reward: [(0, '8.200')] [2024-10-13 23:12:18,638][03313] Updated weights for policy 0, policy_version 510 (0.0014) [2024-10-13 23:12:23,160][00667] Fps is (10 sec: 2865.7, 60 sec: 3208.3, 300 sec: 3387.8). Total num frames: 2097152. Throughput: 0: 821.5. Samples: 524520. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-13 23:12:23,164][00667] Avg episode reward: [(0, '8.289')] [2024-10-13 23:12:28,155][00667] Fps is (10 sec: 3277.5, 60 sec: 3276.8, 300 sec: 3387.9). Total num frames: 2117632. Throughput: 0: 807.1. Samples: 529750. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-10-13 23:12:28,161][00667] Avg episode reward: [(0, '8.267')] [2024-10-13 23:12:30,566][03313] Updated weights for policy 0, policy_version 520 (0.0027) [2024-10-13 23:12:33,155][00667] Fps is (10 sec: 4098.2, 60 sec: 3413.3, 300 sec: 3374.0). Total num frames: 2138112. Throughput: 0: 833.2. Samples: 532844. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-10-13 23:12:33,156][00667] Avg episode reward: [(0, '8.453')] [2024-10-13 23:12:38,155][00667] Fps is (10 sec: 3276.8, 60 sec: 3276.8, 300 sec: 3360.1). Total num frames: 2150400. Throughput: 0: 845.9. Samples: 537772. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-13 23:12:38,161][00667] Avg episode reward: [(0, '8.836')] [2024-10-13 23:12:43,155][00667] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 3360.1). Total num frames: 2166784. Throughput: 0: 809.7. Samples: 542040. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-13 23:12:43,161][00667] Avg episode reward: [(0, '8.696')] [2024-10-13 23:12:43,837][03313] Updated weights for policy 0, policy_version 530 (0.0041) [2024-10-13 23:12:48,155][00667] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3332.3). Total num frames: 2183168. Throughput: 0: 815.8. Samples: 544950. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-13 23:12:48,157][00667] Avg episode reward: [(0, '8.808')] [2024-10-13 23:12:53,156][00667] Fps is (10 sec: 2866.8, 60 sec: 3276.7, 300 sec: 3304.6). Total num frames: 2195456. Throughput: 0: 826.2. Samples: 548908. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-13 23:12:53,158][00667] Avg episode reward: [(0, '8.627')] [2024-10-13 23:12:58,155][00667] Fps is (10 sec: 2048.0, 60 sec: 3072.0, 300 sec: 3276.9). Total num frames: 2203648. Throughput: 0: 770.4. Samples: 552054. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-10-13 23:12:58,157][00667] Avg episode reward: [(0, '8.264')] [2024-10-13 23:12:59,775][03313] Updated weights for policy 0, policy_version 540 (0.0020) [2024-10-13 23:13:03,155][00667] Fps is (10 sec: 2867.6, 60 sec: 3140.3, 300 sec: 3276.8). Total num frames: 2224128. Throughput: 0: 753.9. Samples: 554346. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-13 23:13:03,160][00667] Avg episode reward: [(0, '8.771')] [2024-10-13 23:13:08,155][00667] Fps is (10 sec: 4096.0, 60 sec: 3276.8, 300 sec: 3290.7). Total num frames: 2244608. Throughput: 0: 796.0. Samples: 560336. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-13 23:13:08,160][00667] Avg episode reward: [(0, '7.778')] [2024-10-13 23:13:10,368][03313] Updated weights for policy 0, policy_version 550 (0.0031) [2024-10-13 23:13:13,155][00667] Fps is (10 sec: 3276.8, 60 sec: 3140.3, 300 sec: 3290.7). Total num frames: 2256896. Throughput: 0: 783.0. Samples: 564984. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-10-13 23:13:13,160][00667] Avg episode reward: [(0, '7.905')] [2024-10-13 23:13:18,155][00667] Fps is (10 sec: 2867.2, 60 sec: 3140.4, 300 sec: 3290.7). Total num frames: 2273280. Throughput: 0: 754.8. Samples: 566808. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-10-13 23:13:18,159][00667] Avg episode reward: [(0, '8.070')] [2024-10-13 23:13:18,170][03300] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000555_2273280.pth... [2024-10-13 23:13:18,301][03300] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000367_1503232.pth [2024-10-13 23:13:23,115][03313] Updated weights for policy 0, policy_version 560 (0.0035) [2024-10-13 23:13:23,155][00667] Fps is (10 sec: 3686.4, 60 sec: 3277.1, 300 sec: 3276.8). Total num frames: 2293760. Throughput: 0: 768.9. Samples: 572374. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-13 23:13:23,157][00667] Avg episode reward: [(0, '9.063')] [2024-10-13 23:13:28,156][00667] Fps is (10 sec: 3686.1, 60 sec: 3208.5, 300 sec: 3262.9). Total num frames: 2310144. Throughput: 0: 797.0. Samples: 577904. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-13 23:13:28,160][00667] Avg episode reward: [(0, '10.087')] [2024-10-13 23:13:28,177][03300] Saving new best policy, reward=10.087! [2024-10-13 23:13:33,155][00667] Fps is (10 sec: 2457.6, 60 sec: 3003.7, 300 sec: 3235.2). Total num frames: 2318336. Throughput: 0: 768.6. Samples: 579536. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-13 23:13:33,159][00667] Avg episode reward: [(0, '11.137')] [2024-10-13 23:13:33,166][03300] Saving new best policy, reward=11.137! [2024-10-13 23:13:36,577][03313] Updated weights for policy 0, policy_version 570 (0.0025) [2024-10-13 23:13:38,155][00667] Fps is (10 sec: 2867.5, 60 sec: 3140.3, 300 sec: 3235.1). Total num frames: 2338816. Throughput: 0: 785.4. Samples: 584248. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-13 23:13:38,163][00667] Avg episode reward: [(0, '12.617')] [2024-10-13 23:13:38,174][03300] Saving new best policy, reward=12.617! [2024-10-13 23:13:43,155][00667] Fps is (10 sec: 4096.0, 60 sec: 3208.5, 300 sec: 3221.3). Total num frames: 2359296. Throughput: 0: 843.3. Samples: 590004. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-10-13 23:13:43,159][00667] Avg episode reward: [(0, '13.544')] [2024-10-13 23:13:43,161][03300] Saving new best policy, reward=13.544! [2024-10-13 23:13:48,155][00667] Fps is (10 sec: 3276.8, 60 sec: 3140.3, 300 sec: 3221.3). Total num frames: 2371584. Throughput: 0: 838.0. Samples: 592058. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-10-13 23:13:48,158][00667] Avg episode reward: [(0, '13.498')] [2024-10-13 23:13:49,567][03313] Updated weights for policy 0, policy_version 580 (0.0023) [2024-10-13 23:13:53,155][00667] Fps is (10 sec: 2867.2, 60 sec: 3208.6, 300 sec: 3235.1). Total num frames: 2387968. Throughput: 0: 795.3. Samples: 596126. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-10-13 23:13:53,158][00667] Avg episode reward: [(0, '13.885')] [2024-10-13 23:13:53,165][03300] Saving new best policy, reward=13.885! [2024-10-13 23:13:58,155][00667] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3235.1). Total num frames: 2408448. Throughput: 0: 827.6. Samples: 602226. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-10-13 23:13:58,157][00667] Avg episode reward: [(0, '14.284')] [2024-10-13 23:13:58,169][03300] Saving new best policy, reward=14.284! [2024-10-13 23:14:00,099][03313] Updated weights for policy 0, policy_version 590 (0.0021) [2024-10-13 23:14:03,157][00667] Fps is (10 sec: 3685.7, 60 sec: 3345.0, 300 sec: 3235.2). Total num frames: 2424832. Throughput: 0: 852.1. Samples: 605154. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-13 23:14:03,164][00667] Avg episode reward: [(0, '14.697')] [2024-10-13 23:14:03,174][03300] Saving new best policy, reward=14.697! [2024-10-13 23:14:08,155][00667] Fps is (10 sec: 2867.2, 60 sec: 3208.5, 300 sec: 3235.2). Total num frames: 2437120. Throughput: 0: 807.1. Samples: 608692. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-10-13 23:14:08,159][00667] Avg episode reward: [(0, '15.098')] [2024-10-13 23:14:08,171][03300] Saving new best policy, reward=15.098! [2024-10-13 23:14:13,155][00667] Fps is (10 sec: 2867.7, 60 sec: 3276.8, 300 sec: 3221.3). Total num frames: 2453504. Throughput: 0: 803.2. Samples: 614046. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-10-13 23:14:13,157][00667] Avg episode reward: [(0, '15.859')] [2024-10-13 23:14:13,160][03300] Saving new best policy, reward=15.859! [2024-10-13 23:14:13,510][03313] Updated weights for policy 0, policy_version 600 (0.0027) [2024-10-13 23:14:18,155][00667] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 3235.1). Total num frames: 2473984. Throughput: 0: 831.7. Samples: 616962. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-13 23:14:18,163][00667] Avg episode reward: [(0, '15.068')] [2024-10-13 23:14:23,155][00667] Fps is (10 sec: 3276.8, 60 sec: 3208.5, 300 sec: 3235.1). Total num frames: 2486272. Throughput: 0: 825.2. Samples: 621382. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-13 23:14:23,160][00667] Avg episode reward: [(0, '15.409')] [2024-10-13 23:14:26,659][03313] Updated weights for policy 0, policy_version 610 (0.0042) [2024-10-13 23:14:28,155][00667] Fps is (10 sec: 2867.1, 60 sec: 3208.6, 300 sec: 3235.1). Total num frames: 2502656. Throughput: 0: 804.9. Samples: 626224. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-10-13 23:14:28,158][00667] Avg episode reward: [(0, '15.005')] [2024-10-13 23:14:33,155][00667] Fps is (10 sec: 3686.3, 60 sec: 3413.3, 300 sec: 3235.1). Total num frames: 2523136. Throughput: 0: 826.6. Samples: 629256. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-10-13 23:14:33,157][00667] Avg episode reward: [(0, '14.336')] [2024-10-13 23:14:38,157][00667] Fps is (10 sec: 3276.2, 60 sec: 3276.7, 300 sec: 3221.3). Total num frames: 2535424. Throughput: 0: 852.3. Samples: 634480. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-13 23:14:38,162][00667] Avg episode reward: [(0, '14.314')] [2024-10-13 23:14:38,251][03313] Updated weights for policy 0, policy_version 620 (0.0034) [2024-10-13 23:14:43,155][00667] Fps is (10 sec: 2867.3, 60 sec: 3208.5, 300 sec: 3235.1). Total num frames: 2551808. Throughput: 0: 805.8. Samples: 638486. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-13 23:14:43,162][00667] Avg episode reward: [(0, '15.364')] [2024-10-13 23:14:48,155][00667] Fps is (10 sec: 3687.2, 60 sec: 3345.1, 300 sec: 3235.1). Total num frames: 2572288. Throughput: 0: 807.1. Samples: 641472. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-10-13 23:14:48,157][00667] Avg episode reward: [(0, '14.688')] [2024-10-13 23:14:49,839][03313] Updated weights for policy 0, policy_version 630 (0.0017) [2024-10-13 23:14:53,155][00667] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 3235.1). Total num frames: 2588672. Throughput: 0: 858.9. Samples: 647342. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-10-13 23:14:53,157][00667] Avg episode reward: [(0, '16.732')] [2024-10-13 23:14:53,166][03300] Saving new best policy, reward=16.732! [2024-10-13 23:14:58,155][00667] Fps is (10 sec: 2867.2, 60 sec: 3208.5, 300 sec: 3235.1). Total num frames: 2600960. Throughput: 0: 818.7. Samples: 650886. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-10-13 23:14:58,162][00667] Avg episode reward: [(0, '16.635')] [2024-10-13 23:15:03,110][03313] Updated weights for policy 0, policy_version 640 (0.0045) [2024-10-13 23:15:03,155][00667] Fps is (10 sec: 3276.8, 60 sec: 3276.9, 300 sec: 3249.0). Total num frames: 2621440. Throughput: 0: 810.4. Samples: 653428. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-10-13 23:15:03,164][00667] Avg episode reward: [(0, '16.729')] [2024-10-13 23:15:08,159][00667] Fps is (10 sec: 3684.9, 60 sec: 3344.8, 300 sec: 3235.1). Total num frames: 2637824. Throughput: 0: 833.9. Samples: 658912. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-10-13 23:15:08,166][00667] Avg episode reward: [(0, '16.619')] [2024-10-13 23:15:13,158][00667] Fps is (10 sec: 2456.8, 60 sec: 3208.4, 300 sec: 3207.3). Total num frames: 2646016. Throughput: 0: 795.2. Samples: 662012. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-10-13 23:15:13,160][00667] Avg episode reward: [(0, '16.380')] [2024-10-13 23:15:18,155][00667] Fps is (10 sec: 2048.8, 60 sec: 3072.0, 300 sec: 3207.4). Total num frames: 2658304. Throughput: 0: 761.4. Samples: 663518. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-13 23:15:18,158][00667] Avg episode reward: [(0, '14.483')] [2024-10-13 23:15:18,169][03300] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000649_2658304.pth... [2024-10-13 23:15:18,306][03300] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000460_1884160.pth [2024-10-13 23:15:19,241][03313] Updated weights for policy 0, policy_version 650 (0.0049) [2024-10-13 23:15:23,155][00667] Fps is (10 sec: 2868.1, 60 sec: 3140.3, 300 sec: 3193.5). Total num frames: 2674688. Throughput: 0: 757.6. Samples: 668568. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-13 23:15:23,156][00667] Avg episode reward: [(0, '13.126')] [2024-10-13 23:15:28,155][00667] Fps is (10 sec: 3686.4, 60 sec: 3208.5, 300 sec: 3235.1). Total num frames: 2695168. Throughput: 0: 805.0. Samples: 674710. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-13 23:15:28,160][00667] Avg episode reward: [(0, '12.931')] [2024-10-13 23:15:30,152][03313] Updated weights for policy 0, policy_version 660 (0.0024) [2024-10-13 23:15:33,156][00667] Fps is (10 sec: 3276.3, 60 sec: 3071.9, 300 sec: 3235.1). Total num frames: 2707456. Throughput: 0: 784.6. Samples: 676782. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-13 23:15:33,162][00667] Avg episode reward: [(0, '13.220')] [2024-10-13 23:15:38,155][00667] Fps is (10 sec: 3276.8, 60 sec: 3208.7, 300 sec: 3249.0). Total num frames: 2727936. Throughput: 0: 748.1. Samples: 681008. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-10-13 23:15:38,157][00667] Avg episode reward: [(0, '13.909')] [2024-10-13 23:15:42,313][03313] Updated weights for policy 0, policy_version 670 (0.0022) [2024-10-13 23:15:43,155][00667] Fps is (10 sec: 3687.0, 60 sec: 3208.5, 300 sec: 3249.0). Total num frames: 2744320. Throughput: 0: 799.2. Samples: 686852. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-13 23:15:43,163][00667] Avg episode reward: [(0, '14.839')] [2024-10-13 23:15:48,155][00667] Fps is (10 sec: 3276.8, 60 sec: 3140.3, 300 sec: 3249.0). Total num frames: 2760704. Throughput: 0: 807.7. Samples: 689774. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-13 23:15:48,163][00667] Avg episode reward: [(0, '15.791')] [2024-10-13 23:15:53,155][00667] Fps is (10 sec: 2867.2, 60 sec: 3072.0, 300 sec: 3249.1). Total num frames: 2772992. Throughput: 0: 768.0. Samples: 693468. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-13 23:15:53,162][00667] Avg episode reward: [(0, '17.181')] [2024-10-13 23:15:53,167][03300] Saving new best policy, reward=17.181! [2024-10-13 23:15:55,387][03313] Updated weights for policy 0, policy_version 680 (0.0036) [2024-10-13 23:15:58,155][00667] Fps is (10 sec: 3276.8, 60 sec: 3208.5, 300 sec: 3249.0). Total num frames: 2793472. Throughput: 0: 823.6. Samples: 699072. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-13 23:15:58,161][00667] Avg episode reward: [(0, '18.817')] [2024-10-13 23:15:58,172][03300] Saving new best policy, reward=18.817! [2024-10-13 23:16:03,155][00667] Fps is (10 sec: 4096.0, 60 sec: 3208.5, 300 sec: 3249.0). Total num frames: 2813952. Throughput: 0: 856.3. Samples: 702050. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-10-13 23:16:03,157][00667] Avg episode reward: [(0, '18.647')] [2024-10-13 23:16:07,382][03313] Updated weights for policy 0, policy_version 690 (0.0032) [2024-10-13 23:16:08,155][00667] Fps is (10 sec: 3276.8, 60 sec: 3140.5, 300 sec: 3249.0). Total num frames: 2826240. Throughput: 0: 843.1. Samples: 706508. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-10-13 23:16:08,162][00667] Avg episode reward: [(0, '19.140')] [2024-10-13 23:16:08,175][03300] Saving new best policy, reward=19.140! [2024-10-13 23:16:13,155][00667] Fps is (10 sec: 2867.2, 60 sec: 3277.0, 300 sec: 3249.1). Total num frames: 2842624. Throughput: 0: 809.5. Samples: 711136. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-10-13 23:16:13,159][00667] Avg episode reward: [(0, '18.765')] [2024-10-13 23:16:18,155][00667] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3249.0). Total num frames: 2863104. Throughput: 0: 830.0. Samples: 714132. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-10-13 23:16:18,158][00667] Avg episode reward: [(0, '19.504')] [2024-10-13 23:16:18,169][03300] Saving new best policy, reward=19.504! [2024-10-13 23:16:18,757][03313] Updated weights for policy 0, policy_version 700 (0.0047) [2024-10-13 23:16:23,155][00667] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3235.1). Total num frames: 2875392. Throughput: 0: 849.8. Samples: 719250. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-10-13 23:16:23,160][00667] Avg episode reward: [(0, '19.633')] [2024-10-13 23:16:23,165][03300] Saving new best policy, reward=19.633! [2024-10-13 23:16:28,155][00667] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 3249.0). Total num frames: 2891776. Throughput: 0: 806.7. Samples: 723152. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-10-13 23:16:28,158][00667] Avg episode reward: [(0, '18.390')] [2024-10-13 23:16:32,107][03313] Updated weights for policy 0, policy_version 710 (0.0030) [2024-10-13 23:16:33,155][00667] Fps is (10 sec: 3686.4, 60 sec: 3413.4, 300 sec: 3249.0). Total num frames: 2912256. Throughput: 0: 808.4. Samples: 726150. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-13 23:16:33,161][00667] Avg episode reward: [(0, '20.172')] [2024-10-13 23:16:33,167][03300] Saving new best policy, reward=20.172! [2024-10-13 23:16:38,157][00667] Fps is (10 sec: 3685.5, 60 sec: 3344.9, 300 sec: 3249.0). Total num frames: 2928640. Throughput: 0: 856.8. Samples: 732028. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-10-13 23:16:38,161][00667] Avg episode reward: [(0, '19.783')] [2024-10-13 23:16:43,155][00667] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 3249.0). Total num frames: 2940928. Throughput: 0: 810.8. Samples: 735556. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-13 23:16:43,160][00667] Avg episode reward: [(0, '21.097')] [2024-10-13 23:16:43,164][03300] Saving new best policy, reward=21.097! [2024-10-13 23:16:45,295][03313] Updated weights for policy 0, policy_version 720 (0.0020) [2024-10-13 23:16:48,155][00667] Fps is (10 sec: 2867.8, 60 sec: 3276.8, 300 sec: 3249.0). Total num frames: 2957312. Throughput: 0: 801.0. Samples: 738096. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-13 23:16:48,162][00667] Avg episode reward: [(0, '21.036')] [2024-10-13 23:16:53,155][00667] Fps is (10 sec: 4096.0, 60 sec: 3481.6, 300 sec: 3262.9). Total num frames: 2981888. Throughput: 0: 842.9. Samples: 744438. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-13 23:16:53,157][00667] Avg episode reward: [(0, '19.842')] [2024-10-13 23:16:56,211][03313] Updated weights for policy 0, policy_version 730 (0.0030) [2024-10-13 23:16:58,155][00667] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 3249.0). Total num frames: 2994176. Throughput: 0: 838.7. Samples: 748878. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-13 23:16:58,160][00667] Avg episode reward: [(0, '19.736')] [2024-10-13 23:17:03,155][00667] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 3262.9). Total num frames: 3010560. Throughput: 0: 811.6. Samples: 750654. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-13 23:17:03,157][00667] Avg episode reward: [(0, '19.685')] [2024-10-13 23:17:08,084][03313] Updated weights for policy 0, policy_version 740 (0.0028) [2024-10-13 23:17:08,155][00667] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3262.9). Total num frames: 3031040. Throughput: 0: 831.2. Samples: 756654. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-13 23:17:08,161][00667] Avg episode reward: [(0, '19.715')] [2024-10-13 23:17:13,155][00667] Fps is (10 sec: 3276.7, 60 sec: 3345.1, 300 sec: 3249.1). Total num frames: 3043328. Throughput: 0: 861.2. Samples: 761904. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-13 23:17:13,160][00667] Avg episode reward: [(0, '20.081')] [2024-10-13 23:17:18,155][00667] Fps is (10 sec: 2457.6, 60 sec: 3208.5, 300 sec: 3249.1). Total num frames: 3055616. Throughput: 0: 833.2. Samples: 763644. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-13 23:17:18,160][00667] Avg episode reward: [(0, '19.347')] [2024-10-13 23:17:18,173][03300] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000746_3055616.pth... [2024-10-13 23:17:18,294][03300] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000555_2273280.pth [2024-10-13 23:17:21,656][03313] Updated weights for policy 0, policy_version 750 (0.0024) [2024-10-13 23:17:23,155][00667] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3249.0). Total num frames: 3076096. Throughput: 0: 813.7. Samples: 768642. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-10-13 23:17:23,163][00667] Avg episode reward: [(0, '21.671')] [2024-10-13 23:17:23,166][03300] Saving new best policy, reward=21.671! [2024-10-13 23:17:28,156][00667] Fps is (10 sec: 3276.4, 60 sec: 3276.7, 300 sec: 3221.2). Total num frames: 3088384. Throughput: 0: 828.6. Samples: 772844. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-10-13 23:17:28,162][00667] Avg episode reward: [(0, '22.511')] [2024-10-13 23:17:28,184][03300] Saving new best policy, reward=22.511! [2024-10-13 23:17:33,155][00667] Fps is (10 sec: 2048.0, 60 sec: 3072.0, 300 sec: 3207.4). Total num frames: 3096576. Throughput: 0: 804.9. Samples: 774316. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-10-13 23:17:33,159][00667] Avg episode reward: [(0, '23.599')] [2024-10-13 23:17:33,188][03300] Saving new best policy, reward=23.599! [2024-10-13 23:17:37,697][03313] Updated weights for policy 0, policy_version 760 (0.0047) [2024-10-13 23:17:38,157][00667] Fps is (10 sec: 2457.4, 60 sec: 3072.0, 300 sec: 3207.4). Total num frames: 3112960. Throughput: 0: 742.6. Samples: 777856. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-10-13 23:17:38,161][00667] Avg episode reward: [(0, '22.933')] [2024-10-13 23:17:43,155][00667] Fps is (10 sec: 3686.4, 60 sec: 3208.5, 300 sec: 3221.3). Total num frames: 3133440. Throughput: 0: 770.5. Samples: 783552. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-10-13 23:17:43,162][00667] Avg episode reward: [(0, '24.102')] [2024-10-13 23:17:43,168][03300] Saving new best policy, reward=24.102! [2024-10-13 23:17:48,155][00667] Fps is (10 sec: 3687.2, 60 sec: 3208.5, 300 sec: 3235.2). Total num frames: 3149824. Throughput: 0: 794.9. Samples: 786426. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-10-13 23:17:48,161][00667] Avg episode reward: [(0, '23.548')] [2024-10-13 23:17:49,445][03313] Updated weights for policy 0, policy_version 770 (0.0027) [2024-10-13 23:17:53,155][00667] Fps is (10 sec: 2867.2, 60 sec: 3003.7, 300 sec: 3249.0). Total num frames: 3162112. Throughput: 0: 750.2. Samples: 790412. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-10-13 23:17:53,157][00667] Avg episode reward: [(0, '22.298')] [2024-10-13 23:17:58,155][00667] Fps is (10 sec: 3276.8, 60 sec: 3140.3, 300 sec: 3249.0). Total num frames: 3182592. Throughput: 0: 748.2. Samples: 795572. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-13 23:17:58,161][00667] Avg episode reward: [(0, '22.115')] [2024-10-13 23:18:01,362][03313] Updated weights for policy 0, policy_version 780 (0.0025) [2024-10-13 23:18:03,155][00667] Fps is (10 sec: 3686.4, 60 sec: 3140.3, 300 sec: 3235.1). Total num frames: 3198976. Throughput: 0: 775.8. Samples: 798554. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-10-13 23:18:03,162][00667] Avg episode reward: [(0, '21.860')] [2024-10-13 23:18:08,158][00667] Fps is (10 sec: 2866.2, 60 sec: 3003.6, 300 sec: 3235.1). Total num frames: 3211264. Throughput: 0: 771.7. Samples: 803372. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-13 23:18:08,161][00667] Avg episode reward: [(0, '22.189')] [2024-10-13 23:18:13,157][00667] Fps is (10 sec: 2866.6, 60 sec: 3071.9, 300 sec: 3235.1). Total num frames: 3227648. Throughput: 0: 770.2. Samples: 807502. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-13 23:18:13,161][00667] Avg episode reward: [(0, '21.328')] [2024-10-13 23:18:14,561][03313] Updated weights for policy 0, policy_version 790 (0.0025) [2024-10-13 23:18:18,155][00667] Fps is (10 sec: 3687.6, 60 sec: 3208.5, 300 sec: 3235.1). Total num frames: 3248128. Throughput: 0: 808.0. Samples: 810678. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-13 23:18:18,158][00667] Avg episode reward: [(0, '21.684')] [2024-10-13 23:18:23,155][00667] Fps is (10 sec: 3687.1, 60 sec: 3140.3, 300 sec: 3235.2). Total num frames: 3264512. Throughput: 0: 859.6. Samples: 816538. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-10-13 23:18:23,162][00667] Avg episode reward: [(0, '20.857')] [2024-10-13 23:18:27,083][03313] Updated weights for policy 0, policy_version 800 (0.0050) [2024-10-13 23:18:28,155][00667] Fps is (10 sec: 2867.2, 60 sec: 3140.3, 300 sec: 3249.0). Total num frames: 3276800. Throughput: 0: 811.8. Samples: 820082. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-13 23:18:28,157][00667] Avg episode reward: [(0, '21.155')] [2024-10-13 23:18:33,155][00667] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3249.0). Total num frames: 3297280. Throughput: 0: 811.1. Samples: 822924. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-13 23:18:33,163][00667] Avg episode reward: [(0, '20.042')] [2024-10-13 23:18:37,505][03313] Updated weights for policy 0, policy_version 810 (0.0036) [2024-10-13 23:18:38,155][00667] Fps is (10 sec: 4096.0, 60 sec: 3413.5, 300 sec: 3249.0). Total num frames: 3317760. Throughput: 0: 856.7. Samples: 828962. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-13 23:18:38,161][00667] Avg episode reward: [(0, '20.028')] [2024-10-13 23:18:43,155][00667] Fps is (10 sec: 3276.8, 60 sec: 3276.8, 300 sec: 3249.0). Total num frames: 3330048. Throughput: 0: 831.1. Samples: 832972. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-10-13 23:18:43,157][00667] Avg episode reward: [(0, '20.190')] [2024-10-13 23:18:48,155][00667] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 3249.0). Total num frames: 3346432. Throughput: 0: 808.4. Samples: 834932. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-10-13 23:18:48,161][00667] Avg episode reward: [(0, '19.929')] [2024-10-13 23:18:50,871][03313] Updated weights for policy 0, policy_version 820 (0.0041) [2024-10-13 23:18:53,155][00667] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3249.0). Total num frames: 3366912. Throughput: 0: 835.4. Samples: 840960. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-13 23:18:53,158][00667] Avg episode reward: [(0, '20.867')] [2024-10-13 23:18:58,155][00667] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 3249.1). Total num frames: 3383296. Throughput: 0: 856.2. Samples: 846030. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-10-13 23:18:58,159][00667] Avg episode reward: [(0, '20.495')] [2024-10-13 23:19:03,155][00667] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 3249.0). Total num frames: 3395584. Throughput: 0: 826.2. Samples: 847858. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-10-13 23:19:03,157][00667] Avg episode reward: [(0, '20.505')] [2024-10-13 23:19:03,685][03313] Updated weights for policy 0, policy_version 830 (0.0030) [2024-10-13 23:19:08,155][00667] Fps is (10 sec: 3276.8, 60 sec: 3413.5, 300 sec: 3262.9). Total num frames: 3416064. Throughput: 0: 818.6. Samples: 853374. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-13 23:19:08,157][00667] Avg episode reward: [(0, '21.743')] [2024-10-13 23:19:13,155][00667] Fps is (10 sec: 3686.4, 60 sec: 3413.5, 300 sec: 3249.0). Total num frames: 3432448. Throughput: 0: 865.8. Samples: 859044. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-13 23:19:13,160][00667] Avg episode reward: [(0, '22.567')] [2024-10-13 23:19:15,385][03313] Updated weights for policy 0, policy_version 840 (0.0021) [2024-10-13 23:19:18,155][00667] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 3249.0). Total num frames: 3444736. Throughput: 0: 841.6. Samples: 860796. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-13 23:19:18,157][00667] Avg episode reward: [(0, '22.017')] [2024-10-13 23:19:18,174][03300] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000841_3444736.pth... [2024-10-13 23:19:18,344][03300] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000649_2658304.pth [2024-10-13 23:19:23,155][00667] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3262.9). Total num frames: 3465216. Throughput: 0: 811.4. Samples: 865474. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-13 23:19:23,157][00667] Avg episode reward: [(0, '22.977')] [2024-10-13 23:19:27,113][03313] Updated weights for policy 0, policy_version 850 (0.0024) [2024-10-13 23:19:28,155][00667] Fps is (10 sec: 4096.0, 60 sec: 3481.6, 300 sec: 3262.9). Total num frames: 3485696. Throughput: 0: 854.9. Samples: 871444. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-13 23:19:28,161][00667] Avg episode reward: [(0, '22.020')] [2024-10-13 23:19:33,155][00667] Fps is (10 sec: 3276.7, 60 sec: 3345.1, 300 sec: 3262.9). Total num frames: 3497984. Throughput: 0: 865.2. Samples: 873864. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-13 23:19:33,160][00667] Avg episode reward: [(0, '21.094')] [2024-10-13 23:19:38,155][00667] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 3262.9). Total num frames: 3514368. Throughput: 0: 820.7. Samples: 877892. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-13 23:19:38,161][00667] Avg episode reward: [(0, '20.312')] [2024-10-13 23:19:40,032][03313] Updated weights for policy 0, policy_version 860 (0.0027) [2024-10-13 23:19:43,155][00667] Fps is (10 sec: 3276.7, 60 sec: 3345.0, 300 sec: 3249.0). Total num frames: 3530752. Throughput: 0: 834.0. Samples: 883560. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-13 23:19:43,158][00667] Avg episode reward: [(0, '19.615')] [2024-10-13 23:19:48,155][00667] Fps is (10 sec: 2867.1, 60 sec: 3276.8, 300 sec: 3235.1). Total num frames: 3543040. Throughput: 0: 834.9. Samples: 885428. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-13 23:19:48,157][00667] Avg episode reward: [(0, '19.899')] [2024-10-13 23:19:53,155][00667] Fps is (10 sec: 2048.1, 60 sec: 3072.0, 300 sec: 3221.3). Total num frames: 3551232. Throughput: 0: 779.9. Samples: 888468. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-13 23:19:53,162][00667] Avg episode reward: [(0, '19.096')] [2024-10-13 23:19:55,552][03313] Updated weights for policy 0, policy_version 870 (0.0026) [2024-10-13 23:19:58,155][00667] Fps is (10 sec: 2867.3, 60 sec: 3140.3, 300 sec: 3221.3). Total num frames: 3571712. Throughput: 0: 765.3. Samples: 893482. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-13 23:19:58,161][00667] Avg episode reward: [(0, '19.375')] [2024-10-13 23:20:03,155][00667] Fps is (10 sec: 4096.0, 60 sec: 3276.8, 300 sec: 3235.2). Total num frames: 3592192. Throughput: 0: 795.1. Samples: 896576. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-13 23:20:03,157][00667] Avg episode reward: [(0, '19.489')] [2024-10-13 23:20:06,745][03313] Updated weights for policy 0, policy_version 880 (0.0023) [2024-10-13 23:20:08,155][00667] Fps is (10 sec: 3276.8, 60 sec: 3140.3, 300 sec: 3249.1). Total num frames: 3604480. Throughput: 0: 801.6. Samples: 901546. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-13 23:20:08,160][00667] Avg episode reward: [(0, '19.807')] [2024-10-13 23:20:13,155][00667] Fps is (10 sec: 2867.3, 60 sec: 3140.3, 300 sec: 3262.9). Total num frames: 3620864. Throughput: 0: 765.6. Samples: 905896. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-10-13 23:20:13,163][00667] Avg episode reward: [(0, '20.471')] [2024-10-13 23:20:18,155][00667] Fps is (10 sec: 3686.3, 60 sec: 3276.8, 300 sec: 3276.8). Total num frames: 3641344. Throughput: 0: 776.7. Samples: 908814. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-10-13 23:20:18,157][00667] Avg episode reward: [(0, '21.209')] [2024-10-13 23:20:18,736][03313] Updated weights for policy 0, policy_version 890 (0.0023) [2024-10-13 23:20:23,155][00667] Fps is (10 sec: 3686.4, 60 sec: 3208.5, 300 sec: 3262.9). Total num frames: 3657728. Throughput: 0: 813.4. Samples: 914494. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-13 23:20:23,164][00667] Avg episode reward: [(0, '20.437')] [2024-10-13 23:20:28,155][00667] Fps is (10 sec: 2867.3, 60 sec: 3072.0, 300 sec: 3262.9). Total num frames: 3670016. Throughput: 0: 770.3. Samples: 918222. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-13 23:20:28,157][00667] Avg episode reward: [(0, '21.215')] [2024-10-13 23:20:31,556][03313] Updated weights for policy 0, policy_version 900 (0.0043) [2024-10-13 23:20:33,155][00667] Fps is (10 sec: 3276.8, 60 sec: 3208.5, 300 sec: 3262.9). Total num frames: 3690496. Throughput: 0: 795.4. Samples: 921222. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-13 23:20:33,157][00667] Avg episode reward: [(0, '20.618')] [2024-10-13 23:20:38,155][00667] Fps is (10 sec: 4096.0, 60 sec: 3276.8, 300 sec: 3276.8). Total num frames: 3710976. Throughput: 0: 863.6. Samples: 927332. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-10-13 23:20:38,161][00667] Avg episode reward: [(0, '21.368')] [2024-10-13 23:20:43,155][00667] Fps is (10 sec: 3276.8, 60 sec: 3208.6, 300 sec: 3262.9). Total num frames: 3723264. Throughput: 0: 840.8. Samples: 931318. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-13 23:20:43,162][00667] Avg episode reward: [(0, '21.616')] [2024-10-13 23:20:44,399][03313] Updated weights for policy 0, policy_version 910 (0.0035) [2024-10-13 23:20:48,155][00667] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 3276.8). Total num frames: 3739648. Throughput: 0: 820.0. Samples: 933474. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-13 23:20:48,162][00667] Avg episode reward: [(0, '21.448')] [2024-10-13 23:20:53,155][00667] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3276.8). Total num frames: 3760128. Throughput: 0: 845.8. Samples: 939608. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-13 23:20:53,157][00667] Avg episode reward: [(0, '21.959')] [2024-10-13 23:20:54,925][03313] Updated weights for policy 0, policy_version 920 (0.0035) [2024-10-13 23:20:58,156][00667] Fps is (10 sec: 3276.3, 60 sec: 3345.0, 300 sec: 3249.0). Total num frames: 3772416. Throughput: 0: 852.3. Samples: 944252. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-13 23:20:58,159][00667] Avg episode reward: [(0, '22.458')] [2024-10-13 23:21:03,155][00667] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 3262.9). Total num frames: 3788800. Throughput: 0: 827.8. Samples: 946064. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-10-13 23:21:03,162][00667] Avg episode reward: [(0, '22.751')] [2024-10-13 23:21:07,568][03313] Updated weights for policy 0, policy_version 930 (0.0018) [2024-10-13 23:21:08,155][00667] Fps is (10 sec: 3687.0, 60 sec: 3413.3, 300 sec: 3276.8). Total num frames: 3809280. Throughput: 0: 830.4. Samples: 951864. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-13 23:21:08,162][00667] Avg episode reward: [(0, '21.070')] [2024-10-13 23:21:13,155][00667] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3262.9). Total num frames: 3825664. Throughput: 0: 875.6. Samples: 957626. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-13 23:21:13,159][00667] Avg episode reward: [(0, '20.861')] [2024-10-13 23:21:18,155][00667] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 3262.9). Total num frames: 3837952. Throughput: 0: 848.1. Samples: 959388. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-10-13 23:21:18,161][00667] Avg episode reward: [(0, '19.707')] [2024-10-13 23:21:18,171][03300] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000937_3837952.pth... [2024-10-13 23:21:18,309][03300] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000746_3055616.pth [2024-10-13 23:21:20,400][03313] Updated weights for policy 0, policy_version 940 (0.0020) [2024-10-13 23:21:23,155][00667] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3276.8). Total num frames: 3858432. Throughput: 0: 821.6. Samples: 964304. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-13 23:21:23,160][00667] Avg episode reward: [(0, '19.397')] [2024-10-13 23:21:28,158][00667] Fps is (10 sec: 4094.8, 60 sec: 3481.4, 300 sec: 3276.8). Total num frames: 3878912. Throughput: 0: 866.3. Samples: 970306. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-13 23:21:28,163][00667] Avg episode reward: [(0, '19.878')] [2024-10-13 23:21:32,217][03313] Updated weights for policy 0, policy_version 950 (0.0029) [2024-10-13 23:21:33,155][00667] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3262.9). Total num frames: 3891200. Throughput: 0: 862.7. Samples: 972296. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-13 23:21:33,157][00667] Avg episode reward: [(0, '20.656')] [2024-10-13 23:21:38,155][00667] Fps is (10 sec: 2868.0, 60 sec: 3276.8, 300 sec: 3276.8). Total num frames: 3907584. Throughput: 0: 823.3. Samples: 976656. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-13 23:21:38,161][00667] Avg episode reward: [(0, '20.974')] [2024-10-13 23:21:43,155][00667] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3290.7). Total num frames: 3928064. Throughput: 0: 856.1. Samples: 982774. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-13 23:21:43,161][00667] Avg episode reward: [(0, '21.901')] [2024-10-13 23:21:43,441][03313] Updated weights for policy 0, policy_version 960 (0.0021) [2024-10-13 23:21:48,155][00667] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3262.9). Total num frames: 3944448. Throughput: 0: 875.3. Samples: 985452. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-13 23:21:48,160][00667] Avg episode reward: [(0, '22.777')] [2024-10-13 23:21:53,155][00667] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3276.8). Total num frames: 3960832. Throughput: 0: 828.0. Samples: 989126. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-13 23:21:53,163][00667] Avg episode reward: [(0, '22.113')] [2024-10-13 23:21:55,968][03313] Updated weights for policy 0, policy_version 970 (0.0016) [2024-10-13 23:21:58,157][00667] Fps is (10 sec: 3685.7, 60 sec: 3481.6, 300 sec: 3290.7). Total num frames: 3981312. Throughput: 0: 839.7. Samples: 995412. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-13 23:21:58,159][00667] Avg episode reward: [(0, '21.386')] [2024-10-13 23:22:03,155][00667] Fps is (10 sec: 2867.2, 60 sec: 3345.1, 300 sec: 3249.0). Total num frames: 3989504. Throughput: 0: 843.9. Samples: 997364. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-13 23:22:03,158][00667] Avg episode reward: [(0, '21.886')] [2024-10-13 23:22:08,155][00667] Fps is (10 sec: 2048.3, 60 sec: 3208.5, 300 sec: 3249.0). Total num frames: 4001792. Throughput: 0: 802.9. Samples: 1000436. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-10-13 23:22:08,160][00667] Avg episode reward: [(0, '22.675')] [2024-10-13 23:22:09,213][03300] Stopping Batcher_0... [2024-10-13 23:22:09,213][03300] Loop batcher_evt_loop terminating... [2024-10-13 23:22:09,215][03300] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-10-13 23:22:09,213][00667] Component Batcher_0 stopped! [2024-10-13 23:22:09,317][03313] Weights refcount: 2 0 [2024-10-13 23:22:09,324][03313] Stopping InferenceWorker_p0-w0... [2024-10-13 23:22:09,328][03313] Loop inference_proc0-0_evt_loop terminating... [2024-10-13 23:22:09,325][00667] Component InferenceWorker_p0-w0 stopped! [2024-10-13 23:22:09,374][03300] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000841_3444736.pth [2024-10-13 23:22:09,410][03300] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-10-13 23:22:09,727][03300] Stopping LearnerWorker_p0... [2024-10-13 23:22:09,728][03300] Loop learner_proc0_evt_loop terminating... [2024-10-13 23:22:09,727][00667] Component LearnerWorker_p0 stopped! [2024-10-13 23:22:09,850][00667] Component RolloutWorker_w4 stopped! [2024-10-13 23:22:09,858][03318] Stopping RolloutWorker_w4... [2024-10-13 23:22:09,858][03318] Loop rollout_proc4_evt_loop terminating... [2024-10-13 23:22:09,861][03316] Stopping RolloutWorker_w1... [2024-10-13 23:22:09,861][03316] Loop rollout_proc1_evt_loop terminating... [2024-10-13 23:22:09,861][00667] Component RolloutWorker_w1 stopped! [2024-10-13 23:22:09,872][00667] Component RolloutWorker_w2 stopped! [2024-10-13 23:22:09,875][03317] Stopping RolloutWorker_w2... [2024-10-13 23:22:09,876][03317] Loop rollout_proc2_evt_loop terminating... [2024-10-13 23:22:09,890][03321] Stopping RolloutWorker_w7... [2024-10-13 23:22:09,888][00667] Component RolloutWorker_w6 stopped! [2024-10-13 23:22:09,893][03319] Stopping RolloutWorker_w6... [2024-10-13 23:22:09,893][03319] Loop rollout_proc6_evt_loop terminating... [2024-10-13 23:22:09,890][03321] Loop rollout_proc7_evt_loop terminating... [2024-10-13 23:22:09,891][00667] Component RolloutWorker_w7 stopped! [2024-10-13 23:22:09,905][03320] Stopping RolloutWorker_w5... [2024-10-13 23:22:09,905][03320] Loop rollout_proc5_evt_loop terminating... [2024-10-13 23:22:09,912][03314] Stopping RolloutWorker_w0... [2024-10-13 23:22:09,913][03314] Loop rollout_proc0_evt_loop terminating... [2024-10-13 23:22:09,913][03315] Stopping RolloutWorker_w3... [2024-10-13 23:22:09,904][00667] Component RolloutWorker_w5 stopped! [2024-10-13 23:22:09,918][00667] Component RolloutWorker_w0 stopped! [2024-10-13 23:22:09,922][00667] Component RolloutWorker_w3 stopped! [2024-10-13 23:22:09,923][00667] Waiting for process learner_proc0 to stop... [2024-10-13 23:22:09,913][03315] Loop rollout_proc3_evt_loop terminating... [2024-10-13 23:22:11,574][00667] Waiting for process inference_proc0-0 to join... [2024-10-13 23:22:11,578][00667] Waiting for process rollout_proc0 to join... [2024-10-13 23:22:13,729][00667] Waiting for process rollout_proc1 to join... [2024-10-13 23:22:13,734][00667] Waiting for process rollout_proc2 to join... [2024-10-13 23:22:13,737][00667] Waiting for process rollout_proc3 to join... [2024-10-13 23:22:13,740][00667] Waiting for process rollout_proc4 to join... [2024-10-13 23:22:13,749][00667] Waiting for process rollout_proc5 to join... [2024-10-13 23:22:13,754][00667] Waiting for process rollout_proc6 to join... [2024-10-13 23:22:13,758][00667] Waiting for process rollout_proc7 to join... [2024-10-13 23:22:13,762][00667] Batcher 0 profile tree view: batching: 26.9591, releasing_batches: 0.0399 [2024-10-13 23:22:13,764][00667] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0001 wait_policy_total: 431.0935 update_model: 11.5196 weight_update: 0.0015 one_step: 0.0218 handle_policy_step: 726.7763 deserialize: 18.4750, stack: 3.6660, obs_to_device_normalize: 143.0186, forward: 394.5493, send_messages: 35.1427 prepare_outputs: 97.3036 to_cpu: 53.8976 [2024-10-13 23:22:13,767][00667] Learner 0 profile tree view: misc: 0.0057, prepare_batch: 14.1341 train: 77.4666 epoch_init: 0.0142, minibatch_init: 0.0069, losses_postprocess: 0.6566, kl_divergence: 0.6593, after_optimizer: 35.3312 calculate_losses: 27.5710 losses_init: 0.0070, forward_head: 1.5271, bptt_initial: 18.2778, tail: 1.1719, advantages_returns: 0.3316, losses: 3.8082 bptt: 2.1144 bptt_forward_core: 2.0094 update: 12.4909 clip: 1.0055 [2024-10-13 23:22:13,768][00667] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.3487, enqueue_policy_requests: 120.4805, env_step: 940.1816, overhead: 20.4897, complete_rollouts: 8.7837 save_policy_outputs: 24.8368 split_output_tensors: 10.0857 [2024-10-13 23:22:13,770][00667] RolloutWorker_w7 profile tree view: wait_for_trajectories: 0.4578, enqueue_policy_requests: 119.0921, env_step: 942.6492, overhead: 19.1267, complete_rollouts: 7.2794 save_policy_outputs: 25.3761 split_output_tensors: 10.1596 [2024-10-13 23:22:13,772][00667] Loop Runner_EvtLoop terminating... [2024-10-13 23:22:13,773][00667] Runner profile tree view: main_loop: 1250.1522 [2024-10-13 23:22:13,774][00667] Collected {0: 4005888}, FPS: 3204.3 [2024-10-13 23:22:14,071][00667] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-10-13 23:22:14,073][00667] Overriding arg 'num_workers' with value 1 passed from command line [2024-10-13 23:22:14,076][00667] Adding new argument 'no_render'=True that is not in the saved config file! [2024-10-13 23:22:14,078][00667] Adding new argument 'save_video'=True that is not in the saved config file! [2024-10-13 23:22:14,080][00667] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-10-13 23:22:14,081][00667] Adding new argument 'video_name'=None that is not in the saved config file! [2024-10-13 23:22:14,082][00667] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! [2024-10-13 23:22:14,084][00667] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-10-13 23:22:14,085][00667] Adding new argument 'push_to_hub'=False that is not in the saved config file! [2024-10-13 23:22:14,086][00667] Adding new argument 'hf_repository'=None that is not in the saved config file! [2024-10-13 23:22:14,087][00667] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-10-13 23:22:14,088][00667] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-10-13 23:22:14,090][00667] Adding new argument 'train_script'=None that is not in the saved config file! [2024-10-13 23:22:14,091][00667] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-10-13 23:22:14,092][00667] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-10-13 23:22:14,135][00667] Doom resolution: 160x120, resize resolution: (128, 72) [2024-10-13 23:22:14,139][00667] RunningMeanStd input shape: (3, 72, 128) [2024-10-13 23:22:14,142][00667] RunningMeanStd input shape: (1,) [2024-10-13 23:22:14,158][00667] ConvEncoder: input_channels=3 [2024-10-13 23:22:14,267][00667] Conv encoder output size: 512 [2024-10-13 23:22:14,270][00667] Policy head output size: 512 [2024-10-13 23:22:14,568][00667] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-10-13 23:22:15,479][00667] Num frames 100... [2024-10-13 23:22:15,621][00667] Num frames 200... [2024-10-13 23:22:15,752][00667] Num frames 300... [2024-10-13 23:22:15,883][00667] Num frames 400... [2024-10-13 23:22:16,017][00667] Num frames 500... [2024-10-13 23:22:16,150][00667] Num frames 600... [2024-10-13 23:22:16,279][00667] Num frames 700... [2024-10-13 23:22:16,357][00667] Avg episode rewards: #0: 14.150, true rewards: #0: 7.150 [2024-10-13 23:22:16,359][00667] Avg episode reward: 14.150, avg true_objective: 7.150 [2024-10-13 23:22:16,484][00667] Num frames 800... [2024-10-13 23:22:16,609][00667] Num frames 900... [2024-10-13 23:22:16,734][00667] Num frames 1000... [2024-10-13 23:22:16,864][00667] Num frames 1100... [2024-10-13 23:22:16,993][00667] Num frames 1200... [2024-10-13 23:22:17,121][00667] Num frames 1300... [2024-10-13 23:22:17,245][00667] Num frames 1400... [2024-10-13 23:22:17,369][00667] Avg episode rewards: #0: 13.255, true rewards: #0: 7.255 [2024-10-13 23:22:17,370][00667] Avg episode reward: 13.255, avg true_objective: 7.255 [2024-10-13 23:22:17,439][00667] Num frames 1500... [2024-10-13 23:22:17,575][00667] Num frames 1600... [2024-10-13 23:22:17,713][00667] Num frames 1700... [2024-10-13 23:22:17,838][00667] Num frames 1800... [2024-10-13 23:22:18,016][00667] Avg episode rewards: #0: 11.330, true rewards: #0: 6.330 [2024-10-13 23:22:18,018][00667] Avg episode reward: 11.330, avg true_objective: 6.330 [2024-10-13 23:22:18,022][00667] Num frames 1900... [2024-10-13 23:22:18,156][00667] Num frames 2000... [2024-10-13 23:22:18,283][00667] Num frames 2100... [2024-10-13 23:22:18,438][00667] Num frames 2200... [2024-10-13 23:22:18,581][00667] Num frames 2300... [2024-10-13 23:22:18,710][00667] Num frames 2400... [2024-10-13 23:22:18,844][00667] Num frames 2500... [2024-10-13 23:22:18,980][00667] Num frames 2600... [2024-10-13 23:22:19,116][00667] Num frames 2700... [2024-10-13 23:22:19,293][00667] Avg episode rewards: #0: 13.238, true rewards: #0: 6.987 [2024-10-13 23:22:19,295][00667] Avg episode reward: 13.238, avg true_objective: 6.987 [2024-10-13 23:22:19,310][00667] Num frames 2800... [2024-10-13 23:22:19,448][00667] Num frames 2900... [2024-10-13 23:22:19,594][00667] Num frames 3000... [2024-10-13 23:22:19,722][00667] Num frames 3100... [2024-10-13 23:22:19,852][00667] Num frames 3200... [2024-10-13 23:22:19,984][00667] Num frames 3300... [2024-10-13 23:22:20,114][00667] Num frames 3400... [2024-10-13 23:22:20,248][00667] Num frames 3500... [2024-10-13 23:22:20,447][00667] Num frames 3600... [2024-10-13 23:22:20,675][00667] Avg episode rewards: #0: 14.582, true rewards: #0: 7.382 [2024-10-13 23:22:20,678][00667] Avg episode reward: 14.582, avg true_objective: 7.382 [2024-10-13 23:22:20,709][00667] Num frames 3700... [2024-10-13 23:22:20,890][00667] Num frames 3800... [2024-10-13 23:22:21,071][00667] Num frames 3900... [2024-10-13 23:22:21,249][00667] Num frames 4000... [2024-10-13 23:22:21,442][00667] Num frames 4100... [2024-10-13 23:22:21,645][00667] Num frames 4200... [2024-10-13 23:22:21,765][00667] Avg episode rewards: #0: 13.725, true rewards: #0: 7.058 [2024-10-13 23:22:21,767][00667] Avg episode reward: 13.725, avg true_objective: 7.058 [2024-10-13 23:22:21,893][00667] Num frames 4300... [2024-10-13 23:22:22,084][00667] Num frames 4400... [2024-10-13 23:22:22,272][00667] Num frames 4500... [2024-10-13 23:22:22,472][00667] Num frames 4600... [2024-10-13 23:22:22,573][00667] Avg episode rewards: #0: 12.313, true rewards: #0: 6.599 [2024-10-13 23:22:22,575][00667] Avg episode reward: 12.313, avg true_objective: 6.599 [2024-10-13 23:22:22,687][00667] Num frames 4700... [2024-10-13 23:22:22,818][00667] Num frames 4800... [2024-10-13 23:22:22,945][00667] Num frames 4900... [2024-10-13 23:22:23,074][00667] Num frames 5000... [2024-10-13 23:22:23,207][00667] Num frames 5100... [2024-10-13 23:22:23,343][00667] Num frames 5200... [2024-10-13 23:22:23,476][00667] Num frames 5300... [2024-10-13 23:22:23,648][00667] Avg episode rewards: #0: 12.359, true rewards: #0: 6.734 [2024-10-13 23:22:23,649][00667] Avg episode reward: 12.359, avg true_objective: 6.734 [2024-10-13 23:22:23,673][00667] Num frames 5400... [2024-10-13 23:22:23,805][00667] Num frames 5500... [2024-10-13 23:22:23,935][00667] Num frames 5600... [2024-10-13 23:22:24,067][00667] Num frames 5700... [2024-10-13 23:22:24,198][00667] Num frames 5800... [2024-10-13 23:22:24,331][00667] Num frames 5900... [2024-10-13 23:22:24,466][00667] Num frames 6000... [2024-10-13 23:22:24,593][00667] Num frames 6100... [2024-10-13 23:22:24,730][00667] Num frames 6200... [2024-10-13 23:22:24,862][00667] Num frames 6300... [2024-10-13 23:22:24,991][00667] Num frames 6400... [2024-10-13 23:22:25,126][00667] Num frames 6500... [2024-10-13 23:22:25,262][00667] Num frames 6600... [2024-10-13 23:22:25,404][00667] Num frames 6700... [2024-10-13 23:22:25,534][00667] Num frames 6800... [2024-10-13 23:22:25,669][00667] Num frames 6900... [2024-10-13 23:22:25,811][00667] Num frames 7000... [2024-10-13 23:22:25,913][00667] Avg episode rewards: #0: 15.484, true rewards: #0: 7.818 [2024-10-13 23:22:25,914][00667] Avg episode reward: 15.484, avg true_objective: 7.818 [2024-10-13 23:22:26,004][00667] Num frames 7100... [2024-10-13 23:22:26,138][00667] Num frames 7200... [2024-10-13 23:22:26,269][00667] Num frames 7300... [2024-10-13 23:22:26,416][00667] Num frames 7400... [2024-10-13 23:22:26,545][00667] Num frames 7500... [2024-10-13 23:22:26,673][00667] Num frames 7600... [2024-10-13 23:22:26,810][00667] Num frames 7700... [2024-10-13 23:22:26,945][00667] Num frames 7800... [2024-10-13 23:22:27,079][00667] Num frames 7900... [2024-10-13 23:22:27,220][00667] Num frames 8000... [2024-10-13 23:22:27,371][00667] Num frames 8100... [2024-10-13 23:22:27,513][00667] Num frames 8200... [2024-10-13 23:22:27,650][00667] Num frames 8300... [2024-10-13 23:22:27,790][00667] Num frames 8400... [2024-10-13 23:22:27,929][00667] Num frames 8500... [2024-10-13 23:22:28,066][00667] Num frames 8600... [2024-10-13 23:22:28,204][00667] Num frames 8700... [2024-10-13 23:22:28,256][00667] Avg episode rewards: #0: 17.900, true rewards: #0: 8.700 [2024-10-13 23:22:28,259][00667] Avg episode reward: 17.900, avg true_objective: 8.700 [2024-10-13 23:23:20,036][00667] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2024-10-13 23:24:15,646][00667] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-10-13 23:24:15,648][00667] Overriding arg 'num_workers' with value 1 passed from command line [2024-10-13 23:24:15,652][00667] Adding new argument 'no_render'=True that is not in the saved config file! [2024-10-13 23:24:15,656][00667] Adding new argument 'save_video'=True that is not in the saved config file! [2024-10-13 23:24:15,657][00667] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-10-13 23:24:15,660][00667] Adding new argument 'video_name'=None that is not in the saved config file! [2024-10-13 23:24:15,661][00667] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2024-10-13 23:24:15,663][00667] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-10-13 23:24:15,666][00667] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2024-10-13 23:24:15,668][00667] Adding new argument 'hf_repository'='alient12/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2024-10-13 23:24:15,670][00667] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-10-13 23:24:15,671][00667] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-10-13 23:24:15,674][00667] Adding new argument 'train_script'=None that is not in the saved config file! [2024-10-13 23:24:15,675][00667] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-10-13 23:24:15,676][00667] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-10-13 23:24:15,725][00667] RunningMeanStd input shape: (3, 72, 128) [2024-10-13 23:24:15,730][00667] RunningMeanStd input shape: (1,) [2024-10-13 23:24:15,749][00667] ConvEncoder: input_channels=3 [2024-10-13 23:24:15,809][00667] Conv encoder output size: 512 [2024-10-13 23:24:15,814][00667] Policy head output size: 512 [2024-10-13 23:24:15,843][00667] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-10-13 23:24:16,544][00667] Num frames 100... [2024-10-13 23:24:16,727][00667] Num frames 200... [2024-10-13 23:24:16,912][00667] Num frames 300... [2024-10-13 23:24:17,090][00667] Num frames 400... [2024-10-13 23:24:17,280][00667] Num frames 500... [2024-10-13 23:24:17,485][00667] Avg episode rewards: #0: 8.760, true rewards: #0: 5.760 [2024-10-13 23:24:17,487][00667] Avg episode reward: 8.760, avg true_objective: 5.760 [2024-10-13 23:24:17,528][00667] Num frames 600... [2024-10-13 23:24:17,661][00667] Num frames 700... [2024-10-13 23:24:17,791][00667] Num frames 800... [2024-10-13 23:24:17,920][00667] Num frames 900... [2024-10-13 23:24:18,041][00667] Num frames 1000... [2024-10-13 23:24:18,175][00667] Num frames 1100... [2024-10-13 23:24:18,304][00667] Num frames 1200... [2024-10-13 23:24:18,458][00667] Num frames 1300... [2024-10-13 23:24:18,587][00667] Num frames 1400... [2024-10-13 23:24:18,722][00667] Num frames 1500... [2024-10-13 23:24:18,850][00667] Num frames 1600... [2024-10-13 23:24:18,974][00667] Num frames 1700... [2024-10-13 23:24:19,097][00667] Num frames 1800... [2024-10-13 23:24:19,226][00667] Num frames 1900... [2024-10-13 23:24:19,362][00667] Num frames 2000... [2024-10-13 23:24:19,502][00667] Num frames 2100... [2024-10-13 23:24:19,631][00667] Num frames 2200... [2024-10-13 23:24:19,766][00667] Num frames 2300... [2024-10-13 23:24:19,898][00667] Num frames 2400... [2024-10-13 23:24:20,027][00667] Num frames 2500... [2024-10-13 23:24:20,158][00667] Num frames 2600... [2024-10-13 23:24:20,310][00667] Avg episode rewards: #0: 33.879, true rewards: #0: 13.380 [2024-10-13 23:24:20,312][00667] Avg episode reward: 33.879, avg true_objective: 13.380 [2024-10-13 23:24:20,351][00667] Num frames 2700... [2024-10-13 23:24:20,482][00667] Num frames 2800... [2024-10-13 23:24:20,613][00667] Num frames 2900... [2024-10-13 23:24:20,743][00667] Num frames 3000... [2024-10-13 23:24:20,878][00667] Num frames 3100... [2024-10-13 23:24:21,004][00667] Num frames 3200... [2024-10-13 23:24:21,136][00667] Num frames 3300... [2024-10-13 23:24:21,262][00667] Num frames 3400... [2024-10-13 23:24:21,400][00667] Num frames 3500... [2024-10-13 23:24:21,589][00667] Num frames 3600... [2024-10-13 23:24:21,771][00667] Num frames 3700... [2024-10-13 23:24:21,947][00667] Num frames 3800... [2024-10-13 23:24:22,118][00667] Num frames 3900... [2024-10-13 23:24:22,295][00667] Num frames 4000... [2024-10-13 23:24:22,491][00667] Num frames 4100... [2024-10-13 23:24:22,663][00667] Num frames 4200... [2024-10-13 23:24:22,846][00667] Num frames 4300... [2024-10-13 23:24:23,034][00667] Num frames 4400... [2024-10-13 23:24:23,219][00667] Num frames 4500... [2024-10-13 23:24:23,414][00667] Num frames 4600... [2024-10-13 23:24:23,609][00667] Num frames 4700... [2024-10-13 23:24:23,801][00667] Avg episode rewards: #0: 43.252, true rewards: #0: 15.920 [2024-10-13 23:24:23,804][00667] Avg episode reward: 43.252, avg true_objective: 15.920 [2024-10-13 23:24:23,856][00667] Num frames 4800... [2024-10-13 23:24:24,033][00667] Num frames 4900... [2024-10-13 23:24:24,162][00667] Num frames 5000... [2024-10-13 23:24:24,294][00667] Num frames 5100... [2024-10-13 23:24:24,429][00667] Num frames 5200... [2024-10-13 23:24:24,557][00667] Num frames 5300... [2024-10-13 23:24:24,692][00667] Num frames 5400... [2024-10-13 23:24:24,818][00667] Num frames 5500... [2024-10-13 23:24:24,946][00667] Num frames 5600... [2024-10-13 23:24:25,078][00667] Num frames 5700... [2024-10-13 23:24:25,205][00667] Num frames 5800... [2024-10-13 23:24:25,340][00667] Num frames 5900... [2024-10-13 23:24:25,468][00667] Num frames 6000... [2024-10-13 23:24:25,637][00667] Avg episode rewards: #0: 39.219, true rewards: #0: 15.220 [2024-10-13 23:24:25,639][00667] Avg episode reward: 39.219, avg true_objective: 15.220 [2024-10-13 23:24:25,658][00667] Num frames 6100... [2024-10-13 23:24:25,782][00667] Num frames 6200... [2024-10-13 23:24:25,909][00667] Num frames 6300... [2024-10-13 23:24:26,041][00667] Num frames 6400... [2024-10-13 23:24:26,171][00667] Num frames 6500... [2024-10-13 23:24:26,306][00667] Num frames 6600... [2024-10-13 23:24:26,488][00667] Avg episode rewards: #0: 33.192, true rewards: #0: 13.392 [2024-10-13 23:24:26,490][00667] Avg episode reward: 33.192, avg true_objective: 13.392 [2024-10-13 23:24:26,500][00667] Num frames 6700... [2024-10-13 23:24:26,624][00667] Num frames 6800... [2024-10-13 23:24:26,759][00667] Num frames 6900... [2024-10-13 23:24:26,883][00667] Num frames 7000... [2024-10-13 23:24:27,011][00667] Num frames 7100... [2024-10-13 23:24:27,135][00667] Num frames 7200... [2024-10-13 23:24:27,264][00667] Num frames 7300... [2024-10-13 23:24:27,397][00667] Num frames 7400... [2024-10-13 23:24:27,524][00667] Num frames 7500... [2024-10-13 23:24:27,655][00667] Num frames 7600... [2024-10-13 23:24:27,799][00667] Num frames 7700... [2024-10-13 23:24:27,924][00667] Num frames 7800... [2024-10-13 23:24:28,051][00667] Num frames 7900... [2024-10-13 23:24:28,183][00667] Num frames 8000... [2024-10-13 23:24:28,364][00667] Avg episode rewards: #0: 33.988, true rewards: #0: 13.488 [2024-10-13 23:24:28,366][00667] Avg episode reward: 33.988, avg true_objective: 13.488 [2024-10-13 23:24:28,379][00667] Num frames 8100... [2024-10-13 23:24:28,502][00667] Num frames 8200... [2024-10-13 23:24:28,634][00667] Num frames 8300... [2024-10-13 23:24:28,777][00667] Num frames 8400... [2024-10-13 23:24:28,909][00667] Num frames 8500... [2024-10-13 23:24:29,039][00667] Num frames 8600... [2024-10-13 23:24:29,170][00667] Num frames 8700... [2024-10-13 23:24:29,431][00667] Num frames 8800... [2024-10-13 23:24:29,563][00667] Avg episode rewards: #0: 31.608, true rewards: #0: 12.609 [2024-10-13 23:24:29,565][00667] Avg episode reward: 31.608, avg true_objective: 12.609 [2024-10-13 23:24:29,664][00667] Num frames 8900... [2024-10-13 23:24:29,799][00667] Num frames 9000... [2024-10-13 23:24:29,926][00667] Num frames 9100... [2024-10-13 23:24:30,053][00667] Num frames 9200... [2024-10-13 23:24:30,198][00667] Num frames 9300... [2024-10-13 23:24:30,467][00667] Num frames 9400... [2024-10-13 23:24:30,593][00667] Num frames 9500... [2024-10-13 23:24:30,722][00667] Num frames 9600... [2024-10-13 23:24:30,864][00667] Num frames 9700... [2024-10-13 23:24:30,995][00667] Num frames 9800... [2024-10-13 23:24:31,139][00667] Avg episode rewards: #0: 30.460, true rewards: #0: 12.335 [2024-10-13 23:24:31,141][00667] Avg episode reward: 30.460, avg true_objective: 12.335 [2024-10-13 23:24:31,186][00667] Num frames 9900... [2024-10-13 23:24:31,316][00667] Num frames 10000... [2024-10-13 23:24:31,455][00667] Num frames 10100... [2024-10-13 23:24:31,591][00667] Num frames 10200... [2024-10-13 23:24:31,720][00667] Num frames 10300... [2024-10-13 23:24:31,856][00667] Num frames 10400... [2024-10-13 23:24:31,982][00667] Num frames 10500... [2024-10-13 23:24:32,108][00667] Num frames 10600... [2024-10-13 23:24:32,237][00667] Num frames 10700... [2024-10-13 23:24:32,373][00667] Num frames 10800... [2024-10-13 23:24:32,502][00667] Num frames 10900... [2024-10-13 23:24:32,634][00667] Num frames 11000... [2024-10-13 23:24:32,764][00667] Num frames 11100... [2024-10-13 23:24:32,862][00667] Avg episode rewards: #0: 30.472, true rewards: #0: 12.361 [2024-10-13 23:24:32,864][00667] Avg episode reward: 30.472, avg true_objective: 12.361 [2024-10-13 23:24:32,962][00667] Num frames 11200... [2024-10-13 23:24:33,089][00667] Num frames 11300... [2024-10-13 23:24:33,221][00667] Num frames 11400... [2024-10-13 23:24:33,362][00667] Num frames 11500... [2024-10-13 23:24:33,486][00667] Num frames 11600... [2024-10-13 23:24:33,616][00667] Num frames 11700... [2024-10-13 23:24:33,743][00667] Num frames 11800... [2024-10-13 23:24:33,883][00667] Avg episode rewards: #0: 28.561, true rewards: #0: 11.861 [2024-10-13 23:24:33,885][00667] Avg episode reward: 28.561, avg true_objective: 11.861 [2024-10-13 23:25:49,882][00667] Replay video saved to /content/train_dir/default_experiment/replay.mp4!