[2024-11-06 04:22:32,482][00514] Saving configuration to /content/train_dir/default_experiment/config.json... [2024-11-06 04:22:32,484][00514] Rollout worker 0 uses device cpu [2024-11-06 04:22:32,485][00514] Rollout worker 1 uses device cpu [2024-11-06 04:22:32,488][00514] Rollout worker 2 uses device cpu [2024-11-06 04:22:32,489][00514] Rollout worker 3 uses device cpu [2024-11-06 04:22:32,491][00514] Rollout worker 4 uses device cpu [2024-11-06 04:22:32,492][00514] Rollout worker 5 uses device cpu [2024-11-06 04:22:32,493][00514] Rollout worker 6 uses device cpu [2024-11-06 04:22:32,494][00514] Rollout worker 7 uses device cpu [2024-11-06 04:22:32,649][00514] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-11-06 04:22:32,651][00514] InferenceWorker_p0-w0: min num requests: 2 [2024-11-06 04:22:32,684][00514] Starting all processes... [2024-11-06 04:22:32,685][00514] Starting process learner_proc0 [2024-11-06 04:22:32,731][00514] Starting all processes... [2024-11-06 04:22:32,739][00514] Starting process inference_proc0-0 [2024-11-06 04:22:32,739][00514] Starting process rollout_proc0 [2024-11-06 04:22:32,741][00514] Starting process rollout_proc1 [2024-11-06 04:22:32,741][00514] Starting process rollout_proc2 [2024-11-06 04:22:32,741][00514] Starting process rollout_proc3 [2024-11-06 04:22:32,741][00514] Starting process rollout_proc4 [2024-11-06 04:22:32,741][00514] Starting process rollout_proc5 [2024-11-06 04:22:32,741][00514] Starting process rollout_proc6 [2024-11-06 04:22:32,741][00514] Starting process rollout_proc7 [2024-11-06 04:22:50,136][02522] Worker 5 uses CPU cores [1] [2024-11-06 04:22:50,293][02516] Worker 1 uses CPU cores [1] [2024-11-06 04:22:50,487][02518] Worker 3 uses CPU cores [1] [2024-11-06 04:22:50,685][02511] Worker 0 uses CPU cores [0] [2024-11-06 04:22:50,735][02517] Worker 2 uses CPU cores [0] [2024-11-06 04:22:50,760][02519] Worker 6 uses CPU cores [0] [2024-11-06 04:22:50,774][02497] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-11-06 04:22:50,774][02497] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2024-11-06 04:22:50,778][02520] Worker 7 uses CPU cores [1] [2024-11-06 04:22:50,817][02497] Num visible devices: 1 [2024-11-06 04:22:50,835][02497] Starting seed is not provided [2024-11-06 04:22:50,836][02497] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-11-06 04:22:50,837][02497] Initializing actor-critic model on device cuda:0 [2024-11-06 04:22:50,838][02497] RunningMeanStd input shape: (3, 72, 128) [2024-11-06 04:22:50,841][02497] RunningMeanStd input shape: (1,) [2024-11-06 04:22:50,840][02521] Worker 4 uses CPU cores [0] [2024-11-06 04:22:50,879][02510] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-11-06 04:22:50,879][02510] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2024-11-06 04:22:50,885][02497] ConvEncoder: input_channels=3 [2024-11-06 04:22:50,908][02510] Num visible devices: 1 [2024-11-06 04:22:51,210][02497] Conv encoder output size: 512 [2024-11-06 04:22:51,210][02497] Policy head output size: 512 [2024-11-06 04:22:51,271][02497] Created Actor Critic model with architecture: [2024-11-06 04:22:51,272][02497] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2024-11-06 04:22:51,637][02497] Using optimizer [2024-11-06 04:22:52,641][00514] Heartbeat connected on Batcher_0 [2024-11-06 04:22:52,650][00514] Heartbeat connected on InferenceWorker_p0-w0 [2024-11-06 04:22:52,659][00514] Heartbeat connected on RolloutWorker_w0 [2024-11-06 04:22:52,663][00514] Heartbeat connected on RolloutWorker_w1 [2024-11-06 04:22:52,667][00514] Heartbeat connected on RolloutWorker_w2 [2024-11-06 04:22:52,669][00514] Heartbeat connected on RolloutWorker_w3 [2024-11-06 04:22:52,676][00514] Heartbeat connected on RolloutWorker_w5 [2024-11-06 04:22:52,679][00514] Heartbeat connected on RolloutWorker_w4 [2024-11-06 04:22:52,682][00514] Heartbeat connected on RolloutWorker_w6 [2024-11-06 04:22:52,685][00514] Heartbeat connected on RolloutWorker_w7 [2024-11-06 04:22:54,881][02497] No checkpoints found [2024-11-06 04:22:54,881][02497] Did not load from checkpoint, starting from scratch! [2024-11-06 04:22:54,881][02497] Initialized policy 0 weights for model version 0 [2024-11-06 04:22:54,885][02497] LearnerWorker_p0 finished initialization! [2024-11-06 04:22:54,888][02497] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-11-06 04:22:54,886][00514] Heartbeat connected on LearnerWorker_p0 [2024-11-06 04:22:54,985][02510] RunningMeanStd input shape: (3, 72, 128) [2024-11-06 04:22:54,986][02510] RunningMeanStd input shape: (1,) [2024-11-06 04:22:54,998][02510] ConvEncoder: input_channels=3 [2024-11-06 04:22:55,102][02510] Conv encoder output size: 512 [2024-11-06 04:22:55,103][02510] Policy head output size: 512 [2024-11-06 04:22:55,155][00514] Inference worker 0-0 is ready! [2024-11-06 04:22:55,159][00514] All inference workers are ready! Signal rollout workers to start! [2024-11-06 04:22:55,360][02522] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-06 04:22:55,363][02520] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-06 04:22:55,364][02518] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-06 04:22:55,365][02516] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-06 04:22:55,371][02519] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-06 04:22:55,377][02521] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-06 04:22:55,377][02517] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-06 04:22:55,383][02511] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-06 04:22:56,781][02516] Decorrelating experience for 0 frames... [2024-11-06 04:22:56,780][02520] Decorrelating experience for 0 frames... [2024-11-06 04:22:56,783][02522] Decorrelating experience for 0 frames... [2024-11-06 04:22:57,040][02521] Decorrelating experience for 0 frames... [2024-11-06 04:22:57,036][02511] Decorrelating experience for 0 frames... [2024-11-06 04:22:57,038][02519] Decorrelating experience for 0 frames... [2024-11-06 04:22:57,040][02517] Decorrelating experience for 0 frames... [2024-11-06 04:22:57,401][02511] Decorrelating experience for 32 frames... [2024-11-06 04:22:57,959][02518] Decorrelating experience for 0 frames... [2024-11-06 04:22:57,980][02520] Decorrelating experience for 32 frames... [2024-11-06 04:22:57,982][02516] Decorrelating experience for 32 frames... [2024-11-06 04:22:58,388][02522] Decorrelating experience for 32 frames... [2024-11-06 04:22:58,469][00514] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-11-06 04:22:58,843][02518] Decorrelating experience for 32 frames... [2024-11-06 04:22:58,884][02511] Decorrelating experience for 64 frames... [2024-11-06 04:22:59,124][02521] Decorrelating experience for 32 frames... [2024-11-06 04:22:59,616][02519] Decorrelating experience for 32 frames... [2024-11-06 04:22:59,868][02522] Decorrelating experience for 64 frames... [2024-11-06 04:23:00,170][02518] Decorrelating experience for 64 frames... [2024-11-06 04:23:00,186][02511] Decorrelating experience for 96 frames... [2024-11-06 04:23:00,329][02516] Decorrelating experience for 64 frames... [2024-11-06 04:23:00,979][02517] Decorrelating experience for 32 frames... [2024-11-06 04:23:01,352][02521] Decorrelating experience for 64 frames... [2024-11-06 04:23:01,528][02522] Decorrelating experience for 96 frames... [2024-11-06 04:23:01,760][02519] Decorrelating experience for 64 frames... [2024-11-06 04:23:02,051][02518] Decorrelating experience for 96 frames... [2024-11-06 04:23:02,319][02516] Decorrelating experience for 96 frames... [2024-11-06 04:23:03,279][02520] Decorrelating experience for 64 frames... [2024-11-06 04:23:03,469][00514] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 3.2. Samples: 16. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-11-06 04:23:03,471][00514] Avg episode reward: [(0, '0.640')] [2024-11-06 04:23:04,248][02517] Decorrelating experience for 64 frames... [2024-11-06 04:23:04,863][02519] Decorrelating experience for 96 frames... [2024-11-06 04:23:04,950][02521] Decorrelating experience for 96 frames... [2024-11-06 04:23:07,408][02520] Decorrelating experience for 96 frames... [2024-11-06 04:23:08,472][00514] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 179.2. Samples: 1792. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-11-06 04:23:08,477][00514] Avg episode reward: [(0, '2.905')] [2024-11-06 04:23:08,630][02497] Signal inference workers to stop experience collection... [2024-11-06 04:23:08,667][02510] InferenceWorker_p0-w0: stopping experience collection [2024-11-06 04:23:09,126][02517] Decorrelating experience for 96 frames... [2024-11-06 04:23:11,707][02497] Signal inference workers to resume experience collection... [2024-11-06 04:23:11,709][02510] InferenceWorker_p0-w0: resuming experience collection [2024-11-06 04:23:13,469][00514] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 12288. Throughput: 0: 241.2. Samples: 3618. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-11-06 04:23:13,473][00514] Avg episode reward: [(0, '3.081')] [2024-11-06 04:23:18,469][00514] Fps is (10 sec: 3687.4, 60 sec: 1843.2, 300 sec: 1843.2). Total num frames: 36864. Throughput: 0: 356.8. Samples: 7136. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-11-06 04:23:18,476][00514] Avg episode reward: [(0, '3.775')] [2024-11-06 04:23:19,258][02510] Updated weights for policy 0, policy_version 10 (0.0151) [2024-11-06 04:23:23,469][00514] Fps is (10 sec: 3686.4, 60 sec: 1966.1, 300 sec: 1966.1). Total num frames: 49152. Throughput: 0: 511.2. Samples: 12780. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-06 04:23:23,473][00514] Avg episode reward: [(0, '4.411')] [2024-11-06 04:23:28,469][00514] Fps is (10 sec: 3276.8, 60 sec: 2321.1, 300 sec: 2321.1). Total num frames: 69632. Throughput: 0: 589.4. Samples: 17682. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-11-06 04:23:28,476][00514] Avg episode reward: [(0, '4.503')] [2024-11-06 04:23:30,978][02510] Updated weights for policy 0, policy_version 20 (0.0031) [2024-11-06 04:23:33,469][00514] Fps is (10 sec: 4095.9, 60 sec: 2574.6, 300 sec: 2574.6). Total num frames: 90112. Throughput: 0: 605.0. Samples: 21176. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-11-06 04:23:33,476][00514] Avg episode reward: [(0, '4.327')] [2024-11-06 04:23:38,469][00514] Fps is (10 sec: 4096.0, 60 sec: 2764.8, 300 sec: 2764.8). Total num frames: 110592. Throughput: 0: 701.2. Samples: 28048. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-06 04:23:38,473][00514] Avg episode reward: [(0, '4.474')] [2024-11-06 04:23:38,478][02497] Saving new best policy, reward=4.474! [2024-11-06 04:23:41,945][02510] Updated weights for policy 0, policy_version 30 (0.0036) [2024-11-06 04:23:43,469][00514] Fps is (10 sec: 3686.5, 60 sec: 2821.7, 300 sec: 2821.7). Total num frames: 126976. Throughput: 0: 717.7. Samples: 32296. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-11-06 04:23:43,472][00514] Avg episode reward: [(0, '4.502')] [2024-11-06 04:23:43,483][02497] Saving new best policy, reward=4.502! [2024-11-06 04:23:48,469][00514] Fps is (10 sec: 3686.4, 60 sec: 2949.1, 300 sec: 2949.1). Total num frames: 147456. Throughput: 0: 789.7. Samples: 35554. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 04:23:48,471][00514] Avg episode reward: [(0, '4.594')] [2024-11-06 04:23:48,477][02497] Saving new best policy, reward=4.594! [2024-11-06 04:23:51,321][02510] Updated weights for policy 0, policy_version 40 (0.0023) [2024-11-06 04:23:53,469][00514] Fps is (10 sec: 4505.6, 60 sec: 3127.9, 300 sec: 3127.9). Total num frames: 172032. Throughput: 0: 908.5. Samples: 42670. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-06 04:23:53,477][00514] Avg episode reward: [(0, '4.551')] [2024-11-06 04:23:58,476][00514] Fps is (10 sec: 4093.2, 60 sec: 3139.9, 300 sec: 3139.9). Total num frames: 188416. Throughput: 0: 981.7. Samples: 47802. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-06 04:23:58,481][00514] Avg episode reward: [(0, '4.552')] [2024-11-06 04:24:02,643][02510] Updated weights for policy 0, policy_version 50 (0.0030) [2024-11-06 04:24:03,469][00514] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3150.8). Total num frames: 204800. Throughput: 0: 956.3. Samples: 50168. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-06 04:24:03,476][00514] Avg episode reward: [(0, '4.635')] [2024-11-06 04:24:03,518][02497] Saving new best policy, reward=4.635! [2024-11-06 04:24:08,469][00514] Fps is (10 sec: 4098.7, 60 sec: 3823.1, 300 sec: 3276.8). Total num frames: 229376. Throughput: 0: 986.8. Samples: 57184. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 04:24:08,472][00514] Avg episode reward: [(0, '4.413')] [2024-11-06 04:24:11,803][02510] Updated weights for policy 0, policy_version 60 (0.0021) [2024-11-06 04:24:13,469][00514] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3331.4). Total num frames: 249856. Throughput: 0: 1009.6. Samples: 63112. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 04:24:13,474][00514] Avg episode reward: [(0, '4.765')] [2024-11-06 04:24:13,482][02497] Saving new best policy, reward=4.765! [2024-11-06 04:24:18,469][00514] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3276.8). Total num frames: 262144. Throughput: 0: 978.5. Samples: 65208. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-06 04:24:18,471][00514] Avg episode reward: [(0, '4.847')] [2024-11-06 04:24:18,477][02497] Saving new best policy, reward=4.847! [2024-11-06 04:24:22,788][02510] Updated weights for policy 0, policy_version 70 (0.0029) [2024-11-06 04:24:23,469][00514] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3373.2). Total num frames: 286720. Throughput: 0: 966.2. Samples: 71526. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 04:24:23,471][00514] Avg episode reward: [(0, '4.745')] [2024-11-06 04:24:23,529][02497] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000071_290816.pth... [2024-11-06 04:24:28,469][00514] Fps is (10 sec: 4915.2, 60 sec: 4027.7, 300 sec: 3458.9). Total num frames: 311296. Throughput: 0: 1030.5. Samples: 78668. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-11-06 04:24:28,475][00514] Avg episode reward: [(0, '4.900')] [2024-11-06 04:24:28,477][02497] Saving new best policy, reward=4.900! [2024-11-06 04:24:33,435][02510] Updated weights for policy 0, policy_version 80 (0.0025) [2024-11-06 04:24:33,469][00514] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3449.3). Total num frames: 327680. Throughput: 0: 1004.1. Samples: 80740. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-11-06 04:24:33,473][00514] Avg episode reward: [(0, '4.799')] [2024-11-06 04:24:38,469][00514] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3440.6). Total num frames: 344064. Throughput: 0: 961.0. Samples: 85916. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-11-06 04:24:38,476][00514] Avg episode reward: [(0, '4.590')] [2024-11-06 04:24:43,091][02510] Updated weights for policy 0, policy_version 90 (0.0028) [2024-11-06 04:24:43,469][00514] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3510.9). Total num frames: 368640. Throughput: 0: 996.9. Samples: 92658. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-11-06 04:24:43,471][00514] Avg episode reward: [(0, '4.659')] [2024-11-06 04:24:48,469][00514] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3500.2). Total num frames: 385024. Throughput: 0: 1012.0. Samples: 95710. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-11-06 04:24:48,473][00514] Avg episode reward: [(0, '4.971')] [2024-11-06 04:24:48,483][02497] Saving new best policy, reward=4.971! [2024-11-06 04:24:53,469][00514] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3490.5). Total num frames: 401408. Throughput: 0: 952.2. Samples: 100032. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-06 04:24:53,471][00514] Avg episode reward: [(0, '5.068')] [2024-11-06 04:24:53,493][02497] Saving new best policy, reward=5.068! [2024-11-06 04:24:54,648][02510] Updated weights for policy 0, policy_version 100 (0.0039) [2024-11-06 04:24:58,469][00514] Fps is (10 sec: 4096.0, 60 sec: 3959.9, 300 sec: 3549.9). Total num frames: 425984. Throughput: 0: 976.5. Samples: 107054. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-11-06 04:24:58,471][00514] Avg episode reward: [(0, '4.820')] [2024-11-06 04:25:03,469][00514] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3571.7). Total num frames: 446464. Throughput: 0: 1008.6. Samples: 110596. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-06 04:25:03,472][00514] Avg episode reward: [(0, '4.808')] [2024-11-06 04:25:04,336][02510] Updated weights for policy 0, policy_version 110 (0.0018) [2024-11-06 04:25:08,469][00514] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3528.9). Total num frames: 458752. Throughput: 0: 972.7. Samples: 115296. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 04:25:08,471][00514] Avg episode reward: [(0, '4.938')] [2024-11-06 04:25:13,469][00514] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3580.2). Total num frames: 483328. Throughput: 0: 950.0. Samples: 121416. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-06 04:25:13,472][00514] Avg episode reward: [(0, '4.839')] [2024-11-06 04:25:14,823][02510] Updated weights for policy 0, policy_version 120 (0.0029) [2024-11-06 04:25:18,473][00514] Fps is (10 sec: 4913.4, 60 sec: 4095.8, 300 sec: 3627.8). Total num frames: 507904. Throughput: 0: 983.2. Samples: 124988. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 04:25:18,479][00514] Avg episode reward: [(0, '4.733')] [2024-11-06 04:25:23,469][00514] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3587.5). Total num frames: 520192. Throughput: 0: 993.0. Samples: 130600. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 04:25:23,478][00514] Avg episode reward: [(0, '4.841')] [2024-11-06 04:25:26,383][02510] Updated weights for policy 0, policy_version 130 (0.0022) [2024-11-06 04:25:28,469][00514] Fps is (10 sec: 3278.0, 60 sec: 3822.9, 300 sec: 3604.5). Total num frames: 540672. Throughput: 0: 964.2. Samples: 136046. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-06 04:25:28,475][00514] Avg episode reward: [(0, '5.249')] [2024-11-06 04:25:28,477][02497] Saving new best policy, reward=5.249! [2024-11-06 04:25:33,469][00514] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3646.8). Total num frames: 565248. Throughput: 0: 975.2. Samples: 139592. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-11-06 04:25:33,475][00514] Avg episode reward: [(0, '5.580')] [2024-11-06 04:25:33,484][02497] Saving new best policy, reward=5.580! [2024-11-06 04:25:35,029][02510] Updated weights for policy 0, policy_version 140 (0.0018) [2024-11-06 04:25:38,469][00514] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3660.8). Total num frames: 585728. Throughput: 0: 1021.3. Samples: 145992. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-11-06 04:25:38,473][00514] Avg episode reward: [(0, '5.641')] [2024-11-06 04:25:38,478][02497] Saving new best policy, reward=5.641! [2024-11-06 04:25:43,469][00514] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3624.3). Total num frames: 598016. Throughput: 0: 959.6. Samples: 150236. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-11-06 04:25:43,475][00514] Avg episode reward: [(0, '5.728')] [2024-11-06 04:25:43,484][02497] Saving new best policy, reward=5.728! [2024-11-06 04:25:46,412][02510] Updated weights for policy 0, policy_version 150 (0.0021) [2024-11-06 04:25:48,469][00514] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3662.3). Total num frames: 622592. Throughput: 0: 960.5. Samples: 153818. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-06 04:25:48,475][00514] Avg episode reward: [(0, '5.611')] [2024-11-06 04:25:53,470][00514] Fps is (10 sec: 4095.6, 60 sec: 3959.4, 300 sec: 3651.3). Total num frames: 638976. Throughput: 0: 993.8. Samples: 160020. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 04:25:53,474][00514] Avg episode reward: [(0, '5.459')] [2024-11-06 04:25:58,473][00514] Fps is (10 sec: 2866.1, 60 sec: 3754.4, 300 sec: 3618.1). Total num frames: 651264. Throughput: 0: 937.7. Samples: 163616. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-11-06 04:25:58,475][00514] Avg episode reward: [(0, '5.437')] [2024-11-06 04:25:59,409][02510] Updated weights for policy 0, policy_version 160 (0.0014) [2024-11-06 04:26:03,469][00514] Fps is (10 sec: 2867.5, 60 sec: 3686.4, 300 sec: 3608.9). Total num frames: 667648. Throughput: 0: 903.1. Samples: 165626. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-06 04:26:03,472][00514] Avg episode reward: [(0, '5.287')] [2024-11-06 04:26:08,469][00514] Fps is (10 sec: 4097.6, 60 sec: 3891.2, 300 sec: 3643.3). Total num frames: 692224. Throughput: 0: 926.2. Samples: 172280. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-06 04:26:08,471][00514] Avg episode reward: [(0, '5.183')] [2024-11-06 04:26:09,264][02510] Updated weights for policy 0, policy_version 170 (0.0025) [2024-11-06 04:26:13,474][00514] Fps is (10 sec: 4503.5, 60 sec: 3822.6, 300 sec: 3654.8). Total num frames: 712704. Throughput: 0: 948.4. Samples: 178730. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 04:26:13,476][00514] Avg episode reward: [(0, '5.250')] [2024-11-06 04:26:18,470][00514] Fps is (10 sec: 3276.6, 60 sec: 3618.3, 300 sec: 3624.9). Total num frames: 724992. Throughput: 0: 916.7. Samples: 180846. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 04:26:18,474][00514] Avg episode reward: [(0, '5.291')] [2024-11-06 04:26:20,924][02510] Updated weights for policy 0, policy_version 180 (0.0014) [2024-11-06 04:26:23,469][00514] Fps is (10 sec: 3688.1, 60 sec: 3822.9, 300 sec: 3656.4). Total num frames: 749568. Throughput: 0: 900.1. Samples: 186498. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 04:26:23,474][00514] Avg episode reward: [(0, '5.494')] [2024-11-06 04:26:23,487][02497] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000183_749568.pth... [2024-11-06 04:26:28,469][00514] Fps is (10 sec: 4505.9, 60 sec: 3822.9, 300 sec: 3666.9). Total num frames: 770048. Throughput: 0: 963.3. Samples: 193584. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 04:26:28,471][00514] Avg episode reward: [(0, '5.713')] [2024-11-06 04:26:29,676][02510] Updated weights for policy 0, policy_version 190 (0.0024) [2024-11-06 04:26:33,471][00514] Fps is (10 sec: 3685.5, 60 sec: 3686.3, 300 sec: 3657.8). Total num frames: 786432. Throughput: 0: 944.7. Samples: 196334. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 04:26:33,474][00514] Avg episode reward: [(0, '6.098')] [2024-11-06 04:26:33,481][02497] Saving new best policy, reward=6.098! [2024-11-06 04:26:38,469][00514] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3667.8). Total num frames: 806912. Throughput: 0: 909.9. Samples: 200964. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-11-06 04:26:38,471][00514] Avg episode reward: [(0, '6.477')] [2024-11-06 04:26:38,473][02497] Saving new best policy, reward=6.477! [2024-11-06 04:26:40,957][02510] Updated weights for policy 0, policy_version 200 (0.0020) [2024-11-06 04:26:43,469][00514] Fps is (10 sec: 4096.9, 60 sec: 3822.9, 300 sec: 3677.3). Total num frames: 827392. Throughput: 0: 984.7. Samples: 207926. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-11-06 04:26:43,471][00514] Avg episode reward: [(0, '6.825')] [2024-11-06 04:26:43,479][02497] Saving new best policy, reward=6.825! [2024-11-06 04:26:48,469][00514] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3686.4). Total num frames: 847872. Throughput: 0: 1020.4. Samples: 211544. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-06 04:26:48,471][00514] Avg episode reward: [(0, '7.235')] [2024-11-06 04:26:48,477][02497] Saving new best policy, reward=7.235! [2024-11-06 04:26:51,849][02510] Updated weights for policy 0, policy_version 210 (0.0022) [2024-11-06 04:26:53,469][00514] Fps is (10 sec: 3686.5, 60 sec: 3754.7, 300 sec: 3677.7). Total num frames: 864256. Throughput: 0: 967.6. Samples: 215820. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 04:26:53,472][00514] Avg episode reward: [(0, '7.479')] [2024-11-06 04:26:53,481][02497] Saving new best policy, reward=7.479! [2024-11-06 04:26:58,469][00514] Fps is (10 sec: 4096.0, 60 sec: 3959.7, 300 sec: 3703.5). Total num frames: 888832. Throughput: 0: 972.7. Samples: 222496. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-11-06 04:26:58,471][00514] Avg episode reward: [(0, '7.331')] [2024-11-06 04:27:00,939][02510] Updated weights for policy 0, policy_version 220 (0.0031) [2024-11-06 04:27:03,469][00514] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3711.5). Total num frames: 909312. Throughput: 0: 1003.2. Samples: 225990. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-11-06 04:27:03,474][00514] Avg episode reward: [(0, '7.107')] [2024-11-06 04:27:08,474][00514] Fps is (10 sec: 3684.5, 60 sec: 3890.9, 300 sec: 3702.7). Total num frames: 925696. Throughput: 0: 997.2. Samples: 231378. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-11-06 04:27:08,477][00514] Avg episode reward: [(0, '7.336')] [2024-11-06 04:27:12,489][02510] Updated weights for policy 0, policy_version 230 (0.0034) [2024-11-06 04:27:13,469][00514] Fps is (10 sec: 3686.4, 60 sec: 3891.5, 300 sec: 3710.5). Total num frames: 946176. Throughput: 0: 964.4. Samples: 236980. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-06 04:27:13,475][00514] Avg episode reward: [(0, '6.818')] [2024-11-06 04:27:18,469][00514] Fps is (10 sec: 4098.1, 60 sec: 4027.8, 300 sec: 3717.9). Total num frames: 966656. Throughput: 0: 980.8. Samples: 240468. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-06 04:27:18,471][00514] Avg episode reward: [(0, '6.393')] [2024-11-06 04:27:21,160][02510] Updated weights for policy 0, policy_version 240 (0.0013) [2024-11-06 04:27:23,469][00514] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3725.0). Total num frames: 987136. Throughput: 0: 1019.6. Samples: 246844. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-06 04:27:23,471][00514] Avg episode reward: [(0, '6.608')] [2024-11-06 04:27:28,469][00514] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3716.7). Total num frames: 1003520. Throughput: 0: 968.1. Samples: 251488. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-11-06 04:27:28,475][00514] Avg episode reward: [(0, '6.990')] [2024-11-06 04:27:32,521][02510] Updated weights for policy 0, policy_version 250 (0.0026) [2024-11-06 04:27:33,469][00514] Fps is (10 sec: 4096.0, 60 sec: 4027.9, 300 sec: 3738.5). Total num frames: 1028096. Throughput: 0: 967.6. Samples: 255088. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 04:27:33,473][00514] Avg episode reward: [(0, '7.210')] [2024-11-06 04:27:38,469][00514] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3744.9). Total num frames: 1048576. Throughput: 0: 1028.3. Samples: 262094. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-11-06 04:27:38,475][00514] Avg episode reward: [(0, '6.697')] [2024-11-06 04:27:43,316][02510] Updated weights for policy 0, policy_version 260 (0.0014) [2024-11-06 04:27:43,471][00514] Fps is (10 sec: 3685.8, 60 sec: 3959.4, 300 sec: 3736.7). Total num frames: 1064960. Throughput: 0: 979.1. Samples: 266558. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-11-06 04:27:43,473][00514] Avg episode reward: [(0, '7.287')] [2024-11-06 04:27:48,469][00514] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3742.9). Total num frames: 1085440. Throughput: 0: 964.8. Samples: 269408. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-11-06 04:27:48,476][00514] Avg episode reward: [(0, '7.639')] [2024-11-06 04:27:48,479][02497] Saving new best policy, reward=7.639! [2024-11-06 04:27:52,445][02510] Updated weights for policy 0, policy_version 270 (0.0021) [2024-11-06 04:27:53,469][00514] Fps is (10 sec: 4506.3, 60 sec: 4096.0, 300 sec: 3762.8). Total num frames: 1110016. Throughput: 0: 1003.7. Samples: 276540. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-11-06 04:27:53,474][00514] Avg episode reward: [(0, '8.098')] [2024-11-06 04:27:53,488][02497] Saving new best policy, reward=8.098! [2024-11-06 04:27:58,469][00514] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3818.3). Total num frames: 1126400. Throughput: 0: 1002.4. Samples: 282086. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-11-06 04:27:58,474][00514] Avg episode reward: [(0, '8.059')] [2024-11-06 04:28:03,469][00514] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3873.9). Total num frames: 1142784. Throughput: 0: 971.9. Samples: 284206. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-11-06 04:28:03,476][00514] Avg episode reward: [(0, '8.159')] [2024-11-06 04:28:03,488][02497] Saving new best policy, reward=8.159! [2024-11-06 04:28:04,034][02510] Updated weights for policy 0, policy_version 280 (0.0024) [2024-11-06 04:28:08,469][00514] Fps is (10 sec: 4096.0, 60 sec: 4028.1, 300 sec: 3915.5). Total num frames: 1167360. Throughput: 0: 980.4. Samples: 290964. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-11-06 04:28:08,476][00514] Avg episode reward: [(0, '8.899')] [2024-11-06 04:28:08,480][02497] Saving new best policy, reward=8.899! [2024-11-06 04:28:13,263][02510] Updated weights for policy 0, policy_version 290 (0.0024) [2024-11-06 04:28:13,470][00514] Fps is (10 sec: 4505.4, 60 sec: 4027.7, 300 sec: 3901.6). Total num frames: 1187840. Throughput: 0: 1016.3. Samples: 297220. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-11-06 04:28:13,472][00514] Avg episode reward: [(0, '9.219')] [2024-11-06 04:28:13,490][02497] Saving new best policy, reward=9.219! [2024-11-06 04:28:18,469][00514] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3901.6). Total num frames: 1200128. Throughput: 0: 982.6. Samples: 299306. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-11-06 04:28:18,475][00514] Avg episode reward: [(0, '9.290')] [2024-11-06 04:28:18,477][02497] Saving new best policy, reward=9.290! [2024-11-06 04:28:23,469][00514] Fps is (10 sec: 3686.6, 60 sec: 3959.5, 300 sec: 3915.5). Total num frames: 1224704. Throughput: 0: 959.6. Samples: 305274. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-06 04:28:23,474][00514] Avg episode reward: [(0, '9.798')] [2024-11-06 04:28:23,485][02497] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000299_1224704.pth... [2024-11-06 04:28:23,617][02497] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000071_290816.pth [2024-11-06 04:28:23,628][02497] Saving new best policy, reward=9.798! [2024-11-06 04:28:24,517][02510] Updated weights for policy 0, policy_version 300 (0.0041) [2024-11-06 04:28:28,469][00514] Fps is (10 sec: 4505.5, 60 sec: 4027.7, 300 sec: 3915.5). Total num frames: 1245184. Throughput: 0: 1016.4. Samples: 312296. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 04:28:28,471][00514] Avg episode reward: [(0, '10.124')] [2024-11-06 04:28:28,483][02497] Saving new best policy, reward=10.124! [2024-11-06 04:28:33,469][00514] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3901.6). Total num frames: 1261568. Throughput: 0: 1007.4. Samples: 314742. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 04:28:33,476][00514] Avg episode reward: [(0, '10.419')] [2024-11-06 04:28:33,488][02497] Saving new best policy, reward=10.419! [2024-11-06 04:28:35,871][02510] Updated weights for policy 0, policy_version 310 (0.0016) [2024-11-06 04:28:38,469][00514] Fps is (10 sec: 3686.5, 60 sec: 3891.2, 300 sec: 3915.5). Total num frames: 1282048. Throughput: 0: 957.6. Samples: 319632. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 04:28:38,476][00514] Avg episode reward: [(0, '10.945')] [2024-11-06 04:28:38,479][02497] Saving new best policy, reward=10.945! [2024-11-06 04:28:43,469][00514] Fps is (10 sec: 4096.0, 60 sec: 3959.6, 300 sec: 3915.5). Total num frames: 1302528. Throughput: 0: 986.0. Samples: 326458. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 04:28:43,471][00514] Avg episode reward: [(0, '11.368')] [2024-11-06 04:28:43,484][02497] Saving new best policy, reward=11.368! [2024-11-06 04:28:44,718][02510] Updated weights for policy 0, policy_version 320 (0.0034) [2024-11-06 04:28:48,470][00514] Fps is (10 sec: 4095.7, 60 sec: 3959.4, 300 sec: 3901.6). Total num frames: 1323008. Throughput: 0: 1014.3. Samples: 329848. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 04:28:48,472][00514] Avg episode reward: [(0, '11.046')] [2024-11-06 04:28:53,469][00514] Fps is (10 sec: 3276.7, 60 sec: 3754.7, 300 sec: 3887.8). Total num frames: 1335296. Throughput: 0: 961.0. Samples: 334210. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-06 04:28:53,475][00514] Avg episode reward: [(0, '11.606')] [2024-11-06 04:28:53,509][02497] Saving new best policy, reward=11.606! [2024-11-06 04:28:56,143][02510] Updated weights for policy 0, policy_version 330 (0.0034) [2024-11-06 04:28:58,469][00514] Fps is (10 sec: 3686.5, 60 sec: 3891.2, 300 sec: 3915.5). Total num frames: 1359872. Throughput: 0: 971.2. Samples: 340924. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 04:28:58,474][00514] Avg episode reward: [(0, '12.622')] [2024-11-06 04:28:58,476][02497] Saving new best policy, reward=12.622! [2024-11-06 04:29:03,469][00514] Fps is (10 sec: 4915.3, 60 sec: 4027.8, 300 sec: 3915.5). Total num frames: 1384448. Throughput: 0: 1001.1. Samples: 344354. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-06 04:29:03,472][00514] Avg episode reward: [(0, '13.184')] [2024-11-06 04:29:03,480][02497] Saving new best policy, reward=13.184! [2024-11-06 04:29:06,052][02510] Updated weights for policy 0, policy_version 340 (0.0033) [2024-11-06 04:29:08,469][00514] Fps is (10 sec: 3686.5, 60 sec: 3822.9, 300 sec: 3887.7). Total num frames: 1396736. Throughput: 0: 983.0. Samples: 349510. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-06 04:29:08,475][00514] Avg episode reward: [(0, '13.825')] [2024-11-06 04:29:08,481][02497] Saving new best policy, reward=13.825! [2024-11-06 04:29:13,469][00514] Fps is (10 sec: 3276.8, 60 sec: 3823.0, 300 sec: 3915.5). Total num frames: 1417216. Throughput: 0: 954.0. Samples: 355224. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-11-06 04:29:13,475][00514] Avg episode reward: [(0, '15.136')] [2024-11-06 04:29:13,484][02497] Saving new best policy, reward=15.136! [2024-11-06 04:29:16,265][02510] Updated weights for policy 0, policy_version 350 (0.0023) [2024-11-06 04:29:18,469][00514] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3915.5). Total num frames: 1441792. Throughput: 0: 978.5. Samples: 358776. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-11-06 04:29:18,475][00514] Avg episode reward: [(0, '13.768')] [2024-11-06 04:29:23,469][00514] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3887.7). Total num frames: 1458176. Throughput: 0: 1006.0. Samples: 364904. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 04:29:23,474][00514] Avg episode reward: [(0, '13.981')] [2024-11-06 04:29:27,514][02510] Updated weights for policy 0, policy_version 360 (0.0028) [2024-11-06 04:29:28,469][00514] Fps is (10 sec: 3686.3, 60 sec: 3891.2, 300 sec: 3901.6). Total num frames: 1478656. Throughput: 0: 964.6. Samples: 369866. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-11-06 04:29:28,476][00514] Avg episode reward: [(0, '14.782')] [2024-11-06 04:29:33,469][00514] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3915.5). Total num frames: 1499136. Throughput: 0: 967.8. Samples: 373398. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-06 04:29:33,475][00514] Avg episode reward: [(0, '14.413')] [2024-11-06 04:29:36,139][02510] Updated weights for policy 0, policy_version 370 (0.0014) [2024-11-06 04:29:38,469][00514] Fps is (10 sec: 4505.7, 60 sec: 4027.7, 300 sec: 3915.5). Total num frames: 1523712. Throughput: 0: 1029.9. Samples: 380554. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-11-06 04:29:38,478][00514] Avg episode reward: [(0, '15.103')] [2024-11-06 04:29:43,469][00514] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3901.6). Total num frames: 1536000. Throughput: 0: 978.0. Samples: 384934. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-11-06 04:29:43,475][00514] Avg episode reward: [(0, '14.319')] [2024-11-06 04:29:47,492][02510] Updated weights for policy 0, policy_version 380 (0.0028) [2024-11-06 04:29:48,469][00514] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3915.5). Total num frames: 1556480. Throughput: 0: 970.2. Samples: 388012. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-11-06 04:29:48,471][00514] Avg episode reward: [(0, '13.701')] [2024-11-06 04:29:53,469][00514] Fps is (10 sec: 4915.1, 60 sec: 4164.3, 300 sec: 3929.4). Total num frames: 1585152. Throughput: 0: 1016.8. Samples: 395266. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-06 04:29:53,472][00514] Avg episode reward: [(0, '12.711')] [2024-11-06 04:29:57,186][02510] Updated weights for policy 0, policy_version 390 (0.0034) [2024-11-06 04:29:58,469][00514] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3901.6). Total num frames: 1597440. Throughput: 0: 1008.5. Samples: 400608. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-11-06 04:29:58,472][00514] Avg episode reward: [(0, '13.373')] [2024-11-06 04:30:03,469][00514] Fps is (10 sec: 3276.9, 60 sec: 3891.2, 300 sec: 3929.4). Total num frames: 1617920. Throughput: 0: 978.5. Samples: 402810. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-11-06 04:30:03,471][00514] Avg episode reward: [(0, '13.853')] [2024-11-06 04:30:07,427][02510] Updated weights for policy 0, policy_version 400 (0.0025) [2024-11-06 04:30:08,469][00514] Fps is (10 sec: 4505.5, 60 sec: 4096.0, 300 sec: 3929.4). Total num frames: 1642496. Throughput: 0: 996.2. Samples: 409732. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-06 04:30:08,477][00514] Avg episode reward: [(0, '15.465')] [2024-11-06 04:30:08,482][02497] Saving new best policy, reward=15.465! [2024-11-06 04:30:13,469][00514] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3901.7). Total num frames: 1658880. Throughput: 0: 1023.4. Samples: 415920. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-06 04:30:13,473][00514] Avg episode reward: [(0, '15.542')] [2024-11-06 04:30:13,490][02497] Saving new best policy, reward=15.542! [2024-11-06 04:30:18,469][00514] Fps is (10 sec: 3276.9, 60 sec: 3891.2, 300 sec: 3915.5). Total num frames: 1675264. Throughput: 0: 990.4. Samples: 417968. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-06 04:30:18,478][00514] Avg episode reward: [(0, '15.957')] [2024-11-06 04:30:18,483][02497] Saving new best policy, reward=15.957! [2024-11-06 04:30:19,274][02510] Updated weights for policy 0, policy_version 410 (0.0034) [2024-11-06 04:30:23,470][00514] Fps is (10 sec: 3686.1, 60 sec: 3959.4, 300 sec: 3915.5). Total num frames: 1695744. Throughput: 0: 962.8. Samples: 423880. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-06 04:30:23,472][00514] Avg episode reward: [(0, '15.937')] [2024-11-06 04:30:23,491][02497] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000414_1695744.pth... [2024-11-06 04:30:23,680][02497] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000183_749568.pth [2024-11-06 04:30:28,469][00514] Fps is (10 sec: 3686.3, 60 sec: 3891.2, 300 sec: 3887.7). Total num frames: 1712128. Throughput: 0: 967.7. Samples: 428480. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 04:30:28,473][00514] Avg episode reward: [(0, '15.086')] [2024-11-06 04:30:31,807][02510] Updated weights for policy 0, policy_version 420 (0.0033) [2024-11-06 04:30:33,470][00514] Fps is (10 sec: 2867.3, 60 sec: 3754.6, 300 sec: 3860.0). Total num frames: 1724416. Throughput: 0: 941.1. Samples: 430362. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-06 04:30:33,477][00514] Avg episode reward: [(0, '15.396')] [2024-11-06 04:30:38,469][00514] Fps is (10 sec: 2867.2, 60 sec: 3618.1, 300 sec: 3873.8). Total num frames: 1740800. Throughput: 0: 879.9. Samples: 434860. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 04:30:38,471][00514] Avg episode reward: [(0, '15.961')] [2024-11-06 04:30:38,476][02497] Saving new best policy, reward=15.961! [2024-11-06 04:30:42,448][02510] Updated weights for policy 0, policy_version 430 (0.0022) [2024-11-06 04:30:43,469][00514] Fps is (10 sec: 4096.2, 60 sec: 3822.9, 300 sec: 3873.8). Total num frames: 1765376. Throughput: 0: 913.3. Samples: 441706. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-11-06 04:30:43,477][00514] Avg episode reward: [(0, '16.140')] [2024-11-06 04:30:43,487][02497] Saving new best policy, reward=16.140! [2024-11-06 04:30:48,471][00514] Fps is (10 sec: 4504.9, 60 sec: 3822.8, 300 sec: 3887.7). Total num frames: 1785856. Throughput: 0: 941.5. Samples: 445180. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-11-06 04:30:48,473][00514] Avg episode reward: [(0, '17.088')] [2024-11-06 04:30:48,480][02497] Saving new best policy, reward=17.088! [2024-11-06 04:30:53,469][00514] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3887.8). Total num frames: 1798144. Throughput: 0: 886.6. Samples: 449630. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-06 04:30:53,472][00514] Avg episode reward: [(0, '18.036')] [2024-11-06 04:30:53,486][02497] Saving new best policy, reward=18.036! [2024-11-06 04:30:54,088][02510] Updated weights for policy 0, policy_version 440 (0.0018) [2024-11-06 04:30:58,469][00514] Fps is (10 sec: 3687.0, 60 sec: 3754.7, 300 sec: 3915.5). Total num frames: 1822720. Throughput: 0: 891.1. Samples: 456020. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 04:30:58,471][00514] Avg episode reward: [(0, '19.653')] [2024-11-06 04:30:58,474][02497] Saving new best policy, reward=19.653! [2024-11-06 04:31:02,909][02510] Updated weights for policy 0, policy_version 450 (0.0035) [2024-11-06 04:31:03,469][00514] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3901.6). Total num frames: 1843200. Throughput: 0: 920.1. Samples: 459372. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 04:31:03,471][00514] Avg episode reward: [(0, '20.213')] [2024-11-06 04:31:03,483][02497] Saving new best policy, reward=20.213! [2024-11-06 04:31:08,472][00514] Fps is (10 sec: 3275.9, 60 sec: 3549.7, 300 sec: 3873.9). Total num frames: 1855488. Throughput: 0: 904.4. Samples: 464582. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-06 04:31:08,474][00514] Avg episode reward: [(0, '20.087')] [2024-11-06 04:31:13,469][00514] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3901.6). Total num frames: 1875968. Throughput: 0: 907.2. Samples: 469302. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-11-06 04:31:13,475][00514] Avg episode reward: [(0, '21.223')] [2024-11-06 04:31:13,483][02497] Saving new best policy, reward=21.223! [2024-11-06 04:31:15,462][02510] Updated weights for policy 0, policy_version 460 (0.0013) [2024-11-06 04:31:18,469][00514] Fps is (10 sec: 4097.0, 60 sec: 3686.4, 300 sec: 3887.7). Total num frames: 1896448. Throughput: 0: 934.5. Samples: 472416. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 04:31:18,476][00514] Avg episode reward: [(0, '19.370')] [2024-11-06 04:31:23,469][00514] Fps is (10 sec: 3686.4, 60 sec: 3618.2, 300 sec: 3873.8). Total num frames: 1912832. Throughput: 0: 963.4. Samples: 478212. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-06 04:31:23,471][00514] Avg episode reward: [(0, '19.566')] [2024-11-06 04:31:27,890][02510] Updated weights for policy 0, policy_version 470 (0.0013) [2024-11-06 04:31:28,469][00514] Fps is (10 sec: 2867.3, 60 sec: 3549.9, 300 sec: 3860.0). Total num frames: 1925120. Throughput: 0: 899.8. Samples: 482198. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 04:31:28,472][00514] Avg episode reward: [(0, '19.696')] [2024-11-06 04:31:33,469][00514] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3860.0). Total num frames: 1945600. Throughput: 0: 891.8. Samples: 485310. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-06 04:31:33,476][00514] Avg episode reward: [(0, '19.683')] [2024-11-06 04:31:37,581][02510] Updated weights for policy 0, policy_version 480 (0.0029) [2024-11-06 04:31:38,469][00514] Fps is (10 sec: 4095.9, 60 sec: 3754.7, 300 sec: 3860.0). Total num frames: 1966080. Throughput: 0: 934.6. Samples: 491686. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 04:31:38,472][00514] Avg episode reward: [(0, '18.169')] [2024-11-06 04:31:43,469][00514] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3832.2). Total num frames: 1978368. Throughput: 0: 880.6. Samples: 495648. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-06 04:31:43,471][00514] Avg episode reward: [(0, '17.714')] [2024-11-06 04:31:48,469][00514] Fps is (10 sec: 3276.9, 60 sec: 3550.0, 300 sec: 3846.1). Total num frames: 1998848. Throughput: 0: 859.4. Samples: 498044. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 04:31:48,475][00514] Avg episode reward: [(0, '16.563')] [2024-11-06 04:31:50,143][02510] Updated weights for policy 0, policy_version 490 (0.0036) [2024-11-06 04:31:53,469][00514] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3832.2). Total num frames: 2019328. Throughput: 0: 886.6. Samples: 504478. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 04:31:53,472][00514] Avg episode reward: [(0, '16.165')] [2024-11-06 04:31:58,469][00514] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3818.3). Total num frames: 2035712. Throughput: 0: 895.2. Samples: 509586. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-11-06 04:31:58,476][00514] Avg episode reward: [(0, '15.156')] [2024-11-06 04:32:02,212][02510] Updated weights for policy 0, policy_version 500 (0.0028) [2024-11-06 04:32:03,469][00514] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3818.4). Total num frames: 2052096. Throughput: 0: 874.4. Samples: 511764. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-11-06 04:32:03,475][00514] Avg episode reward: [(0, '15.528')] [2024-11-06 04:32:08,469][00514] Fps is (10 sec: 4096.0, 60 sec: 3686.6, 300 sec: 3832.2). Total num frames: 2076672. Throughput: 0: 895.2. Samples: 518498. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-11-06 04:32:08,476][00514] Avg episode reward: [(0, '16.006')] [2024-11-06 04:32:10,811][02510] Updated weights for policy 0, policy_version 510 (0.0024) [2024-11-06 04:32:13,472][00514] Fps is (10 sec: 4504.4, 60 sec: 3686.2, 300 sec: 3832.2). Total num frames: 2097152. Throughput: 0: 953.9. Samples: 525124. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 04:32:13,474][00514] Avg episode reward: [(0, '16.741')] [2024-11-06 04:32:18,472][00514] Fps is (10 sec: 3275.9, 60 sec: 3549.7, 300 sec: 3804.4). Total num frames: 2109440. Throughput: 0: 925.5. Samples: 526962. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 04:32:18,476][00514] Avg episode reward: [(0, '18.166')] [2024-11-06 04:32:23,030][02510] Updated weights for policy 0, policy_version 520 (0.0039) [2024-11-06 04:32:23,469][00514] Fps is (10 sec: 3277.7, 60 sec: 3618.1, 300 sec: 3818.3). Total num frames: 2129920. Throughput: 0: 899.4. Samples: 532160. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 04:32:23,474][00514] Avg episode reward: [(0, '18.546')] [2024-11-06 04:32:23,491][02497] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000520_2129920.pth... [2024-11-06 04:32:23,628][02497] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000299_1224704.pth [2024-11-06 04:32:28,469][00514] Fps is (10 sec: 4506.8, 60 sec: 3822.9, 300 sec: 3818.3). Total num frames: 2154496. Throughput: 0: 964.6. Samples: 539054. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 04:32:28,477][00514] Avg episode reward: [(0, '19.355')] [2024-11-06 04:32:32,838][02510] Updated weights for policy 0, policy_version 530 (0.0022) [2024-11-06 04:32:33,469][00514] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3804.4). Total num frames: 2170880. Throughput: 0: 980.8. Samples: 542180. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-06 04:32:33,475][00514] Avg episode reward: [(0, '18.121')] [2024-11-06 04:32:38,472][00514] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3804.4). Total num frames: 2187264. Throughput: 0: 930.6. Samples: 546354. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-06 04:32:38,478][00514] Avg episode reward: [(0, '18.088')] [2024-11-06 04:32:43,315][02510] Updated weights for policy 0, policy_version 540 (0.0020) [2024-11-06 04:32:43,471][00514] Fps is (10 sec: 4095.3, 60 sec: 3891.1, 300 sec: 3818.3). Total num frames: 2211840. Throughput: 0: 971.3. Samples: 553294. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 04:32:43,478][00514] Avg episode reward: [(0, '18.337')] [2024-11-06 04:32:48,469][00514] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 2228224. Throughput: 0: 994.4. Samples: 556510. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-06 04:32:48,475][00514] Avg episode reward: [(0, '19.309')] [2024-11-06 04:32:53,469][00514] Fps is (10 sec: 3277.4, 60 sec: 3754.7, 300 sec: 3790.5). Total num frames: 2244608. Throughput: 0: 945.6. Samples: 561050. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 04:32:53,474][00514] Avg episode reward: [(0, '21.056')] [2024-11-06 04:32:55,489][02510] Updated weights for policy 0, policy_version 550 (0.0019) [2024-11-06 04:32:58,469][00514] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3804.4). Total num frames: 2265088. Throughput: 0: 929.2. Samples: 566934. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 04:32:58,476][00514] Avg episode reward: [(0, '21.594')] [2024-11-06 04:32:58,479][02497] Saving new best policy, reward=21.594! [2024-11-06 04:33:03,469][00514] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3790.5). Total num frames: 2285568. Throughput: 0: 961.8. Samples: 570240. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 04:33:03,471][00514] Avg episode reward: [(0, '23.647')] [2024-11-06 04:33:03,479][02497] Saving new best policy, reward=23.647! [2024-11-06 04:33:05,005][02510] Updated weights for policy 0, policy_version 560 (0.0018) [2024-11-06 04:33:08,469][00514] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3776.7). Total num frames: 2301952. Throughput: 0: 966.0. Samples: 575628. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-06 04:33:08,478][00514] Avg episode reward: [(0, '22.756')] [2024-11-06 04:33:13,469][00514] Fps is (10 sec: 3276.8, 60 sec: 3686.6, 300 sec: 3790.5). Total num frames: 2318336. Throughput: 0: 926.6. Samples: 580750. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-06 04:33:13,477][00514] Avg episode reward: [(0, '23.403')] [2024-11-06 04:33:16,337][02510] Updated weights for policy 0, policy_version 570 (0.0032) [2024-11-06 04:33:18,469][00514] Fps is (10 sec: 4096.0, 60 sec: 3891.4, 300 sec: 3790.5). Total num frames: 2342912. Throughput: 0: 933.9. Samples: 584206. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-06 04:33:18,474][00514] Avg episode reward: [(0, '23.436')] [2024-11-06 04:33:23,469][00514] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3790.5). Total num frames: 2363392. Throughput: 0: 994.0. Samples: 591082. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-06 04:33:23,473][00514] Avg episode reward: [(0, '22.868')] [2024-11-06 04:33:26,960][02510] Updated weights for policy 0, policy_version 580 (0.0019) [2024-11-06 04:33:28,469][00514] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3790.5). Total num frames: 2379776. Throughput: 0: 935.9. Samples: 595410. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-06 04:33:28,477][00514] Avg episode reward: [(0, '22.750')] [2024-11-06 04:33:33,469][00514] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 2400256. Throughput: 0: 941.8. Samples: 598892. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-11-06 04:33:33,477][00514] Avg episode reward: [(0, '22.315')] [2024-11-06 04:33:36,340][02510] Updated weights for policy 0, policy_version 590 (0.0017) [2024-11-06 04:33:38,469][00514] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3804.4). Total num frames: 2424832. Throughput: 0: 995.7. Samples: 605856. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 04:33:38,477][00514] Avg episode reward: [(0, '22.150')] [2024-11-06 04:33:43,469][00514] Fps is (10 sec: 4096.0, 60 sec: 3823.0, 300 sec: 3790.5). Total num frames: 2441216. Throughput: 0: 971.7. Samples: 610660. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-11-06 04:33:43,476][00514] Avg episode reward: [(0, '21.435')] [2024-11-06 04:33:47,785][02510] Updated weights for policy 0, policy_version 600 (0.0031) [2024-11-06 04:33:48,469][00514] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3804.4). Total num frames: 2457600. Throughput: 0: 950.6. Samples: 613016. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-11-06 04:33:48,475][00514] Avg episode reward: [(0, '21.519')] [2024-11-06 04:33:53,469][00514] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3804.4). Total num frames: 2482176. Throughput: 0: 987.0. Samples: 620044. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 04:33:53,471][00514] Avg episode reward: [(0, '21.139')] [2024-11-06 04:33:57,396][02510] Updated weights for policy 0, policy_version 610 (0.0018) [2024-11-06 04:33:58,469][00514] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3776.7). Total num frames: 2498560. Throughput: 0: 1000.4. Samples: 625766. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-11-06 04:33:58,471][00514] Avg episode reward: [(0, '19.934')] [2024-11-06 04:34:03,469][00514] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 2514944. Throughput: 0: 970.0. Samples: 627854. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-11-06 04:34:03,472][00514] Avg episode reward: [(0, '19.237')] [2024-11-06 04:34:08,460][02510] Updated weights for policy 0, policy_version 620 (0.0028) [2024-11-06 04:34:08,469][00514] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3804.4). Total num frames: 2539520. Throughput: 0: 952.7. Samples: 633952. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-06 04:34:08,471][00514] Avg episode reward: [(0, '18.907')] [2024-11-06 04:34:13,469][00514] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3790.5). Total num frames: 2560000. Throughput: 0: 1000.6. Samples: 640436. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-06 04:34:13,471][00514] Avg episode reward: [(0, '19.160')] [2024-11-06 04:34:18,479][00514] Fps is (10 sec: 3273.5, 60 sec: 3822.3, 300 sec: 3776.5). Total num frames: 2572288. Throughput: 0: 965.4. Samples: 642346. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-06 04:34:18,486][00514] Avg episode reward: [(0, '19.123')] [2024-11-06 04:34:20,766][02510] Updated weights for policy 0, policy_version 630 (0.0036) [2024-11-06 04:34:23,469][00514] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3776.7). Total num frames: 2592768. Throughput: 0: 923.9. Samples: 647432. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 04:34:23,476][00514] Avg episode reward: [(0, '19.748')] [2024-11-06 04:34:23,489][02497] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000633_2592768.pth... [2024-11-06 04:34:23,619][02497] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000414_1695744.pth [2024-11-06 04:34:28,469][00514] Fps is (10 sec: 4100.1, 60 sec: 3891.2, 300 sec: 3776.7). Total num frames: 2613248. Throughput: 0: 965.2. Samples: 654094. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-11-06 04:34:28,474][00514] Avg episode reward: [(0, '20.195')] [2024-11-06 04:34:29,725][02510] Updated weights for policy 0, policy_version 640 (0.0017) [2024-11-06 04:34:33,470][00514] Fps is (10 sec: 3685.9, 60 sec: 3822.9, 300 sec: 3748.9). Total num frames: 2629632. Throughput: 0: 978.5. Samples: 657048. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-06 04:34:33,474][00514] Avg episode reward: [(0, '20.818')] [2024-11-06 04:34:38,469][00514] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3762.8). Total num frames: 2646016. Throughput: 0: 920.0. Samples: 661444. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-11-06 04:34:38,471][00514] Avg episode reward: [(0, '22.078')] [2024-11-06 04:34:41,134][02510] Updated weights for policy 0, policy_version 650 (0.0015) [2024-11-06 04:34:43,469][00514] Fps is (10 sec: 4096.6, 60 sec: 3822.9, 300 sec: 3776.7). Total num frames: 2670592. Throughput: 0: 949.0. Samples: 668470. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-11-06 04:34:43,472][00514] Avg episode reward: [(0, '21.778')] [2024-11-06 04:34:48,472][00514] Fps is (10 sec: 4504.4, 60 sec: 3891.0, 300 sec: 3748.9). Total num frames: 2691072. Throughput: 0: 977.9. Samples: 671860. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-06 04:34:48,474][00514] Avg episode reward: [(0, '22.575')] [2024-11-06 04:34:52,412][02510] Updated weights for policy 0, policy_version 660 (0.0024) [2024-11-06 04:34:53,470][00514] Fps is (10 sec: 3276.4, 60 sec: 3686.3, 300 sec: 3748.9). Total num frames: 2703360. Throughput: 0: 941.4. Samples: 676316. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 04:34:53,478][00514] Avg episode reward: [(0, '23.423')] [2024-11-06 04:34:58,471][00514] Fps is (10 sec: 2867.4, 60 sec: 3686.3, 300 sec: 3735.0). Total num frames: 2719744. Throughput: 0: 907.0. Samples: 681254. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-06 04:34:58,473][00514] Avg episode reward: [(0, '23.435')] [2024-11-06 04:35:03,469][00514] Fps is (10 sec: 3277.1, 60 sec: 3686.4, 300 sec: 3707.2). Total num frames: 2736128. Throughput: 0: 909.5. Samples: 683266. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-06 04:35:03,477][00514] Avg episode reward: [(0, '22.207')] [2024-11-06 04:35:05,223][02510] Updated weights for policy 0, policy_version 670 (0.0016) [2024-11-06 04:35:08,469][00514] Fps is (10 sec: 3277.3, 60 sec: 3549.9, 300 sec: 3707.2). Total num frames: 2752512. Throughput: 0: 902.0. Samples: 688024. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-06 04:35:08,477][00514] Avg episode reward: [(0, '22.251')] [2024-11-06 04:35:13,469][00514] Fps is (10 sec: 3276.9, 60 sec: 3481.6, 300 sec: 3707.2). Total num frames: 2768896. Throughput: 0: 870.4. Samples: 693260. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 04:35:13,479][00514] Avg episode reward: [(0, '22.117')] [2024-11-06 04:35:16,505][02510] Updated weights for policy 0, policy_version 680 (0.0023) [2024-11-06 04:35:18,469][00514] Fps is (10 sec: 4096.1, 60 sec: 3687.0, 300 sec: 3721.1). Total num frames: 2793472. Throughput: 0: 879.9. Samples: 696644. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 04:35:18,471][00514] Avg episode reward: [(0, '21.082')] [2024-11-06 04:35:23,469][00514] Fps is (10 sec: 4505.6, 60 sec: 3686.4, 300 sec: 3735.0). Total num frames: 2813952. Throughput: 0: 928.1. Samples: 703210. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 04:35:23,471][00514] Avg episode reward: [(0, '20.850')] [2024-11-06 04:35:27,782][02510] Updated weights for policy 0, policy_version 690 (0.0021) [2024-11-06 04:35:28,469][00514] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3735.0). Total num frames: 2826240. Throughput: 0: 866.4. Samples: 707456. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-11-06 04:35:28,472][00514] Avg episode reward: [(0, '20.969')] [2024-11-06 04:35:33,475][00514] Fps is (10 sec: 3684.2, 60 sec: 3686.1, 300 sec: 3762.7). Total num frames: 2850816. Throughput: 0: 870.7. Samples: 711046. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 04:35:33,478][00514] Avg episode reward: [(0, '21.618')] [2024-11-06 04:35:36,657][02510] Updated weights for policy 0, policy_version 700 (0.0018) [2024-11-06 04:35:38,469][00514] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 2871296. Throughput: 0: 927.6. Samples: 718056. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 04:35:38,475][00514] Avg episode reward: [(0, '22.664')] [2024-11-06 04:35:43,470][00514] Fps is (10 sec: 3688.2, 60 sec: 3618.1, 300 sec: 3735.0). Total num frames: 2887680. Throughput: 0: 923.0. Samples: 722790. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-06 04:35:43,473][00514] Avg episode reward: [(0, '22.510')] [2024-11-06 04:35:48,228][02510] Updated weights for policy 0, policy_version 710 (0.0020) [2024-11-06 04:35:48,469][00514] Fps is (10 sec: 3686.4, 60 sec: 3618.3, 300 sec: 3762.8). Total num frames: 2908160. Throughput: 0: 933.2. Samples: 725258. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 04:35:48,473][00514] Avg episode reward: [(0, '22.950')] [2024-11-06 04:35:53,473][00514] Fps is (10 sec: 4504.2, 60 sec: 3822.8, 300 sec: 3762.7). Total num frames: 2932736. Throughput: 0: 983.7. Samples: 732294. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 04:35:53,475][00514] Avg episode reward: [(0, '21.696')] [2024-11-06 04:35:57,847][02510] Updated weights for policy 0, policy_version 720 (0.0014) [2024-11-06 04:35:58,469][00514] Fps is (10 sec: 4096.0, 60 sec: 3823.0, 300 sec: 3748.9). Total num frames: 2949120. Throughput: 0: 996.4. Samples: 738098. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 04:35:58,474][00514] Avg episode reward: [(0, '22.148')] [2024-11-06 04:36:03,469][00514] Fps is (10 sec: 3278.1, 60 sec: 3823.0, 300 sec: 3762.8). Total num frames: 2965504. Throughput: 0: 969.9. Samples: 740288. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 04:36:03,471][00514] Avg episode reward: [(0, '21.418')] [2024-11-06 04:36:08,225][02510] Updated weights for policy 0, policy_version 730 (0.0033) [2024-11-06 04:36:08,469][00514] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3776.7). Total num frames: 2990080. Throughput: 0: 968.5. Samples: 746794. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-06 04:36:08,472][00514] Avg episode reward: [(0, '21.582')] [2024-11-06 04:36:13,469][00514] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3776.7). Total num frames: 3010560. Throughput: 0: 1024.8. Samples: 753572. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-11-06 04:36:13,472][00514] Avg episode reward: [(0, '22.253')] [2024-11-06 04:36:18,472][00514] Fps is (10 sec: 3685.4, 60 sec: 3891.0, 300 sec: 3776.6). Total num frames: 3026944. Throughput: 0: 991.0. Samples: 755640. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-06 04:36:18,475][00514] Avg episode reward: [(0, '24.154')] [2024-11-06 04:36:18,484][02497] Saving new best policy, reward=24.154! [2024-11-06 04:36:19,863][02510] Updated weights for policy 0, policy_version 740 (0.0013) [2024-11-06 04:36:23,469][00514] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3804.4). Total num frames: 3047424. Throughput: 0: 953.7. Samples: 760972. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 04:36:23,472][00514] Avg episode reward: [(0, '22.542')] [2024-11-06 04:36:23,484][02497] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000744_3047424.pth... [2024-11-06 04:36:23,609][02497] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000520_2129920.pth [2024-11-06 04:36:28,469][00514] Fps is (10 sec: 4097.1, 60 sec: 4027.7, 300 sec: 3804.4). Total num frames: 3067904. Throughput: 0: 1005.6. Samples: 768040. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 04:36:28,471][00514] Avg episode reward: [(0, '22.690')] [2024-11-06 04:36:28,695][02510] Updated weights for policy 0, policy_version 750 (0.0030) [2024-11-06 04:36:33,469][00514] Fps is (10 sec: 3686.4, 60 sec: 3891.6, 300 sec: 3790.5). Total num frames: 3084288. Throughput: 0: 1014.8. Samples: 770922. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 04:36:33,476][00514] Avg episode reward: [(0, '22.936')] [2024-11-06 04:36:38,469][00514] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3804.4). Total num frames: 3100672. Throughput: 0: 954.7. Samples: 775254. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-06 04:36:38,475][00514] Avg episode reward: [(0, '22.528')] [2024-11-06 04:36:40,185][02510] Updated weights for policy 0, policy_version 760 (0.0041) [2024-11-06 04:36:43,469][00514] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3818.3). Total num frames: 3125248. Throughput: 0: 986.0. Samples: 782470. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 04:36:43,477][00514] Avg episode reward: [(0, '22.223')] [2024-11-06 04:36:48,469][00514] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3818.3). Total num frames: 3145728. Throughput: 0: 1016.3. Samples: 786022. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 04:36:48,472][00514] Avg episode reward: [(0, '22.407')] [2024-11-06 04:36:50,181][02510] Updated weights for policy 0, policy_version 770 (0.0029) [2024-11-06 04:36:53,469][00514] Fps is (10 sec: 3686.4, 60 sec: 3823.2, 300 sec: 3818.3). Total num frames: 3162112. Throughput: 0: 970.6. Samples: 790470. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-11-06 04:36:53,475][00514] Avg episode reward: [(0, '23.804')] [2024-11-06 04:36:58,472][00514] Fps is (10 sec: 4094.8, 60 sec: 3959.3, 300 sec: 3846.0). Total num frames: 3186688. Throughput: 0: 961.5. Samples: 796840. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-06 04:36:58,474][00514] Avg episode reward: [(0, '24.275')] [2024-11-06 04:36:58,481][02497] Saving new best policy, reward=24.275! [2024-11-06 04:37:00,321][02510] Updated weights for policy 0, policy_version 780 (0.0026) [2024-11-06 04:37:03,469][00514] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3832.2). Total num frames: 3207168. Throughput: 0: 992.7. Samples: 800308. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-06 04:37:03,474][00514] Avg episode reward: [(0, '24.676')] [2024-11-06 04:37:03,483][02497] Saving new best policy, reward=24.676! [2024-11-06 04:37:08,469][00514] Fps is (10 sec: 3687.5, 60 sec: 3891.2, 300 sec: 3818.3). Total num frames: 3223552. Throughput: 0: 993.3. Samples: 805670. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 04:37:08,478][00514] Avg episode reward: [(0, '26.226')] [2024-11-06 04:37:08,479][02497] Saving new best policy, reward=26.226! [2024-11-06 04:37:12,064][02510] Updated weights for policy 0, policy_version 790 (0.0043) [2024-11-06 04:37:13,469][00514] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3832.2). Total num frames: 3239936. Throughput: 0: 951.6. Samples: 810864. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 04:37:13,474][00514] Avg episode reward: [(0, '25.453')] [2024-11-06 04:37:18,469][00514] Fps is (10 sec: 4096.0, 60 sec: 3959.6, 300 sec: 3846.1). Total num frames: 3264512. Throughput: 0: 969.2. Samples: 814538. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-11-06 04:37:18,474][00514] Avg episode reward: [(0, '26.734')] [2024-11-06 04:37:18,477][02497] Saving new best policy, reward=26.734! [2024-11-06 04:37:20,732][02510] Updated weights for policy 0, policy_version 800 (0.0017) [2024-11-06 04:37:23,469][00514] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3832.2). Total num frames: 3284992. Throughput: 0: 1016.2. Samples: 820984. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 04:37:23,472][00514] Avg episode reward: [(0, '25.740')] [2024-11-06 04:37:28,469][00514] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3818.3). Total num frames: 3297280. Throughput: 0: 954.1. Samples: 825406. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 04:37:28,471][00514] Avg episode reward: [(0, '25.742')] [2024-11-06 04:37:32,107][02510] Updated weights for policy 0, policy_version 810 (0.0023) [2024-11-06 04:37:33,469][00514] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3846.1). Total num frames: 3321856. Throughput: 0: 955.7. Samples: 829028. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 04:37:33,471][00514] Avg episode reward: [(0, '24.604')] [2024-11-06 04:37:38,469][00514] Fps is (10 sec: 4915.2, 60 sec: 4096.0, 300 sec: 3846.1). Total num frames: 3346432. Throughput: 0: 1013.6. Samples: 836082. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-06 04:37:38,472][00514] Avg episode reward: [(0, '24.377')] [2024-11-06 04:37:42,219][02510] Updated weights for policy 0, policy_version 820 (0.0020) [2024-11-06 04:37:43,469][00514] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3832.2). Total num frames: 3358720. Throughput: 0: 978.4. Samples: 840864. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-06 04:37:43,472][00514] Avg episode reward: [(0, '24.488')] [2024-11-06 04:37:48,469][00514] Fps is (10 sec: 3276.7, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 3379200. Throughput: 0: 962.1. Samples: 843602. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-11-06 04:37:48,473][00514] Avg episode reward: [(0, '22.931')] [2024-11-06 04:37:52,085][02510] Updated weights for policy 0, policy_version 830 (0.0019) [2024-11-06 04:37:53,469][00514] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3860.0). Total num frames: 3403776. Throughput: 0: 996.9. Samples: 850532. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-11-06 04:37:53,473][00514] Avg episode reward: [(0, '22.110')] [2024-11-06 04:37:58,469][00514] Fps is (10 sec: 4096.1, 60 sec: 3891.4, 300 sec: 3846.1). Total num frames: 3420160. Throughput: 0: 1009.3. Samples: 856284. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-06 04:37:58,474][00514] Avg episode reward: [(0, '19.735')] [2024-11-06 04:38:03,434][02510] Updated weights for policy 0, policy_version 840 (0.0015) [2024-11-06 04:38:03,469][00514] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 3440640. Throughput: 0: 976.3. Samples: 858472. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-11-06 04:38:03,471][00514] Avg episode reward: [(0, '19.716')] [2024-11-06 04:38:08,469][00514] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3873.8). Total num frames: 3461120. Throughput: 0: 983.2. Samples: 865228. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-06 04:38:08,471][00514] Avg episode reward: [(0, '19.623')] [2024-11-06 04:38:12,059][02510] Updated weights for policy 0, policy_version 850 (0.0015) [2024-11-06 04:38:13,469][00514] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 3873.8). Total num frames: 3485696. Throughput: 0: 1030.0. Samples: 871758. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 04:38:13,471][00514] Avg episode reward: [(0, '19.849')] [2024-11-06 04:38:18,469][00514] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 3497984. Throughput: 0: 996.7. Samples: 873880. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-06 04:38:18,472][00514] Avg episode reward: [(0, '20.380')] [2024-11-06 04:38:23,469][00514] Fps is (10 sec: 3276.7, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 3518464. Throughput: 0: 966.0. Samples: 879552. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 04:38:23,472][00514] Avg episode reward: [(0, '20.477')] [2024-11-06 04:38:23,485][02497] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000859_3518464.pth... [2024-11-06 04:38:23,617][02497] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000633_2592768.pth [2024-11-06 04:38:23,733][02510] Updated weights for policy 0, policy_version 860 (0.0014) [2024-11-06 04:38:28,469][00514] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 3873.8). Total num frames: 3543040. Throughput: 0: 1017.8. Samples: 886666. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-06 04:38:28,472][00514] Avg episode reward: [(0, '20.678')] [2024-11-06 04:38:33,469][00514] Fps is (10 sec: 4096.1, 60 sec: 3959.5, 300 sec: 3846.1). Total num frames: 3559424. Throughput: 0: 1018.4. Samples: 889430. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-11-06 04:38:33,473][00514] Avg episode reward: [(0, '20.886')] [2024-11-06 04:38:33,905][02510] Updated weights for policy 0, policy_version 870 (0.0015) [2024-11-06 04:38:38,469][00514] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 3579904. Throughput: 0: 967.6. Samples: 894074. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 04:38:38,472][00514] Avg episode reward: [(0, '21.549')] [2024-11-06 04:38:43,340][02510] Updated weights for policy 0, policy_version 880 (0.0018) [2024-11-06 04:38:43,469][00514] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 3887.7). Total num frames: 3604480. Throughput: 0: 1001.6. Samples: 901358. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 04:38:43,471][00514] Avg episode reward: [(0, '22.136')] [2024-11-06 04:38:48,471][00514] Fps is (10 sec: 4504.8, 60 sec: 4095.9, 300 sec: 3873.8). Total num frames: 3624960. Throughput: 0: 1033.5. Samples: 904982. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-11-06 04:38:48,473][00514] Avg episode reward: [(0, '22.647')] [2024-11-06 04:38:53,469][00514] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 3637248. Throughput: 0: 985.0. Samples: 909554. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-06 04:38:53,471][00514] Avg episode reward: [(0, '22.018')] [2024-11-06 04:38:54,810][02510] Updated weights for policy 0, policy_version 890 (0.0024) [2024-11-06 04:38:58,469][00514] Fps is (10 sec: 3686.9, 60 sec: 4027.7, 300 sec: 3887.7). Total num frames: 3661824. Throughput: 0: 984.7. Samples: 916072. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 04:38:58,472][00514] Avg episode reward: [(0, '22.776')] [2024-11-06 04:39:03,413][02510] Updated weights for policy 0, policy_version 900 (0.0015) [2024-11-06 04:39:03,469][00514] Fps is (10 sec: 4915.2, 60 sec: 4096.0, 300 sec: 3887.7). Total num frames: 3686400. Throughput: 0: 1015.6. Samples: 919582. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-06 04:39:03,477][00514] Avg episode reward: [(0, '22.683')] [2024-11-06 04:39:08,469][00514] Fps is (10 sec: 3686.5, 60 sec: 3959.5, 300 sec: 3860.0). Total num frames: 3698688. Throughput: 0: 1012.4. Samples: 925108. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-11-06 04:39:08,473][00514] Avg episode reward: [(0, '23.830')] [2024-11-06 04:39:13,469][00514] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3887.9). Total num frames: 3719168. Throughput: 0: 973.3. Samples: 930466. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-06 04:39:13,475][00514] Avg episode reward: [(0, '23.636')] [2024-11-06 04:39:14,945][02510] Updated weights for policy 0, policy_version 910 (0.0027) [2024-11-06 04:39:18,469][00514] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3887.7). Total num frames: 3739648. Throughput: 0: 991.5. Samples: 934046. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-11-06 04:39:18,471][00514] Avg episode reward: [(0, '24.409')] [2024-11-06 04:39:23,469][00514] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3887.7). Total num frames: 3760128. Throughput: 0: 1031.9. Samples: 940508. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 04:39:23,472][00514] Avg episode reward: [(0, '23.872')] [2024-11-06 04:39:25,243][02510] Updated weights for policy 0, policy_version 920 (0.0025) [2024-11-06 04:39:28,469][00514] Fps is (10 sec: 3686.3, 60 sec: 3891.2, 300 sec: 3887.7). Total num frames: 3776512. Throughput: 0: 967.2. Samples: 944884. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-06 04:39:28,472][00514] Avg episode reward: [(0, '24.121')] [2024-11-06 04:39:33,471][00514] Fps is (10 sec: 3276.1, 60 sec: 3891.1, 300 sec: 3887.7). Total num frames: 3792896. Throughput: 0: 941.7. Samples: 947358. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-06 04:39:33,474][00514] Avg episode reward: [(0, '24.385')] [2024-11-06 04:39:37,904][02510] Updated weights for policy 0, policy_version 930 (0.0019) [2024-11-06 04:39:38,469][00514] Fps is (10 sec: 3276.9, 60 sec: 3822.9, 300 sec: 3860.0). Total num frames: 3809280. Throughput: 0: 945.2. Samples: 952086. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-06 04:39:38,472][00514] Avg episode reward: [(0, '23.499')] [2024-11-06 04:39:43,469][00514] Fps is (10 sec: 3277.5, 60 sec: 3686.4, 300 sec: 3846.1). Total num frames: 3825664. Throughput: 0: 904.3. Samples: 956764. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-06 04:39:43,475][00514] Avg episode reward: [(0, '24.375')] [2024-11-06 04:39:48,469][00514] Fps is (10 sec: 3686.4, 60 sec: 3686.5, 300 sec: 3873.9). Total num frames: 3846144. Throughput: 0: 888.5. Samples: 959566. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-06 04:39:48,476][00514] Avg episode reward: [(0, '24.421')] [2024-11-06 04:39:49,211][02510] Updated weights for policy 0, policy_version 940 (0.0038) [2024-11-06 04:39:53,469][00514] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3887.8). Total num frames: 3866624. Throughput: 0: 920.9. Samples: 966548. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-06 04:39:53,476][00514] Avg episode reward: [(0, '25.311')] [2024-11-06 04:39:58,473][00514] Fps is (10 sec: 4094.4, 60 sec: 3754.4, 300 sec: 3901.6). Total num frames: 3887104. Throughput: 0: 927.0. Samples: 972186. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 04:39:58,479][00514] Avg episode reward: [(0, '25.815')] [2024-11-06 04:39:59,995][02510] Updated weights for policy 0, policy_version 950 (0.0015) [2024-11-06 04:40:03,469][00514] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3901.6). Total num frames: 3903488. Throughput: 0: 895.6. Samples: 974350. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 04:40:03,471][00514] Avg episode reward: [(0, '25.611')] [2024-11-06 04:40:08,469][00514] Fps is (10 sec: 3687.9, 60 sec: 3754.7, 300 sec: 3915.5). Total num frames: 3923968. Throughput: 0: 901.8. Samples: 981090. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-06 04:40:08,471][00514] Avg episode reward: [(0, '26.817')] [2024-11-06 04:40:08,514][02497] Saving new best policy, reward=26.817! [2024-11-06 04:40:09,417][02510] Updated weights for policy 0, policy_version 960 (0.0039) [2024-11-06 04:40:13,470][00514] Fps is (10 sec: 4505.3, 60 sec: 3822.9, 300 sec: 3915.5). Total num frames: 3948544. Throughput: 0: 949.7. Samples: 987620. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 04:40:13,473][00514] Avg episode reward: [(0, '25.371')] [2024-11-06 04:40:18,469][00514] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3887.7). Total num frames: 3960832. Throughput: 0: 942.0. Samples: 989746. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-06 04:40:18,478][00514] Avg episode reward: [(0, '26.280')] [2024-11-06 04:40:20,846][02510] Updated weights for policy 0, policy_version 970 (0.0022) [2024-11-06 04:40:23,469][00514] Fps is (10 sec: 3686.7, 60 sec: 3754.7, 300 sec: 3929.4). Total num frames: 3985408. Throughput: 0: 963.8. Samples: 995456. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-06 04:40:23,471][00514] Avg episode reward: [(0, '24.647')] [2024-11-06 04:40:23,487][02497] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000973_3985408.pth... [2024-11-06 04:40:23,589][02497] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000744_3047424.pth [2024-11-06 04:40:27,739][02497] Stopping Batcher_0... [2024-11-06 04:40:27,739][02497] Loop batcher_evt_loop terminating... [2024-11-06 04:40:27,740][00514] Component Batcher_0 stopped! [2024-11-06 04:40:27,755][02497] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-11-06 04:40:27,792][02510] Weights refcount: 2 0 [2024-11-06 04:40:27,794][02510] Stopping InferenceWorker_p0-w0... [2024-11-06 04:40:27,795][02510] Loop inference_proc0-0_evt_loop terminating... [2024-11-06 04:40:27,795][00514] Component InferenceWorker_p0-w0 stopped! [2024-11-06 04:40:27,879][02497] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000859_3518464.pth [2024-11-06 04:40:27,900][02497] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-11-06 04:40:28,153][00514] Component RolloutWorker_w4 stopped! [2024-11-06 04:40:28,155][02521] Stopping RolloutWorker_w4... [2024-11-06 04:40:28,162][02497] Stopping LearnerWorker_p0... [2024-11-06 04:40:28,163][02497] Loop learner_proc0_evt_loop terminating... [2024-11-06 04:40:28,162][00514] Component LearnerWorker_p0 stopped! [2024-11-06 04:40:28,156][02521] Loop rollout_proc4_evt_loop terminating... [2024-11-06 04:40:28,206][00514] Component RolloutWorker_w6 stopped! [2024-11-06 04:40:28,209][02519] Stopping RolloutWorker_w6... [2024-11-06 04:40:28,213][02519] Loop rollout_proc6_evt_loop terminating... [2024-11-06 04:40:28,222][00514] Component RolloutWorker_w0 stopped! [2024-11-06 04:40:28,225][02511] Stopping RolloutWorker_w0... [2024-11-06 04:40:28,232][02511] Loop rollout_proc0_evt_loop terminating... [2024-11-06 04:40:28,250][00514] Component RolloutWorker_w2 stopped! [2024-11-06 04:40:28,252][02517] Stopping RolloutWorker_w2... [2024-11-06 04:40:28,259][02517] Loop rollout_proc2_evt_loop terminating... [2024-11-06 04:40:28,280][02518] Stopping RolloutWorker_w3... [2024-11-06 04:40:28,280][00514] Component RolloutWorker_w3 stopped! [2024-11-06 04:40:28,284][02518] Loop rollout_proc3_evt_loop terminating... [2024-11-06 04:40:28,290][02520] Stopping RolloutWorker_w7... [2024-11-06 04:40:28,290][00514] Component RolloutWorker_w7 stopped! [2024-11-06 04:40:28,295][02520] Loop rollout_proc7_evt_loop terminating... [2024-11-06 04:40:28,308][02516] Stopping RolloutWorker_w1... [2024-11-06 04:40:28,308][00514] Component RolloutWorker_w1 stopped! [2024-11-06 04:40:28,310][02516] Loop rollout_proc1_evt_loop terminating... [2024-11-06 04:40:28,315][02522] Stopping RolloutWorker_w5... [2024-11-06 04:40:28,315][00514] Component RolloutWorker_w5 stopped! [2024-11-06 04:40:28,319][02522] Loop rollout_proc5_evt_loop terminating... [2024-11-06 04:40:28,318][00514] Waiting for process learner_proc0 to stop... [2024-11-06 04:40:29,952][00514] Waiting for process inference_proc0-0 to join... [2024-11-06 04:40:29,959][00514] Waiting for process rollout_proc0 to join... [2024-11-06 04:40:32,816][00514] Waiting for process rollout_proc1 to join... [2024-11-06 04:40:32,821][00514] Waiting for process rollout_proc2 to join... [2024-11-06 04:40:32,824][00514] Waiting for process rollout_proc3 to join... [2024-11-06 04:40:32,827][00514] Waiting for process rollout_proc4 to join... [2024-11-06 04:40:32,831][00514] Waiting for process rollout_proc5 to join... [2024-11-06 04:40:32,832][00514] Waiting for process rollout_proc6 to join... [2024-11-06 04:40:32,834][00514] Waiting for process rollout_proc7 to join... [2024-11-06 04:40:32,837][00514] Batcher 0 profile tree view: batching: 27.1279, releasing_batches: 0.0294 [2024-11-06 04:40:32,838][00514] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0079 wait_policy_total: 419.1196 update_model: 8.9095 weight_update: 0.0018 one_step: 0.0029 handle_policy_step: 578.7178 deserialize: 14.3568, stack: 3.1937, obs_to_device_normalize: 122.6336, forward: 290.9441, send_messages: 28.8873 prepare_outputs: 88.9240 to_cpu: 53.4494 [2024-11-06 04:40:32,839][00514] Learner 0 profile tree view: misc: 0.0057, prepare_batch: 14.5589 train: 73.9831 epoch_init: 0.0090, minibatch_init: 0.0064, losses_postprocess: 0.5891, kl_divergence: 0.6397, after_optimizer: 33.5418 calculate_losses: 26.4705 losses_init: 0.0035, forward_head: 1.3277, bptt_initial: 17.5275, tail: 1.1194, advantages_returns: 0.2943, losses: 3.7906 bptt: 2.0680 bptt_forward_core: 1.9842 update: 12.0634 clip: 0.8810 [2024-11-06 04:40:32,842][00514] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.3496, enqueue_policy_requests: 103.6578, env_step: 821.7983, overhead: 13.1057, complete_rollouts: 6.9834 save_policy_outputs: 20.9652 split_output_tensors: 8.3602 [2024-11-06 04:40:32,844][00514] RolloutWorker_w7 profile tree view: wait_for_trajectories: 0.3800, enqueue_policy_requests: 101.8186, env_step: 817.1827, overhead: 13.3204, complete_rollouts: 7.1962 save_policy_outputs: 20.8048 split_output_tensors: 8.0459 [2024-11-06 04:40:32,845][00514] Loop Runner_EvtLoop terminating... [2024-11-06 04:40:32,846][00514] Runner profile tree view: main_loop: 1080.1631 [2024-11-06 04:40:32,849][00514] Collected {0: 4005888}, FPS: 3708.6 [2024-11-06 04:40:33,293][00514] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-11-06 04:40:33,295][00514] Overriding arg 'num_workers' with value 1 passed from command line [2024-11-06 04:40:33,297][00514] Adding new argument 'no_render'=True that is not in the saved config file! [2024-11-06 04:40:33,299][00514] Adding new argument 'save_video'=True that is not in the saved config file! [2024-11-06 04:40:33,301][00514] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-11-06 04:40:33,303][00514] Adding new argument 'video_name'=None that is not in the saved config file! [2024-11-06 04:40:33,304][00514] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! [2024-11-06 04:40:33,306][00514] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-11-06 04:40:33,307][00514] Adding new argument 'push_to_hub'=False that is not in the saved config file! [2024-11-06 04:40:33,308][00514] Adding new argument 'hf_repository'=None that is not in the saved config file! [2024-11-06 04:40:33,310][00514] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-11-06 04:40:33,311][00514] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-11-06 04:40:33,312][00514] Adding new argument 'train_script'=None that is not in the saved config file! [2024-11-06 04:40:33,313][00514] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-11-06 04:40:33,314][00514] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-11-06 04:40:33,358][00514] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-06 04:40:33,363][00514] RunningMeanStd input shape: (3, 72, 128) [2024-11-06 04:40:33,365][00514] RunningMeanStd input shape: (1,) [2024-11-06 04:40:33,384][00514] ConvEncoder: input_channels=3 [2024-11-06 04:40:33,496][00514] Conv encoder output size: 512 [2024-11-06 04:40:33,497][00514] Policy head output size: 512 [2024-11-06 04:40:33,672][00514] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-11-06 04:40:34,480][00514] Num frames 100... [2024-11-06 04:40:34,602][00514] Num frames 200... [2024-11-06 04:40:34,731][00514] Num frames 300... [2024-11-06 04:40:34,856][00514] Num frames 400... [2024-11-06 04:40:34,979][00514] Num frames 500... [2024-11-06 04:40:35,112][00514] Num frames 600... [2024-11-06 04:40:35,234][00514] Num frames 700... [2024-11-06 04:40:35,364][00514] Num frames 800... [2024-11-06 04:40:35,486][00514] Num frames 900... [2024-11-06 04:40:35,613][00514] Num frames 1000... [2024-11-06 04:40:35,771][00514] Avg episode rewards: #0: 24.830, true rewards: #0: 10.830 [2024-11-06 04:40:35,773][00514] Avg episode reward: 24.830, avg true_objective: 10.830 [2024-11-06 04:40:35,797][00514] Num frames 1100... [2024-11-06 04:40:35,926][00514] Num frames 1200... [2024-11-06 04:40:36,059][00514] Num frames 1300... [2024-11-06 04:40:36,180][00514] Num frames 1400... [2024-11-06 04:40:36,309][00514] Num frames 1500... [2024-11-06 04:40:36,431][00514] Num frames 1600... [2024-11-06 04:40:36,552][00514] Num frames 1700... [2024-11-06 04:40:36,672][00514] Num frames 1800... [2024-11-06 04:40:36,797][00514] Num frames 1900... [2024-11-06 04:40:36,926][00514] Num frames 2000... [2024-11-06 04:40:37,052][00514] Num frames 2100... [2024-11-06 04:40:37,182][00514] Num frames 2200... [2024-11-06 04:40:37,250][00514] Avg episode rewards: #0: 26.545, true rewards: #0: 11.045 [2024-11-06 04:40:37,253][00514] Avg episode reward: 26.545, avg true_objective: 11.045 [2024-11-06 04:40:37,368][00514] Num frames 2300... [2024-11-06 04:40:37,485][00514] Num frames 2400... [2024-11-06 04:40:37,604][00514] Num frames 2500... [2024-11-06 04:40:37,726][00514] Num frames 2600... [2024-11-06 04:40:37,851][00514] Num frames 2700... [2024-11-06 04:40:37,971][00514] Num frames 2800... [2024-11-06 04:40:38,103][00514] Num frames 2900... [2024-11-06 04:40:38,231][00514] Num frames 3000... [2024-11-06 04:40:38,362][00514] Num frames 3100... [2024-11-06 04:40:38,487][00514] Num frames 3200... [2024-11-06 04:40:38,609][00514] Num frames 3300... [2024-11-06 04:40:38,734][00514] Num frames 3400... [2024-11-06 04:40:38,855][00514] Num frames 3500... [2024-11-06 04:40:38,979][00514] Num frames 3600... [2024-11-06 04:40:39,102][00514] Num frames 3700... [2024-11-06 04:40:39,235][00514] Num frames 3800... [2024-11-06 04:40:39,343][00514] Avg episode rewards: #0: 30.793, true rewards: #0: 12.793 [2024-11-06 04:40:39,344][00514] Avg episode reward: 30.793, avg true_objective: 12.793 [2024-11-06 04:40:39,422][00514] Num frames 3900... [2024-11-06 04:40:39,545][00514] Num frames 4000... [2024-11-06 04:40:39,670][00514] Num frames 4100... [2024-11-06 04:40:39,792][00514] Num frames 4200... [2024-11-06 04:40:39,923][00514] Num frames 4300... [2024-11-06 04:40:40,048][00514] Num frames 4400... [2024-11-06 04:40:40,183][00514] Num frames 4500... [2024-11-06 04:40:40,313][00514] Num frames 4600... [2024-11-06 04:40:40,437][00514] Num frames 4700... [2024-11-06 04:40:40,558][00514] Num frames 4800... [2024-11-06 04:40:40,677][00514] Num frames 4900... [2024-11-06 04:40:40,801][00514] Num frames 5000... [2024-11-06 04:40:40,922][00514] Num frames 5100... [2024-11-06 04:40:41,046][00514] Num frames 5200... [2024-11-06 04:40:41,177][00514] Num frames 5300... [2024-11-06 04:40:41,301][00514] Num frames 5400... [2024-11-06 04:40:41,423][00514] Num frames 5500... [2024-11-06 04:40:41,547][00514] Num frames 5600... [2024-11-06 04:40:41,669][00514] Num frames 5700... [2024-11-06 04:40:41,791][00514] Num frames 5800... [2024-11-06 04:40:41,922][00514] Num frames 5900... [2024-11-06 04:40:42,031][00514] Avg episode rewards: #0: 38.345, true rewards: #0: 14.845 [2024-11-06 04:40:42,033][00514] Avg episode reward: 38.345, avg true_objective: 14.845 [2024-11-06 04:40:42,110][00514] Num frames 6000... [2024-11-06 04:40:42,241][00514] Num frames 6100... [2024-11-06 04:40:42,368][00514] Num frames 6200... [2024-11-06 04:40:42,498][00514] Num frames 6300... [2024-11-06 04:40:42,619][00514] Num frames 6400... [2024-11-06 04:40:42,740][00514] Num frames 6500... [2024-11-06 04:40:42,872][00514] Num frames 6600... [2024-11-06 04:40:43,004][00514] Num frames 6700... [2024-11-06 04:40:43,177][00514] Num frames 6800... [2024-11-06 04:40:43,373][00514] Num frames 6900... [2024-11-06 04:40:43,550][00514] Num frames 7000... [2024-11-06 04:40:43,611][00514] Avg episode rewards: #0: 35.402, true rewards: #0: 14.002 [2024-11-06 04:40:43,613][00514] Avg episode reward: 35.402, avg true_objective: 14.002 [2024-11-06 04:40:43,788][00514] Num frames 7100... [2024-11-06 04:40:43,956][00514] Num frames 7200... [2024-11-06 04:40:44,117][00514] Num frames 7300... [2024-11-06 04:40:44,293][00514] Num frames 7400... [2024-11-06 04:40:44,464][00514] Num frames 7500... [2024-11-06 04:40:44,635][00514] Num frames 7600... [2024-11-06 04:40:44,815][00514] Num frames 7700... [2024-11-06 04:40:44,996][00514] Num frames 7800... [2024-11-06 04:40:45,227][00514] Avg episode rewards: #0: 32.328, true rewards: #0: 13.162 [2024-11-06 04:40:45,229][00514] Avg episode reward: 32.328, avg true_objective: 13.162 [2024-11-06 04:40:45,235][00514] Num frames 7900... [2024-11-06 04:40:45,429][00514] Num frames 8000... [2024-11-06 04:40:45,547][00514] Num frames 8100... [2024-11-06 04:40:45,669][00514] Num frames 8200... [2024-11-06 04:40:45,798][00514] Num frames 8300... [2024-11-06 04:40:45,921][00514] Num frames 8400... [2024-11-06 04:40:46,047][00514] Num frames 8500... [2024-11-06 04:40:46,172][00514] Num frames 8600... [2024-11-06 04:40:46,299][00514] Num frames 8700... [2024-11-06 04:40:46,429][00514] Num frames 8800... [2024-11-06 04:40:46,552][00514] Num frames 8900... [2024-11-06 04:40:46,679][00514] Num frames 9000... [2024-11-06 04:40:46,805][00514] Num frames 9100... [2024-11-06 04:40:46,926][00514] Num frames 9200... [2024-11-06 04:40:47,050][00514] Num frames 9300... [2024-11-06 04:40:47,197][00514] Avg episode rewards: #0: 32.391, true rewards: #0: 13.391 [2024-11-06 04:40:47,198][00514] Avg episode reward: 32.391, avg true_objective: 13.391 [2024-11-06 04:40:47,232][00514] Num frames 9400... [2024-11-06 04:40:47,362][00514] Num frames 9500... [2024-11-06 04:40:47,486][00514] Num frames 9600... [2024-11-06 04:40:47,610][00514] Num frames 9700... [2024-11-06 04:40:47,735][00514] Num frames 9800... [2024-11-06 04:40:47,858][00514] Num frames 9900... [2024-11-06 04:40:47,978][00514] Num frames 10000... [2024-11-06 04:40:48,102][00514] Num frames 10100... [2024-11-06 04:40:48,228][00514] Num frames 10200... [2024-11-06 04:40:48,362][00514] Num frames 10300... [2024-11-06 04:40:48,491][00514] Num frames 10400... [2024-11-06 04:40:48,610][00514] Num frames 10500... [2024-11-06 04:40:48,730][00514] Num frames 10600... [2024-11-06 04:40:48,887][00514] Num frames 10700... [2024-11-06 04:40:48,956][00514] Avg episode rewards: #0: 32.637, true rewards: #0: 13.387 [2024-11-06 04:40:48,957][00514] Avg episode reward: 32.637, avg true_objective: 13.387 [2024-11-06 04:40:49,066][00514] Num frames 10800... [2024-11-06 04:40:49,202][00514] Num frames 10900... [2024-11-06 04:40:49,338][00514] Num frames 11000... [2024-11-06 04:40:49,471][00514] Num frames 11100... [2024-11-06 04:40:49,594][00514] Num frames 11200... [2024-11-06 04:40:49,716][00514] Num frames 11300... [2024-11-06 04:40:49,839][00514] Num frames 11400... [2024-11-06 04:40:49,967][00514] Num frames 11500... [2024-11-06 04:40:50,098][00514] Num frames 11600... [2024-11-06 04:40:50,221][00514] Num frames 11700... [2024-11-06 04:40:50,348][00514] Num frames 11800... [2024-11-06 04:40:50,477][00514] Num frames 11900... [2024-11-06 04:40:50,596][00514] Num frames 12000... [2024-11-06 04:40:50,715][00514] Num frames 12100... [2024-11-06 04:40:50,832][00514] Num frames 12200... [2024-11-06 04:40:50,981][00514] Avg episode rewards: #0: 33.198, true rewards: #0: 13.642 [2024-11-06 04:40:50,982][00514] Avg episode reward: 33.198, avg true_objective: 13.642 [2024-11-06 04:40:51,011][00514] Num frames 12300... [2024-11-06 04:40:51,132][00514] Num frames 12400... [2024-11-06 04:40:51,258][00514] Num frames 12500... [2024-11-06 04:40:51,384][00514] Num frames 12600... [2024-11-06 04:40:51,516][00514] Num frames 12700... [2024-11-06 04:40:51,637][00514] Num frames 12800... [2024-11-06 04:40:51,758][00514] Num frames 12900... [2024-11-06 04:40:51,880][00514] Num frames 13000... [2024-11-06 04:40:52,008][00514] Num frames 13100... [2024-11-06 04:40:52,132][00514] Num frames 13200... [2024-11-06 04:40:52,259][00514] Num frames 13300... [2024-11-06 04:40:52,383][00514] Num frames 13400... [2024-11-06 04:40:52,516][00514] Num frames 13500... [2024-11-06 04:40:52,640][00514] Num frames 13600... [2024-11-06 04:40:52,764][00514] Num frames 13700... [2024-11-06 04:40:52,886][00514] Num frames 13800... [2024-11-06 04:40:52,960][00514] Avg episode rewards: #0: 33.814, true rewards: #0: 13.814 [2024-11-06 04:40:52,961][00514] Avg episode reward: 33.814, avg true_objective: 13.814 [2024-11-06 04:42:18,459][00514] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2024-11-06 04:42:19,236][00514] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-11-06 04:42:19,240][00514] Overriding arg 'num_workers' with value 1 passed from command line [2024-11-06 04:42:19,243][00514] Adding new argument 'no_render'=True that is not in the saved config file! [2024-11-06 04:42:19,245][00514] Adding new argument 'save_video'=True that is not in the saved config file! [2024-11-06 04:42:19,248][00514] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-11-06 04:42:19,250][00514] Adding new argument 'video_name'=None that is not in the saved config file! [2024-11-06 04:42:19,251][00514] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2024-11-06 04:42:19,252][00514] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-11-06 04:42:19,255][00514] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2024-11-06 04:42:19,257][00514] Adding new argument 'hf_repository'='InMDev/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2024-11-06 04:42:19,259][00514] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-11-06 04:42:19,260][00514] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-11-06 04:42:19,262][00514] Adding new argument 'train_script'=None that is not in the saved config file! [2024-11-06 04:42:19,263][00514] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-11-06 04:42:19,265][00514] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-11-06 04:42:19,321][00514] RunningMeanStd input shape: (3, 72, 128) [2024-11-06 04:42:19,325][00514] RunningMeanStd input shape: (1,) [2024-11-06 04:42:19,345][00514] ConvEncoder: input_channels=3 [2024-11-06 04:42:19,403][00514] Conv encoder output size: 512 [2024-11-06 04:42:19,405][00514] Policy head output size: 512 [2024-11-06 04:42:19,431][00514] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-11-06 04:42:20,095][00514] Num frames 100... [2024-11-06 04:42:20,255][00514] Num frames 200... [2024-11-06 04:42:20,424][00514] Num frames 300... [2024-11-06 04:42:20,588][00514] Num frames 400... [2024-11-06 04:42:20,747][00514] Num frames 500... [2024-11-06 04:42:20,900][00514] Num frames 600... [2024-11-06 04:42:21,076][00514] Num frames 700... [2024-11-06 04:42:21,239][00514] Num frames 800... [2024-11-06 04:42:21,397][00514] Num frames 900... [2024-11-06 04:42:21,556][00514] Num frames 1000... [2024-11-06 04:42:21,712][00514] Num frames 1100... [2024-11-06 04:42:21,871][00514] Num frames 1200... [2024-11-06 04:42:22,042][00514] Num frames 1300... [2024-11-06 04:42:22,200][00514] Num frames 1400... [2024-11-06 04:42:22,357][00514] Num frames 1500... [2024-11-06 04:42:22,514][00514] Num frames 1600... [2024-11-06 04:42:22,682][00514] Num frames 1700... [2024-11-06 04:42:22,836][00514] Num frames 1800... [2024-11-06 04:42:22,999][00514] Num frames 1900... [2024-11-06 04:42:23,185][00514] Num frames 2000... [2024-11-06 04:42:23,346][00514] Num frames 2100... [2024-11-06 04:42:23,399][00514] Avg episode rewards: #0: 58.999, true rewards: #0: 21.000 [2024-11-06 04:42:23,401][00514] Avg episode reward: 58.999, avg true_objective: 21.000 [2024-11-06 04:42:23,568][00514] Num frames 2200... [2024-11-06 04:42:23,767][00514] Num frames 2300... [2024-11-06 04:42:23,935][00514] Num frames 2400... [2024-11-06 04:42:24,103][00514] Num frames 2500... [2024-11-06 04:42:24,278][00514] Num frames 2600... [2024-11-06 04:42:24,360][00514] Avg episode rewards: #0: 34.559, true rewards: #0: 13.060 [2024-11-06 04:42:24,363][00514] Avg episode reward: 34.559, avg true_objective: 13.060 [2024-11-06 04:42:24,517][00514] Num frames 2700... [2024-11-06 04:42:24,715][00514] Num frames 2800... [2024-11-06 04:42:24,914][00514] Num frames 2900... [2024-11-06 04:42:25,081][00514] Num frames 3000... [2024-11-06 04:42:25,273][00514] Num frames 3100... [2024-11-06 04:42:25,458][00514] Num frames 3200... [2024-11-06 04:42:25,642][00514] Num frames 3300... [2024-11-06 04:42:25,823][00514] Num frames 3400... [2024-11-06 04:42:26,001][00514] Num frames 3500... [2024-11-06 04:42:26,190][00514] Num frames 3600... [2024-11-06 04:42:26,377][00514] Num frames 3700... [2024-11-06 04:42:26,564][00514] Num frames 3800... [2024-11-06 04:42:26,745][00514] Num frames 3900... [2024-11-06 04:42:26,930][00514] Num frames 4000... [2024-11-06 04:42:27,025][00514] Avg episode rewards: #0: 34.733, true rewards: #0: 13.400 [2024-11-06 04:42:27,027][00514] Avg episode reward: 34.733, avg true_objective: 13.400 [2024-11-06 04:42:27,164][00514] Num frames 4100... [2024-11-06 04:42:27,334][00514] Num frames 4200... [2024-11-06 04:42:27,508][00514] Num frames 4300... [2024-11-06 04:42:27,694][00514] Num frames 4400... [2024-11-06 04:42:27,890][00514] Num frames 4500... [2024-11-06 04:42:28,064][00514] Num frames 4600... [2024-11-06 04:42:28,226][00514] Avg episode rewards: #0: 29.400, true rewards: #0: 11.650 [2024-11-06 04:42:28,229][00514] Avg episode reward: 29.400, avg true_objective: 11.650 [2024-11-06 04:42:28,325][00514] Num frames 4700... [2024-11-06 04:42:28,492][00514] Num frames 4800... [2024-11-06 04:42:28,655][00514] Num frames 4900... [2024-11-06 04:42:28,823][00514] Num frames 5000... [2024-11-06 04:42:29,008][00514] Num frames 5100... [2024-11-06 04:42:29,177][00514] Num frames 5200... [2024-11-06 04:42:29,351][00514] Num frames 5300... [2024-11-06 04:42:29,521][00514] Num frames 5400... [2024-11-06 04:42:29,698][00514] Num frames 5500... [2024-11-06 04:42:29,875][00514] Num frames 5600... [2024-11-06 04:42:30,046][00514] Num frames 5700... [2024-11-06 04:42:30,221][00514] Num frames 5800... [2024-11-06 04:42:30,377][00514] Num frames 5900... [2024-11-06 04:42:30,508][00514] Num frames 6000... [2024-11-06 04:42:30,632][00514] Num frames 6100... [2024-11-06 04:42:30,754][00514] Num frames 6200... [2024-11-06 04:42:30,878][00514] Num frames 6300... [2024-11-06 04:42:31,003][00514] Num frames 6400... [2024-11-06 04:42:31,130][00514] Num frames 6500... [2024-11-06 04:42:31,258][00514] Num frames 6600... [2024-11-06 04:42:31,381][00514] Num frames 6700... [2024-11-06 04:42:31,515][00514] Avg episode rewards: #0: 34.919, true rewards: #0: 13.520 [2024-11-06 04:42:31,517][00514] Avg episode reward: 34.919, avg true_objective: 13.520 [2024-11-06 04:42:31,568][00514] Num frames 6800... [2024-11-06 04:42:31,697][00514] Num frames 6900... [2024-11-06 04:42:31,831][00514] Num frames 7000... [2024-11-06 04:42:31,951][00514] Num frames 7100... [2024-11-06 04:42:32,072][00514] Num frames 7200... [2024-11-06 04:42:32,193][00514] Num frames 7300... [2024-11-06 04:42:32,323][00514] Num frames 7400... [2024-11-06 04:42:32,418][00514] Avg episode rewards: #0: 31.220, true rewards: #0: 12.387 [2024-11-06 04:42:32,420][00514] Avg episode reward: 31.220, avg true_objective: 12.387 [2024-11-06 04:42:32,517][00514] Num frames 7500... [2024-11-06 04:42:32,639][00514] Num frames 7600... [2024-11-06 04:42:32,761][00514] Num frames 7700... [2024-11-06 04:42:32,886][00514] Num frames 7800... [2024-11-06 04:42:33,010][00514] Num frames 7900... [2024-11-06 04:42:33,132][00514] Num frames 8000... [2024-11-06 04:42:33,255][00514] Num frames 8100... [2024-11-06 04:42:33,380][00514] Num frames 8200... [2024-11-06 04:42:33,507][00514] Num frames 8300... [2024-11-06 04:42:33,626][00514] Num frames 8400... [2024-11-06 04:42:33,745][00514] Num frames 8500... [2024-11-06 04:42:33,867][00514] Num frames 8600... [2024-11-06 04:42:33,985][00514] Num frames 8700... [2024-11-06 04:42:34,105][00514] Num frames 8800... [2024-11-06 04:42:34,225][00514] Num frames 8900... [2024-11-06 04:42:34,354][00514] Num frames 9000... [2024-11-06 04:42:34,482][00514] Num frames 9100... [2024-11-06 04:42:34,612][00514] Num frames 9200... [2024-11-06 04:42:34,734][00514] Num frames 9300... [2024-11-06 04:42:34,859][00514] Num frames 9400... [2024-11-06 04:42:34,978][00514] Num frames 9500... [2024-11-06 04:42:35,050][00514] Avg episode rewards: #0: 34.588, true rewards: #0: 13.589 [2024-11-06 04:42:35,051][00514] Avg episode reward: 34.588, avg true_objective: 13.589 [2024-11-06 04:42:35,159][00514] Num frames 9600... [2024-11-06 04:42:35,285][00514] Num frames 9700... [2024-11-06 04:42:35,405][00514] Num frames 9800... [2024-11-06 04:42:35,528][00514] Num frames 9900... [2024-11-06 04:42:35,654][00514] Num frames 10000... [2024-11-06 04:42:35,773][00514] Num frames 10100... [2024-11-06 04:42:35,894][00514] Num frames 10200... [2024-11-06 04:42:36,019][00514] Num frames 10300... [2024-11-06 04:42:36,140][00514] Num frames 10400... [2024-11-06 04:42:36,266][00514] Num frames 10500... [2024-11-06 04:42:36,389][00514] Num frames 10600... [2024-11-06 04:42:36,525][00514] Num frames 10700... [2024-11-06 04:42:36,656][00514] Num frames 10800... [2024-11-06 04:42:36,777][00514] Num frames 10900... [2024-11-06 04:42:36,902][00514] Num frames 11000... [2024-11-06 04:42:37,037][00514] Num frames 11100... [2024-11-06 04:42:37,161][00514] Num frames 11200... [2024-11-06 04:42:37,288][00514] Num frames 11300... [2024-11-06 04:42:37,411][00514] Num frames 11400... [2024-11-06 04:42:37,528][00514] Num frames 11500... [2024-11-06 04:42:37,657][00514] Num frames 11600... [2024-11-06 04:42:37,728][00514] Avg episode rewards: #0: 37.889, true rewards: #0: 14.515 [2024-11-06 04:42:37,729][00514] Avg episode reward: 37.889, avg true_objective: 14.515 [2024-11-06 04:42:37,835][00514] Num frames 11700... [2024-11-06 04:42:37,951][00514] Num frames 11800... [2024-11-06 04:42:38,068][00514] Num frames 11900... [2024-11-06 04:42:38,188][00514] Num frames 12000... [2024-11-06 04:42:38,313][00514] Num frames 12100... [2024-11-06 04:42:38,434][00514] Num frames 12200... [2024-11-06 04:42:38,577][00514] Avg episode rewards: #0: 35.079, true rewards: #0: 13.636 [2024-11-06 04:42:38,579][00514] Avg episode reward: 35.079, avg true_objective: 13.636 [2024-11-06 04:42:38,624][00514] Num frames 12300... [2024-11-06 04:42:38,746][00514] Num frames 12400... [2024-11-06 04:42:38,866][00514] Num frames 12500... [2024-11-06 04:42:38,988][00514] Num frames 12600... [2024-11-06 04:42:39,110][00514] Num frames 12700... [2024-11-06 04:42:39,238][00514] Num frames 12800... [2024-11-06 04:42:39,371][00514] Num frames 12900... [2024-11-06 04:42:39,493][00514] Num frames 13000... [2024-11-06 04:42:39,616][00514] Num frames 13100... [2024-11-06 04:42:39,745][00514] Num frames 13200... [2024-11-06 04:42:39,873][00514] Num frames 13300... [2024-11-06 04:42:39,998][00514] Num frames 13400... [2024-11-06 04:42:40,166][00514] Avg episode rewards: #0: 34.388, true rewards: #0: 13.488 [2024-11-06 04:42:40,167][00514] Avg episode reward: 34.388, avg true_objective: 13.488 [2024-11-06 04:43:23,138][00514] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2024-11-06 04:43:32,284][00514] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-11-06 04:43:32,286][00514] Overriding arg 'num_workers' with value 1 passed from command line [2024-11-06 04:43:32,288][00514] Adding new argument 'no_render'=True that is not in the saved config file! [2024-11-06 04:43:32,290][00514] Adding new argument 'save_video'=True that is not in the saved config file! [2024-11-06 04:43:32,292][00514] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-11-06 04:43:32,294][00514] Adding new argument 'video_name'=None that is not in the saved config file! [2024-11-06 04:43:32,296][00514] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2024-11-06 04:43:32,297][00514] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-11-06 04:43:32,298][00514] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2024-11-06 04:43:32,299][00514] Adding new argument 'hf_repository'='InMDev/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2024-11-06 04:43:32,300][00514] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-11-06 04:43:32,301][00514] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-11-06 04:43:32,302][00514] Adding new argument 'train_script'=None that is not in the saved config file! [2024-11-06 04:43:32,304][00514] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-11-06 04:43:32,305][00514] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-11-06 04:43:32,342][00514] RunningMeanStd input shape: (3, 72, 128) [2024-11-06 04:43:32,343][00514] RunningMeanStd input shape: (1,) [2024-11-06 04:43:32,362][00514] ConvEncoder: input_channels=3 [2024-11-06 04:43:32,403][00514] Conv encoder output size: 512 [2024-11-06 04:43:32,404][00514] Policy head output size: 512 [2024-11-06 04:43:32,423][00514] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-11-06 04:43:32,936][00514] Num frames 100... [2024-11-06 04:43:33,098][00514] Num frames 200... [2024-11-06 04:43:33,264][00514] Num frames 300... [2024-11-06 04:43:33,440][00514] Num frames 400... [2024-11-06 04:43:33,608][00514] Num frames 500... [2024-11-06 04:43:33,773][00514] Num frames 600... [2024-11-06 04:43:33,933][00514] Num frames 700... [2024-11-06 04:43:34,105][00514] Avg episode rewards: #0: 17.680, true rewards: #0: 7.680 [2024-11-06 04:43:34,108][00514] Avg episode reward: 17.680, avg true_objective: 7.680 [2024-11-06 04:43:34,169][00514] Num frames 800... [2024-11-06 04:43:34,350][00514] Num frames 900... [2024-11-06 04:43:34,541][00514] Num frames 1000... [2024-11-06 04:43:34,712][00514] Num frames 1100... [2024-11-06 04:43:34,888][00514] Num frames 1200... [2024-11-06 04:43:35,067][00514] Num frames 1300... [2024-11-06 04:43:35,233][00514] Num frames 1400... [2024-11-06 04:43:35,356][00514] Num frames 1500... [2024-11-06 04:43:35,484][00514] Num frames 1600... [2024-11-06 04:43:35,612][00514] Num frames 1700... [2024-11-06 04:43:35,737][00514] Num frames 1800... [2024-11-06 04:43:35,862][00514] Num frames 1900... [2024-11-06 04:43:35,986][00514] Num frames 2000... [2024-11-06 04:43:36,111][00514] Num frames 2100... [2024-11-06 04:43:36,231][00514] Num frames 2200... [2024-11-06 04:43:36,356][00514] Num frames 2300... [2024-11-06 04:43:36,476][00514] Num frames 2400... [2024-11-06 04:43:36,609][00514] Num frames 2500... [2024-11-06 04:43:36,736][00514] Avg episode rewards: #0: 31.300, true rewards: #0: 12.800 [2024-11-06 04:43:36,737][00514] Avg episode reward: 31.300, avg true_objective: 12.800 [2024-11-06 04:43:36,791][00514] Num frames 2600... [2024-11-06 04:43:36,908][00514] Num frames 2700... [2024-11-06 04:43:37,027][00514] Num frames 2800... [2024-11-06 04:43:37,155][00514] Num frames 2900... [2024-11-06 04:43:37,283][00514] Num frames 3000... [2024-11-06 04:43:37,404][00514] Num frames 3100... [2024-11-06 04:43:37,528][00514] Num frames 3200... [2024-11-06 04:43:37,657][00514] Num frames 3300... [2024-11-06 04:43:37,785][00514] Num frames 3400... [2024-11-06 04:43:37,911][00514] Num frames 3500... [2024-11-06 04:43:38,032][00514] Num frames 3600... [2024-11-06 04:43:38,159][00514] Num frames 3700... [2024-11-06 04:43:38,299][00514] Num frames 3800... [2024-11-06 04:43:38,422][00514] Num frames 3900... [2024-11-06 04:43:38,545][00514] Num frames 4000... [2024-11-06 04:43:38,680][00514] Num frames 4100... [2024-11-06 04:43:38,802][00514] Num frames 4200... [2024-11-06 04:43:38,924][00514] Num frames 4300... [2024-11-06 04:43:39,050][00514] Num frames 4400... [2024-11-06 04:43:39,186][00514] Num frames 4500... [2024-11-06 04:43:39,340][00514] Avg episode rewards: #0: 37.253, true rewards: #0: 15.253 [2024-11-06 04:43:39,342][00514] Avg episode reward: 37.253, avg true_objective: 15.253 [2024-11-06 04:43:39,374][00514] Num frames 4600... [2024-11-06 04:43:39,493][00514] Num frames 4700... [2024-11-06 04:43:39,620][00514] Num frames 4800... [2024-11-06 04:43:39,745][00514] Num frames 4900... [2024-11-06 04:43:39,873][00514] Num frames 5000... [2024-11-06 04:43:39,998][00514] Num frames 5100... [2024-11-06 04:43:40,120][00514] Num frames 5200... [2024-11-06 04:43:40,265][00514] Num frames 5300... [2024-11-06 04:43:40,386][00514] Num frames 5400... [2024-11-06 04:43:40,508][00514] Num frames 5500... [2024-11-06 04:43:40,641][00514] Num frames 5600... [2024-11-06 04:43:40,766][00514] Num frames 5700... [2024-11-06 04:43:40,890][00514] Num frames 5800... [2024-11-06 04:43:41,022][00514] Num frames 5900... [2024-11-06 04:43:41,152][00514] Num frames 6000... [2024-11-06 04:43:41,283][00514] Num frames 6100... [2024-11-06 04:43:41,413][00514] Avg episode rewards: #0: 36.650, true rewards: #0: 15.400 [2024-11-06 04:43:41,414][00514] Avg episode reward: 36.650, avg true_objective: 15.400 [2024-11-06 04:43:41,465][00514] Num frames 6200... [2024-11-06 04:43:41,592][00514] Num frames 6300... [2024-11-06 04:43:41,731][00514] Num frames 6400... [2024-11-06 04:43:41,861][00514] Num frames 6500... [2024-11-06 04:43:41,987][00514] Num frames 6600... [2024-11-06 04:43:42,109][00514] Num frames 6700... [2024-11-06 04:43:42,228][00514] Num frames 6800... [2024-11-06 04:43:42,357][00514] Num frames 6900... [2024-11-06 04:43:42,481][00514] Num frames 7000... [2024-11-06 04:43:42,605][00514] Num frames 7100... [2024-11-06 04:43:42,735][00514] Num frames 7200... [2024-11-06 04:43:42,862][00514] Num frames 7300... [2024-11-06 04:43:42,992][00514] Num frames 7400... [2024-11-06 04:43:43,115][00514] Num frames 7500... [2024-11-06 04:43:43,235][00514] Num frames 7600... [2024-11-06 04:43:43,364][00514] Num frames 7700... [2024-11-06 04:43:43,487][00514] Num frames 7800... [2024-11-06 04:43:43,609][00514] Num frames 7900... [2024-11-06 04:43:43,739][00514] Num frames 8000... [2024-11-06 04:43:43,862][00514] Num frames 8100... [2024-11-06 04:43:43,985][00514] Num frames 8200... [2024-11-06 04:43:44,112][00514] Avg episode rewards: #0: 39.719, true rewards: #0: 16.520 [2024-11-06 04:43:44,114][00514] Avg episode reward: 39.719, avg true_objective: 16.520 [2024-11-06 04:43:44,166][00514] Num frames 8300... [2024-11-06 04:43:44,291][00514] Num frames 8400... [2024-11-06 04:43:44,412][00514] Num frames 8500... [2024-11-06 04:43:44,530][00514] Num frames 8600... [2024-11-06 04:43:44,650][00514] Num frames 8700... [2024-11-06 04:43:44,823][00514] Avg episode rewards: #0: 35.326, true rewards: #0: 14.660 [2024-11-06 04:43:44,825][00514] Avg episode reward: 35.326, avg true_objective: 14.660 [2024-11-06 04:43:44,834][00514] Num frames 8800... [2024-11-06 04:43:44,955][00514] Num frames 8900... [2024-11-06 04:43:45,074][00514] Num frames 9000... [2024-11-06 04:43:45,211][00514] Num frames 9100... [2024-11-06 04:43:45,387][00514] Num frames 9200... [2024-11-06 04:43:45,563][00514] Num frames 9300... [2024-11-06 04:43:45,740][00514] Num frames 9400... [2024-11-06 04:43:45,915][00514] Num frames 9500... [2024-11-06 04:43:46,082][00514] Num frames 9600... [2024-11-06 04:43:46,248][00514] Num frames 9700... [2024-11-06 04:43:46,415][00514] Num frames 9800... [2024-11-06 04:43:46,509][00514] Avg episode rewards: #0: 33.743, true rewards: #0: 14.029 [2024-11-06 04:43:46,511][00514] Avg episode reward: 33.743, avg true_objective: 14.029 [2024-11-06 04:43:46,642][00514] Num frames 9900... [2024-11-06 04:43:46,818][00514] Num frames 10000... [2024-11-06 04:43:46,992][00514] Num frames 10100... [2024-11-06 04:43:47,167][00514] Num frames 10200... [2024-11-06 04:43:47,285][00514] Avg episode rewards: #0: 30.420, true rewards: #0: 12.795 [2024-11-06 04:43:47,287][00514] Avg episode reward: 30.420, avg true_objective: 12.795 [2024-11-06 04:43:47,403][00514] Num frames 10300... [2024-11-06 04:43:47,577][00514] Num frames 10400... [2024-11-06 04:43:47,704][00514] Num frames 10500... [2024-11-06 04:43:47,831][00514] Num frames 10600... [2024-11-06 04:43:47,964][00514] Num frames 10700... [2024-11-06 04:43:48,086][00514] Num frames 10800... [2024-11-06 04:43:48,212][00514] Num frames 10900... [2024-11-06 04:43:48,341][00514] Num frames 11000... [2024-11-06 04:43:48,459][00514] Num frames 11100... [2024-11-06 04:43:48,582][00514] Num frames 11200... [2024-11-06 04:43:48,707][00514] Num frames 11300... [2024-11-06 04:43:48,838][00514] Num frames 11400... [2024-11-06 04:43:48,967][00514] Num frames 11500... [2024-11-06 04:43:49,115][00514] Avg episode rewards: #0: 30.533, true rewards: #0: 12.867 [2024-11-06 04:43:49,117][00514] Avg episode reward: 30.533, avg true_objective: 12.867 [2024-11-06 04:43:49,148][00514] Num frames 11600... [2024-11-06 04:43:49,273][00514] Num frames 11700... [2024-11-06 04:43:49,395][00514] Num frames 11800... [2024-11-06 04:43:49,521][00514] Num frames 11900... [2024-11-06 04:43:49,650][00514] Num frames 12000... [2024-11-06 04:43:49,774][00514] Num frames 12100... [2024-11-06 04:43:49,899][00514] Num frames 12200... [2024-11-06 04:43:50,037][00514] Num frames 12300... [2024-11-06 04:43:50,163][00514] Num frames 12400... [2024-11-06 04:43:50,297][00514] Num frames 12500... [2024-11-06 04:43:50,362][00514] Avg episode rewards: #0: 30.008, true rewards: #0: 12.508 [2024-11-06 04:43:50,364][00514] Avg episode reward: 30.008, avg true_objective: 12.508 [2024-11-06 04:45:08,961][00514] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2024-11-06 04:45:14,434][00514] The model has been pushed to https://huggingface.co./InMDev/rl_course_vizdoom_health_gathering_supreme [2024-11-06 04:45:55,266][00514] Environment doom_basic already registered, overwriting... [2024-11-06 04:45:55,269][00514] Environment doom_two_colors_easy already registered, overwriting... [2024-11-06 04:45:55,270][00514] Environment doom_two_colors_hard already registered, overwriting... [2024-11-06 04:45:55,272][00514] Environment doom_dm already registered, overwriting... [2024-11-06 04:45:55,273][00514] Environment doom_dwango5 already registered, overwriting... [2024-11-06 04:45:55,275][00514] Environment doom_my_way_home_flat_actions already registered, overwriting... [2024-11-06 04:45:55,276][00514] Environment doom_defend_the_center_flat_actions already registered, overwriting... [2024-11-06 04:45:55,277][00514] Environment doom_my_way_home already registered, overwriting... [2024-11-06 04:45:55,278][00514] Environment doom_deadly_corridor already registered, overwriting... [2024-11-06 04:45:55,280][00514] Environment doom_defend_the_center already registered, overwriting... [2024-11-06 04:45:55,281][00514] Environment doom_defend_the_line already registered, overwriting... [2024-11-06 04:45:55,282][00514] Environment doom_health_gathering already registered, overwriting... [2024-11-06 04:45:55,283][00514] Environment doom_health_gathering_supreme already registered, overwriting... [2024-11-06 04:45:55,284][00514] Environment doom_battle already registered, overwriting... [2024-11-06 04:45:55,286][00514] Environment doom_battle2 already registered, overwriting... [2024-11-06 04:45:55,287][00514] Environment doom_duel_bots already registered, overwriting... [2024-11-06 04:45:55,288][00514] Environment doom_deathmatch_bots already registered, overwriting... [2024-11-06 04:45:55,289][00514] Environment doom_duel already registered, overwriting... [2024-11-06 04:45:55,290][00514] Environment doom_deathmatch_full already registered, overwriting... [2024-11-06 04:45:55,292][00514] Environment doom_benchmark already registered, overwriting... [2024-11-06 04:45:55,293][00514] register_encoder_factory: [2024-11-06 04:45:55,319][00514] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-11-06 04:45:55,321][00514] Overriding arg 'train_for_env_steps' with value 10000000 passed from command line [2024-11-06 04:45:55,327][00514] Experiment dir /content/train_dir/default_experiment already exists! [2024-11-06 04:45:55,329][00514] Resuming existing experiment from /content/train_dir/default_experiment... [2024-11-06 04:45:55,331][00514] Weights and Biases integration disabled [2024-11-06 04:45:55,335][00514] Environment var CUDA_VISIBLE_DEVICES is 0 [2024-11-06 04:45:57,616][00514] Starting experiment with the following configuration: help=False algo=APPO env=doom_health_gathering_supreme experiment=default_experiment train_dir=/content/train_dir restart_behavior=resume device=gpu seed=None num_policies=1 async_rl=True serial_mode=False batched_sampling=False num_batches_to_accumulate=2 worker_num_splits=2 policy_workers_per_policy=1 max_policy_lag=1000 num_workers=8 num_envs_per_worker=4 batch_size=1024 num_batches_per_epoch=1 num_epochs=1 rollout=32 recurrence=32 shuffle_minibatches=False gamma=0.99 reward_scale=1.0 reward_clip=1000.0 value_bootstrap=False normalize_returns=True exploration_loss_coeff=0.001 value_loss_coeff=0.5 kl_loss_coeff=0.0 exploration_loss=symmetric_kl gae_lambda=0.95 ppo_clip_ratio=0.1 ppo_clip_value=0.2 with_vtrace=False vtrace_rho=1.0 vtrace_c=1.0 optimizer=adam adam_eps=1e-06 adam_beta1=0.9 adam_beta2=0.999 max_grad_norm=4.0 learning_rate=0.0001 lr_schedule=constant lr_schedule_kl_threshold=0.008 lr_adaptive_min=1e-06 lr_adaptive_max=0.01 obs_subtract_mean=0.0 obs_scale=255.0 normalize_input=True normalize_input_keys=None decorrelate_experience_max_seconds=0 decorrelate_envs_on_one_worker=True actor_worker_gpus=[] set_workers_cpu_affinity=True force_envs_single_thread=False default_niceness=0 log_to_file=True experiment_summaries_interval=10 flush_summaries_interval=30 stats_avg=100 summaries_use_frameskip=True heartbeat_interval=20 heartbeat_reporting_interval=600 train_for_env_steps=10000000 train_for_seconds=10000000000 save_every_sec=120 keep_checkpoints=2 load_checkpoint_kind=latest save_milestones_sec=-1 save_best_every_sec=5 save_best_metric=reward save_best_after=100000 benchmark=False encoder_mlp_layers=[512, 512] encoder_conv_architecture=convnet_simple encoder_conv_mlp_layers=[512] use_rnn=True rnn_size=512 rnn_type=gru rnn_num_layers=1 decoder_mlp_layers=[] nonlinearity=elu policy_initialization=orthogonal policy_init_gain=1.0 actor_critic_share_weights=True adaptive_stddev=True continuous_tanh_scale=0.0 initial_stddev=1.0 use_env_info_cache=False env_gpu_actions=False env_gpu_observations=True env_frameskip=4 env_framestack=1 pixel_format=CHW use_record_episode_statistics=False with_wandb=False wandb_user=None wandb_project=sample_factory wandb_group=None wandb_job_type=SF wandb_tags=[] with_pbt=False pbt_mix_policies_in_one_env=True pbt_period_env_steps=5000000 pbt_start_mutation=20000000 pbt_replace_fraction=0.3 pbt_mutation_rate=0.15 pbt_replace_reward_gap=0.1 pbt_replace_reward_gap_absolute=1e-06 pbt_optimize_gamma=False pbt_target_objective=true_objective pbt_perturb_min=1.1 pbt_perturb_max=1.5 num_agents=-1 num_humans=0 num_bots=-1 start_bot_difficulty=None timelimit=None res_w=128 res_h=72 wide_aspect_ratio=False eval_env_frameskip=1 fps=35 command_line=--env=doom_health_gathering_supreme --num_workers=8 --num_envs_per_worker=4 --train_for_env_steps=4000000 cli_args={'env': 'doom_health_gathering_supreme', 'num_workers': 8, 'num_envs_per_worker': 4, 'train_for_env_steps': 4000000} git_hash=unknown git_repo_name=not a git repository [2024-11-06 04:45:57,617][00514] Saving configuration to /content/train_dir/default_experiment/config.json... [2024-11-06 04:45:57,622][00514] Rollout worker 0 uses device cpu [2024-11-06 04:45:57,625][00514] Rollout worker 1 uses device cpu [2024-11-06 04:45:57,627][00514] Rollout worker 2 uses device cpu [2024-11-06 04:45:57,628][00514] Rollout worker 3 uses device cpu [2024-11-06 04:45:57,629][00514] Rollout worker 4 uses device cpu [2024-11-06 04:45:57,630][00514] Rollout worker 5 uses device cpu [2024-11-06 04:45:57,631][00514] Rollout worker 6 uses device cpu [2024-11-06 04:45:57,632][00514] Rollout worker 7 uses device cpu [2024-11-06 04:45:57,706][00514] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-11-06 04:45:57,707][00514] InferenceWorker_p0-w0: min num requests: 2 [2024-11-06 04:45:57,741][00514] Starting all processes... [2024-11-06 04:45:57,742][00514] Starting process learner_proc0 [2024-11-06 04:45:57,791][00514] Starting all processes... [2024-11-06 04:45:57,799][00514] Starting process inference_proc0-0 [2024-11-06 04:45:57,799][00514] Starting process rollout_proc0 [2024-11-06 04:45:57,802][00514] Starting process rollout_proc1 [2024-11-06 04:45:57,802][00514] Starting process rollout_proc2 [2024-11-06 04:45:57,804][00514] Starting process rollout_proc3 [2024-11-06 04:45:57,804][00514] Starting process rollout_proc4 [2024-11-06 04:45:57,804][00514] Starting process rollout_proc5 [2024-11-06 04:45:57,804][00514] Starting process rollout_proc6 [2024-11-06 04:45:57,804][00514] Starting process rollout_proc7 [2024-11-06 04:46:14,604][11969] Worker 2 uses CPU cores [0] [2024-11-06 04:46:14,912][11973] Worker 5 uses CPU cores [1] [2024-11-06 04:46:14,945][11950] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-11-06 04:46:14,946][11950] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2024-11-06 04:46:15,025][11950] Num visible devices: 1 [2024-11-06 04:46:15,026][11972] Worker 4 uses CPU cores [0] [2024-11-06 04:46:15,034][11968] Worker 0 uses CPU cores [0] [2024-11-06 04:46:15,047][11950] Starting seed is not provided [2024-11-06 04:46:15,048][11950] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-11-06 04:46:15,048][11950] Initializing actor-critic model on device cuda:0 [2024-11-06 04:46:15,049][11950] RunningMeanStd input shape: (3, 72, 128) [2024-11-06 04:46:15,051][11950] RunningMeanStd input shape: (1,) [2024-11-06 04:46:15,110][11974] Worker 7 uses CPU cores [1] [2024-11-06 04:46:15,109][11971] Worker 3 uses CPU cores [1] [2024-11-06 04:46:15,117][11975] Worker 6 uses CPU cores [0] [2024-11-06 04:46:15,123][11950] ConvEncoder: input_channels=3 [2024-11-06 04:46:15,246][11967] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-11-06 04:46:15,247][11967] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2024-11-06 04:46:15,277][11970] Worker 1 uses CPU cores [1] [2024-11-06 04:46:15,335][11967] Num visible devices: 1 [2024-11-06 04:46:15,470][11950] Conv encoder output size: 512 [2024-11-06 04:46:15,470][11950] Policy head output size: 512 [2024-11-06 04:46:15,484][11950] Created Actor Critic model with architecture: [2024-11-06 04:46:15,485][11950] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2024-11-06 04:46:15,599][11950] Using optimizer [2024-11-06 04:46:16,417][11950] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-11-06 04:46:16,450][11950] Loading model from checkpoint [2024-11-06 04:46:16,451][11950] Loaded experiment state at self.train_step=978, self.env_steps=4005888 [2024-11-06 04:46:16,452][11950] Initialized policy 0 weights for model version 978 [2024-11-06 04:46:16,455][11950] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-11-06 04:46:16,462][11950] LearnerWorker_p0 finished initialization! [2024-11-06 04:46:16,576][11967] RunningMeanStd input shape: (3, 72, 128) [2024-11-06 04:46:16,579][11967] RunningMeanStd input shape: (1,) [2024-11-06 04:46:16,600][11967] ConvEncoder: input_channels=3 [2024-11-06 04:46:16,704][11967] Conv encoder output size: 512 [2024-11-06 04:46:16,704][11967] Policy head output size: 512 [2024-11-06 04:46:16,757][00514] Inference worker 0-0 is ready! [2024-11-06 04:46:16,759][00514] All inference workers are ready! Signal rollout workers to start! [2024-11-06 04:46:17,043][11970] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-06 04:46:17,043][11968] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-06 04:46:17,088][11973] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-06 04:46:17,095][11971] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-06 04:46:17,145][11972] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-06 04:46:17,134][11975] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-06 04:46:17,171][11969] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-06 04:46:17,183][11974] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-06 04:46:17,698][00514] Heartbeat connected on Batcher_0 [2024-11-06 04:46:17,702][00514] Heartbeat connected on LearnerWorker_p0 [2024-11-06 04:46:17,756][00514] Heartbeat connected on InferenceWorker_p0-w0 [2024-11-06 04:46:18,273][11968] Decorrelating experience for 0 frames... [2024-11-06 04:46:18,274][11972] Decorrelating experience for 0 frames... [2024-11-06 04:46:18,981][11973] Decorrelating experience for 0 frames... [2024-11-06 04:46:18,982][11970] Decorrelating experience for 0 frames... [2024-11-06 04:46:18,997][11971] Decorrelating experience for 0 frames... [2024-11-06 04:46:19,049][11974] Decorrelating experience for 0 frames... [2024-11-06 04:46:19,651][11972] Decorrelating experience for 32 frames... [2024-11-06 04:46:19,655][11968] Decorrelating experience for 32 frames... [2024-11-06 04:46:20,327][11975] Decorrelating experience for 0 frames... [2024-11-06 04:46:20,337][00514] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 4005888. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-11-06 04:46:20,623][11970] Decorrelating experience for 32 frames... [2024-11-06 04:46:20,632][11973] Decorrelating experience for 32 frames... [2024-11-06 04:46:20,744][11974] Decorrelating experience for 32 frames... [2024-11-06 04:46:22,461][11975] Decorrelating experience for 32 frames... [2024-11-06 04:46:22,473][11969] Decorrelating experience for 0 frames... [2024-11-06 04:46:22,518][11971] Decorrelating experience for 32 frames... [2024-11-06 04:46:23,242][11973] Decorrelating experience for 64 frames... [2024-11-06 04:46:23,248][11968] Decorrelating experience for 64 frames... [2024-11-06 04:46:23,310][11974] Decorrelating experience for 64 frames... [2024-11-06 04:46:24,746][11970] Decorrelating experience for 64 frames... [2024-11-06 04:46:25,144][11969] Decorrelating experience for 32 frames... [2024-11-06 04:46:25,190][11972] Decorrelating experience for 64 frames... [2024-11-06 04:46:25,303][11973] Decorrelating experience for 96 frames... [2024-11-06 04:46:25,335][00514] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 4005888. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-11-06 04:46:25,382][11974] Decorrelating experience for 96 frames... [2024-11-06 04:46:25,689][00514] Heartbeat connected on RolloutWorker_w5 [2024-11-06 04:46:25,730][00514] Heartbeat connected on RolloutWorker_w7 [2024-11-06 04:46:25,929][11968] Decorrelating experience for 96 frames... [2024-11-06 04:46:26,057][00514] Heartbeat connected on RolloutWorker_w0 [2024-11-06 04:46:26,430][11975] Decorrelating experience for 64 frames... [2024-11-06 04:46:26,509][11970] Decorrelating experience for 96 frames... [2024-11-06 04:46:26,636][00514] Heartbeat connected on RolloutWorker_w1 [2024-11-06 04:46:26,671][11971] Decorrelating experience for 64 frames... [2024-11-06 04:46:29,152][11971] Decorrelating experience for 96 frames... [2024-11-06 04:46:29,244][11972] Decorrelating experience for 96 frames... [2024-11-06 04:46:29,358][11969] Decorrelating experience for 64 frames... [2024-11-06 04:46:29,457][11975] Decorrelating experience for 96 frames... [2024-11-06 04:46:29,709][00514] Heartbeat connected on RolloutWorker_w4 [2024-11-06 04:46:29,781][00514] Heartbeat connected on RolloutWorker_w3 [2024-11-06 04:46:30,029][00514] Heartbeat connected on RolloutWorker_w6 [2024-11-06 04:46:30,335][00514] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 4005888. Throughput: 0: 139.8. Samples: 1398. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-11-06 04:46:30,342][00514] Avg episode reward: [(0, '5.010')] [2024-11-06 04:46:31,812][11950] Signal inference workers to stop experience collection... [2024-11-06 04:46:31,873][11967] InferenceWorker_p0-w0: stopping experience collection [2024-11-06 04:46:31,977][11969] Decorrelating experience for 96 frames... [2024-11-06 04:46:32,081][00514] Heartbeat connected on RolloutWorker_w2 [2024-11-06 04:46:33,407][11950] Signal inference workers to resume experience collection... [2024-11-06 04:46:33,408][11967] InferenceWorker_p0-w0: resuming experience collection [2024-11-06 04:46:35,335][00514] Fps is (10 sec: 1638.4, 60 sec: 1092.4, 300 sec: 1092.4). Total num frames: 4022272. Throughput: 0: 281.1. Samples: 4216. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-11-06 04:46:35,338][00514] Avg episode reward: [(0, '9.732')] [2024-11-06 04:46:40,339][00514] Fps is (10 sec: 3275.7, 60 sec: 1638.2, 300 sec: 1638.2). Total num frames: 4038656. Throughput: 0: 335.2. Samples: 6704. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-11-06 04:46:40,343][00514] Avg episode reward: [(0, '13.063')] [2024-11-06 04:46:43,152][11967] Updated weights for policy 0, policy_version 988 (0.0156) [2024-11-06 04:46:45,335][00514] Fps is (10 sec: 3276.8, 60 sec: 1966.2, 300 sec: 1966.2). Total num frames: 4055040. Throughput: 0: 451.3. Samples: 11282. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-11-06 04:46:45,343][00514] Avg episode reward: [(0, '16.635')] [2024-11-06 04:46:50,335][00514] Fps is (10 sec: 3687.7, 60 sec: 2321.2, 300 sec: 2321.2). Total num frames: 4075520. Throughput: 0: 608.8. Samples: 18262. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-11-06 04:46:50,343][00514] Avg episode reward: [(0, '20.191')] [2024-11-06 04:46:52,101][11967] Updated weights for policy 0, policy_version 998 (0.0021) [2024-11-06 04:46:55,335][00514] Fps is (10 sec: 4096.0, 60 sec: 2574.7, 300 sec: 2574.7). Total num frames: 4096000. Throughput: 0: 622.0. Samples: 21770. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-06 04:46:55,337][00514] Avg episode reward: [(0, '21.731')] [2024-11-06 04:47:00,336][00514] Fps is (10 sec: 3276.8, 60 sec: 2560.1, 300 sec: 2560.1). Total num frames: 4108288. Throughput: 0: 643.3. Samples: 25732. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-11-06 04:47:00,342][00514] Avg episode reward: [(0, '22.081')] [2024-11-06 04:47:04,305][11967] Updated weights for policy 0, policy_version 1008 (0.0025) [2024-11-06 04:47:05,336][00514] Fps is (10 sec: 3686.3, 60 sec: 2821.8, 300 sec: 2821.8). Total num frames: 4132864. Throughput: 0: 709.3. Samples: 31918. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-06 04:47:05,340][00514] Avg episode reward: [(0, '24.704')] [2024-11-06 04:47:10,335][00514] Fps is (10 sec: 4505.6, 60 sec: 2949.2, 300 sec: 2949.2). Total num frames: 4153344. Throughput: 0: 787.0. Samples: 35416. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-11-06 04:47:10,343][00514] Avg episode reward: [(0, '25.783')] [2024-11-06 04:47:15,012][11967] Updated weights for policy 0, policy_version 1018 (0.0016) [2024-11-06 04:47:15,336][00514] Fps is (10 sec: 3686.4, 60 sec: 2979.0, 300 sec: 2979.0). Total num frames: 4169728. Throughput: 0: 874.0. Samples: 40730. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 04:47:15,340][00514] Avg episode reward: [(0, '24.886')] [2024-11-06 04:47:20,335][00514] Fps is (10 sec: 3686.4, 60 sec: 3072.1, 300 sec: 3072.1). Total num frames: 4190208. Throughput: 0: 928.1. Samples: 45980. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-06 04:47:20,340][00514] Avg episode reward: [(0, '23.001')] [2024-11-06 04:47:24,702][11967] Updated weights for policy 0, policy_version 1028 (0.0020) [2024-11-06 04:47:25,336][00514] Fps is (10 sec: 4096.1, 60 sec: 3413.3, 300 sec: 3150.8). Total num frames: 4210688. Throughput: 0: 951.4. Samples: 49514. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-06 04:47:25,342][00514] Avg episode reward: [(0, '22.612')] [2024-11-06 04:47:30,335][00514] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3218.4). Total num frames: 4231168. Throughput: 0: 993.1. Samples: 55970. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-06 04:47:30,338][00514] Avg episode reward: [(0, '21.560')] [2024-11-06 04:47:35,335][00514] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3167.6). Total num frames: 4243456. Throughput: 0: 930.8. Samples: 60146. Policy #0 lag: (min: 0.0, avg: 0.3, max: 2.0) [2024-11-06 04:47:35,338][00514] Avg episode reward: [(0, '21.109')] [2024-11-06 04:47:36,454][11967] Updated weights for policy 0, policy_version 1038 (0.0027) [2024-11-06 04:47:40,335][00514] Fps is (10 sec: 3686.4, 60 sec: 3823.2, 300 sec: 3276.9). Total num frames: 4268032. Throughput: 0: 928.4. Samples: 63546. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-06 04:47:40,343][00514] Avg episode reward: [(0, '23.258')] [2024-11-06 04:47:45,264][11967] Updated weights for policy 0, policy_version 1048 (0.0023) [2024-11-06 04:47:45,335][00514] Fps is (10 sec: 4915.2, 60 sec: 3959.5, 300 sec: 3373.2). Total num frames: 4292608. Throughput: 0: 996.5. Samples: 70574. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 04:47:45,340][00514] Avg episode reward: [(0, '25.902')] [2024-11-06 04:47:50,341][00514] Fps is (10 sec: 3684.4, 60 sec: 3822.6, 300 sec: 3322.2). Total num frames: 4304896. Throughput: 0: 966.6. Samples: 75418. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-06 04:47:50,343][00514] Avg episode reward: [(0, '26.223')] [2024-11-06 04:47:55,335][00514] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3363.1). Total num frames: 4325376. Throughput: 0: 940.3. Samples: 77730. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 04:47:55,337][00514] Avg episode reward: [(0, '26.446')] [2024-11-06 04:47:55,348][11950] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001056_4325376.pth... [2024-11-06 04:47:55,515][11950] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000973_3985408.pth [2024-11-06 04:47:57,012][11967] Updated weights for policy 0, policy_version 1058 (0.0017) [2024-11-06 04:48:00,336][00514] Fps is (10 sec: 4098.1, 60 sec: 3959.5, 300 sec: 3399.7). Total num frames: 4345856. Throughput: 0: 973.2. Samples: 84522. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-06 04:48:00,341][00514] Avg episode reward: [(0, '25.338')] [2024-11-06 04:48:05,336][00514] Fps is (10 sec: 4095.9, 60 sec: 3891.2, 300 sec: 3432.9). Total num frames: 4366336. Throughput: 0: 985.5. Samples: 90330. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-11-06 04:48:05,339][00514] Avg episode reward: [(0, '24.241')] [2024-11-06 04:48:08,444][11967] Updated weights for policy 0, policy_version 1068 (0.0046) [2024-11-06 04:48:10,336][00514] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3388.5). Total num frames: 4378624. Throughput: 0: 952.9. Samples: 92394. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-06 04:48:10,341][00514] Avg episode reward: [(0, '25.143')] [2024-11-06 04:48:15,336][00514] Fps is (10 sec: 3686.5, 60 sec: 3891.2, 300 sec: 3454.9). Total num frames: 4403200. Throughput: 0: 943.9. Samples: 98444. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-11-06 04:48:15,341][00514] Avg episode reward: [(0, '24.103')] [2024-11-06 04:48:17,751][11967] Updated weights for policy 0, policy_version 1078 (0.0016) [2024-11-06 04:48:20,335][00514] Fps is (10 sec: 4505.7, 60 sec: 3891.2, 300 sec: 3481.6). Total num frames: 4423680. Throughput: 0: 1008.2. Samples: 105514. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-06 04:48:20,338][00514] Avg episode reward: [(0, '22.649')] [2024-11-06 04:48:25,336][00514] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3473.4). Total num frames: 4440064. Throughput: 0: 979.2. Samples: 107612. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-11-06 04:48:25,340][00514] Avg episode reward: [(0, '23.625')] [2024-11-06 04:48:29,504][11967] Updated weights for policy 0, policy_version 1088 (0.0025) [2024-11-06 04:48:30,335][00514] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3497.4). Total num frames: 4460544. Throughput: 0: 932.9. Samples: 112556. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-06 04:48:30,342][00514] Avg episode reward: [(0, '23.917')] [2024-11-06 04:48:35,335][00514] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3519.6). Total num frames: 4481024. Throughput: 0: 980.0. Samples: 119512. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 04:48:35,340][00514] Avg episode reward: [(0, '22.851')] [2024-11-06 04:48:38,626][11967] Updated weights for policy 0, policy_version 1098 (0.0014) [2024-11-06 04:48:40,335][00514] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3540.2). Total num frames: 4501504. Throughput: 0: 999.7. Samples: 122718. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-11-06 04:48:40,337][00514] Avg episode reward: [(0, '23.967')] [2024-11-06 04:48:45,343][00514] Fps is (10 sec: 3274.4, 60 sec: 3685.9, 300 sec: 3502.6). Total num frames: 4513792. Throughput: 0: 941.4. Samples: 126890. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-06 04:48:45,348][00514] Avg episode reward: [(0, '24.212')] [2024-11-06 04:48:50,341][00514] Fps is (10 sec: 2456.3, 60 sec: 3686.4, 300 sec: 3467.9). Total num frames: 4526080. Throughput: 0: 901.7. Samples: 130910. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-06 04:48:50,345][00514] Avg episode reward: [(0, '24.639')] [2024-11-06 04:48:52,963][11967] Updated weights for policy 0, policy_version 1108 (0.0019) [2024-11-06 04:48:55,336][00514] Fps is (10 sec: 3279.2, 60 sec: 3686.4, 300 sec: 3488.2). Total num frames: 4546560. Throughput: 0: 915.5. Samples: 133592. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-11-06 04:48:55,339][00514] Avg episode reward: [(0, '25.973')] [2024-11-06 04:49:00,335][00514] Fps is (10 sec: 3688.4, 60 sec: 3618.1, 300 sec: 3481.6). Total num frames: 4562944. Throughput: 0: 895.8. Samples: 138754. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-06 04:49:00,342][00514] Avg episode reward: [(0, '26.639')] [2024-11-06 04:49:04,955][11967] Updated weights for policy 0, policy_version 1118 (0.0024) [2024-11-06 04:49:05,336][00514] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3475.4). Total num frames: 4579328. Throughput: 0: 850.4. Samples: 143782. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 04:49:05,342][00514] Avg episode reward: [(0, '27.015')] [2024-11-06 04:49:05,352][11950] Saving new best policy, reward=27.015! [2024-11-06 04:49:10,336][00514] Fps is (10 sec: 3686.3, 60 sec: 3686.4, 300 sec: 3493.7). Total num frames: 4599808. Throughput: 0: 876.9. Samples: 147074. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-06 04:49:10,341][00514] Avg episode reward: [(0, '27.291')] [2024-11-06 04:49:10,402][11950] Saving new best policy, reward=27.291! [2024-11-06 04:49:15,104][11967] Updated weights for policy 0, policy_version 1128 (0.0024) [2024-11-06 04:49:15,337][00514] Fps is (10 sec: 4095.5, 60 sec: 3618.1, 300 sec: 3510.9). Total num frames: 4620288. Throughput: 0: 903.2. Samples: 153200. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-06 04:49:15,339][00514] Avg episode reward: [(0, '28.192')] [2024-11-06 04:49:15,354][11950] Saving new best policy, reward=28.192! [2024-11-06 04:49:20,335][00514] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3481.6). Total num frames: 4632576. Throughput: 0: 841.1. Samples: 157362. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-06 04:49:20,338][00514] Avg episode reward: [(0, '28.437')] [2024-11-06 04:49:20,343][11950] Saving new best policy, reward=28.437! [2024-11-06 04:49:25,335][00514] Fps is (10 sec: 3686.9, 60 sec: 3618.1, 300 sec: 3520.4). Total num frames: 4657152. Throughput: 0: 845.1. Samples: 160748. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-06 04:49:25,341][00514] Avg episode reward: [(0, '28.540')] [2024-11-06 04:49:25,350][11950] Saving new best policy, reward=28.540! [2024-11-06 04:49:26,152][11967] Updated weights for policy 0, policy_version 1138 (0.0035) [2024-11-06 04:49:30,335][00514] Fps is (10 sec: 4505.6, 60 sec: 3618.1, 300 sec: 3535.5). Total num frames: 4677632. Throughput: 0: 895.1. Samples: 167162. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-06 04:49:30,341][00514] Avg episode reward: [(0, '27.447')] [2024-11-06 04:49:35,335][00514] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3507.9). Total num frames: 4689920. Throughput: 0: 903.1. Samples: 171546. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-06 04:49:35,340][00514] Avg episode reward: [(0, '27.395')] [2024-11-06 04:49:38,345][11967] Updated weights for policy 0, policy_version 1148 (0.0027) [2024-11-06 04:49:40,335][00514] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3522.6). Total num frames: 4710400. Throughput: 0: 896.2. Samples: 173920. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-11-06 04:49:40,339][00514] Avg episode reward: [(0, '26.729')] [2024-11-06 04:49:45,335][00514] Fps is (10 sec: 4096.0, 60 sec: 3618.6, 300 sec: 3536.6). Total num frames: 4730880. Throughput: 0: 927.6. Samples: 180494. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-06 04:49:45,341][00514] Avg episode reward: [(0, '24.710')] [2024-11-06 04:49:47,984][11967] Updated weights for policy 0, policy_version 1158 (0.0021) [2024-11-06 04:49:50,337][00514] Fps is (10 sec: 3685.9, 60 sec: 3686.6, 300 sec: 3530.4). Total num frames: 4747264. Throughput: 0: 936.1. Samples: 185908. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-11-06 04:49:50,340][00514] Avg episode reward: [(0, '24.056')] [2024-11-06 04:49:55,336][00514] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3524.5). Total num frames: 4763648. Throughput: 0: 909.2. Samples: 187986. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-06 04:49:55,338][00514] Avg episode reward: [(0, '23.937')] [2024-11-06 04:49:55,351][11950] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001163_4763648.pth... [2024-11-06 04:49:55,474][11950] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth [2024-11-06 04:49:59,924][11967] Updated weights for policy 0, policy_version 1168 (0.0033) [2024-11-06 04:50:00,335][00514] Fps is (10 sec: 3686.9, 60 sec: 3686.4, 300 sec: 3537.5). Total num frames: 4784128. Throughput: 0: 904.8. Samples: 193916. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 04:50:00,338][00514] Avg episode reward: [(0, '22.979')] [2024-11-06 04:50:05,335][00514] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3549.9). Total num frames: 4804608. Throughput: 0: 956.0. Samples: 200380. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 04:50:05,338][00514] Avg episode reward: [(0, '22.917')] [2024-11-06 04:50:10,335][00514] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3526.1). Total num frames: 4816896. Throughput: 0: 923.8. Samples: 202318. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-11-06 04:50:10,340][00514] Avg episode reward: [(0, '23.958')] [2024-11-06 04:50:11,875][11967] Updated weights for policy 0, policy_version 1178 (0.0025) [2024-11-06 04:50:15,336][00514] Fps is (10 sec: 3276.7, 60 sec: 3618.2, 300 sec: 3538.3). Total num frames: 4837376. Throughput: 0: 896.6. Samples: 207510. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-11-06 04:50:15,338][00514] Avg episode reward: [(0, '24.504')] [2024-11-06 04:50:20,335][00514] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3567.0). Total num frames: 4861952. Throughput: 0: 946.0. Samples: 214116. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-11-06 04:50:20,337][00514] Avg episode reward: [(0, '24.796')] [2024-11-06 04:50:21,084][11967] Updated weights for policy 0, policy_version 1188 (0.0023) [2024-11-06 04:50:25,335][00514] Fps is (10 sec: 4096.1, 60 sec: 3686.4, 300 sec: 3561.0). Total num frames: 4878336. Throughput: 0: 953.2. Samples: 216812. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-11-06 04:50:25,338][00514] Avg episode reward: [(0, '25.458')] [2024-11-06 04:50:30,335][00514] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3539.0). Total num frames: 4890624. Throughput: 0: 895.5. Samples: 220790. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-11-06 04:50:30,338][00514] Avg episode reward: [(0, '25.415')] [2024-11-06 04:50:33,393][11967] Updated weights for policy 0, policy_version 1198 (0.0027) [2024-11-06 04:50:35,335][00514] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3565.9). Total num frames: 4915200. Throughput: 0: 922.6. Samples: 227424. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-11-06 04:50:35,338][00514] Avg episode reward: [(0, '24.517')] [2024-11-06 04:50:40,338][00514] Fps is (10 sec: 4094.8, 60 sec: 3686.2, 300 sec: 3560.3). Total num frames: 4931584. Throughput: 0: 947.9. Samples: 230644. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-06 04:50:40,347][00514] Avg episode reward: [(0, '24.152')] [2024-11-06 04:50:45,150][11967] Updated weights for policy 0, policy_version 1208 (0.0019) [2024-11-06 04:50:45,335][00514] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3555.0). Total num frames: 4947968. Throughput: 0: 913.4. Samples: 235018. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-06 04:50:45,338][00514] Avg episode reward: [(0, '23.536')] [2024-11-06 04:50:50,335][00514] Fps is (10 sec: 3687.5, 60 sec: 3686.5, 300 sec: 3565.1). Total num frames: 4968448. Throughput: 0: 896.6. Samples: 240726. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-06 04:50:50,338][00514] Avg episode reward: [(0, '23.804')] [2024-11-06 04:50:54,914][11967] Updated weights for policy 0, policy_version 1218 (0.0016) [2024-11-06 04:50:55,335][00514] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3574.7). Total num frames: 4988928. Throughput: 0: 926.4. Samples: 244004. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-11-06 04:50:55,341][00514] Avg episode reward: [(0, '23.522')] [2024-11-06 04:51:00,335][00514] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3569.4). Total num frames: 5005312. Throughput: 0: 930.5. Samples: 249384. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 04:51:00,344][00514] Avg episode reward: [(0, '23.297')] [2024-11-06 04:51:05,335][00514] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3564.3). Total num frames: 5021696. Throughput: 0: 887.2. Samples: 254042. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-11-06 04:51:05,342][00514] Avg episode reward: [(0, '24.538')] [2024-11-06 04:51:07,111][11967] Updated weights for policy 0, policy_version 1228 (0.0024) [2024-11-06 04:51:10,335][00514] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3573.4). Total num frames: 5042176. Throughput: 0: 904.0. Samples: 257494. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-11-06 04:51:10,341][00514] Avg episode reward: [(0, '25.555')] [2024-11-06 04:51:15,335][00514] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3582.3). Total num frames: 5062656. Throughput: 0: 965.5. Samples: 264238. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 04:51:15,342][00514] Avg episode reward: [(0, '23.777')] [2024-11-06 04:51:17,676][11967] Updated weights for policy 0, policy_version 1238 (0.0015) [2024-11-06 04:51:20,336][00514] Fps is (10 sec: 3276.7, 60 sec: 3549.8, 300 sec: 3623.9). Total num frames: 5074944. Throughput: 0: 910.7. Samples: 268404. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-11-06 04:51:20,345][00514] Avg episode reward: [(0, '23.595')] [2024-11-06 04:51:25,335][00514] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3707.2). Total num frames: 5099520. Throughput: 0: 907.4. Samples: 271474. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-06 04:51:25,341][00514] Avg episode reward: [(0, '24.125')] [2024-11-06 04:51:27,613][11967] Updated weights for policy 0, policy_version 1248 (0.0017) [2024-11-06 04:51:30,335][00514] Fps is (10 sec: 4505.7, 60 sec: 3822.9, 300 sec: 3721.1). Total num frames: 5120000. Throughput: 0: 958.2. Samples: 278138. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-06 04:51:30,338][00514] Avg episode reward: [(0, '23.536')] [2024-11-06 04:51:35,335][00514] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3721.2). Total num frames: 5136384. Throughput: 0: 945.1. Samples: 283254. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-11-06 04:51:35,338][00514] Avg episode reward: [(0, '24.387')] [2024-11-06 04:51:39,491][11967] Updated weights for policy 0, policy_version 1258 (0.0021) [2024-11-06 04:51:40,335][00514] Fps is (10 sec: 3276.8, 60 sec: 3686.6, 300 sec: 3721.1). Total num frames: 5152768. Throughput: 0: 917.5. Samples: 285290. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-06 04:51:40,342][00514] Avg episode reward: [(0, '23.341')] [2024-11-06 04:51:45,335][00514] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3735.0). Total num frames: 5177344. Throughput: 0: 948.1. Samples: 292050. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-06 04:51:45,340][00514] Avg episode reward: [(0, '24.153')] [2024-11-06 04:51:49,007][11967] Updated weights for policy 0, policy_version 1268 (0.0014) [2024-11-06 04:51:50,335][00514] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3721.1). Total num frames: 5193728. Throughput: 0: 981.9. Samples: 298226. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-06 04:51:50,340][00514] Avg episode reward: [(0, '24.492')] [2024-11-06 04:51:55,336][00514] Fps is (10 sec: 3276.7, 60 sec: 3686.4, 300 sec: 3735.0). Total num frames: 5210112. Throughput: 0: 950.2. Samples: 300254. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-11-06 04:51:55,342][00514] Avg episode reward: [(0, '24.257')] [2024-11-06 04:51:55,353][11950] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001272_5210112.pth... [2024-11-06 04:51:55,487][11950] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001056_4325376.pth [2024-11-06 04:52:00,340][00514] Fps is (10 sec: 3684.8, 60 sec: 3754.4, 300 sec: 3721.1). Total num frames: 5230592. Throughput: 0: 919.6. Samples: 305624. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-06 04:52:00,344][00514] Avg episode reward: [(0, '23.273')] [2024-11-06 04:52:00,811][11967] Updated weights for policy 0, policy_version 1278 (0.0026) [2024-11-06 04:52:05,336][00514] Fps is (10 sec: 4095.9, 60 sec: 3822.9, 300 sec: 3721.1). Total num frames: 5251072. Throughput: 0: 972.8. Samples: 312178. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-06 04:52:05,338][00514] Avg episode reward: [(0, '24.034')] [2024-11-06 04:52:10,337][00514] Fps is (10 sec: 3687.6, 60 sec: 3754.6, 300 sec: 3721.1). Total num frames: 5267456. Throughput: 0: 953.5. Samples: 314384. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 04:52:10,347][00514] Avg episode reward: [(0, '24.478')] [2024-11-06 04:52:12,903][11967] Updated weights for policy 0, policy_version 1288 (0.0022) [2024-11-06 04:52:15,336][00514] Fps is (10 sec: 3276.9, 60 sec: 3686.4, 300 sec: 3707.2). Total num frames: 5283840. Throughput: 0: 908.8. Samples: 319036. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 04:52:15,342][00514] Avg episode reward: [(0, '23.360')] [2024-11-06 04:52:20,335][00514] Fps is (10 sec: 4096.5, 60 sec: 3891.2, 300 sec: 3721.1). Total num frames: 5308416. Throughput: 0: 942.7. Samples: 325674. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 04:52:20,340][00514] Avg episode reward: [(0, '23.758')] [2024-11-06 04:52:22,041][11967] Updated weights for policy 0, policy_version 1298 (0.0023) [2024-11-06 04:52:25,336][00514] Fps is (10 sec: 4095.9, 60 sec: 3754.6, 300 sec: 3707.2). Total num frames: 5324800. Throughput: 0: 968.2. Samples: 328860. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-06 04:52:25,341][00514] Avg episode reward: [(0, '24.632')] [2024-11-06 04:52:30,335][00514] Fps is (10 sec: 2867.2, 60 sec: 3618.1, 300 sec: 3707.2). Total num frames: 5337088. Throughput: 0: 903.6. Samples: 332714. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-06 04:52:30,338][00514] Avg episode reward: [(0, '24.746')] [2024-11-06 04:52:34,385][11967] Updated weights for policy 0, policy_version 1308 (0.0013) [2024-11-06 04:52:35,335][00514] Fps is (10 sec: 3686.5, 60 sec: 3754.7, 300 sec: 3707.2). Total num frames: 5361664. Throughput: 0: 905.3. Samples: 338966. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-06 04:52:35,338][00514] Avg episode reward: [(0, '24.997')] [2024-11-06 04:52:40,335][00514] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3693.3). Total num frames: 5382144. Throughput: 0: 936.2. Samples: 342384. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 04:52:40,342][00514] Avg episode reward: [(0, '24.547')] [2024-11-06 04:52:45,335][00514] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3693.4). Total num frames: 5394432. Throughput: 0: 930.5. Samples: 347494. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-11-06 04:52:45,342][00514] Avg episode reward: [(0, '24.767')] [2024-11-06 04:52:45,359][11967] Updated weights for policy 0, policy_version 1318 (0.0031) [2024-11-06 04:52:50,335][00514] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3693.3). Total num frames: 5414912. Throughput: 0: 908.9. Samples: 353078. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-11-06 04:52:50,339][00514] Avg episode reward: [(0, '25.862')] [2024-11-06 04:52:55,142][11967] Updated weights for policy 0, policy_version 1328 (0.0029) [2024-11-06 04:52:55,335][00514] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3707.2). Total num frames: 5439488. Throughput: 0: 930.0. Samples: 356232. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 04:52:55,338][00514] Avg episode reward: [(0, '24.275')] [2024-11-06 04:53:00,336][00514] Fps is (10 sec: 3686.3, 60 sec: 3686.6, 300 sec: 3679.5). Total num frames: 5451776. Throughput: 0: 955.2. Samples: 362018. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-11-06 04:53:00,341][00514] Avg episode reward: [(0, '23.946')] [2024-11-06 04:53:05,335][00514] Fps is (10 sec: 2867.2, 60 sec: 3618.2, 300 sec: 3693.3). Total num frames: 5468160. Throughput: 0: 893.1. Samples: 365864. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 04:53:05,338][00514] Avg episode reward: [(0, '24.140')] [2024-11-06 04:53:07,821][11967] Updated weights for policy 0, policy_version 1338 (0.0017) [2024-11-06 04:53:10,335][00514] Fps is (10 sec: 3686.5, 60 sec: 3686.5, 300 sec: 3679.5). Total num frames: 5488640. Throughput: 0: 894.6. Samples: 369118. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-11-06 04:53:10,340][00514] Avg episode reward: [(0, '24.258')] [2024-11-06 04:53:15,335][00514] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3679.5). Total num frames: 5509120. Throughput: 0: 956.5. Samples: 375756. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-11-06 04:53:15,338][00514] Avg episode reward: [(0, '25.136')] [2024-11-06 04:53:18,714][11967] Updated weights for policy 0, policy_version 1348 (0.0026) [2024-11-06 04:53:20,339][00514] Fps is (10 sec: 3275.7, 60 sec: 3549.7, 300 sec: 3665.5). Total num frames: 5521408. Throughput: 0: 910.4. Samples: 379936. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-11-06 04:53:20,341][00514] Avg episode reward: [(0, '24.111')] [2024-11-06 04:53:25,340][00514] Fps is (10 sec: 2456.5, 60 sec: 3481.4, 300 sec: 3637.7). Total num frames: 5533696. Throughput: 0: 871.7. Samples: 381616. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-11-06 04:53:25,342][00514] Avg episode reward: [(0, '24.105')] [2024-11-06 04:53:30,335][00514] Fps is (10 sec: 3277.9, 60 sec: 3618.1, 300 sec: 3637.8). Total num frames: 5554176. Throughput: 0: 861.3. Samples: 386254. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-11-06 04:53:30,343][00514] Avg episode reward: [(0, '25.530')] [2024-11-06 04:53:31,990][11967] Updated weights for policy 0, policy_version 1358 (0.0022) [2024-11-06 04:53:35,346][00514] Fps is (10 sec: 3684.3, 60 sec: 3481.0, 300 sec: 3623.8). Total num frames: 5570560. Throughput: 0: 863.1. Samples: 391924. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 04:53:35,352][00514] Avg episode reward: [(0, '24.043')] [2024-11-06 04:53:40,335][00514] Fps is (10 sec: 2867.2, 60 sec: 3345.1, 300 sec: 3624.0). Total num frames: 5582848. Throughput: 0: 837.7. Samples: 393930. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-11-06 04:53:40,343][00514] Avg episode reward: [(0, '24.898')] [2024-11-06 04:53:44,016][11967] Updated weights for policy 0, policy_version 1368 (0.0025) [2024-11-06 04:53:45,335][00514] Fps is (10 sec: 3690.1, 60 sec: 3549.9, 300 sec: 3665.6). Total num frames: 5607424. Throughput: 0: 835.2. Samples: 399602. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 04:53:45,343][00514] Avg episode reward: [(0, '23.750')] [2024-11-06 04:53:50,335][00514] Fps is (10 sec: 4505.6, 60 sec: 3549.9, 300 sec: 3665.6). Total num frames: 5627904. Throughput: 0: 899.4. Samples: 406338. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-06 04:53:50,344][00514] Avg episode reward: [(0, '24.223')] [2024-11-06 04:53:55,341][00514] Fps is (10 sec: 3275.1, 60 sec: 3344.8, 300 sec: 3651.6). Total num frames: 5640192. Throughput: 0: 871.7. Samples: 408348. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-11-06 04:53:55,348][00514] Avg episode reward: [(0, '25.008')] [2024-11-06 04:53:55,372][11950] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001378_5644288.pth... [2024-11-06 04:53:55,377][11967] Updated weights for policy 0, policy_version 1378 (0.0014) [2024-11-06 04:53:55,594][11950] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001163_4763648.pth [2024-11-06 04:54:00,335][00514] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3665.6). Total num frames: 5660672. Throughput: 0: 831.1. Samples: 413154. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-06 04:54:00,340][00514] Avg episode reward: [(0, '23.771')] [2024-11-06 04:54:05,137][11967] Updated weights for policy 0, policy_version 1388 (0.0030) [2024-11-06 04:54:05,336][00514] Fps is (10 sec: 4507.9, 60 sec: 3618.1, 300 sec: 3679.5). Total num frames: 5685248. Throughput: 0: 892.0. Samples: 420072. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 04:54:05,338][00514] Avg episode reward: [(0, '25.047')] [2024-11-06 04:54:10,335][00514] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3665.6). Total num frames: 5701632. Throughput: 0: 925.0. Samples: 423238. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-06 04:54:10,341][00514] Avg episode reward: [(0, '26.281')] [2024-11-06 04:54:15,335][00514] Fps is (10 sec: 3276.9, 60 sec: 3481.6, 300 sec: 3679.5). Total num frames: 5718016. Throughput: 0: 915.0. Samples: 427428. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 04:54:15,339][00514] Avg episode reward: [(0, '26.967')] [2024-11-06 04:54:16,679][11967] Updated weights for policy 0, policy_version 1398 (0.0020) [2024-11-06 04:54:20,335][00514] Fps is (10 sec: 4096.0, 60 sec: 3686.6, 300 sec: 3679.5). Total num frames: 5742592. Throughput: 0: 941.1. Samples: 434262. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-11-06 04:54:20,342][00514] Avg episode reward: [(0, '26.076')] [2024-11-06 04:54:25,335][00514] Fps is (10 sec: 4505.6, 60 sec: 3823.2, 300 sec: 3679.5). Total num frames: 5763072. Throughput: 0: 974.7. Samples: 437790. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-06 04:54:25,340][00514] Avg episode reward: [(0, '26.304')] [2024-11-06 04:54:26,267][11967] Updated weights for policy 0, policy_version 1408 (0.0014) [2024-11-06 04:54:30,336][00514] Fps is (10 sec: 3276.7, 60 sec: 3686.4, 300 sec: 3679.5). Total num frames: 5775360. Throughput: 0: 954.4. Samples: 442550. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-06 04:54:30,341][00514] Avg episode reward: [(0, '25.651')] [2024-11-06 04:54:35,335][00514] Fps is (10 sec: 3276.8, 60 sec: 3755.3, 300 sec: 3679.5). Total num frames: 5795840. Throughput: 0: 928.8. Samples: 448136. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-06 04:54:35,339][00514] Avg episode reward: [(0, '25.433')] [2024-11-06 04:54:37,369][11967] Updated weights for policy 0, policy_version 1418 (0.0023) [2024-11-06 04:54:40,335][00514] Fps is (10 sec: 4505.7, 60 sec: 3959.5, 300 sec: 3693.3). Total num frames: 5820416. Throughput: 0: 960.2. Samples: 451550. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-06 04:54:40,340][00514] Avg episode reward: [(0, '25.702')] [2024-11-06 04:54:45,336][00514] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3679.5). Total num frames: 5832704. Throughput: 0: 979.0. Samples: 457208. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-11-06 04:54:45,339][00514] Avg episode reward: [(0, '25.656')] [2024-11-06 04:54:49,592][11967] Updated weights for policy 0, policy_version 1428 (0.0018) [2024-11-06 04:54:50,335][00514] Fps is (10 sec: 2867.2, 60 sec: 3686.4, 300 sec: 3679.5). Total num frames: 5849088. Throughput: 0: 923.4. Samples: 461624. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-11-06 04:54:50,343][00514] Avg episode reward: [(0, '25.032')] [2024-11-06 04:54:55,335][00514] Fps is (10 sec: 4096.0, 60 sec: 3891.5, 300 sec: 3693.3). Total num frames: 5873664. Throughput: 0: 926.3. Samples: 464922. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-06 04:54:55,338][00514] Avg episode reward: [(0, '25.744')] [2024-11-06 04:54:58,662][11967] Updated weights for policy 0, policy_version 1438 (0.0030) [2024-11-06 04:55:00,335][00514] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3693.3). Total num frames: 5894144. Throughput: 0: 986.6. Samples: 471824. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-06 04:55:00,341][00514] Avg episode reward: [(0, '26.024')] [2024-11-06 04:55:05,338][00514] Fps is (10 sec: 3276.0, 60 sec: 3686.3, 300 sec: 3693.3). Total num frames: 5906432. Throughput: 0: 925.0. Samples: 475890. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 04:55:05,343][00514] Avg episode reward: [(0, '26.351')] [2024-11-06 04:55:10,336][00514] Fps is (10 sec: 3276.7, 60 sec: 3754.6, 300 sec: 3693.3). Total num frames: 5926912. Throughput: 0: 907.4. Samples: 478624. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-06 04:55:10,341][00514] Avg episode reward: [(0, '25.759')] [2024-11-06 04:55:10,734][11967] Updated weights for policy 0, policy_version 1448 (0.0025) [2024-11-06 04:55:15,335][00514] Fps is (10 sec: 4506.6, 60 sec: 3891.2, 300 sec: 3693.3). Total num frames: 5951488. Throughput: 0: 953.7. Samples: 485468. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 04:55:15,343][00514] Avg episode reward: [(0, '26.393')] [2024-11-06 04:55:20,335][00514] Fps is (10 sec: 3686.5, 60 sec: 3686.4, 300 sec: 3679.5). Total num frames: 5963776. Throughput: 0: 943.0. Samples: 490570. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-06 04:55:20,343][00514] Avg episode reward: [(0, '26.416')] [2024-11-06 04:55:22,299][11967] Updated weights for policy 0, policy_version 1458 (0.0027) [2024-11-06 04:55:25,335][00514] Fps is (10 sec: 2867.2, 60 sec: 3618.1, 300 sec: 3693.3). Total num frames: 5980160. Throughput: 0: 911.6. Samples: 492570. Policy #0 lag: (min: 0.0, avg: 0.8, max: 1.0) [2024-11-06 04:55:25,341][00514] Avg episode reward: [(0, '25.156')] [2024-11-06 04:55:30,335][00514] Fps is (10 sec: 4096.0, 60 sec: 3823.0, 300 sec: 3693.3). Total num frames: 6004736. Throughput: 0: 934.6. Samples: 499264. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-06 04:55:30,340][00514] Avg episode reward: [(0, '25.108')] [2024-11-06 04:55:31,579][11967] Updated weights for policy 0, policy_version 1468 (0.0028) [2024-11-06 04:55:35,335][00514] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3707.3). Total num frames: 6025216. Throughput: 0: 970.9. Samples: 505316. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-06 04:55:35,341][00514] Avg episode reward: [(0, '25.372')] [2024-11-06 04:55:40,335][00514] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3693.3). Total num frames: 6037504. Throughput: 0: 943.1. Samples: 507360. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-11-06 04:55:40,341][00514] Avg episode reward: [(0, '25.445')] [2024-11-06 04:55:43,431][11967] Updated weights for policy 0, policy_version 1478 (0.0017) [2024-11-06 04:55:45,336][00514] Fps is (10 sec: 3686.3, 60 sec: 3822.9, 300 sec: 3707.2). Total num frames: 6062080. Throughput: 0: 915.9. Samples: 513040. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-06 04:55:45,338][00514] Avg episode reward: [(0, '25.824')] [2024-11-06 04:55:50,335][00514] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3707.2). Total num frames: 6082560. Throughput: 0: 979.7. Samples: 519974. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-11-06 04:55:50,338][00514] Avg episode reward: [(0, '27.584')] [2024-11-06 04:55:53,368][11967] Updated weights for policy 0, policy_version 1488 (0.0028) [2024-11-06 04:55:55,337][00514] Fps is (10 sec: 3685.9, 60 sec: 3754.6, 300 sec: 3707.2). Total num frames: 6098944. Throughput: 0: 971.7. Samples: 522352. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-06 04:55:55,339][00514] Avg episode reward: [(0, '27.013')] [2024-11-06 04:55:55,352][11950] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001489_6098944.pth... [2024-11-06 04:55:55,527][11950] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001272_5210112.pth [2024-11-06 04:56:00,335][00514] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3707.2). Total num frames: 6115328. Throughput: 0: 918.7. Samples: 526808. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-06 04:56:00,343][00514] Avg episode reward: [(0, '27.050')] [2024-11-06 04:56:04,406][11967] Updated weights for policy 0, policy_version 1498 (0.0024) [2024-11-06 04:56:05,335][00514] Fps is (10 sec: 4096.6, 60 sec: 3891.4, 300 sec: 3721.1). Total num frames: 6139904. Throughput: 0: 953.6. Samples: 533480. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-06 04:56:05,340][00514] Avg episode reward: [(0, '28.661')] [2024-11-06 04:56:05,356][11950] Saving new best policy, reward=28.661! [2024-11-06 04:56:10,338][00514] Fps is (10 sec: 4095.1, 60 sec: 3822.8, 300 sec: 3707.2). Total num frames: 6156288. Throughput: 0: 982.2. Samples: 536772. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 04:56:10,340][00514] Avg episode reward: [(0, '28.684')] [2024-11-06 04:56:10,345][11950] Saving new best policy, reward=28.684! [2024-11-06 04:56:15,335][00514] Fps is (10 sec: 2867.2, 60 sec: 3618.1, 300 sec: 3707.2). Total num frames: 6168576. Throughput: 0: 924.7. Samples: 540876. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 04:56:15,343][00514] Avg episode reward: [(0, '27.858')] [2024-11-06 04:56:16,411][11967] Updated weights for policy 0, policy_version 1508 (0.0031) [2024-11-06 04:56:20,336][00514] Fps is (10 sec: 3687.1, 60 sec: 3822.9, 300 sec: 3707.2). Total num frames: 6193152. Throughput: 0: 930.3. Samples: 547180. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-11-06 04:56:20,338][00514] Avg episode reward: [(0, '25.967')] [2024-11-06 04:56:25,129][11967] Updated weights for policy 0, policy_version 1518 (0.0030) [2024-11-06 04:56:25,335][00514] Fps is (10 sec: 4915.2, 60 sec: 3959.5, 300 sec: 3721.1). Total num frames: 6217728. Throughput: 0: 962.8. Samples: 550686. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 04:56:25,337][00514] Avg episode reward: [(0, '26.651')] [2024-11-06 04:56:30,335][00514] Fps is (10 sec: 3686.5, 60 sec: 3754.7, 300 sec: 3707.2). Total num frames: 6230016. Throughput: 0: 954.8. Samples: 556004. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 04:56:30,345][00514] Avg episode reward: [(0, '27.580')] [2024-11-06 04:56:35,335][00514] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3721.1). Total num frames: 6250496. Throughput: 0: 914.0. Samples: 561102. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 04:56:35,338][00514] Avg episode reward: [(0, '25.649')] [2024-11-06 04:56:37,018][11967] Updated weights for policy 0, policy_version 1528 (0.0027) [2024-11-06 04:56:40,335][00514] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3707.2). Total num frames: 6270976. Throughput: 0: 940.0. Samples: 564652. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-06 04:56:40,338][00514] Avg episode reward: [(0, '25.478')] [2024-11-06 04:56:45,338][00514] Fps is (10 sec: 3685.5, 60 sec: 3754.5, 300 sec: 3707.2). Total num frames: 6287360. Throughput: 0: 976.9. Samples: 570772. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-06 04:56:45,340][00514] Avg episode reward: [(0, '26.276')] [2024-11-06 04:56:48,596][11967] Updated weights for policy 0, policy_version 1538 (0.0017) [2024-11-06 04:56:50,335][00514] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3707.2). Total num frames: 6303744. Throughput: 0: 919.7. Samples: 574868. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-06 04:56:50,342][00514] Avg episode reward: [(0, '26.173')] [2024-11-06 04:56:55,335][00514] Fps is (10 sec: 3687.2, 60 sec: 3754.8, 300 sec: 3707.3). Total num frames: 6324224. Throughput: 0: 919.4. Samples: 578144. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-06 04:56:55,340][00514] Avg episode reward: [(0, '25.976')] [2024-11-06 04:56:58,148][11967] Updated weights for policy 0, policy_version 1548 (0.0028) [2024-11-06 04:57:00,335][00514] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3721.1). Total num frames: 6348800. Throughput: 0: 980.9. Samples: 585018. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 04:57:00,345][00514] Avg episode reward: [(0, '24.956')] [2024-11-06 04:57:05,336][00514] Fps is (10 sec: 3686.1, 60 sec: 3686.4, 300 sec: 3707.2). Total num frames: 6361088. Throughput: 0: 940.9. Samples: 589520. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 04:57:05,339][00514] Avg episode reward: [(0, '26.519')] [2024-11-06 04:57:10,091][11967] Updated weights for policy 0, policy_version 1558 (0.0020) [2024-11-06 04:57:10,335][00514] Fps is (10 sec: 3276.8, 60 sec: 3754.8, 300 sec: 3721.1). Total num frames: 6381568. Throughput: 0: 916.9. Samples: 591946. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 04:57:10,340][00514] Avg episode reward: [(0, '26.769')] [2024-11-06 04:57:15,336][00514] Fps is (10 sec: 4096.2, 60 sec: 3891.2, 300 sec: 3707.2). Total num frames: 6402048. Throughput: 0: 947.9. Samples: 598660. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-06 04:57:15,343][00514] Avg episode reward: [(0, '26.179')] [2024-11-06 04:57:20,335][00514] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3707.2). Total num frames: 6418432. Throughput: 0: 956.7. Samples: 604154. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 04:57:20,340][00514] Avg episode reward: [(0, '26.481')] [2024-11-06 04:57:20,719][11967] Updated weights for policy 0, policy_version 1568 (0.0015) [2024-11-06 04:57:25,336][00514] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3721.1). Total num frames: 6434816. Throughput: 0: 922.7. Samples: 606174. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 04:57:25,338][00514] Avg episode reward: [(0, '26.218')] [2024-11-06 04:57:30,335][00514] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3721.1). Total num frames: 6459392. Throughput: 0: 927.6. Samples: 612514. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-11-06 04:57:30,337][00514] Avg episode reward: [(0, '26.183')] [2024-11-06 04:57:30,890][11967] Updated weights for policy 0, policy_version 1578 (0.0034) [2024-11-06 04:57:35,342][00514] Fps is (10 sec: 4502.9, 60 sec: 3822.5, 300 sec: 3721.0). Total num frames: 6479872. Throughput: 0: 980.8. Samples: 619012. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 04:57:35,348][00514] Avg episode reward: [(0, '26.165')] [2024-11-06 04:57:40,335][00514] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3721.1). Total num frames: 6492160. Throughput: 0: 953.0. Samples: 621028. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-11-06 04:57:40,342][00514] Avg episode reward: [(0, '25.089')] [2024-11-06 04:57:42,865][11967] Updated weights for policy 0, policy_version 1588 (0.0015) [2024-11-06 04:57:45,335][00514] Fps is (10 sec: 3278.9, 60 sec: 3754.8, 300 sec: 3721.1). Total num frames: 6512640. Throughput: 0: 919.1. Samples: 626378. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 04:57:45,338][00514] Avg episode reward: [(0, '25.434')] [2024-11-06 04:57:50,335][00514] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3721.1). Total num frames: 6537216. Throughput: 0: 970.7. Samples: 633202. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 04:57:50,342][00514] Avg episode reward: [(0, '26.349')] [2024-11-06 04:57:51,857][11967] Updated weights for policy 0, policy_version 1598 (0.0025) [2024-11-06 04:57:55,339][00514] Fps is (10 sec: 4094.6, 60 sec: 3822.7, 300 sec: 3735.0). Total num frames: 6553600. Throughput: 0: 981.4. Samples: 636112. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-06 04:57:55,343][00514] Avg episode reward: [(0, '27.470')] [2024-11-06 04:57:55,351][11950] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001600_6553600.pth... [2024-11-06 04:57:55,500][11950] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001378_5644288.pth [2024-11-06 04:58:00,335][00514] Fps is (10 sec: 2867.2, 60 sec: 3618.1, 300 sec: 3721.1). Total num frames: 6565888. Throughput: 0: 912.7. Samples: 639732. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 04:58:00,346][00514] Avg episode reward: [(0, '27.765')] [2024-11-06 04:58:05,336][00514] Fps is (10 sec: 2458.4, 60 sec: 3618.2, 300 sec: 3693.3). Total num frames: 6578176. Throughput: 0: 876.3. Samples: 643586. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-11-06 04:58:05,338][00514] Avg episode reward: [(0, '26.581')] [2024-11-06 04:58:06,771][11967] Updated weights for policy 0, policy_version 1608 (0.0026) [2024-11-06 04:58:10,338][00514] Fps is (10 sec: 3276.0, 60 sec: 3618.0, 300 sec: 3693.3). Total num frames: 6598656. Throughput: 0: 907.2. Samples: 647002. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-06 04:58:10,344][00514] Avg episode reward: [(0, '27.284')] [2024-11-06 04:58:15,335][00514] Fps is (10 sec: 3686.5, 60 sec: 3549.9, 300 sec: 3707.3). Total num frames: 6615040. Throughput: 0: 878.2. Samples: 652034. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-11-06 04:58:15,341][00514] Avg episode reward: [(0, '26.693')] [2024-11-06 04:58:18,404][11967] Updated weights for policy 0, policy_version 1618 (0.0032) [2024-11-06 04:58:20,335][00514] Fps is (10 sec: 3277.6, 60 sec: 3549.9, 300 sec: 3721.2). Total num frames: 6631424. Throughput: 0: 854.6. Samples: 657462. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-06 04:58:20,343][00514] Avg episode reward: [(0, '25.872')] [2024-11-06 04:58:25,335][00514] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3735.0). Total num frames: 6656000. Throughput: 0: 883.8. Samples: 660798. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 04:58:25,337][00514] Avg episode reward: [(0, '24.041')] [2024-11-06 04:58:27,951][11967] Updated weights for policy 0, policy_version 1628 (0.0029) [2024-11-06 04:58:30,335][00514] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3735.1). Total num frames: 6672384. Throughput: 0: 898.4. Samples: 666806. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 04:58:30,342][00514] Avg episode reward: [(0, '24.167')] [2024-11-06 04:58:35,335][00514] Fps is (10 sec: 3276.8, 60 sec: 3482.0, 300 sec: 3748.9). Total num frames: 6688768. Throughput: 0: 835.4. Samples: 670794. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-06 04:58:35,337][00514] Avg episode reward: [(0, '24.279')] [2024-11-06 04:58:39,803][11967] Updated weights for policy 0, policy_version 1638 (0.0019) [2024-11-06 04:58:40,335][00514] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3735.0). Total num frames: 6709248. Throughput: 0: 844.7. Samples: 674122. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-06 04:58:40,338][00514] Avg episode reward: [(0, '25.652')] [2024-11-06 04:58:45,337][00514] Fps is (10 sec: 4095.2, 60 sec: 3618.0, 300 sec: 3735.0). Total num frames: 6729728. Throughput: 0: 914.8. Samples: 680900. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-06 04:58:45,347][00514] Avg episode reward: [(0, '24.671')] [2024-11-06 04:58:50,335][00514] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3748.9). Total num frames: 6746112. Throughput: 0: 930.9. Samples: 685478. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-06 04:58:50,346][00514] Avg episode reward: [(0, '25.799')] [2024-11-06 04:58:51,444][11967] Updated weights for policy 0, policy_version 1648 (0.0024) [2024-11-06 04:58:55,335][00514] Fps is (10 sec: 3687.1, 60 sec: 3550.1, 300 sec: 3748.9). Total num frames: 6766592. Throughput: 0: 913.0. Samples: 688086. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 04:58:55,341][00514] Avg episode reward: [(0, '27.634')] [2024-11-06 04:59:00,335][00514] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3735.0). Total num frames: 6787072. Throughput: 0: 955.2. Samples: 695018. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-06 04:59:00,338][00514] Avg episode reward: [(0, '28.709')] [2024-11-06 04:59:00,341][11950] Saving new best policy, reward=28.709! [2024-11-06 04:59:00,630][11967] Updated weights for policy 0, policy_version 1658 (0.0015) [2024-11-06 04:59:05,339][00514] Fps is (10 sec: 3685.2, 60 sec: 3754.5, 300 sec: 3735.0). Total num frames: 6803456. Throughput: 0: 951.3. Samples: 700272. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 04:59:05,343][00514] Avg episode reward: [(0, '28.418')] [2024-11-06 04:59:10,335][00514] Fps is (10 sec: 3276.8, 60 sec: 3686.5, 300 sec: 3735.0). Total num frames: 6819840. Throughput: 0: 924.5. Samples: 702400. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-06 04:59:10,342][00514] Avg episode reward: [(0, '28.680')] [2024-11-06 04:59:12,382][11967] Updated weights for policy 0, policy_version 1668 (0.0015) [2024-11-06 04:59:15,335][00514] Fps is (10 sec: 4097.3, 60 sec: 3822.9, 300 sec: 3735.0). Total num frames: 6844416. Throughput: 0: 928.6. Samples: 708594. Policy #0 lag: (min: 0.0, avg: 0.3, max: 2.0) [2024-11-06 04:59:15,342][00514] Avg episode reward: [(0, '28.891')] [2024-11-06 04:59:15,354][11950] Saving new best policy, reward=28.891! [2024-11-06 04:59:20,336][00514] Fps is (10 sec: 4505.4, 60 sec: 3891.2, 300 sec: 3735.0). Total num frames: 6864896. Throughput: 0: 986.1. Samples: 715170. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-11-06 04:59:20,339][00514] Avg episode reward: [(0, '28.329')] [2024-11-06 04:59:22,818][11967] Updated weights for policy 0, policy_version 1678 (0.0013) [2024-11-06 04:59:25,335][00514] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3735.0). Total num frames: 6877184. Throughput: 0: 957.1. Samples: 717190. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-11-06 04:59:25,340][00514] Avg episode reward: [(0, '28.074')] [2024-11-06 04:59:30,335][00514] Fps is (10 sec: 3277.0, 60 sec: 3754.7, 300 sec: 3735.0). Total num frames: 6897664. Throughput: 0: 919.0. Samples: 722252. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-11-06 04:59:30,337][00514] Avg episode reward: [(0, '27.800')] [2024-11-06 04:59:33,632][11967] Updated weights for policy 0, policy_version 1688 (0.0018) [2024-11-06 04:59:35,336][00514] Fps is (10 sec: 4095.9, 60 sec: 3822.9, 300 sec: 3721.1). Total num frames: 6918144. Throughput: 0: 959.8. Samples: 728670. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-06 04:59:35,341][00514] Avg episode reward: [(0, '27.082')] [2024-11-06 04:59:40,338][00514] Fps is (10 sec: 3685.6, 60 sec: 3754.5, 300 sec: 3735.0). Total num frames: 6934528. Throughput: 0: 964.1. Samples: 731474. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-11-06 04:59:40,342][00514] Avg episode reward: [(0, '26.113')] [2024-11-06 04:59:45,336][00514] Fps is (10 sec: 3276.9, 60 sec: 3686.5, 300 sec: 3735.0). Total num frames: 6950912. Throughput: 0: 896.2. Samples: 735346. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-06 04:59:45,343][00514] Avg episode reward: [(0, '26.003')] [2024-11-06 04:59:46,101][11967] Updated weights for policy 0, policy_version 1698 (0.0027) [2024-11-06 04:59:50,335][00514] Fps is (10 sec: 3687.2, 60 sec: 3754.7, 300 sec: 3721.1). Total num frames: 6971392. Throughput: 0: 924.4. Samples: 741868. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-06 04:59:50,342][00514] Avg episode reward: [(0, '26.436')] [2024-11-06 04:59:55,341][00514] Fps is (10 sec: 4093.9, 60 sec: 3754.3, 300 sec: 3721.0). Total num frames: 6991872. Throughput: 0: 951.3. Samples: 745212. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-06 04:59:55,343][00514] Avg episode reward: [(0, '25.649')] [2024-11-06 04:59:55,351][11950] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001707_6991872.pth... [2024-11-06 04:59:55,536][11950] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001489_6098944.pth [2024-11-06 04:59:56,192][11967] Updated weights for policy 0, policy_version 1708 (0.0017) [2024-11-06 05:00:00,335][00514] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3721.1). Total num frames: 7004160. Throughput: 0: 918.2. Samples: 749912. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-06 05:00:00,342][00514] Avg episode reward: [(0, '26.027')] [2024-11-06 05:00:05,335][00514] Fps is (10 sec: 3278.5, 60 sec: 3686.6, 300 sec: 3721.1). Total num frames: 7024640. Throughput: 0: 896.7. Samples: 755522. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-06 05:00:05,343][00514] Avg episode reward: [(0, '27.891')] [2024-11-06 05:00:07,263][11967] Updated weights for policy 0, policy_version 1718 (0.0026) [2024-11-06 05:00:10,335][00514] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3721.1). Total num frames: 7049216. Throughput: 0: 929.7. Samples: 759028. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 05:00:10,342][00514] Avg episode reward: [(0, '28.095')] [2024-11-06 05:00:15,338][00514] Fps is (10 sec: 4095.0, 60 sec: 3686.3, 300 sec: 3735.0). Total num frames: 7065600. Throughput: 0: 945.5. Samples: 764804. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-06 05:00:15,340][00514] Avg episode reward: [(0, '28.138')] [2024-11-06 05:00:18,923][11967] Updated weights for policy 0, policy_version 1728 (0.0017) [2024-11-06 05:00:20,335][00514] Fps is (10 sec: 3276.8, 60 sec: 3618.2, 300 sec: 3735.0). Total num frames: 7081984. Throughput: 0: 909.2. Samples: 769582. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-06 05:00:20,338][00514] Avg episode reward: [(0, '27.414')] [2024-11-06 05:00:25,335][00514] Fps is (10 sec: 4097.0, 60 sec: 3822.9, 300 sec: 3735.0). Total num frames: 7106560. Throughput: 0: 923.8. Samples: 773044. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 05:00:25,338][00514] Avg episode reward: [(0, '28.532')] [2024-11-06 05:00:27,813][11967] Updated weights for policy 0, policy_version 1738 (0.0027) [2024-11-06 05:00:30,335][00514] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3735.0). Total num frames: 7127040. Throughput: 0: 989.5. Samples: 779874. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 05:00:30,340][00514] Avg episode reward: [(0, '26.814')] [2024-11-06 05:00:35,338][00514] Fps is (10 sec: 3276.1, 60 sec: 3686.3, 300 sec: 3735.0). Total num frames: 7139328. Throughput: 0: 939.2. Samples: 784132. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-11-06 05:00:35,344][00514] Avg episode reward: [(0, '26.310')] [2024-11-06 05:00:39,501][11967] Updated weights for policy 0, policy_version 1748 (0.0021) [2024-11-06 05:00:40,336][00514] Fps is (10 sec: 3276.7, 60 sec: 3754.8, 300 sec: 3721.1). Total num frames: 7159808. Throughput: 0: 930.8. Samples: 787094. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 05:00:40,341][00514] Avg episode reward: [(0, '25.668')] [2024-11-06 05:00:45,335][00514] Fps is (10 sec: 4506.6, 60 sec: 3891.2, 300 sec: 3735.0). Total num frames: 7184384. Throughput: 0: 984.2. Samples: 794200. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 05:00:45,342][00514] Avg episode reward: [(0, '26.568')] [2024-11-06 05:00:49,672][11967] Updated weights for policy 0, policy_version 1758 (0.0027) [2024-11-06 05:00:50,336][00514] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3735.0). Total num frames: 7200768. Throughput: 0: 976.0. Samples: 799444. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 05:00:50,343][00514] Avg episode reward: [(0, '27.767')] [2024-11-06 05:00:55,335][00514] Fps is (10 sec: 3276.8, 60 sec: 3755.0, 300 sec: 3735.0). Total num frames: 7217152. Throughput: 0: 943.5. Samples: 801486. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 05:00:55,339][00514] Avg episode reward: [(0, '27.573')] [2024-11-06 05:00:59,731][11967] Updated weights for policy 0, policy_version 1768 (0.0015) [2024-11-06 05:01:00,335][00514] Fps is (10 sec: 4096.1, 60 sec: 3959.5, 300 sec: 3735.0). Total num frames: 7241728. Throughput: 0: 971.7. Samples: 808526. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 05:01:00,338][00514] Avg episode reward: [(0, '26.492')] [2024-11-06 05:01:05,337][00514] Fps is (10 sec: 4505.0, 60 sec: 3959.4, 300 sec: 3748.9). Total num frames: 7262208. Throughput: 0: 1003.2. Samples: 814728. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 05:01:05,342][00514] Avg episode reward: [(0, '25.821')] [2024-11-06 05:01:10,335][00514] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3762.8). Total num frames: 7278592. Throughput: 0: 972.0. Samples: 816784. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-11-06 05:01:10,338][00514] Avg episode reward: [(0, '26.387')] [2024-11-06 05:01:11,231][11967] Updated weights for policy 0, policy_version 1778 (0.0030) [2024-11-06 05:01:15,335][00514] Fps is (10 sec: 3686.9, 60 sec: 3891.4, 300 sec: 3748.9). Total num frames: 7299072. Throughput: 0: 950.7. Samples: 822654. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-06 05:01:15,338][00514] Avg episode reward: [(0, '26.062')] [2024-11-06 05:01:20,117][11967] Updated weights for policy 0, policy_version 1788 (0.0027) [2024-11-06 05:01:20,340][00514] Fps is (10 sec: 4503.7, 60 sec: 4027.4, 300 sec: 3748.8). Total num frames: 7323648. Throughput: 0: 1012.5. Samples: 829696. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-06 05:01:20,342][00514] Avg episode reward: [(0, '25.600')] [2024-11-06 05:01:25,335][00514] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3748.9). Total num frames: 7335936. Throughput: 0: 999.2. Samples: 832056. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-06 05:01:25,341][00514] Avg episode reward: [(0, '24.918')] [2024-11-06 05:01:30,335][00514] Fps is (10 sec: 3278.2, 60 sec: 3822.9, 300 sec: 3748.9). Total num frames: 7356416. Throughput: 0: 945.6. Samples: 836754. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-06 05:01:30,341][00514] Avg episode reward: [(0, '25.786')] [2024-11-06 05:01:31,920][11967] Updated weights for policy 0, policy_version 1798 (0.0022) [2024-11-06 05:01:35,336][00514] Fps is (10 sec: 4095.6, 60 sec: 3959.5, 300 sec: 3748.9). Total num frames: 7376896. Throughput: 0: 980.2. Samples: 843554. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-06 05:01:35,349][00514] Avg episode reward: [(0, '24.915')] [2024-11-06 05:01:40,335][00514] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3748.9). Total num frames: 7393280. Throughput: 0: 1005.0. Samples: 846710. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 05:01:40,342][00514] Avg episode reward: [(0, '23.358')] [2024-11-06 05:01:43,503][11967] Updated weights for policy 0, policy_version 1808 (0.0019) [2024-11-06 05:01:45,335][00514] Fps is (10 sec: 3277.1, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 7409664. Throughput: 0: 942.1. Samples: 850920. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-11-06 05:01:45,337][00514] Avg episode reward: [(0, '23.354')] [2024-11-06 05:01:50,336][00514] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3762.8). Total num frames: 7434240. Throughput: 0: 952.8. Samples: 857604. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-06 05:01:50,337][00514] Avg episode reward: [(0, '24.260')] [2024-11-06 05:01:52,554][11967] Updated weights for policy 0, policy_version 1818 (0.0025) [2024-11-06 05:01:55,336][00514] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3748.9). Total num frames: 7454720. Throughput: 0: 983.5. Samples: 861042. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-11-06 05:01:55,343][00514] Avg episode reward: [(0, '23.835')] [2024-11-06 05:01:55,359][11950] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001820_7454720.pth... [2024-11-06 05:01:55,526][11950] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001600_6553600.pth [2024-11-06 05:02:00,340][00514] Fps is (10 sec: 3684.7, 60 sec: 3822.6, 300 sec: 3762.7). Total num frames: 7471104. Throughput: 0: 963.3. Samples: 866008. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 05:02:00,349][00514] Avg episode reward: [(0, '23.930')] [2024-11-06 05:02:04,516][11967] Updated weights for policy 0, policy_version 1828 (0.0017) [2024-11-06 05:02:05,335][00514] Fps is (10 sec: 3276.8, 60 sec: 3754.8, 300 sec: 3748.9). Total num frames: 7487488. Throughput: 0: 927.9. Samples: 871446. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-06 05:02:05,343][00514] Avg episode reward: [(0, '25.563')] [2024-11-06 05:02:10,335][00514] Fps is (10 sec: 4097.9, 60 sec: 3891.2, 300 sec: 3762.8). Total num frames: 7512064. Throughput: 0: 950.3. Samples: 874818. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-06 05:02:10,338][00514] Avg episode reward: [(0, '26.132')] [2024-11-06 05:02:13,891][11967] Updated weights for policy 0, policy_version 1838 (0.0024) [2024-11-06 05:02:15,339][00514] Fps is (10 sec: 4094.7, 60 sec: 3822.7, 300 sec: 3762.7). Total num frames: 7528448. Throughput: 0: 983.8. Samples: 881026. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 05:02:15,346][00514] Avg episode reward: [(0, '26.966')] [2024-11-06 05:02:20,335][00514] Fps is (10 sec: 3276.8, 60 sec: 3686.7, 300 sec: 3762.8). Total num frames: 7544832. Throughput: 0: 929.0. Samples: 885358. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-11-06 05:02:20,338][00514] Avg episode reward: [(0, '27.854')] [2024-11-06 05:02:25,183][11967] Updated weights for policy 0, policy_version 1848 (0.0018) [2024-11-06 05:02:25,336][00514] Fps is (10 sec: 4097.3, 60 sec: 3891.2, 300 sec: 3762.8). Total num frames: 7569408. Throughput: 0: 933.3. Samples: 888708. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-06 05:02:25,339][00514] Avg episode reward: [(0, '28.770')] [2024-11-06 05:02:30,335][00514] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3762.8). Total num frames: 7589888. Throughput: 0: 990.8. Samples: 895506. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-06 05:02:30,340][00514] Avg episode reward: [(0, '27.690')] [2024-11-06 05:02:35,335][00514] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3762.8). Total num frames: 7602176. Throughput: 0: 943.1. Samples: 900044. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-06 05:02:35,341][00514] Avg episode reward: [(0, '27.595')] [2024-11-06 05:02:37,769][11967] Updated weights for policy 0, policy_version 1858 (0.0032) [2024-11-06 05:02:40,335][00514] Fps is (10 sec: 2457.6, 60 sec: 3686.4, 300 sec: 3735.0). Total num frames: 7614464. Throughput: 0: 905.6. Samples: 901792. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 05:02:40,342][00514] Avg episode reward: [(0, '26.626')] [2024-11-06 05:02:45,335][00514] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3721.1). Total num frames: 7634944. Throughput: 0: 889.1. Samples: 906012. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 05:02:45,342][00514] Avg episode reward: [(0, '27.268')] [2024-11-06 05:02:49,336][11967] Updated weights for policy 0, policy_version 1868 (0.0022) [2024-11-06 05:02:50,335][00514] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3721.2). Total num frames: 7651328. Throughput: 0: 905.0. Samples: 912172. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-06 05:02:50,341][00514] Avg episode reward: [(0, '25.533')] [2024-11-06 05:02:55,335][00514] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3735.0). Total num frames: 7667712. Throughput: 0: 877.1. Samples: 914288. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-11-06 05:02:55,343][00514] Avg episode reward: [(0, '24.190')] [2024-11-06 05:03:00,336][00514] Fps is (10 sec: 3686.4, 60 sec: 3618.4, 300 sec: 3762.8). Total num frames: 7688192. Throughput: 0: 870.2. Samples: 920184. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 05:03:00,338][00514] Avg episode reward: [(0, '24.002')] [2024-11-06 05:03:00,474][11967] Updated weights for policy 0, policy_version 1878 (0.0022) [2024-11-06 05:03:05,335][00514] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3776.7). Total num frames: 7712768. Throughput: 0: 928.9. Samples: 927160. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-06 05:03:05,343][00514] Avg episode reward: [(0, '24.765')] [2024-11-06 05:03:10,338][00514] Fps is (10 sec: 3685.5, 60 sec: 3549.7, 300 sec: 3762.7). Total num frames: 7725056. Throughput: 0: 903.8. Samples: 929380. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-11-06 05:03:10,340][00514] Avg episode reward: [(0, '24.447')] [2024-11-06 05:03:12,314][11967] Updated weights for policy 0, policy_version 1888 (0.0025) [2024-11-06 05:03:15,335][00514] Fps is (10 sec: 3276.8, 60 sec: 3618.3, 300 sec: 3776.7). Total num frames: 7745536. Throughput: 0: 852.1. Samples: 933852. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-06 05:03:15,339][00514] Avg episode reward: [(0, '24.206')] [2024-11-06 05:03:20,335][00514] Fps is (10 sec: 4097.0, 60 sec: 3686.4, 300 sec: 3762.8). Total num frames: 7766016. Throughput: 0: 908.8. Samples: 940938. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 05:03:20,337][00514] Avg episode reward: [(0, '24.911')] [2024-11-06 05:03:21,365][11967] Updated weights for policy 0, policy_version 1898 (0.0034) [2024-11-06 05:03:25,335][00514] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3776.7). Total num frames: 7786496. Throughput: 0: 944.6. Samples: 944300. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-11-06 05:03:25,338][00514] Avg episode reward: [(0, '26.126')] [2024-11-06 05:03:30,335][00514] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3776.7). Total num frames: 7802880. Throughput: 0: 946.6. Samples: 948608. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-06 05:03:30,342][00514] Avg episode reward: [(0, '25.744')] [2024-11-06 05:03:33,015][11967] Updated weights for policy 0, policy_version 1908 (0.0044) [2024-11-06 05:03:35,336][00514] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3776.6). Total num frames: 7823360. Throughput: 0: 951.9. Samples: 955008. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 05:03:35,343][00514] Avg episode reward: [(0, '26.970')] [2024-11-06 05:03:40,336][00514] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3790.6). Total num frames: 7847936. Throughput: 0: 977.7. Samples: 958284. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-06 05:03:40,340][00514] Avg episode reward: [(0, '28.269')] [2024-11-06 05:03:43,319][11967] Updated weights for policy 0, policy_version 1918 (0.0025) [2024-11-06 05:03:45,338][00514] Fps is (10 sec: 3685.6, 60 sec: 3754.5, 300 sec: 3776.6). Total num frames: 7860224. Throughput: 0: 961.8. Samples: 963468. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 05:03:45,343][00514] Avg episode reward: [(0, '27.158')] [2024-11-06 05:03:50,335][00514] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3776.7). Total num frames: 7880704. Throughput: 0: 925.2. Samples: 968792. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-11-06 05:03:50,337][00514] Avg episode reward: [(0, '27.348')] [2024-11-06 05:03:53,590][11967] Updated weights for policy 0, policy_version 1928 (0.0019) [2024-11-06 05:03:55,336][00514] Fps is (10 sec: 4096.8, 60 sec: 3891.2, 300 sec: 3776.6). Total num frames: 7901184. Throughput: 0: 954.4. Samples: 972326. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 05:03:55,338][00514] Avg episode reward: [(0, '28.520')] [2024-11-06 05:03:55,348][11950] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001929_7901184.pth... [2024-11-06 05:03:55,505][11950] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001707_6991872.pth [2024-11-06 05:04:00,336][00514] Fps is (10 sec: 4095.9, 60 sec: 3891.2, 300 sec: 3790.6). Total num frames: 7921664. Throughput: 0: 992.1. Samples: 978496. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-06 05:04:00,342][00514] Avg episode reward: [(0, '29.124')] [2024-11-06 05:04:00,347][11950] Saving new best policy, reward=29.124! [2024-11-06 05:04:05,335][00514] Fps is (10 sec: 3276.9, 60 sec: 3686.4, 300 sec: 3776.7). Total num frames: 7933952. Throughput: 0: 933.2. Samples: 982934. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 05:04:05,343][00514] Avg episode reward: [(0, '28.211')] [2024-11-06 05:04:05,360][11967] Updated weights for policy 0, policy_version 1938 (0.0022) [2024-11-06 05:04:10,335][00514] Fps is (10 sec: 3686.5, 60 sec: 3891.4, 300 sec: 3776.7). Total num frames: 7958528. Throughput: 0: 930.1. Samples: 986154. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 05:04:10,338][00514] Avg episode reward: [(0, '27.521')] [2024-11-06 05:04:14,554][11967] Updated weights for policy 0, policy_version 1948 (0.0017) [2024-11-06 05:04:15,335][00514] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3776.7). Total num frames: 7979008. Throughput: 0: 987.6. Samples: 993048. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-06 05:04:15,344][00514] Avg episode reward: [(0, '27.119')] [2024-11-06 05:04:20,335][00514] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3776.7). Total num frames: 7991296. Throughput: 0: 943.3. Samples: 997458. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-06 05:04:20,345][00514] Avg episode reward: [(0, '25.635')] [2024-11-06 05:04:25,338][00514] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 8015872. Throughput: 0: 932.6. Samples: 1000252. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 05:04:25,343][00514] Avg episode reward: [(0, '24.835')] [2024-11-06 05:04:26,129][11967] Updated weights for policy 0, policy_version 1958 (0.0014) [2024-11-06 05:04:30,335][00514] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3790.5). Total num frames: 8036352. Throughput: 0: 973.2. Samples: 1007260. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 05:04:30,340][00514] Avg episode reward: [(0, '26.007')] [2024-11-06 05:04:35,339][00514] Fps is (10 sec: 3685.1, 60 sec: 3822.7, 300 sec: 3790.5). Total num frames: 8052736. Throughput: 0: 973.7. Samples: 1012610. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 05:04:35,341][00514] Avg episode reward: [(0, '26.609')] [2024-11-06 05:04:37,428][11967] Updated weights for policy 0, policy_version 1968 (0.0016) [2024-11-06 05:04:40,335][00514] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3790.5). Total num frames: 8069120. Throughput: 0: 939.6. Samples: 1014610. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-06 05:04:40,342][00514] Avg episode reward: [(0, '26.243')] [2024-11-06 05:04:45,335][00514] Fps is (10 sec: 3687.6, 60 sec: 3823.1, 300 sec: 3790.5). Total num frames: 8089600. Throughput: 0: 934.8. Samples: 1020562. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-11-06 05:04:45,343][00514] Avg episode reward: [(0, '27.907')] [2024-11-06 05:04:47,330][11967] Updated weights for policy 0, policy_version 1978 (0.0023) [2024-11-06 05:04:50,338][00514] Fps is (10 sec: 4095.0, 60 sec: 3822.8, 300 sec: 3790.6). Total num frames: 8110080. Throughput: 0: 979.1. Samples: 1026994. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 05:04:50,340][00514] Avg episode reward: [(0, '28.466')] [2024-11-06 05:04:55,337][00514] Fps is (10 sec: 3685.9, 60 sec: 3754.6, 300 sec: 3804.4). Total num frames: 8126464. Throughput: 0: 952.8. Samples: 1029032. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 05:04:55,346][00514] Avg episode reward: [(0, '28.864')] [2024-11-06 05:04:59,094][11967] Updated weights for policy 0, policy_version 1988 (0.0043) [2024-11-06 05:05:00,339][00514] Fps is (10 sec: 3686.0, 60 sec: 3754.5, 300 sec: 3804.4). Total num frames: 8146944. Throughput: 0: 920.6. Samples: 1034476. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 05:05:00,347][00514] Avg episode reward: [(0, '28.213')] [2024-11-06 05:05:05,335][00514] Fps is (10 sec: 4096.5, 60 sec: 3891.2, 300 sec: 3790.5). Total num frames: 8167424. Throughput: 0: 971.6. Samples: 1041178. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-11-06 05:05:05,338][00514] Avg episode reward: [(0, '27.654')] [2024-11-06 05:05:09,491][11967] Updated weights for policy 0, policy_version 1998 (0.0025) [2024-11-06 05:05:10,335][00514] Fps is (10 sec: 3687.6, 60 sec: 3754.7, 300 sec: 3790.6). Total num frames: 8183808. Throughput: 0: 971.9. Samples: 1043986. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-11-06 05:05:10,341][00514] Avg episode reward: [(0, '28.959')] [2024-11-06 05:05:15,335][00514] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3790.5). Total num frames: 8200192. Throughput: 0: 905.2. Samples: 1047996. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 05:05:15,342][00514] Avg episode reward: [(0, '27.522')] [2024-11-06 05:05:20,198][11967] Updated weights for policy 0, policy_version 2008 (0.0017) [2024-11-06 05:05:20,335][00514] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3790.5). Total num frames: 8224768. Throughput: 0: 938.0. Samples: 1054818. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 05:05:20,342][00514] Avg episode reward: [(0, '26.815')] [2024-11-06 05:05:25,336][00514] Fps is (10 sec: 4505.5, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 8245248. Throughput: 0: 971.6. Samples: 1058330. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 05:05:25,338][00514] Avg episode reward: [(0, '27.910')] [2024-11-06 05:05:30,335][00514] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3790.6). Total num frames: 8257536. Throughput: 0: 943.5. Samples: 1063018. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-06 05:05:30,341][00514] Avg episode reward: [(0, '28.206')] [2024-11-06 05:05:31,869][11967] Updated weights for policy 0, policy_version 2018 (0.0042) [2024-11-06 05:05:35,338][00514] Fps is (10 sec: 3276.1, 60 sec: 3754.7, 300 sec: 3790.5). Total num frames: 8278016. Throughput: 0: 931.5. Samples: 1068910. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-11-06 05:05:35,346][00514] Avg episode reward: [(0, '27.899')] [2024-11-06 05:05:40,335][00514] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3790.5). Total num frames: 8302592. Throughput: 0: 962.9. Samples: 1072362. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 05:05:40,340][00514] Avg episode reward: [(0, '27.468')] [2024-11-06 05:05:40,893][11967] Updated weights for policy 0, policy_version 2028 (0.0017) [2024-11-06 05:05:45,335][00514] Fps is (10 sec: 4096.9, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 8318976. Throughput: 0: 965.2. Samples: 1077908. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-06 05:05:45,342][00514] Avg episode reward: [(0, '27.929')] [2024-11-06 05:05:50,335][00514] Fps is (10 sec: 3276.8, 60 sec: 3754.8, 300 sec: 3790.5). Total num frames: 8335360. Throughput: 0: 924.5. Samples: 1082782. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-06 05:05:50,340][00514] Avg episode reward: [(0, '28.815')] [2024-11-06 05:05:52,585][11967] Updated weights for policy 0, policy_version 2038 (0.0025) [2024-11-06 05:05:55,335][00514] Fps is (10 sec: 4096.0, 60 sec: 3891.3, 300 sec: 3790.5). Total num frames: 8359936. Throughput: 0: 939.5. Samples: 1086262. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-11-06 05:05:55,340][00514] Avg episode reward: [(0, '28.546')] [2024-11-06 05:05:55,350][11950] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002041_8359936.pth... [2024-11-06 05:05:55,474][11950] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001820_7454720.pth [2024-11-06 05:06:00,335][00514] Fps is (10 sec: 4096.0, 60 sec: 3823.1, 300 sec: 3776.7). Total num frames: 8376320. Throughput: 0: 1001.9. Samples: 1093082. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-06 05:06:00,340][00514] Avg episode reward: [(0, '29.261')] [2024-11-06 05:06:00,402][11950] Saving new best policy, reward=29.261! [2024-11-06 05:06:03,865][11967] Updated weights for policy 0, policy_version 2048 (0.0016) [2024-11-06 05:06:05,335][00514] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3776.7). Total num frames: 8392704. Throughput: 0: 939.2. Samples: 1097080. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-06 05:06:05,339][00514] Avg episode reward: [(0, '28.646')] [2024-11-06 05:06:10,335][00514] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3776.7). Total num frames: 8413184. Throughput: 0: 930.9. Samples: 1100222. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-06 05:06:10,345][00514] Avg episode reward: [(0, '28.150')] [2024-11-06 05:06:13,570][11967] Updated weights for policy 0, policy_version 2058 (0.0025) [2024-11-06 05:06:15,335][00514] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3762.8). Total num frames: 8433664. Throughput: 0: 973.1. Samples: 1106808. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-11-06 05:06:15,341][00514] Avg episode reward: [(0, '27.954')] [2024-11-06 05:06:20,344][00514] Fps is (10 sec: 3683.3, 60 sec: 3754.1, 300 sec: 3776.5). Total num frames: 8450048. Throughput: 0: 952.4. Samples: 1111774. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-06 05:06:20,346][00514] Avg episode reward: [(0, '27.748')] [2024-11-06 05:06:25,336][00514] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3762.8). Total num frames: 8466432. Throughput: 0: 922.8. Samples: 1113886. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-11-06 05:06:25,343][00514] Avg episode reward: [(0, '27.407')] [2024-11-06 05:06:25,430][11967] Updated weights for policy 0, policy_version 2068 (0.0022) [2024-11-06 05:06:30,335][00514] Fps is (10 sec: 4099.5, 60 sec: 3891.2, 300 sec: 3776.7). Total num frames: 8491008. Throughput: 0: 947.8. Samples: 1120560. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-06 05:06:30,338][00514] Avg episode reward: [(0, '27.571')] [2024-11-06 05:06:35,340][00514] Fps is (10 sec: 4094.3, 60 sec: 3822.8, 300 sec: 3776.6). Total num frames: 8507392. Throughput: 0: 971.1. Samples: 1126486. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-11-06 05:06:35,344][00514] Avg episode reward: [(0, '26.685')] [2024-11-06 05:06:35,507][11967] Updated weights for policy 0, policy_version 2078 (0.0044) [2024-11-06 05:06:40,339][00514] Fps is (10 sec: 3275.7, 60 sec: 3686.2, 300 sec: 3776.6). Total num frames: 8523776. Throughput: 0: 938.0. Samples: 1128474. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 05:06:40,347][00514] Avg episode reward: [(0, '27.080')] [2024-11-06 05:06:45,335][00514] Fps is (10 sec: 3688.0, 60 sec: 3754.7, 300 sec: 3762.8). Total num frames: 8544256. Throughput: 0: 910.5. Samples: 1134054. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-06 05:06:45,344][00514] Avg episode reward: [(0, '28.012')] [2024-11-06 05:06:46,754][11967] Updated weights for policy 0, policy_version 2088 (0.0024) [2024-11-06 05:06:50,335][00514] Fps is (10 sec: 4097.3, 60 sec: 3822.9, 300 sec: 3762.8). Total num frames: 8564736. Throughput: 0: 967.5. Samples: 1140616. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 05:06:50,344][00514] Avg episode reward: [(0, '26.225')] [2024-11-06 05:06:55,341][00514] Fps is (10 sec: 3684.4, 60 sec: 3686.1, 300 sec: 3762.8). Total num frames: 8581120. Throughput: 0: 948.6. Samples: 1142916. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 05:06:55,344][00514] Avg episode reward: [(0, '26.370')] [2024-11-06 05:06:58,874][11967] Updated weights for policy 0, policy_version 2098 (0.0024) [2024-11-06 05:07:00,335][00514] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3762.8). Total num frames: 8597504. Throughput: 0: 900.4. Samples: 1147326. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-06 05:07:00,341][00514] Avg episode reward: [(0, '24.989')] [2024-11-06 05:07:05,335][00514] Fps is (10 sec: 3688.4, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 8617984. Throughput: 0: 935.4. Samples: 1153858. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-06 05:07:05,342][00514] Avg episode reward: [(0, '25.688')] [2024-11-06 05:07:08,465][11967] Updated weights for policy 0, policy_version 2108 (0.0017) [2024-11-06 05:07:10,338][00514] Fps is (10 sec: 4095.1, 60 sec: 3754.5, 300 sec: 3762.8). Total num frames: 8638464. Throughput: 0: 960.2. Samples: 1157098. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-11-06 05:07:10,343][00514] Avg episode reward: [(0, '24.454')] [2024-11-06 05:07:15,336][00514] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3748.9). Total num frames: 8650752. Throughput: 0: 898.9. Samples: 1161010. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-06 05:07:15,343][00514] Avg episode reward: [(0, '23.835')] [2024-11-06 05:07:20,339][00514] Fps is (10 sec: 2867.0, 60 sec: 3618.5, 300 sec: 3721.1). Total num frames: 8667136. Throughput: 0: 883.0. Samples: 1166218. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-06 05:07:20,342][00514] Avg episode reward: [(0, '24.553')] [2024-11-06 05:07:22,482][11967] Updated weights for policy 0, policy_version 2118 (0.0026) [2024-11-06 05:07:25,336][00514] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3693.3). Total num frames: 8679424. Throughput: 0: 880.6. Samples: 1168096. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-06 05:07:25,342][00514] Avg episode reward: [(0, '25.738')] [2024-11-06 05:07:30,335][00514] Fps is (10 sec: 2868.1, 60 sec: 3413.3, 300 sec: 3707.2). Total num frames: 8695808. Throughput: 0: 847.6. Samples: 1172194. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-06 05:07:30,340][00514] Avg episode reward: [(0, '26.052')] [2024-11-06 05:07:35,336][00514] Fps is (10 sec: 3276.8, 60 sec: 3413.6, 300 sec: 3721.1). Total num frames: 8712192. Throughput: 0: 810.4. Samples: 1177082. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 05:07:35,337][00514] Avg episode reward: [(0, '26.905')] [2024-11-06 05:07:35,697][11967] Updated weights for policy 0, policy_version 2128 (0.0031) [2024-11-06 05:07:40,338][00514] Fps is (10 sec: 3685.4, 60 sec: 3481.6, 300 sec: 3721.1). Total num frames: 8732672. Throughput: 0: 831.3. Samples: 1180322. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-11-06 05:07:40,340][00514] Avg episode reward: [(0, '27.796')] [2024-11-06 05:07:45,337][00514] Fps is (10 sec: 3686.0, 60 sec: 3413.3, 300 sec: 3721.1). Total num frames: 8749056. Throughput: 0: 866.9. Samples: 1186336. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-11-06 05:07:45,340][00514] Avg episode reward: [(0, '28.476')] [2024-11-06 05:07:47,081][11967] Updated weights for policy 0, policy_version 2138 (0.0019) [2024-11-06 05:07:50,335][00514] Fps is (10 sec: 3277.7, 60 sec: 3345.1, 300 sec: 3721.1). Total num frames: 8765440. Throughput: 0: 810.5. Samples: 1190332. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-06 05:07:50,341][00514] Avg episode reward: [(0, '28.721')] [2024-11-06 05:07:55,336][00514] Fps is (10 sec: 3686.8, 60 sec: 3413.6, 300 sec: 3721.1). Total num frames: 8785920. Throughput: 0: 811.1. Samples: 1193594. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 05:07:55,343][00514] Avg episode reward: [(0, '29.479')] [2024-11-06 05:07:55,393][11950] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002146_8790016.pth... [2024-11-06 05:07:55,524][11950] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001929_7901184.pth [2024-11-06 05:07:55,542][11950] Saving new best policy, reward=29.479! [2024-11-06 05:07:57,259][11967] Updated weights for policy 0, policy_version 2148 (0.0018) [2024-11-06 05:08:00,335][00514] Fps is (10 sec: 4505.6, 60 sec: 3549.9, 300 sec: 3721.1). Total num frames: 8810496. Throughput: 0: 873.8. Samples: 1200332. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-06 05:08:00,343][00514] Avg episode reward: [(0, '27.328')] [2024-11-06 05:08:05,339][00514] Fps is (10 sec: 3685.2, 60 sec: 3413.1, 300 sec: 3721.1). Total num frames: 8822784. Throughput: 0: 857.9. Samples: 1204822. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-06 05:08:05,341][00514] Avg episode reward: [(0, '26.680')] [2024-11-06 05:08:09,238][11967] Updated weights for policy 0, policy_version 2158 (0.0044) [2024-11-06 05:08:10,336][00514] Fps is (10 sec: 3276.7, 60 sec: 3413.5, 300 sec: 3721.1). Total num frames: 8843264. Throughput: 0: 868.1. Samples: 1207162. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-06 05:08:10,342][00514] Avg episode reward: [(0, '27.228')] [2024-11-06 05:08:15,335][00514] Fps is (10 sec: 4097.3, 60 sec: 3549.9, 300 sec: 3721.1). Total num frames: 8863744. Throughput: 0: 925.0. Samples: 1213818. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-06 05:08:15,338][00514] Avg episode reward: [(0, '27.076')] [2024-11-06 05:08:19,126][11967] Updated weights for policy 0, policy_version 2168 (0.0028) [2024-11-06 05:08:20,335][00514] Fps is (10 sec: 3686.4, 60 sec: 3550.0, 300 sec: 3707.2). Total num frames: 8880128. Throughput: 0: 942.3. Samples: 1219484. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-06 05:08:20,341][00514] Avg episode reward: [(0, '28.378')] [2024-11-06 05:08:25,336][00514] Fps is (10 sec: 3276.7, 60 sec: 3618.1, 300 sec: 3707.2). Total num frames: 8896512. Throughput: 0: 912.6. Samples: 1221386. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-06 05:08:25,343][00514] Avg episode reward: [(0, '27.367')] [2024-11-06 05:08:30,335][00514] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3707.2). Total num frames: 8916992. Throughput: 0: 910.3. Samples: 1227298. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 05:08:30,337][00514] Avg episode reward: [(0, '28.197')] [2024-11-06 05:08:30,660][11967] Updated weights for policy 0, policy_version 2178 (0.0020) [2024-11-06 05:08:35,335][00514] Fps is (10 sec: 4096.2, 60 sec: 3754.7, 300 sec: 3693.3). Total num frames: 8937472. Throughput: 0: 971.4. Samples: 1234044. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 05:08:35,338][00514] Avg episode reward: [(0, '27.283')] [2024-11-06 05:08:40,335][00514] Fps is (10 sec: 3686.4, 60 sec: 3686.6, 300 sec: 3707.3). Total num frames: 8953856. Throughput: 0: 944.0. Samples: 1236076. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 05:08:40,342][00514] Avg episode reward: [(0, '26.207')] [2024-11-06 05:08:42,618][11967] Updated weights for policy 0, policy_version 2188 (0.0018) [2024-11-06 05:08:45,335][00514] Fps is (10 sec: 3276.8, 60 sec: 3686.5, 300 sec: 3693.3). Total num frames: 8970240. Throughput: 0: 902.6. Samples: 1240948. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-06 05:08:45,338][00514] Avg episode reward: [(0, '27.598')] [2024-11-06 05:08:50,335][00514] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3707.2). Total num frames: 8994816. Throughput: 0: 955.4. Samples: 1247814. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 05:08:50,338][00514] Avg episode reward: [(0, '28.819')] [2024-11-06 05:08:51,452][11967] Updated weights for policy 0, policy_version 2198 (0.0018) [2024-11-06 05:08:55,338][00514] Fps is (10 sec: 4095.1, 60 sec: 3754.5, 300 sec: 3693.3). Total num frames: 9011200. Throughput: 0: 974.7. Samples: 1251026. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-06 05:08:55,347][00514] Avg episode reward: [(0, '28.833')] [2024-11-06 05:09:00,336][00514] Fps is (10 sec: 3276.7, 60 sec: 3618.1, 300 sec: 3707.2). Total num frames: 9027584. Throughput: 0: 919.5. Samples: 1255196. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-06 05:09:00,338][00514] Avg episode reward: [(0, '28.833')] [2024-11-06 05:09:03,214][11967] Updated weights for policy 0, policy_version 2208 (0.0017) [2024-11-06 05:09:05,335][00514] Fps is (10 sec: 4096.9, 60 sec: 3823.1, 300 sec: 3707.2). Total num frames: 9052160. Throughput: 0: 945.3. Samples: 1262022. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 05:09:05,338][00514] Avg episode reward: [(0, '29.409')] [2024-11-06 05:09:10,335][00514] Fps is (10 sec: 4505.8, 60 sec: 3822.9, 300 sec: 3707.2). Total num frames: 9072640. Throughput: 0: 978.1. Samples: 1265400. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-11-06 05:09:10,342][00514] Avg episode reward: [(0, '29.726')] [2024-11-06 05:09:10,350][11950] Saving new best policy, reward=29.726! [2024-11-06 05:09:14,312][11967] Updated weights for policy 0, policy_version 2218 (0.0022) [2024-11-06 05:09:15,335][00514] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3707.2). Total num frames: 9084928. Throughput: 0: 949.6. Samples: 1270030. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-06 05:09:15,339][00514] Avg episode reward: [(0, '28.312')] [2024-11-06 05:09:20,335][00514] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3693.3). Total num frames: 9105408. Throughput: 0: 918.8. Samples: 1275390. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-11-06 05:09:20,343][00514] Avg episode reward: [(0, '26.717')] [2024-11-06 05:09:24,328][11967] Updated weights for policy 0, policy_version 2228 (0.0023) [2024-11-06 05:09:25,335][00514] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3707.2). Total num frames: 9129984. Throughput: 0: 948.6. Samples: 1278762. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 05:09:25,338][00514] Avg episode reward: [(0, '27.105')] [2024-11-06 05:09:30,335][00514] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3707.3). Total num frames: 9146368. Throughput: 0: 975.1. Samples: 1284828. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 05:09:30,343][00514] Avg episode reward: [(0, '27.996')] [2024-11-06 05:09:35,335][00514] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3707.2). Total num frames: 9162752. Throughput: 0: 918.7. Samples: 1289154. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-06 05:09:35,338][00514] Avg episode reward: [(0, '27.036')] [2024-11-06 05:09:36,174][11967] Updated weights for policy 0, policy_version 2238 (0.0018) [2024-11-06 05:09:40,335][00514] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3707.2). Total num frames: 9183232. Throughput: 0: 924.3. Samples: 1292616. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-06 05:09:40,340][00514] Avg episode reward: [(0, '27.416')] [2024-11-06 05:09:45,335][00514] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3707.3). Total num frames: 9203712. Throughput: 0: 981.9. Samples: 1299380. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-06 05:09:45,339][00514] Avg episode reward: [(0, '27.696')] [2024-11-06 05:09:45,813][11967] Updated weights for policy 0, policy_version 2248 (0.0025) [2024-11-06 05:09:50,338][00514] Fps is (10 sec: 3685.6, 60 sec: 3754.5, 300 sec: 3707.2). Total num frames: 9220096. Throughput: 0: 929.1. Samples: 1303834. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-11-06 05:09:50,341][00514] Avg episode reward: [(0, '28.147')] [2024-11-06 05:09:55,335][00514] Fps is (10 sec: 3686.4, 60 sec: 3823.1, 300 sec: 3707.3). Total num frames: 9240576. Throughput: 0: 912.8. Samples: 1306476. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-06 05:09:55,338][00514] Avg episode reward: [(0, '28.711')] [2024-11-06 05:09:55,347][11950] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002256_9240576.pth... [2024-11-06 05:09:55,477][11950] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002041_8359936.pth [2024-11-06 05:09:57,174][11967] Updated weights for policy 0, policy_version 2258 (0.0032) [2024-11-06 05:10:00,335][00514] Fps is (10 sec: 4096.9, 60 sec: 3891.2, 300 sec: 3707.2). Total num frames: 9261056. Throughput: 0: 959.3. Samples: 1313200. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-06 05:10:00,342][00514] Avg episode reward: [(0, '28.774')] [2024-11-06 05:10:05,335][00514] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3707.2). Total num frames: 9277440. Throughput: 0: 960.0. Samples: 1318588. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-06 05:10:05,339][00514] Avg episode reward: [(0, '28.314')] [2024-11-06 05:10:09,116][11967] Updated weights for policy 0, policy_version 2268 (0.0015) [2024-11-06 05:10:10,335][00514] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3707.2). Total num frames: 9293824. Throughput: 0: 929.8. Samples: 1320602. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-06 05:10:10,339][00514] Avg episode reward: [(0, '28.368')] [2024-11-06 05:10:15,336][00514] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3693.3). Total num frames: 9314304. Throughput: 0: 929.2. Samples: 1326644. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-06 05:10:15,341][00514] Avg episode reward: [(0, '29.284')] [2024-11-06 05:10:18,202][11967] Updated weights for policy 0, policy_version 2278 (0.0016) [2024-11-06 05:10:20,336][00514] Fps is (10 sec: 4095.9, 60 sec: 3822.9, 300 sec: 3693.3). Total num frames: 9334784. Throughput: 0: 979.2. Samples: 1333220. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-06 05:10:20,339][00514] Avg episode reward: [(0, '28.280')] [2024-11-06 05:10:25,337][00514] Fps is (10 sec: 3685.9, 60 sec: 3686.3, 300 sec: 3707.2). Total num frames: 9351168. Throughput: 0: 949.0. Samples: 1335324. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-06 05:10:25,344][00514] Avg episode reward: [(0, '28.581')] [2024-11-06 05:10:30,071][11967] Updated weights for policy 0, policy_version 2288 (0.0020) [2024-11-06 05:10:30,335][00514] Fps is (10 sec: 3686.5, 60 sec: 3754.7, 300 sec: 3707.3). Total num frames: 9371648. Throughput: 0: 917.2. Samples: 1340652. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 05:10:30,338][00514] Avg episode reward: [(0, '27.261')] [2024-11-06 05:10:35,336][00514] Fps is (10 sec: 4506.1, 60 sec: 3891.2, 300 sec: 3707.2). Total num frames: 9396224. Throughput: 0: 973.5. Samples: 1347640. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 05:10:35,338][00514] Avg episode reward: [(0, '26.094')] [2024-11-06 05:10:40,335][00514] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3693.3). Total num frames: 9408512. Throughput: 0: 976.6. Samples: 1350422. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-06 05:10:40,344][00514] Avg episode reward: [(0, '25.996')] [2024-11-06 05:10:40,865][11967] Updated weights for policy 0, policy_version 2298 (0.0025) [2024-11-06 05:10:45,335][00514] Fps is (10 sec: 2867.3, 60 sec: 3686.4, 300 sec: 3693.3). Total num frames: 9424896. Throughput: 0: 914.6. Samples: 1354356. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-06 05:10:45,337][00514] Avg episode reward: [(0, '24.861')] [2024-11-06 05:10:50,335][00514] Fps is (10 sec: 4096.0, 60 sec: 3823.1, 300 sec: 3693.3). Total num frames: 9449472. Throughput: 0: 941.6. Samples: 1360958. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 05:10:50,344][00514] Avg episode reward: [(0, '24.694')] [2024-11-06 05:10:51,220][11967] Updated weights for policy 0, policy_version 2308 (0.0026) [2024-11-06 05:10:55,335][00514] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3707.2). Total num frames: 9469952. Throughput: 0: 971.6. Samples: 1364326. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 05:10:55,338][00514] Avg episode reward: [(0, '23.733')] [2024-11-06 05:11:00,335][00514] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3693.3). Total num frames: 9482240. Throughput: 0: 946.4. Samples: 1369234. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 05:11:00,338][00514] Avg episode reward: [(0, '25.096')] [2024-11-06 05:11:03,140][11967] Updated weights for policy 0, policy_version 2318 (0.0039) [2024-11-06 05:11:05,335][00514] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3693.3). Total num frames: 9502720. Throughput: 0: 924.2. Samples: 1374808. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-11-06 05:11:05,338][00514] Avg episode reward: [(0, '24.624')] [2024-11-06 05:11:10,336][00514] Fps is (10 sec: 4505.5, 60 sec: 3891.2, 300 sec: 3707.2). Total num frames: 9527296. Throughput: 0: 954.8. Samples: 1378288. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-06 05:11:10,337][00514] Avg episode reward: [(0, '25.122')] [2024-11-06 05:11:11,920][11967] Updated weights for policy 0, policy_version 2328 (0.0017) [2024-11-06 05:11:15,341][00514] Fps is (10 sec: 4093.8, 60 sec: 3822.6, 300 sec: 3707.3). Total num frames: 9543680. Throughput: 0: 967.1. Samples: 1384176. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-06 05:11:15,349][00514] Avg episode reward: [(0, '25.140')] [2024-11-06 05:11:20,335][00514] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3707.2). Total num frames: 9560064. Throughput: 0: 910.3. Samples: 1388604. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-06 05:11:20,338][00514] Avg episode reward: [(0, '26.730')] [2024-11-06 05:11:23,951][11967] Updated weights for policy 0, policy_version 2338 (0.0036) [2024-11-06 05:11:25,335][00514] Fps is (10 sec: 3688.4, 60 sec: 3823.0, 300 sec: 3693.3). Total num frames: 9580544. Throughput: 0: 921.8. Samples: 1391904. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-06 05:11:25,342][00514] Avg episode reward: [(0, '27.082')] [2024-11-06 05:11:30,336][00514] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3707.3). Total num frames: 9601024. Throughput: 0: 985.3. Samples: 1398694. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-06 05:11:30,340][00514] Avg episode reward: [(0, '26.990')] [2024-11-06 05:11:35,114][11967] Updated weights for policy 0, policy_version 2348 (0.0035) [2024-11-06 05:11:35,341][00514] Fps is (10 sec: 3684.4, 60 sec: 3686.1, 300 sec: 3707.2). Total num frames: 9617408. Throughput: 0: 936.0. Samples: 1403082. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-11-06 05:11:35,345][00514] Avg episode reward: [(0, '27.725')] [2024-11-06 05:11:40,335][00514] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3693.3). Total num frames: 9633792. Throughput: 0: 919.9. Samples: 1405720. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-11-06 05:11:40,338][00514] Avg episode reward: [(0, '27.092')] [2024-11-06 05:11:44,817][11967] Updated weights for policy 0, policy_version 2358 (0.0019) [2024-11-06 05:11:45,335][00514] Fps is (10 sec: 4098.2, 60 sec: 3891.2, 300 sec: 3707.2). Total num frames: 9658368. Throughput: 0: 961.6. Samples: 1412506. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-06 05:11:45,338][00514] Avg episode reward: [(0, '27.679')] [2024-11-06 05:11:50,335][00514] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3707.3). Total num frames: 9674752. Throughput: 0: 955.6. Samples: 1417810. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 05:11:50,338][00514] Avg episode reward: [(0, '27.064')] [2024-11-06 05:11:55,336][00514] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3707.2). Total num frames: 9691136. Throughput: 0: 923.2. Samples: 1419834. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-06 05:11:55,344][00514] Avg episode reward: [(0, '28.262')] [2024-11-06 05:11:55,357][11950] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002366_9691136.pth... [2024-11-06 05:11:55,503][11950] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002146_8790016.pth [2024-11-06 05:11:56,818][11967] Updated weights for policy 0, policy_version 2368 (0.0018) [2024-11-06 05:12:00,335][00514] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3707.2). Total num frames: 9711616. Throughput: 0: 932.8. Samples: 1426146. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-06 05:12:00,343][00514] Avg episode reward: [(0, '28.508')] [2024-11-06 05:12:05,340][00514] Fps is (10 sec: 3684.7, 60 sec: 3754.4, 300 sec: 3693.3). Total num frames: 9728000. Throughput: 0: 952.0. Samples: 1431448. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-06 05:12:05,343][00514] Avg episode reward: [(0, '27.638')] [2024-11-06 05:12:09,487][11967] Updated weights for policy 0, policy_version 2378 (0.0035) [2024-11-06 05:12:10,335][00514] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3693.3). Total num frames: 9740288. Throughput: 0: 916.5. Samples: 1433148. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-06 05:12:10,338][00514] Avg episode reward: [(0, '27.009')] [2024-11-06 05:12:15,335][00514] Fps is (10 sec: 2458.7, 60 sec: 3481.9, 300 sec: 3679.5). Total num frames: 9752576. Throughput: 0: 842.5. Samples: 1436606. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-06 05:12:15,337][00514] Avg episode reward: [(0, '26.638')] [2024-11-06 05:12:20,335][00514] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3721.1). Total num frames: 9777152. Throughput: 0: 894.5. Samples: 1443328. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-06 05:12:20,343][00514] Avg episode reward: [(0, '25.641')] [2024-11-06 05:12:20,613][11967] Updated weights for policy 0, policy_version 2388 (0.0022) [2024-11-06 05:12:25,341][00514] Fps is (10 sec: 4503.2, 60 sec: 3617.8, 300 sec: 3734.9). Total num frames: 9797632. Throughput: 0: 915.0. Samples: 1446898. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-06 05:12:25,345][00514] Avg episode reward: [(0, '24.529')] [2024-11-06 05:12:30,335][00514] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3735.0). Total num frames: 9814016. Throughput: 0: 871.0. Samples: 1451700. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-06 05:12:30,338][00514] Avg episode reward: [(0, '24.764')] [2024-11-06 05:12:32,402][11967] Updated weights for policy 0, policy_version 2398 (0.0016) [2024-11-06 05:12:35,335][00514] Fps is (10 sec: 3688.4, 60 sec: 3618.5, 300 sec: 3735.0). Total num frames: 9834496. Throughput: 0: 878.5. Samples: 1457342. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-06 05:12:35,342][00514] Avg episode reward: [(0, '24.230')] [2024-11-06 05:12:40,335][00514] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3748.9). Total num frames: 9854976. Throughput: 0: 908.6. Samples: 1460722. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-11-06 05:12:40,343][00514] Avg episode reward: [(0, '23.706')] [2024-11-06 05:12:41,283][11967] Updated weights for policy 0, policy_version 2408 (0.0021) [2024-11-06 05:12:45,337][00514] Fps is (10 sec: 3685.7, 60 sec: 3549.8, 300 sec: 3748.9). Total num frames: 9871360. Throughput: 0: 902.9. Samples: 1466778. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-06 05:12:45,344][00514] Avg episode reward: [(0, '24.732')] [2024-11-06 05:12:50,335][00514] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3735.0). Total num frames: 9887744. Throughput: 0: 876.7. Samples: 1470896. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-11-06 05:12:50,343][00514] Avg episode reward: [(0, '24.105')] [2024-11-06 05:12:53,428][11967] Updated weights for policy 0, policy_version 2418 (0.0023) [2024-11-06 05:12:55,335][00514] Fps is (10 sec: 4096.8, 60 sec: 3686.4, 300 sec: 3735.0). Total num frames: 9912320. Throughput: 0: 914.4. Samples: 1474296. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-06 05:12:55,343][00514] Avg episode reward: [(0, '24.286')] [2024-11-06 05:13:00,341][00514] Fps is (10 sec: 4503.2, 60 sec: 3686.1, 300 sec: 3762.7). Total num frames: 9932800. Throughput: 0: 989.4. Samples: 1481134. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-06 05:13:00,345][00514] Avg episode reward: [(0, '25.727')] [2024-11-06 05:13:04,702][11967] Updated weights for policy 0, policy_version 2428 (0.0030) [2024-11-06 05:13:05,336][00514] Fps is (10 sec: 3276.8, 60 sec: 3618.4, 300 sec: 3735.0). Total num frames: 9945088. Throughput: 0: 935.3. Samples: 1485416. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 05:13:05,338][00514] Avg episode reward: [(0, '26.242')] [2024-11-06 05:13:10,336][00514] Fps is (10 sec: 3278.5, 60 sec: 3754.7, 300 sec: 3735.0). Total num frames: 9965568. Throughput: 0: 914.4. Samples: 1488042. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-11-06 05:13:10,339][00514] Avg episode reward: [(0, '26.292')] [2024-11-06 05:13:14,605][11967] Updated weights for policy 0, policy_version 2438 (0.0034) [2024-11-06 05:13:15,335][00514] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3748.9). Total num frames: 9986048. Throughput: 0: 956.3. Samples: 1494732. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-06 05:13:15,338][00514] Avg episode reward: [(0, '26.119')] [2024-11-06 05:13:20,335][00514] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 10002432. Throughput: 0: 949.2. Samples: 1500056. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-06 05:13:20,342][00514] Avg episode reward: [(0, '26.882')] [2024-11-06 05:13:20,773][11950] Stopping Batcher_0... [2024-11-06 05:13:20,774][11950] Loop batcher_evt_loop terminating... [2024-11-06 05:13:20,775][00514] Component Batcher_0 stopped! [2024-11-06 05:13:20,783][11950] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002443_10006528.pth... [2024-11-06 05:13:20,885][11967] Weights refcount: 2 0 [2024-11-06 05:13:20,897][00514] Component InferenceWorker_p0-w0 stopped! [2024-11-06 05:13:20,904][11967] Stopping InferenceWorker_p0-w0... [2024-11-06 05:13:20,904][11967] Loop inference_proc0-0_evt_loop terminating... [2024-11-06 05:13:20,962][11950] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002256_9240576.pth [2024-11-06 05:13:20,979][11950] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002443_10006528.pth... [2024-11-06 05:13:21,245][00514] Component LearnerWorker_p0 stopped! [2024-11-06 05:13:21,248][11950] Stopping LearnerWorker_p0... [2024-11-06 05:13:21,248][11950] Loop learner_proc0_evt_loop terminating... [2024-11-06 05:13:21,450][11973] Stopping RolloutWorker_w5... [2024-11-06 05:13:21,451][11973] Loop rollout_proc5_evt_loop terminating... [2024-11-06 05:13:21,452][00514] Component RolloutWorker_w5 stopped! [2024-11-06 05:13:21,488][00514] Component RolloutWorker_w1 stopped! [2024-11-06 05:13:21,487][11970] Stopping RolloutWorker_w1... [2024-11-06 05:13:21,493][11970] Loop rollout_proc1_evt_loop terminating... [2024-11-06 05:13:21,519][11974] Stopping RolloutWorker_w7... [2024-11-06 05:13:21,519][11974] Loop rollout_proc7_evt_loop terminating... [2024-11-06 05:13:21,519][00514] Component RolloutWorker_w7 stopped! [2024-11-06 05:13:21,553][11971] Stopping RolloutWorker_w3... [2024-11-06 05:13:21,554][11971] Loop rollout_proc3_evt_loop terminating... [2024-11-06 05:13:21,553][00514] Component RolloutWorker_w3 stopped! [2024-11-06 05:13:21,845][00514] Component RolloutWorker_w6 stopped! [2024-11-06 05:13:21,851][11975] Stopping RolloutWorker_w6... [2024-11-06 05:13:21,857][11975] Loop rollout_proc6_evt_loop terminating... [2024-11-06 05:13:21,873][00514] Component RolloutWorker_w2 stopped! [2024-11-06 05:13:21,879][11969] Stopping RolloutWorker_w2... [2024-11-06 05:13:21,879][11969] Loop rollout_proc2_evt_loop terminating... [2024-11-06 05:13:21,904][00514] Component RolloutWorker_w4 stopped! [2024-11-06 05:13:21,908][11972] Stopping RolloutWorker_w4... [2024-11-06 05:13:21,909][11972] Loop rollout_proc4_evt_loop terminating... [2024-11-06 05:13:21,991][00514] Component RolloutWorker_w0 stopped! [2024-11-06 05:13:22,000][11968] Stopping RolloutWorker_w0... [2024-11-06 05:13:22,001][11968] Loop rollout_proc0_evt_loop terminating... [2024-11-06 05:13:22,000][00514] Waiting for process learner_proc0 to stop... [2024-11-06 05:13:23,602][00514] Waiting for process inference_proc0-0 to join... [2024-11-06 05:13:23,612][00514] Waiting for process rollout_proc0 to join... [2024-11-06 05:13:25,901][00514] Waiting for process rollout_proc1 to join... [2024-11-06 05:13:25,923][00514] Waiting for process rollout_proc2 to join... [2024-11-06 05:13:25,926][00514] Waiting for process rollout_proc3 to join... [2024-11-06 05:13:25,932][00514] Waiting for process rollout_proc4 to join... [2024-11-06 05:13:25,934][00514] Waiting for process rollout_proc5 to join... [2024-11-06 05:13:25,939][00514] Waiting for process rollout_proc6 to join... [2024-11-06 05:13:25,942][00514] Waiting for process rollout_proc7 to join... [2024-11-06 05:13:25,946][00514] Batcher 0 profile tree view: batching: 40.1664, releasing_batches: 0.0427 [2024-11-06 05:13:25,948][00514] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0000 wait_policy_total: 639.3410 update_model: 13.4913 weight_update: 0.0025 one_step: 0.0246 handle_policy_step: 902.9147 deserialize: 22.5858, stack: 5.3155, obs_to_device_normalize: 190.9271, forward: 452.0291, send_messages: 46.4044 prepare_outputs: 138.5905 to_cpu: 83.5894 [2024-11-06 05:13:25,950][00514] Learner 0 profile tree view: misc: 0.0080, prepare_batch: 18.1958 train: 111.3424 epoch_init: 0.0095, minibatch_init: 0.0143, losses_postprocess: 0.9088, kl_divergence: 0.9750, after_optimizer: 4.5749 calculate_losses: 40.1415 losses_init: 0.0059, forward_head: 1.9314, bptt_initial: 27.2443, tail: 1.7145, advantages_returns: 0.4076, losses: 5.3796 bptt: 2.9004 bptt_forward_core: 2.7744 update: 63.6509 clip: 1.2805 [2024-11-06 05:13:25,952][00514] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.5291, enqueue_policy_requests: 158.1178, env_step: 1273.0818, overhead: 21.3322, complete_rollouts: 11.4781 save_policy_outputs: 32.6364 split_output_tensors: 13.0709 [2024-11-06 05:13:25,953][00514] RolloutWorker_w7 profile tree view: wait_for_trajectories: 0.5480, enqueue_policy_requests: 159.1357, env_step: 1271.7355, overhead: 20.9869, complete_rollouts: 10.0939 save_policy_outputs: 31.8862 split_output_tensors: 12.9548 [2024-11-06 05:13:25,956][00514] Loop Runner_EvtLoop terminating... [2024-11-06 05:13:25,958][00514] Runner profile tree view: main_loop: 1648.2174 [2024-11-06 05:13:25,960][00514] Collected {0: 10006528}, FPS: 3640.7 [2024-11-06 05:13:25,990][00514] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-11-06 05:13:25,992][00514] Overriding arg 'num_workers' with value 1 passed from command line [2024-11-06 05:13:25,995][00514] Adding new argument 'no_render'=True that is not in the saved config file! [2024-11-06 05:13:25,998][00514] Adding new argument 'save_video'=True that is not in the saved config file! [2024-11-06 05:13:26,000][00514] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-11-06 05:13:26,003][00514] Adding new argument 'video_name'=None that is not in the saved config file! [2024-11-06 05:13:26,005][00514] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! [2024-11-06 05:13:26,006][00514] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-11-06 05:13:26,007][00514] Adding new argument 'push_to_hub'=False that is not in the saved config file! [2024-11-06 05:13:26,009][00514] Adding new argument 'hf_repository'=None that is not in the saved config file! [2024-11-06 05:13:26,010][00514] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-11-06 05:13:26,011][00514] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-11-06 05:13:26,013][00514] Adding new argument 'train_script'=None that is not in the saved config file! [2024-11-06 05:13:26,014][00514] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-11-06 05:13:26,016][00514] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-11-06 05:13:26,046][00514] RunningMeanStd input shape: (3, 72, 128) [2024-11-06 05:13:26,048][00514] RunningMeanStd input shape: (1,) [2024-11-06 05:13:26,063][00514] ConvEncoder: input_channels=3 [2024-11-06 05:13:26,102][00514] Conv encoder output size: 512 [2024-11-06 05:13:26,103][00514] Policy head output size: 512 [2024-11-06 05:13:26,128][00514] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002443_10006528.pth... [2024-11-06 05:13:26,590][00514] Num frames 100... [2024-11-06 05:13:26,698][00514] Avg episode rewards: #0: 1.430, true rewards: #0: 1.430 [2024-11-06 05:13:26,700][00514] Avg episode reward: 1.430, avg true_objective: 1.430 [2024-11-06 05:13:26,776][00514] Num frames 200... [2024-11-06 05:13:26,902][00514] Num frames 300... [2024-11-06 05:13:27,035][00514] Num frames 400... [2024-11-06 05:13:27,171][00514] Num frames 500... [2024-11-06 05:13:27,304][00514] Num frames 600... [2024-11-06 05:13:27,433][00514] Num frames 700... [2024-11-06 05:13:27,565][00514] Num frames 800... [2024-11-06 05:13:27,695][00514] Num frames 900... [2024-11-06 05:13:27,827][00514] Num frames 1000... [2024-11-06 05:13:27,950][00514] Num frames 1100... [2024-11-06 05:13:28,073][00514] Num frames 1200... [2024-11-06 05:13:28,196][00514] Num frames 1300... [2024-11-06 05:13:28,382][00514] Avg episode rewards: #0: 16.495, true rewards: #0: 6.995 [2024-11-06 05:13:28,383][00514] Avg episode reward: 16.495, avg true_objective: 6.995 [2024-11-06 05:13:28,390][00514] Num frames 1400... [2024-11-06 05:13:28,519][00514] Num frames 1500... [2024-11-06 05:13:28,641][00514] Num frames 1600... [2024-11-06 05:13:28,760][00514] Num frames 1700... [2024-11-06 05:13:28,903][00514] Num frames 1800... [2024-11-06 05:13:29,033][00514] Num frames 1900... [2024-11-06 05:13:29,157][00514] Num frames 2000... [2024-11-06 05:13:29,296][00514] Num frames 2100... [2024-11-06 05:13:29,418][00514] Num frames 2200... [2024-11-06 05:13:29,548][00514] Num frames 2300... [2024-11-06 05:13:29,671][00514] Num frames 2400... [2024-11-06 05:13:29,794][00514] Num frames 2500... [2024-11-06 05:13:29,921][00514] Num frames 2600... [2024-11-06 05:13:30,046][00514] Num frames 2700... [2024-11-06 05:13:30,171][00514] Num frames 2800... [2024-11-06 05:13:30,299][00514] Num frames 2900... [2024-11-06 05:13:30,424][00514] Num frames 3000... [2024-11-06 05:13:30,557][00514] Num frames 3100... [2024-11-06 05:13:30,679][00514] Num frames 3200... [2024-11-06 05:13:30,810][00514] Num frames 3300... [2024-11-06 05:13:30,951][00514] Num frames 3400... [2024-11-06 05:13:31,134][00514] Avg episode rewards: #0: 32.996, true rewards: #0: 11.663 [2024-11-06 05:13:31,136][00514] Avg episode reward: 32.996, avg true_objective: 11.663 [2024-11-06 05:13:31,140][00514] Num frames 3500... [2024-11-06 05:13:31,274][00514] Num frames 3600... [2024-11-06 05:13:31,399][00514] Num frames 3700... [2024-11-06 05:13:31,519][00514] Num frames 3800... [2024-11-06 05:13:31,654][00514] Num frames 3900... [2024-11-06 05:13:31,779][00514] Num frames 4000... [2024-11-06 05:13:31,907][00514] Num frames 4100... [2024-11-06 05:13:32,032][00514] Num frames 4200... [2024-11-06 05:13:32,163][00514] Num frames 4300... [2024-11-06 05:13:32,298][00514] Num frames 4400... [2024-11-06 05:13:32,429][00514] Num frames 4500... [2024-11-06 05:13:32,569][00514] Num frames 4600... [2024-11-06 05:13:32,698][00514] Num frames 4700... [2024-11-06 05:13:32,823][00514] Num frames 4800... [2024-11-06 05:13:32,951][00514] Num frames 4900... [2024-11-06 05:13:33,099][00514] Num frames 5000... [2024-11-06 05:13:33,281][00514] Num frames 5100... [2024-11-06 05:13:33,452][00514] Num frames 5200... [2024-11-06 05:13:33,515][00514] Avg episode rewards: #0: 36.507, true rewards: #0: 13.008 [2024-11-06 05:13:33,517][00514] Avg episode reward: 36.507, avg true_objective: 13.008 [2024-11-06 05:13:33,682][00514] Num frames 5300... [2024-11-06 05:13:33,852][00514] Num frames 5400... [2024-11-06 05:13:34,023][00514] Num frames 5500... [2024-11-06 05:13:34,198][00514] Num frames 5600... [2024-11-06 05:13:34,380][00514] Num frames 5700... [2024-11-06 05:13:34,559][00514] Num frames 5800... [2024-11-06 05:13:34,744][00514] Num frames 5900... [2024-11-06 05:13:34,919][00514] Num frames 6000... [2024-11-06 05:13:35,099][00514] Num frames 6100... [2024-11-06 05:13:35,283][00514] Num frames 6200... [2024-11-06 05:13:35,467][00514] Num frames 6300... [2024-11-06 05:13:35,629][00514] Num frames 6400... [2024-11-06 05:13:35,768][00514] Num frames 6500... [2024-11-06 05:13:35,890][00514] Num frames 6600... [2024-11-06 05:13:36,013][00514] Num frames 6700... [2024-11-06 05:13:36,185][00514] Avg episode rewards: #0: 37.779, true rewards: #0: 13.580 [2024-11-06 05:13:36,187][00514] Avg episode reward: 37.779, avg true_objective: 13.580 [2024-11-06 05:13:36,202][00514] Num frames 6800... [2024-11-06 05:13:36,331][00514] Num frames 6900... [2024-11-06 05:13:36,456][00514] Num frames 7000... [2024-11-06 05:13:36,578][00514] Num frames 7100... [2024-11-06 05:13:36,698][00514] Num frames 7200... [2024-11-06 05:13:36,830][00514] Num frames 7300... [2024-11-06 05:13:36,967][00514] Num frames 7400... [2024-11-06 05:13:37,093][00514] Num frames 7500... [2024-11-06 05:13:37,218][00514] Num frames 7600... [2024-11-06 05:13:37,348][00514] Num frames 7700... [2024-11-06 05:13:37,479][00514] Num frames 7800... [2024-11-06 05:13:37,601][00514] Num frames 7900... [2024-11-06 05:13:37,735][00514] Num frames 8000... [2024-11-06 05:13:37,868][00514] Avg episode rewards: #0: 36.245, true rewards: #0: 13.412 [2024-11-06 05:13:37,870][00514] Avg episode reward: 36.245, avg true_objective: 13.412 [2024-11-06 05:13:37,945][00514] Num frames 8100... [2024-11-06 05:13:38,074][00514] Num frames 8200... [2024-11-06 05:13:38,199][00514] Num frames 8300... [2024-11-06 05:13:38,330][00514] Num frames 8400... [2024-11-06 05:13:38,453][00514] Num frames 8500... [2024-11-06 05:13:38,581][00514] Num frames 8600... [2024-11-06 05:13:38,714][00514] Num frames 8700... [2024-11-06 05:13:38,851][00514] Num frames 8800... [2024-11-06 05:13:38,985][00514] Num frames 8900... [2024-11-06 05:13:39,119][00514] Num frames 9000... [2024-11-06 05:13:39,245][00514] Num frames 9100... [2024-11-06 05:13:39,310][00514] Avg episode rewards: #0: 34.721, true rewards: #0: 13.007 [2024-11-06 05:13:39,312][00514] Avg episode reward: 34.721, avg true_objective: 13.007 [2024-11-06 05:13:39,428][00514] Num frames 9200... [2024-11-06 05:13:39,555][00514] Num frames 9300... [2024-11-06 05:13:39,682][00514] Num frames 9400... [2024-11-06 05:13:39,818][00514] Num frames 9500... [2024-11-06 05:13:39,947][00514] Num frames 9600... [2024-11-06 05:13:40,070][00514] Num frames 9700... [2024-11-06 05:13:40,191][00514] Num frames 9800... [2024-11-06 05:13:40,367][00514] Avg episode rewards: #0: 32.866, true rewards: #0: 12.366 [2024-11-06 05:13:40,369][00514] Avg episode reward: 32.866, avg true_objective: 12.366 [2024-11-06 05:13:40,380][00514] Num frames 9900... [2024-11-06 05:13:40,505][00514] Num frames 10000... [2024-11-06 05:13:40,627][00514] Num frames 10100... [2024-11-06 05:13:40,754][00514] Num frames 10200... [2024-11-06 05:13:40,882][00514] Num frames 10300... [2024-11-06 05:13:41,010][00514] Num frames 10400... [2024-11-06 05:13:41,132][00514] Num frames 10500... [2024-11-06 05:13:41,191][00514] Avg episode rewards: #0: 30.779, true rewards: #0: 11.668 [2024-11-06 05:13:41,193][00514] Avg episode reward: 30.779, avg true_objective: 11.668 [2024-11-06 05:13:41,324][00514] Num frames 10600... [2024-11-06 05:13:41,451][00514] Num frames 10700... [2024-11-06 05:13:41,573][00514] Num frames 10800... [2024-11-06 05:13:41,693][00514] Num frames 10900... [2024-11-06 05:13:41,814][00514] Num frames 11000... [2024-11-06 05:13:41,955][00514] Num frames 11100... [2024-11-06 05:13:42,087][00514] Num frames 11200... [2024-11-06 05:13:42,206][00514] Num frames 11300... [2024-11-06 05:13:42,334][00514] Num frames 11400... [2024-11-06 05:13:42,464][00514] Num frames 11500... [2024-11-06 05:13:42,582][00514] Num frames 11600... [2024-11-06 05:13:42,706][00514] Avg episode rewards: #0: 30.253, true rewards: #0: 11.653 [2024-11-06 05:13:42,707][00514] Avg episode reward: 30.253, avg true_objective: 11.653 [2024-11-06 05:14:59,449][00514] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2024-11-06 05:15:39,705][00514] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-11-06 05:15:39,707][00514] Overriding arg 'num_workers' with value 1 passed from command line [2024-11-06 05:15:39,709][00514] Adding new argument 'no_render'=True that is not in the saved config file! [2024-11-06 05:15:39,711][00514] Adding new argument 'save_video'=True that is not in the saved config file! [2024-11-06 05:15:39,712][00514] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-11-06 05:15:39,714][00514] Adding new argument 'video_name'=None that is not in the saved config file! [2024-11-06 05:15:39,716][00514] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2024-11-06 05:15:39,720][00514] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-11-06 05:15:39,721][00514] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2024-11-06 05:15:39,723][00514] Adding new argument 'hf_repository'='InMDev/vizdoom_health_gathering_supreme' that is not in the saved config file! [2024-11-06 05:15:39,724][00514] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-11-06 05:15:39,725][00514] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-11-06 05:15:39,726][00514] Adding new argument 'train_script'=None that is not in the saved config file! [2024-11-06 05:15:39,727][00514] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-11-06 05:15:39,728][00514] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-11-06 05:15:39,759][00514] RunningMeanStd input shape: (3, 72, 128) [2024-11-06 05:15:39,763][00514] RunningMeanStd input shape: (1,) [2024-11-06 05:15:39,783][00514] ConvEncoder: input_channels=3 [2024-11-06 05:15:39,820][00514] Conv encoder output size: 512 [2024-11-06 05:15:39,822][00514] Policy head output size: 512 [2024-11-06 05:15:39,841][00514] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002443_10006528.pth... [2024-11-06 05:15:40,290][00514] Num frames 100... [2024-11-06 05:15:40,421][00514] Num frames 200... [2024-11-06 05:15:40,570][00514] Num frames 300... [2024-11-06 05:15:40,744][00514] Num frames 400... [2024-11-06 05:15:40,878][00514] Avg episode rewards: #0: 7.410, true rewards: #0: 4.410 [2024-11-06 05:15:40,880][00514] Avg episode reward: 7.410, avg true_objective: 4.410 [2024-11-06 05:15:40,978][00514] Num frames 500... [2024-11-06 05:15:41,144][00514] Num frames 600... [2024-11-06 05:15:41,316][00514] Num frames 700... [2024-11-06 05:15:41,481][00514] Num frames 800... [2024-11-06 05:15:41,641][00514] Num frames 900... [2024-11-06 05:15:41,831][00514] Num frames 1000... [2024-11-06 05:15:42,008][00514] Num frames 1100... [2024-11-06 05:15:42,115][00514] Avg episode rewards: #0: 10.130, true rewards: #0: 5.630 [2024-11-06 05:15:42,117][00514] Avg episode reward: 10.130, avg true_objective: 5.630 [2024-11-06 05:15:42,241][00514] Num frames 1200... [2024-11-06 05:15:42,423][00514] Num frames 1300... [2024-11-06 05:15:42,596][00514] Num frames 1400... [2024-11-06 05:15:42,780][00514] Num frames 1500... [2024-11-06 05:15:42,976][00514] Avg episode rewards: #0: 8.913, true rewards: #0: 5.247 [2024-11-06 05:15:42,978][00514] Avg episode reward: 8.913, avg true_objective: 5.247 [2024-11-06 05:15:43,028][00514] Num frames 1600... [2024-11-06 05:15:43,172][00514] Num frames 1700... [2024-11-06 05:15:43,301][00514] Num frames 1800... [2024-11-06 05:15:43,423][00514] Num frames 1900... [2024-11-06 05:15:43,547][00514] Num frames 2000... [2024-11-06 05:15:43,670][00514] Num frames 2100... [2024-11-06 05:15:43,798][00514] Num frames 2200... [2024-11-06 05:15:43,923][00514] Num frames 2300... [2024-11-06 05:15:44,053][00514] Num frames 2400... [2024-11-06 05:15:44,179][00514] Num frames 2500... [2024-11-06 05:15:44,310][00514] Num frames 2600... [2024-11-06 05:15:44,435][00514] Num frames 2700... [2024-11-06 05:15:44,558][00514] Num frames 2800... [2024-11-06 05:15:44,680][00514] Num frames 2900... [2024-11-06 05:15:44,820][00514] Num frames 3000... [2024-11-06 05:15:44,969][00514] Num frames 3100... [2024-11-06 05:15:45,096][00514] Num frames 3200... [2024-11-06 05:15:45,220][00514] Num frames 3300... [2024-11-06 05:15:45,359][00514] Num frames 3400... [2024-11-06 05:15:45,483][00514] Num frames 3500... [2024-11-06 05:15:45,607][00514] Num frames 3600... [2024-11-06 05:15:45,756][00514] Avg episode rewards: #0: 21.935, true rewards: #0: 9.185 [2024-11-06 05:15:45,758][00514] Avg episode reward: 21.935, avg true_objective: 9.185 [2024-11-06 05:15:45,793][00514] Num frames 3700... [2024-11-06 05:15:45,916][00514] Num frames 3800... [2024-11-06 05:15:46,050][00514] Num frames 3900... [2024-11-06 05:15:46,183][00514] Num frames 4000... [2024-11-06 05:15:46,317][00514] Num frames 4100... [2024-11-06 05:15:46,445][00514] Num frames 4200... [2024-11-06 05:15:46,565][00514] Num frames 4300... [2024-11-06 05:15:46,691][00514] Num frames 4400... [2024-11-06 05:15:46,814][00514] Num frames 4500... [2024-11-06 05:15:46,936][00514] Num frames 4600... [2024-11-06 05:15:47,079][00514] Avg episode rewards: #0: 21.332, true rewards: #0: 9.332 [2024-11-06 05:15:47,082][00514] Avg episode reward: 21.332, avg true_objective: 9.332 [2024-11-06 05:15:47,129][00514] Num frames 4700... [2024-11-06 05:15:47,260][00514] Num frames 4800... [2024-11-06 05:15:47,381][00514] Num frames 4900... [2024-11-06 05:15:47,503][00514] Num frames 5000... [2024-11-06 05:15:47,626][00514] Num frames 5100... [2024-11-06 05:15:47,747][00514] Num frames 5200... [2024-11-06 05:15:47,871][00514] Num frames 5300... [2024-11-06 05:15:47,935][00514] Avg episode rewards: #0: 19.843, true rewards: #0: 8.843 [2024-11-06 05:15:47,938][00514] Avg episode reward: 19.843, avg true_objective: 8.843 [2024-11-06 05:15:48,064][00514] Num frames 5400... [2024-11-06 05:15:48,188][00514] Num frames 5500... [2024-11-06 05:15:48,320][00514] Num frames 5600... [2024-11-06 05:15:48,446][00514] Num frames 5700... [2024-11-06 05:15:48,564][00514] Num frames 5800... [2024-11-06 05:15:48,686][00514] Num frames 5900... [2024-11-06 05:15:48,811][00514] Num frames 6000... [2024-11-06 05:15:48,959][00514] Avg episode rewards: #0: 19.391, true rewards: #0: 8.677 [2024-11-06 05:15:48,961][00514] Avg episode reward: 19.391, avg true_objective: 8.677 [2024-11-06 05:15:48,994][00514] Num frames 6100... [2024-11-06 05:15:49,122][00514] Num frames 6200... [2024-11-06 05:15:49,252][00514] Num frames 6300... [2024-11-06 05:15:49,384][00514] Num frames 6400... [2024-11-06 05:15:49,504][00514] Num frames 6500... [2024-11-06 05:15:49,632][00514] Num frames 6600... [2024-11-06 05:15:49,761][00514] Num frames 6700... [2024-11-06 05:15:49,883][00514] Num frames 6800... [2024-11-06 05:15:49,952][00514] Avg episode rewards: #0: 19.387, true rewards: #0: 8.512 [2024-11-06 05:15:49,954][00514] Avg episode reward: 19.387, avg true_objective: 8.512 [2024-11-06 05:15:50,067][00514] Num frames 6900... [2024-11-06 05:15:50,198][00514] Num frames 7000... [2024-11-06 05:15:50,329][00514] Num frames 7100... [2024-11-06 05:15:50,453][00514] Num frames 7200... [2024-11-06 05:15:50,577][00514] Num frames 7300... [2024-11-06 05:15:50,702][00514] Num frames 7400... [2024-11-06 05:15:50,829][00514] Num frames 7500... [2024-11-06 05:15:50,955][00514] Num frames 7600... [2024-11-06 05:15:51,082][00514] Num frames 7700... [2024-11-06 05:15:51,212][00514] Avg episode rewards: #0: 19.719, true rewards: #0: 8.608 [2024-11-06 05:15:51,214][00514] Avg episode reward: 19.719, avg true_objective: 8.608 [2024-11-06 05:15:51,288][00514] Num frames 7800... [2024-11-06 05:15:51,414][00514] Num frames 7900... [2024-11-06 05:15:51,538][00514] Num frames 8000... [2024-11-06 05:15:51,677][00514] Num frames 8100... [2024-11-06 05:15:51,814][00514] Avg episode rewards: #0: 18.363, true rewards: #0: 8.163 [2024-11-06 05:15:51,817][00514] Avg episode reward: 18.363, avg true_objective: 8.163 [2024-11-06 05:16:44,667][00514] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2024-11-06 05:16:49,473][00514] The model has been pushed to https://huggingface.co./InMDev/vizdoom_health_gathering_supreme [2024-11-06 05:19:07,461][20138] Saving configuration to /content/train_dir/default_experiment/config.json... [2024-11-06 05:19:07,467][20138] Rollout worker 0 uses device cpu [2024-11-06 05:19:07,468][20138] Rollout worker 1 uses device cpu [2024-11-06 05:19:07,470][20138] Rollout worker 2 uses device cpu [2024-11-06 05:19:07,471][20138] Rollout worker 3 uses device cpu [2024-11-06 05:19:07,473][20138] Rollout worker 4 uses device cpu [2024-11-06 05:19:07,474][20138] Rollout worker 5 uses device cpu [2024-11-06 05:19:07,475][20138] Rollout worker 6 uses device cpu [2024-11-06 05:19:07,476][20138] Rollout worker 7 uses device cpu [2024-11-06 05:19:07,587][20138] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-11-06 05:19:07,589][20138] InferenceWorker_p0-w0: min num requests: 2 [2024-11-06 05:19:07,623][20138] Starting all processes... [2024-11-06 05:19:07,626][20138] Starting process learner_proc0 [2024-11-06 05:19:07,675][20138] Starting all processes... [2024-11-06 05:19:07,682][20138] Starting process inference_proc0-0 [2024-11-06 05:19:07,683][20138] Starting process rollout_proc0 [2024-11-06 05:19:07,684][20138] Starting process rollout_proc1 [2024-11-06 05:19:07,685][20138] Starting process rollout_proc2 [2024-11-06 05:19:07,685][20138] Starting process rollout_proc3 [2024-11-06 05:19:07,685][20138] Starting process rollout_proc4 [2024-11-06 05:19:07,685][20138] Starting process rollout_proc5 [2024-11-06 05:19:07,685][20138] Starting process rollout_proc6 [2024-11-06 05:19:07,685][20138] Starting process rollout_proc7 [2024-11-06 05:19:26,754][20529] Worker 6 uses CPU cores [0] [2024-11-06 05:19:26,755][20530] Worker 4 uses CPU cores [0] [2024-11-06 05:19:26,823][20511] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-11-06 05:19:26,824][20511] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2024-11-06 05:19:26,855][20511] Num visible devices: 1 [2024-11-06 05:19:26,856][20524] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-11-06 05:19:26,866][20524] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2024-11-06 05:19:26,873][20528] Worker 3 uses CPU cores [1] [2024-11-06 05:19:26,882][20511] Starting seed is not provided [2024-11-06 05:19:26,883][20511] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-11-06 05:19:26,884][20511] Initializing actor-critic model on device cuda:0 [2024-11-06 05:19:26,885][20511] RunningMeanStd input shape: (3, 72, 128) [2024-11-06 05:19:26,886][20511] RunningMeanStd input shape: (1,) [2024-11-06 05:19:26,923][20511] ConvEncoder: input_channels=3 [2024-11-06 05:19:26,942][20524] Num visible devices: 1 [2024-11-06 05:19:27,022][20525] Worker 0 uses CPU cores [0] [2024-11-06 05:19:27,094][20527] Worker 1 uses CPU cores [1] [2024-11-06 05:19:27,107][20526] Worker 2 uses CPU cores [0] [2024-11-06 05:19:27,201][20511] Conv encoder output size: 512 [2024-11-06 05:19:27,202][20511] Policy head output size: 512 [2024-11-06 05:19:27,218][20531] Worker 5 uses CPU cores [1] [2024-11-06 05:19:27,231][20511] Created Actor Critic model with architecture: [2024-11-06 05:19:27,232][20511] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2024-11-06 05:19:27,249][20532] Worker 7 uses CPU cores [1] [2024-11-06 05:19:27,441][20511] Using optimizer [2024-11-06 05:19:27,584][20138] Heartbeat connected on Batcher_0 [2024-11-06 05:19:27,590][20138] Heartbeat connected on InferenceWorker_p0-w0 [2024-11-06 05:19:27,600][20138] Heartbeat connected on RolloutWorker_w1 [2024-11-06 05:19:27,601][20138] Heartbeat connected on RolloutWorker_w0 [2024-11-06 05:19:27,605][20138] Heartbeat connected on RolloutWorker_w2 [2024-11-06 05:19:27,609][20138] Heartbeat connected on RolloutWorker_w3 [2024-11-06 05:19:27,613][20138] Heartbeat connected on RolloutWorker_w4 [2024-11-06 05:19:27,616][20138] Heartbeat connected on RolloutWorker_w5 [2024-11-06 05:19:27,623][20138] Heartbeat connected on RolloutWorker_w7 [2024-11-06 05:19:27,625][20138] Heartbeat connected on RolloutWorker_w6 [2024-11-06 05:19:28,291][20511] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002443_10006528.pth... [2024-11-06 05:19:28,327][20511] Loading model from checkpoint [2024-11-06 05:19:28,329][20511] Loaded experiment state at self.train_step=2443, self.env_steps=10006528 [2024-11-06 05:19:28,330][20511] Initialized policy 0 weights for model version 2443 [2024-11-06 05:19:28,333][20511] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-11-06 05:19:28,340][20511] LearnerWorker_p0 finished initialization! [2024-11-06 05:19:28,341][20138] Heartbeat connected on LearnerWorker_p0 [2024-11-06 05:19:28,429][20524] RunningMeanStd input shape: (3, 72, 128) [2024-11-06 05:19:28,430][20524] RunningMeanStd input shape: (1,) [2024-11-06 05:19:28,443][20524] ConvEncoder: input_channels=3 [2024-11-06 05:19:28,552][20524] Conv encoder output size: 512 [2024-11-06 05:19:28,552][20524] Policy head output size: 512 [2024-11-06 05:19:28,606][20138] Inference worker 0-0 is ready! [2024-11-06 05:19:28,607][20138] All inference workers are ready! Signal rollout workers to start! [2024-11-06 05:19:28,825][20527] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-06 05:19:28,825][20530] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-06 05:19:28,824][20529] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-06 05:19:28,836][20532] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-06 05:19:28,833][20525] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-06 05:19:28,841][20526] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-06 05:19:28,846][20528] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-06 05:19:28,838][20531] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-06 05:19:30,197][20530] Decorrelating experience for 0 frames... [2024-11-06 05:19:30,199][20525] Decorrelating experience for 0 frames... [2024-11-06 05:19:30,201][20529] Decorrelating experience for 0 frames... [2024-11-06 05:19:30,253][20532] Decorrelating experience for 0 frames... [2024-11-06 05:19:30,255][20527] Decorrelating experience for 0 frames... [2024-11-06 05:19:30,263][20531] Decorrelating experience for 0 frames... [2024-11-06 05:19:31,208][20529] Decorrelating experience for 32 frames... [2024-11-06 05:19:31,341][20138] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 10006528. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-11-06 05:19:31,379][20526] Decorrelating experience for 0 frames... [2024-11-06 05:19:31,416][20532] Decorrelating experience for 32 frames... [2024-11-06 05:19:31,419][20528] Decorrelating experience for 0 frames... [2024-11-06 05:19:31,422][20531] Decorrelating experience for 32 frames... [2024-11-06 05:19:31,467][20530] Decorrelating experience for 32 frames... [2024-11-06 05:19:32,415][20526] Decorrelating experience for 32 frames... [2024-11-06 05:19:32,418][20525] Decorrelating experience for 32 frames... [2024-11-06 05:19:32,423][20528] Decorrelating experience for 32 frames... [2024-11-06 05:19:32,764][20532] Decorrelating experience for 64 frames... [2024-11-06 05:19:33,381][20525] Decorrelating experience for 64 frames... [2024-11-06 05:19:33,482][20527] Decorrelating experience for 32 frames... [2024-11-06 05:19:33,501][20531] Decorrelating experience for 64 frames... [2024-11-06 05:19:33,725][20530] Decorrelating experience for 64 frames... [2024-11-06 05:19:34,660][20525] Decorrelating experience for 96 frames... [2024-11-06 05:19:34,714][20526] Decorrelating experience for 64 frames... [2024-11-06 05:19:35,084][20530] Decorrelating experience for 96 frames... [2024-11-06 05:19:35,094][20532] Decorrelating experience for 96 frames... [2024-11-06 05:19:35,228][20531] Decorrelating experience for 96 frames... [2024-11-06 05:19:35,442][20528] Decorrelating experience for 64 frames... [2024-11-06 05:19:35,547][20527] Decorrelating experience for 64 frames... [2024-11-06 05:19:36,186][20526] Decorrelating experience for 96 frames... [2024-11-06 05:19:36,341][20138] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 10006528. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-11-06 05:19:36,776][20529] Decorrelating experience for 64 frames... [2024-11-06 05:19:38,786][20528] Decorrelating experience for 96 frames... [2024-11-06 05:19:39,134][20527] Decorrelating experience for 96 frames... [2024-11-06 05:19:41,341][20138] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 10006528. Throughput: 0: 186.0. Samples: 1860. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-11-06 05:19:41,346][20138] Avg episode reward: [(0, '2.160')] [2024-11-06 05:19:42,481][20511] Signal inference workers to stop experience collection... [2024-11-06 05:19:42,519][20524] InferenceWorker_p0-w0: stopping experience collection [2024-11-06 05:19:42,607][20529] Decorrelating experience for 96 frames... [2024-11-06 05:19:44,301][20511] Signal inference workers to resume experience collection... [2024-11-06 05:19:44,312][20511] Stopping Batcher_0... [2024-11-06 05:19:44,312][20511] Loop batcher_evt_loop terminating... [2024-11-06 05:19:44,313][20138] Component Batcher_0 stopped! [2024-11-06 05:19:44,349][20524] Weights refcount: 2 0 [2024-11-06 05:19:44,351][20138] Component InferenceWorker_p0-w0 stopped! [2024-11-06 05:19:44,355][20524] Stopping InferenceWorker_p0-w0... [2024-11-06 05:19:44,356][20524] Loop inference_proc0-0_evt_loop terminating... [2024-11-06 05:19:44,639][20138] Component RolloutWorker_w7 stopped! [2024-11-06 05:19:44,645][20532] Stopping RolloutWorker_w7... [2024-11-06 05:19:44,665][20138] Component RolloutWorker_w5 stopped! [2024-11-06 05:19:44,654][20532] Loop rollout_proc7_evt_loop terminating... [2024-11-06 05:19:44,670][20531] Stopping RolloutWorker_w5... [2024-11-06 05:19:44,672][20531] Loop rollout_proc5_evt_loop terminating... [2024-11-06 05:19:44,677][20526] Stopping RolloutWorker_w2... [2024-11-06 05:19:44,681][20526] Loop rollout_proc2_evt_loop terminating... [2024-11-06 05:19:44,677][20138] Component RolloutWorker_w2 stopped! [2024-11-06 05:19:44,696][20138] Component RolloutWorker_w3 stopped! [2024-11-06 05:19:44,705][20528] Stopping RolloutWorker_w3... [2024-11-06 05:19:44,711][20529] Stopping RolloutWorker_w6... [2024-11-06 05:19:44,711][20138] Component RolloutWorker_w6 stopped! [2024-11-06 05:19:44,707][20528] Loop rollout_proc3_evt_loop terminating... [2024-11-06 05:19:44,715][20529] Loop rollout_proc6_evt_loop terminating... [2024-11-06 05:19:44,766][20138] Component RolloutWorker_w1 stopped! [2024-11-06 05:19:44,769][20527] Stopping RolloutWorker_w1... [2024-11-06 05:19:44,774][20527] Loop rollout_proc1_evt_loop terminating... [2024-11-06 05:19:44,832][20530] Stopping RolloutWorker_w4... [2024-11-06 05:19:44,836][20525] Stopping RolloutWorker_w0... [2024-11-06 05:19:44,837][20525] Loop rollout_proc0_evt_loop terminating... [2024-11-06 05:19:44,841][20530] Loop rollout_proc4_evt_loop terminating... [2024-11-06 05:19:44,832][20138] Component RolloutWorker_w4 stopped! [2024-11-06 05:19:44,843][20138] Component RolloutWorker_w0 stopped! [2024-11-06 05:19:45,180][20511] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002445_10014720.pth... [2024-11-06 05:19:45,313][20511] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002366_9691136.pth [2024-11-06 05:19:45,327][20511] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002445_10014720.pth... [2024-11-06 05:19:45,520][20511] Stopping LearnerWorker_p0... [2024-11-06 05:19:45,521][20138] Component LearnerWorker_p0 stopped! [2024-11-06 05:19:45,524][20511] Loop learner_proc0_evt_loop terminating... [2024-11-06 05:19:45,525][20138] Waiting for process learner_proc0 to stop... [2024-11-06 05:19:47,034][20138] Waiting for process inference_proc0-0 to join... [2024-11-06 05:19:47,041][20138] Waiting for process rollout_proc0 to join... [2024-11-06 05:19:48,558][20138] Waiting for process rollout_proc1 to join... [2024-11-06 05:19:48,720][20138] Waiting for process rollout_proc2 to join... [2024-11-06 05:19:48,725][20138] Waiting for process rollout_proc3 to join... [2024-11-06 05:19:48,729][20138] Waiting for process rollout_proc4 to join... [2024-11-06 05:19:48,732][20138] Waiting for process rollout_proc5 to join... [2024-11-06 05:19:48,736][20138] Waiting for process rollout_proc6 to join... [2024-11-06 05:19:48,739][20138] Waiting for process rollout_proc7 to join... [2024-11-06 05:19:48,743][20138] Batcher 0 profile tree view: batching: 0.0990, releasing_batches: 0.0020 [2024-11-06 05:19:48,744][20138] InferenceWorker_p0-w0 profile tree view: update_model: 0.0210 wait_policy: 0.0123 wait_policy_total: 9.4983 one_step: 0.0048 handle_policy_step: 4.0362 deserialize: 0.0619, stack: 0.0140, obs_to_device_normalize: 0.6994, forward: 2.5820, send_messages: 0.1661 prepare_outputs: 0.3641 to_cpu: 0.1995 [2024-11-06 05:19:48,746][20138] Learner 0 profile tree view: misc: 0.0000, prepare_batch: 2.6073 train: 4.1122 epoch_init: 0.0000, minibatch_init: 0.0000, losses_postprocess: 0.0004, kl_divergence: 0.0227, after_optimizer: 0.0703 calculate_losses: 2.2671 losses_init: 0.0000, forward_head: 0.4234, bptt_initial: 1.6088, tail: 0.0734, advantages_returns: 0.0020, losses: 0.1522 bptt: 0.0068 bptt_forward_core: 0.0067 update: 1.7506 clip: 0.0383 [2024-11-06 05:19:48,748][20138] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.0138, enqueue_policy_requests: 1.2427, env_step: 4.2976, overhead: 0.1271, complete_rollouts: 0.0335 save_policy_outputs: 0.1316 split_output_tensors: 0.0640 [2024-11-06 05:19:48,753][20138] RolloutWorker_w7 profile tree view: wait_for_trajectories: 0.0012, enqueue_policy_requests: 1.3873, env_step: 4.2567, overhead: 0.0947, complete_rollouts: 0.1022 save_policy_outputs: 0.1197 split_output_tensors: 0.0320 [2024-11-06 05:19:48,754][20138] Loop Runner_EvtLoop terminating... [2024-11-06 05:19:48,756][20138] Runner profile tree view: main_loop: 41.1334 [2024-11-06 05:19:48,758][20138] Collected {0: 10014720}, FPS: 199.2 [2024-11-06 05:19:48,991][20138] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-11-06 05:19:48,993][20138] Overriding arg 'num_workers' with value 1 passed from command line [2024-11-06 05:19:48,995][20138] Adding new argument 'no_render'=True that is not in the saved config file! [2024-11-06 05:19:48,997][20138] Adding new argument 'save_video'=True that is not in the saved config file! [2024-11-06 05:19:48,999][20138] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-11-06 05:19:49,001][20138] Adding new argument 'video_name'=None that is not in the saved config file! [2024-11-06 05:19:49,005][20138] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! [2024-11-06 05:19:49,006][20138] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-11-06 05:19:49,008][20138] Adding new argument 'push_to_hub'=False that is not in the saved config file! [2024-11-06 05:19:49,011][20138] Adding new argument 'hf_repository'=None that is not in the saved config file! [2024-11-06 05:19:49,014][20138] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-11-06 05:19:49,015][20138] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-11-06 05:19:49,016][20138] Adding new argument 'train_script'=None that is not in the saved config file! [2024-11-06 05:19:49,017][20138] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-11-06 05:19:49,018][20138] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-11-06 05:19:49,049][20138] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-06 05:19:49,053][20138] RunningMeanStd input shape: (3, 72, 128) [2024-11-06 05:19:49,056][20138] RunningMeanStd input shape: (1,) [2024-11-06 05:19:49,075][20138] ConvEncoder: input_channels=3 [2024-11-06 05:19:49,182][20138] Conv encoder output size: 512 [2024-11-06 05:19:49,183][20138] Policy head output size: 512 [2024-11-06 05:19:49,357][20138] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002445_10014720.pth... [2024-11-06 05:19:50,179][20138] Num frames 100... [2024-11-06 05:19:50,303][20138] Num frames 200... [2024-11-06 05:19:50,428][20138] Num frames 300... [2024-11-06 05:19:50,553][20138] Num frames 400... [2024-11-06 05:19:50,676][20138] Num frames 500... [2024-11-06 05:19:50,798][20138] Num frames 600... [2024-11-06 05:19:50,868][20138] Avg episode rewards: #0: 10.080, true rewards: #0: 6.080 [2024-11-06 05:19:50,869][20138] Avg episode reward: 10.080, avg true_objective: 6.080 [2024-11-06 05:19:50,989][20138] Num frames 700... [2024-11-06 05:19:51,116][20138] Num frames 800... [2024-11-06 05:19:51,241][20138] Num frames 900... [2024-11-06 05:19:51,372][20138] Num frames 1000... [2024-11-06 05:19:51,498][20138] Num frames 1100... [2024-11-06 05:19:51,621][20138] Num frames 1200... [2024-11-06 05:19:51,752][20138] Num frames 1300... [2024-11-06 05:19:51,878][20138] Num frames 1400... [2024-11-06 05:19:52,010][20138] Num frames 1500... [2024-11-06 05:19:52,112][20138] Avg episode rewards: #0: 15.180, true rewards: #0: 7.680 [2024-11-06 05:19:52,113][20138] Avg episode reward: 15.180, avg true_objective: 7.680 [2024-11-06 05:19:52,193][20138] Num frames 1600... [2024-11-06 05:19:52,324][20138] Num frames 1700... [2024-11-06 05:19:52,451][20138] Num frames 1800... [2024-11-06 05:19:52,575][20138] Num frames 1900... [2024-11-06 05:19:52,700][20138] Num frames 2000... [2024-11-06 05:19:52,823][20138] Num frames 2100... [2024-11-06 05:19:52,944][20138] Num frames 2200... [2024-11-06 05:19:53,079][20138] Num frames 2300... [2024-11-06 05:19:53,208][20138] Num frames 2400... [2024-11-06 05:19:53,341][20138] Num frames 2500... [2024-11-06 05:19:53,517][20138] Num frames 2600... [2024-11-06 05:19:53,689][20138] Num frames 2700... [2024-11-06 05:19:53,861][20138] Num frames 2800... [2024-11-06 05:19:54,061][20138] Num frames 2900... [2024-11-06 05:19:54,259][20138] Num frames 3000... [2024-11-06 05:19:54,435][20138] Num frames 3100... [2024-11-06 05:19:54,612][20138] Num frames 3200... [2024-11-06 05:19:54,794][20138] Num frames 3300... [2024-11-06 05:19:54,974][20138] Num frames 3400... [2024-11-06 05:19:55,187][20138] Avg episode rewards: #0: 27.626, true rewards: #0: 11.627 [2024-11-06 05:19:55,189][20138] Avg episode reward: 27.626, avg true_objective: 11.627 [2024-11-06 05:19:55,215][20138] Num frames 3500... [2024-11-06 05:19:55,415][20138] Num frames 3600... [2024-11-06 05:19:55,600][20138] Num frames 3700... [2024-11-06 05:19:55,774][20138] Num frames 3800... [2024-11-06 05:19:55,959][20138] Num frames 3900... [2024-11-06 05:19:56,092][20138] Num frames 4000... [2024-11-06 05:19:56,232][20138] Num frames 4100... [2024-11-06 05:19:56,367][20138] Num frames 4200... [2024-11-06 05:19:56,493][20138] Num frames 4300... [2024-11-06 05:19:56,620][20138] Num frames 4400... [2024-11-06 05:19:56,747][20138] Num frames 4500... [2024-11-06 05:19:56,874][20138] Num frames 4600... [2024-11-06 05:19:57,000][20138] Num frames 4700... [2024-11-06 05:19:57,124][20138] Num frames 4800... [2024-11-06 05:19:57,264][20138] Num frames 4900... [2024-11-06 05:19:57,410][20138] Avg episode rewards: #0: 29.430, true rewards: #0: 12.430 [2024-11-06 05:19:57,412][20138] Avg episode reward: 29.430, avg true_objective: 12.430 [2024-11-06 05:19:57,449][20138] Num frames 5000... [2024-11-06 05:19:57,578][20138] Num frames 5100... [2024-11-06 05:19:57,705][20138] Num frames 5200... [2024-11-06 05:19:57,827][20138] Num frames 5300... [2024-11-06 05:19:57,956][20138] Num frames 5400... [2024-11-06 05:19:58,082][20138] Num frames 5500... [2024-11-06 05:19:58,220][20138] Num frames 5600... [2024-11-06 05:19:58,355][20138] Num frames 5700... [2024-11-06 05:19:58,482][20138] Num frames 5800... [2024-11-06 05:19:58,612][20138] Num frames 5900... [2024-11-06 05:19:58,736][20138] Num frames 6000... [2024-11-06 05:19:58,872][20138] Num frames 6100... [2024-11-06 05:19:59,009][20138] Num frames 6200... [2024-11-06 05:19:59,142][20138] Num frames 6300... [2024-11-06 05:19:59,281][20138] Num frames 6400... [2024-11-06 05:19:59,406][20138] Num frames 6500... [2024-11-06 05:19:59,536][20138] Num frames 6600... [2024-11-06 05:19:59,663][20138] Num frames 6700... [2024-11-06 05:19:59,796][20138] Num frames 6800... [2024-11-06 05:19:59,926][20138] Num frames 6900... [2024-11-06 05:20:00,051][20138] Num frames 7000... [2024-11-06 05:20:00,199][20138] Avg episode rewards: #0: 36.544, true rewards: #0: 14.144 [2024-11-06 05:20:00,201][20138] Avg episode reward: 36.544, avg true_objective: 14.144 [2024-11-06 05:20:00,252][20138] Num frames 7100... [2024-11-06 05:20:00,390][20138] Num frames 7200... [2024-11-06 05:20:00,515][20138] Num frames 7300... [2024-11-06 05:20:00,637][20138] Num frames 7400... [2024-11-06 05:20:00,765][20138] Num frames 7500... [2024-11-06 05:20:00,892][20138] Num frames 7600... [2024-11-06 05:20:01,017][20138] Num frames 7700... [2024-11-06 05:20:01,142][20138] Num frames 7800... [2024-11-06 05:20:01,278][20138] Num frames 7900... [2024-11-06 05:20:01,407][20138] Num frames 8000... [2024-11-06 05:20:01,532][20138] Num frames 8100... [2024-11-06 05:20:01,666][20138] Num frames 8200... [2024-11-06 05:20:01,801][20138] Num frames 8300... [2024-11-06 05:20:01,930][20138] Num frames 8400... [2024-11-06 05:20:02,000][20138] Avg episode rewards: #0: 36.350, true rewards: #0: 14.017 [2024-11-06 05:20:02,002][20138] Avg episode reward: 36.350, avg true_objective: 14.017 [2024-11-06 05:20:02,115][20138] Num frames 8500... [2024-11-06 05:20:02,242][20138] Num frames 8600... [2024-11-06 05:20:02,382][20138] Num frames 8700... [2024-11-06 05:20:02,511][20138] Num frames 8800... [2024-11-06 05:20:02,641][20138] Num frames 8900... [2024-11-06 05:20:02,765][20138] Num frames 9000... [2024-11-06 05:20:02,884][20138] Num frames 9100... [2024-11-06 05:20:03,009][20138] Num frames 9200... [2024-11-06 05:20:03,133][20138] Num frames 9300... [2024-11-06 05:20:03,263][20138] Num frames 9400... [2024-11-06 05:20:03,399][20138] Num frames 9500... [2024-11-06 05:20:03,522][20138] Num frames 9600... [2024-11-06 05:20:03,649][20138] Num frames 9700... [2024-11-06 05:20:03,776][20138] Num frames 9800... [2024-11-06 05:20:03,900][20138] Num frames 9900... [2024-11-06 05:20:04,032][20138] Num frames 10000... [2024-11-06 05:20:04,160][20138] Num frames 10100... [2024-11-06 05:20:04,305][20138] Num frames 10200... [2024-11-06 05:20:04,437][20138] Num frames 10300... [2024-11-06 05:20:04,562][20138] Num frames 10400... [2024-11-06 05:20:04,686][20138] Num frames 10500... [2024-11-06 05:20:04,755][20138] Avg episode rewards: #0: 40.157, true rewards: #0: 15.014 [2024-11-06 05:20:04,757][20138] Avg episode reward: 40.157, avg true_objective: 15.014 [2024-11-06 05:20:04,863][20138] Num frames 10600... [2024-11-06 05:20:04,995][20138] Num frames 10700... [2024-11-06 05:20:05,115][20138] Num frames 10800... [2024-11-06 05:20:05,240][20138] Num frames 10900... [2024-11-06 05:20:05,371][20138] Num frames 11000... [2024-11-06 05:20:05,501][20138] Num frames 11100... [2024-11-06 05:20:05,627][20138] Num frames 11200... [2024-11-06 05:20:05,752][20138] Num frames 11300... [2024-11-06 05:20:05,875][20138] Num frames 11400... [2024-11-06 05:20:06,006][20138] Num frames 11500... [2024-11-06 05:20:06,186][20138] Num frames 11600... [2024-11-06 05:20:06,362][20138] Num frames 11700... [2024-11-06 05:20:06,540][20138] Num frames 11800... [2024-11-06 05:20:06,714][20138] Num frames 11900... [2024-11-06 05:20:06,886][20138] Num frames 12000... [2024-11-06 05:20:07,083][20138] Avg episode rewards: #0: 40.601, true rewards: #0: 15.101 [2024-11-06 05:20:07,086][20138] Avg episode reward: 40.601, avg true_objective: 15.101 [2024-11-06 05:20:07,121][20138] Num frames 12100... [2024-11-06 05:20:07,291][20138] Num frames 12200... [2024-11-06 05:20:07,488][20138] Num frames 12300... [2024-11-06 05:20:07,668][20138] Num frames 12400... [2024-11-06 05:20:07,845][20138] Num frames 12500... [2024-11-06 05:20:08,028][20138] Num frames 12600... [2024-11-06 05:20:08,253][20138] Avg episode rewards: #0: 37.432, true rewards: #0: 14.099 [2024-11-06 05:20:08,255][20138] Avg episode reward: 37.432, avg true_objective: 14.099 [2024-11-06 05:20:08,277][20138] Num frames 12700... [2024-11-06 05:20:08,461][20138] Num frames 12800... [2024-11-06 05:20:08,625][20138] Num frames 12900... [2024-11-06 05:20:08,751][20138] Num frames 13000... [2024-11-06 05:20:08,880][20138] Num frames 13100... [2024-11-06 05:20:09,122][20138] Num frames 13200... [2024-11-06 05:20:09,331][20138] Num frames 13300... [2024-11-06 05:20:09,453][20138] Num frames 13400... [2024-11-06 05:20:09,587][20138] Num frames 13500... [2024-11-06 05:20:09,713][20138] Num frames 13600... [2024-11-06 05:20:09,856][20138] Num frames 13700... [2024-11-06 05:20:10,116][20138] Num frames 13800... [2024-11-06 05:20:10,238][20138] Num frames 13900... [2024-11-06 05:20:10,364][20138] Num frames 14000... [2024-11-06 05:20:10,498][20138] Avg episode rewards: #0: 36.862, true rewards: #0: 14.062 [2024-11-06 05:20:10,500][20138] Avg episode reward: 36.862, avg true_objective: 14.062 [2024-11-06 05:21:42,824][20138] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2024-11-06 05:22:33,686][20138] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-11-06 05:22:33,688][20138] Overriding arg 'num_workers' with value 1 passed from command line [2024-11-06 05:22:33,690][20138] Adding new argument 'no_render'=True that is not in the saved config file! [2024-11-06 05:22:33,692][20138] Adding new argument 'save_video'=True that is not in the saved config file! [2024-11-06 05:22:33,694][20138] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-11-06 05:22:33,695][20138] Adding new argument 'video_name'=None that is not in the saved config file! [2024-11-06 05:22:33,697][20138] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2024-11-06 05:22:33,698][20138] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-11-06 05:22:33,699][20138] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2024-11-06 05:22:33,700][20138] Adding new argument 'hf_repository'='InMDev/vizdoom_health_gathering_supreme' that is not in the saved config file! [2024-11-06 05:22:33,701][20138] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-11-06 05:22:33,702][20138] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-11-06 05:22:33,703][20138] Adding new argument 'train_script'=None that is not in the saved config file! [2024-11-06 05:22:33,704][20138] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-11-06 05:22:33,706][20138] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-11-06 05:22:33,738][20138] RunningMeanStd input shape: (3, 72, 128) [2024-11-06 05:22:33,739][20138] RunningMeanStd input shape: (1,) [2024-11-06 05:22:33,756][20138] ConvEncoder: input_channels=3 [2024-11-06 05:22:33,794][20138] Conv encoder output size: 512 [2024-11-06 05:22:33,796][20138] Policy head output size: 512 [2024-11-06 05:22:33,814][20138] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002445_10014720.pth... [2024-11-06 05:22:34,266][20138] Num frames 100... [2024-11-06 05:22:34,399][20138] Num frames 200... [2024-11-06 05:22:34,526][20138] Num frames 300... [2024-11-06 05:22:34,661][20138] Num frames 400... [2024-11-06 05:22:34,790][20138] Num frames 500... [2024-11-06 05:22:34,917][20138] Num frames 600... [2024-11-06 05:22:35,038][20138] Num frames 700... [2024-11-06 05:22:35,166][20138] Num frames 800... [2024-11-06 05:22:35,299][20138] Num frames 900... [2024-11-06 05:22:35,421][20138] Num frames 1000... [2024-11-06 05:22:35,547][20138] Num frames 1100... [2024-11-06 05:22:35,681][20138] Num frames 1200... [2024-11-06 05:22:35,811][20138] Num frames 1300... [2024-11-06 05:22:35,887][20138] Avg episode rewards: #0: 36.150, true rewards: #0: 13.150 [2024-11-06 05:22:35,888][20138] Avg episode reward: 36.150, avg true_objective: 13.150 [2024-11-06 05:22:36,008][20138] Num frames 1400... [2024-11-06 05:22:36,142][20138] Num frames 1500... [2024-11-06 05:22:36,281][20138] Num frames 1600... [2024-11-06 05:22:36,409][20138] Num frames 1700... [2024-11-06 05:22:36,534][20138] Num frames 1800... [2024-11-06 05:22:36,677][20138] Num frames 1900... [2024-11-06 05:22:36,800][20138] Num frames 2000... [2024-11-06 05:22:36,923][20138] Num frames 2100... [2024-11-06 05:22:37,046][20138] Num frames 2200... [2024-11-06 05:22:37,167][20138] Avg episode rewards: #0: 27.760, true rewards: #0: 11.260 [2024-11-06 05:22:37,170][20138] Avg episode reward: 27.760, avg true_objective: 11.260 [2024-11-06 05:22:37,230][20138] Num frames 2300... [2024-11-06 05:22:37,364][20138] Num frames 2400... [2024-11-06 05:22:37,492][20138] Num frames 2500... [2024-11-06 05:22:37,613][20138] Num frames 2600... [2024-11-06 05:22:37,745][20138] Num frames 2700... [2024-11-06 05:22:37,871][20138] Num frames 2800... [2024-11-06 05:22:37,992][20138] Num frames 2900... [2024-11-06 05:22:38,119][20138] Num frames 3000... [2024-11-06 05:22:38,249][20138] Num frames 3100... [2024-11-06 05:22:38,379][20138] Num frames 3200... [2024-11-06 05:22:38,501][20138] Num frames 3300... [2024-11-06 05:22:38,621][20138] Num frames 3400... [2024-11-06 05:22:38,753][20138] Num frames 3500... [2024-11-06 05:22:38,931][20138] Num frames 3600... [2024-11-06 05:22:39,103][20138] Num frames 3700... [2024-11-06 05:22:39,293][20138] Num frames 3800... [2024-11-06 05:22:39,463][20138] Num frames 3900... [2024-11-06 05:22:39,638][20138] Num frames 4000... [2024-11-06 05:22:39,816][20138] Num frames 4100... [2024-11-06 05:22:39,993][20138] Avg episode rewards: #0: 34.573, true rewards: #0: 13.907 [2024-11-06 05:22:39,995][20138] Avg episode reward: 34.573, avg true_objective: 13.907 [2024-11-06 05:22:40,054][20138] Num frames 4200... [2024-11-06 05:22:40,256][20138] Num frames 4300... [2024-11-06 05:22:40,430][20138] Num frames 4400... [2024-11-06 05:22:40,615][20138] Num frames 4500... [2024-11-06 05:22:40,811][20138] Num frames 4600... [2024-11-06 05:22:41,001][20138] Num frames 4700... [2024-11-06 05:22:41,188][20138] Num frames 4800... [2024-11-06 05:22:41,378][20138] Num frames 4900... [2024-11-06 05:22:41,510][20138] Num frames 5000... [2024-11-06 05:22:41,635][20138] Num frames 5100... [2024-11-06 05:22:41,760][20138] Num frames 5200... [2024-11-06 05:22:41,900][20138] Num frames 5300... [2024-11-06 05:22:42,035][20138] Num frames 5400... [2024-11-06 05:22:42,171][20138] Num frames 5500... [2024-11-06 05:22:42,308][20138] Num frames 5600... [2024-11-06 05:22:42,432][20138] Num frames 5700... [2024-11-06 05:22:42,557][20138] Num frames 5800... [2024-11-06 05:22:42,682][20138] Num frames 5900... [2024-11-06 05:22:42,803][20138] Num frames 6000... [2024-11-06 05:22:42,936][20138] Num frames 6100... [2024-11-06 05:22:43,070][20138] Num frames 6200... [2024-11-06 05:22:43,222][20138] Avg episode rewards: #0: 40.422, true rewards: #0: 15.673 [2024-11-06 05:22:43,224][20138] Avg episode reward: 40.422, avg true_objective: 15.673 [2024-11-06 05:22:43,272][20138] Num frames 6300... [2024-11-06 05:22:43,400][20138] Num frames 6400... [2024-11-06 05:22:43,524][20138] Num frames 6500... [2024-11-06 05:22:43,645][20138] Num frames 6600... [2024-11-06 05:22:43,769][20138] Num frames 6700... [2024-11-06 05:22:43,902][20138] Num frames 6800... [2024-11-06 05:22:44,025][20138] Num frames 6900... [2024-11-06 05:22:44,151][20138] Num frames 7000... [2024-11-06 05:22:44,327][20138] Avg episode rewards: #0: 35.978, true rewards: #0: 14.178 [2024-11-06 05:22:44,330][20138] Avg episode reward: 35.978, avg true_objective: 14.178 [2024-11-06 05:22:44,345][20138] Num frames 7100... [2024-11-06 05:22:44,474][20138] Num frames 7200... [2024-11-06 05:22:44,596][20138] Num frames 7300... [2024-11-06 05:22:44,719][20138] Num frames 7400... [2024-11-06 05:22:44,848][20138] Num frames 7500... [2024-11-06 05:22:44,988][20138] Num frames 7600... [2024-11-06 05:22:45,115][20138] Num frames 7700... [2024-11-06 05:22:45,244][20138] Num frames 7800... [2024-11-06 05:22:45,371][20138] Num frames 7900... [2024-11-06 05:22:45,499][20138] Num frames 8000... [2024-11-06 05:22:45,566][20138] Avg episode rewards: #0: 33.680, true rewards: #0: 13.347 [2024-11-06 05:22:45,568][20138] Avg episode reward: 33.680, avg true_objective: 13.347 [2024-11-06 05:22:45,684][20138] Num frames 8100... [2024-11-06 05:22:45,817][20138] Num frames 8200... [2024-11-06 05:22:45,955][20138] Num frames 8300... [2024-11-06 05:22:46,088][20138] Num frames 8400... [2024-11-06 05:22:46,215][20138] Num frames 8500... [2024-11-06 05:22:46,347][20138] Num frames 8600... [2024-11-06 05:22:46,469][20138] Num frames 8700... [2024-11-06 05:22:46,597][20138] Num frames 8800... [2024-11-06 05:22:46,724][20138] Num frames 8900... [2024-11-06 05:22:46,848][20138] Num frames 9000... [2024-11-06 05:22:46,907][20138] Avg episode rewards: #0: 31.858, true rewards: #0: 12.859 [2024-11-06 05:22:46,908][20138] Avg episode reward: 31.858, avg true_objective: 12.859 [2024-11-06 05:22:47,040][20138] Num frames 9100... [2024-11-06 05:22:47,165][20138] Num frames 9200... [2024-11-06 05:22:47,298][20138] Num frames 9300... [2024-11-06 05:22:47,429][20138] Num frames 9400... [2024-11-06 05:22:47,585][20138] Avg episode rewards: #0: 28.850, true rewards: #0: 11.850 [2024-11-06 05:22:47,587][20138] Avg episode reward: 28.850, avg true_objective: 11.850 [2024-11-06 05:22:47,614][20138] Num frames 9500... [2024-11-06 05:22:47,739][20138] Num frames 9600... [2024-11-06 05:22:47,867][20138] Num frames 9700... [2024-11-06 05:22:48,002][20138] Num frames 9800... [2024-11-06 05:22:48,129][20138] Num frames 9900... [2024-11-06 05:22:48,261][20138] Num frames 10000... [2024-11-06 05:22:48,388][20138] Num frames 10100... [2024-11-06 05:22:48,510][20138] Num frames 10200... [2024-11-06 05:22:48,624][20138] Avg episode rewards: #0: 27.609, true rewards: #0: 11.387 [2024-11-06 05:22:48,625][20138] Avg episode reward: 27.609, avg true_objective: 11.387 [2024-11-06 05:22:48,693][20138] Num frames 10300... [2024-11-06 05:22:48,820][20138] Num frames 10400... [2024-11-06 05:22:48,975][20138] Num frames 10500... [2024-11-06 05:22:49,105][20138] Num frames 10600... [2024-11-06 05:22:49,229][20138] Num frames 10700... [2024-11-06 05:22:49,365][20138] Num frames 10800... [2024-11-06 05:22:49,486][20138] Num frames 10900... [2024-11-06 05:22:49,618][20138] Num frames 11000... [2024-11-06 05:22:49,746][20138] Num frames 11100... [2024-11-06 05:22:49,869][20138] Num frames 11200... [2024-11-06 05:22:50,001][20138] Avg episode rewards: #0: 26.958, true rewards: #0: 11.258 [2024-11-06 05:22:50,004][20138] Avg episode reward: 26.958, avg true_objective: 11.258 [2024-11-06 05:24:03,470][20138] Replay video saved to /content/train_dir/default_experiment/replay.mp4!