diff --git "a/sf_log.txt" "b/sf_log.txt" new file mode 100644--- /dev/null +++ "b/sf_log.txt" @@ -0,0 +1,1167 @@ +[2025-01-10 05:51:44,348][00763] Saving configuration to /content/train_dir/default_experiment/config.json... +[2025-01-10 05:51:44,351][00763] Rollout worker 0 uses device cpu +[2025-01-10 05:51:44,352][00763] Rollout worker 1 uses device cpu +[2025-01-10 05:51:44,353][00763] Rollout worker 2 uses device cpu +[2025-01-10 05:51:44,357][00763] Rollout worker 3 uses device cpu +[2025-01-10 05:51:44,358][00763] Rollout worker 4 uses device cpu +[2025-01-10 05:51:44,359][00763] Rollout worker 5 uses device cpu +[2025-01-10 05:51:44,360][00763] Rollout worker 6 uses device cpu +[2025-01-10 05:51:44,362][00763] Rollout worker 7 uses device cpu +[2025-01-10 05:51:44,510][00763] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2025-01-10 05:51:44,512][00763] InferenceWorker_p0-w0: min num requests: 2 +[2025-01-10 05:51:44,543][00763] Starting all processes... +[2025-01-10 05:51:44,544][00763] Starting process learner_proc0 +[2025-01-10 05:51:44,587][00763] Starting all processes... +[2025-01-10 05:51:44,595][00763] Starting process inference_proc0-0 +[2025-01-10 05:51:44,596][00763] Starting process rollout_proc0 +[2025-01-10 05:51:44,597][00763] Starting process rollout_proc1 +[2025-01-10 05:51:44,598][00763] Starting process rollout_proc2 +[2025-01-10 05:51:44,598][00763] Starting process rollout_proc3 +[2025-01-10 05:51:44,599][00763] Starting process rollout_proc4 +[2025-01-10 05:51:44,599][00763] Starting process rollout_proc5 +[2025-01-10 05:51:44,599][00763] Starting process rollout_proc6 +[2025-01-10 05:51:44,599][00763] Starting process rollout_proc7 +[2025-01-10 05:52:00,623][08261] Worker 6 uses CPU cores [0] +[2025-01-10 05:52:00,743][08241] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2025-01-10 05:52:00,746][08241] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 +[2025-01-10 05:52:00,807][08241] Num visible devices: 1 +[2025-01-10 05:52:00,818][08257] Worker 2 uses CPU cores [0] +[2025-01-10 05:52:00,822][08254] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2025-01-10 05:52:00,825][08254] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 +[2025-01-10 05:52:00,841][08259] Worker 3 uses CPU cores [1] +[2025-01-10 05:52:00,842][08255] Worker 0 uses CPU cores [0] +[2025-01-10 05:52:00,847][08262] Worker 7 uses CPU cores [1] +[2025-01-10 05:52:00,847][08241] Starting seed is not provided +[2025-01-10 05:52:00,848][08241] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2025-01-10 05:52:00,848][08241] Initializing actor-critic model on device cuda:0 +[2025-01-10 05:52:00,849][08241] RunningMeanStd input shape: (3, 72, 128) +[2025-01-10 05:52:00,852][08241] RunningMeanStd input shape: (1,) +[2025-01-10 05:52:00,883][08254] Num visible devices: 1 +[2025-01-10 05:52:00,882][08241] ConvEncoder: input_channels=3 +[2025-01-10 05:52:00,896][08260] Worker 5 uses CPU cores [1] +[2025-01-10 05:52:00,939][08256] Worker 1 uses CPU cores [1] +[2025-01-10 05:52:01,018][08258] Worker 4 uses CPU cores [0] +[2025-01-10 05:52:01,162][08241] Conv encoder output size: 512 +[2025-01-10 05:52:01,162][08241] Policy head output size: 512 +[2025-01-10 05:52:01,215][08241] Created Actor Critic model with architecture: +[2025-01-10 05:52:01,215][08241] ActorCriticSharedWeights( + (obs_normalizer): ObservationNormalizer( + (running_mean_std): RunningMeanStdDictInPlace( + (running_mean_std): ModuleDict( + (obs): RunningMeanStdInPlace() + ) + ) + ) + (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) + (encoder): VizdoomEncoder( + (basic_encoder): ConvEncoder( + (enc): RecursiveScriptModule( + original_name=ConvEncoderImpl + (conv_head): RecursiveScriptModule( + original_name=Sequential + (0): RecursiveScriptModule(original_name=Conv2d) + (1): RecursiveScriptModule(original_name=ELU) + (2): RecursiveScriptModule(original_name=Conv2d) + (3): RecursiveScriptModule(original_name=ELU) + (4): RecursiveScriptModule(original_name=Conv2d) + (5): RecursiveScriptModule(original_name=ELU) + ) + (mlp_layers): RecursiveScriptModule( + original_name=Sequential + (0): RecursiveScriptModule(original_name=Linear) + (1): RecursiveScriptModule(original_name=ELU) + ) + ) + ) + ) + (core): ModelCoreRNN( + (core): GRU(512, 512) + ) + (decoder): MlpDecoder( + (mlp): Identity() + ) + (critic_linear): Linear(in_features=512, out_features=1, bias=True) + (action_parameterization): ActionParameterizationDefault( + (distribution_linear): Linear(in_features=512, out_features=5, bias=True) + ) +) +[2025-01-10 05:52:01,588][08241] Using optimizer +[2025-01-10 05:52:04,505][00763] Heartbeat connected on Batcher_0 +[2025-01-10 05:52:04,510][00763] Heartbeat connected on InferenceWorker_p0-w0 +[2025-01-10 05:52:04,519][00763] Heartbeat connected on RolloutWorker_w0 +[2025-01-10 05:52:04,525][00763] Heartbeat connected on RolloutWorker_w2 +[2025-01-10 05:52:04,526][00763] Heartbeat connected on RolloutWorker_w1 +[2025-01-10 05:52:04,530][00763] Heartbeat connected on RolloutWorker_w3 +[2025-01-10 05:52:04,544][00763] Heartbeat connected on RolloutWorker_w4 +[2025-01-10 05:52:04,546][00763] Heartbeat connected on RolloutWorker_w5 +[2025-01-10 05:52:04,547][00763] Heartbeat connected on RolloutWorker_w6 +[2025-01-10 05:52:04,548][00763] Heartbeat connected on RolloutWorker_w7 +[2025-01-10 05:52:05,840][08241] No checkpoints found +[2025-01-10 05:52:05,840][08241] Did not load from checkpoint, starting from scratch! +[2025-01-10 05:52:05,840][08241] Initialized policy 0 weights for model version 0 +[2025-01-10 05:52:05,845][08241] LearnerWorker_p0 finished initialization! +[2025-01-10 05:52:05,848][08241] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2025-01-10 05:52:05,845][00763] Heartbeat connected on LearnerWorker_p0 +[2025-01-10 05:52:06,031][08254] RunningMeanStd input shape: (3, 72, 128) +[2025-01-10 05:52:06,032][08254] RunningMeanStd input shape: (1,) +[2025-01-10 05:52:06,044][08254] ConvEncoder: input_channels=3 +[2025-01-10 05:52:06,145][08254] Conv encoder output size: 512 +[2025-01-10 05:52:06,146][08254] Policy head output size: 512 +[2025-01-10 05:52:06,196][00763] Inference worker 0-0 is ready! +[2025-01-10 05:52:06,198][00763] All inference workers are ready! Signal rollout workers to start! +[2025-01-10 05:52:06,392][08258] Doom resolution: 160x120, resize resolution: (128, 72) +[2025-01-10 05:52:06,396][08259] Doom resolution: 160x120, resize resolution: (128, 72) +[2025-01-10 05:52:06,399][08255] Doom resolution: 160x120, resize resolution: (128, 72) +[2025-01-10 05:52:06,397][08260] Doom resolution: 160x120, resize resolution: (128, 72) +[2025-01-10 05:52:06,397][08261] Doom resolution: 160x120, resize resolution: (128, 72) +[2025-01-10 05:52:06,398][08256] Doom resolution: 160x120, resize resolution: (128, 72) +[2025-01-10 05:52:06,395][08262] Doom resolution: 160x120, resize resolution: (128, 72) +[2025-01-10 05:52:06,394][08257] Doom resolution: 160x120, resize resolution: (128, 72) +[2025-01-10 05:52:07,444][08259] Decorrelating experience for 0 frames... +[2025-01-10 05:52:07,443][08260] Decorrelating experience for 0 frames... +[2025-01-10 05:52:07,824][08255] Decorrelating experience for 0 frames... +[2025-01-10 05:52:07,828][08258] Decorrelating experience for 0 frames... +[2025-01-10 05:52:07,834][08257] Decorrelating experience for 0 frames... +[2025-01-10 05:52:07,836][08261] Decorrelating experience for 0 frames... +[2025-01-10 05:52:08,927][08261] Decorrelating experience for 32 frames... +[2025-01-10 05:52:08,928][08258] Decorrelating experience for 32 frames... +[2025-01-10 05:52:08,934][08257] Decorrelating experience for 32 frames... +[2025-01-10 05:52:08,959][08256] Decorrelating experience for 0 frames... +[2025-01-10 05:52:08,962][08262] Decorrelating experience for 0 frames... +[2025-01-10 05:52:08,982][08260] Decorrelating experience for 32 frames... +[2025-01-10 05:52:09,421][00763] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2025-01-10 05:52:09,910][08255] Decorrelating experience for 32 frames... +[2025-01-10 05:52:10,094][08258] Decorrelating experience for 64 frames... +[2025-01-10 05:52:10,466][08256] Decorrelating experience for 32 frames... +[2025-01-10 05:52:10,473][08262] Decorrelating experience for 32 frames... +[2025-01-10 05:52:10,483][08259] Decorrelating experience for 32 frames... +[2025-01-10 05:52:10,921][08260] Decorrelating experience for 64 frames... +[2025-01-10 05:52:11,192][08255] Decorrelating experience for 64 frames... +[2025-01-10 05:52:11,300][08258] Decorrelating experience for 96 frames... +[2025-01-10 05:52:11,780][08257] Decorrelating experience for 64 frames... +[2025-01-10 05:52:11,781][08261] Decorrelating experience for 64 frames... +[2025-01-10 05:52:12,070][08262] Decorrelating experience for 64 frames... +[2025-01-10 05:52:12,072][08259] Decorrelating experience for 64 frames... +[2025-01-10 05:52:12,789][08256] Decorrelating experience for 64 frames... +[2025-01-10 05:52:13,008][08257] Decorrelating experience for 96 frames... +[2025-01-10 05:52:13,009][08261] Decorrelating experience for 96 frames... +[2025-01-10 05:52:13,260][08255] Decorrelating experience for 96 frames... +[2025-01-10 05:52:13,530][08259] Decorrelating experience for 96 frames... +[2025-01-10 05:52:14,253][08256] Decorrelating experience for 96 frames... +[2025-01-10 05:52:14,421][00763] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 4.0. Samples: 20. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2025-01-10 05:52:14,425][00763] Avg episode reward: [(0, '0.853')] +[2025-01-10 05:52:14,656][08262] Decorrelating experience for 96 frames... +[2025-01-10 05:52:17,297][08241] Signal inference workers to stop experience collection... +[2025-01-10 05:52:17,305][08254] InferenceWorker_p0-w0: stopping experience collection +[2025-01-10 05:52:17,540][08260] Decorrelating experience for 96 frames... +[2025-01-10 05:52:19,421][00763] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 261.4. Samples: 2614. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2025-01-10 05:52:19,423][00763] Avg episode reward: [(0, '2.505')] +[2025-01-10 05:52:21,051][08241] Signal inference workers to resume experience collection... +[2025-01-10 05:52:21,051][08254] InferenceWorker_p0-w0: resuming experience collection +[2025-01-10 05:52:24,421][00763] Fps is (10 sec: 2048.0, 60 sec: 1365.3, 300 sec: 1365.3). Total num frames: 20480. Throughput: 0: 393.1. Samples: 5896. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) +[2025-01-10 05:52:24,424][00763] Avg episode reward: [(0, '3.613')] +[2025-01-10 05:52:28,196][08254] Updated weights for policy 0, policy_version 10 (0.0024) +[2025-01-10 05:52:29,421][00763] Fps is (10 sec: 4505.6, 60 sec: 2252.8, 300 sec: 2252.8). Total num frames: 45056. Throughput: 0: 478.7. Samples: 9574. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2025-01-10 05:52:29,424][00763] Avg episode reward: [(0, '4.418')] +[2025-01-10 05:52:34,421][00763] Fps is (10 sec: 4096.0, 60 sec: 2457.6, 300 sec: 2457.6). Total num frames: 61440. Throughput: 0: 632.6. Samples: 15814. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2025-01-10 05:52:34,424][00763] Avg episode reward: [(0, '4.534')] +[2025-01-10 05:52:39,191][08254] Updated weights for policy 0, policy_version 20 (0.0031) +[2025-01-10 05:52:39,421][00763] Fps is (10 sec: 3686.4, 60 sec: 2730.7, 300 sec: 2730.7). Total num frames: 81920. Throughput: 0: 704.2. Samples: 21126. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2025-01-10 05:52:39,423][00763] Avg episode reward: [(0, '4.344')] +[2025-01-10 05:52:44,422][00763] Fps is (10 sec: 4505.5, 60 sec: 3042.7, 300 sec: 3042.7). Total num frames: 106496. Throughput: 0: 704.9. Samples: 24672. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2025-01-10 05:52:44,427][00763] Avg episode reward: [(0, '4.279')] +[2025-01-10 05:52:44,431][08241] Saving new best policy, reward=4.279! +[2025-01-10 05:52:47,946][08254] Updated weights for policy 0, policy_version 30 (0.0022) +[2025-01-10 05:52:49,432][00763] Fps is (10 sec: 4501.0, 60 sec: 3173.6, 300 sec: 3173.6). Total num frames: 126976. Throughput: 0: 785.9. Samples: 31442. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-01-10 05:52:49,436][00763] Avg episode reward: [(0, '4.402')] +[2025-01-10 05:52:49,449][08241] Saving new best policy, reward=4.402! +[2025-01-10 05:52:54,421][00763] Fps is (10 sec: 3276.9, 60 sec: 3094.8, 300 sec: 3094.8). Total num frames: 139264. Throughput: 0: 794.2. Samples: 35740. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2025-01-10 05:52:54,427][00763] Avg episode reward: [(0, '4.382')] +[2025-01-10 05:52:58,820][08254] Updated weights for policy 0, policy_version 40 (0.0027) +[2025-01-10 05:52:59,421][00763] Fps is (10 sec: 3690.1, 60 sec: 3276.8, 300 sec: 3276.8). Total num frames: 163840. Throughput: 0: 874.1. Samples: 39354. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2025-01-10 05:52:59,429][00763] Avg episode reward: [(0, '4.372')] +[2025-01-10 05:53:04,421][00763] Fps is (10 sec: 4915.2, 60 sec: 3425.7, 300 sec: 3425.7). Total num frames: 188416. Throughput: 0: 975.8. Samples: 46526. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2025-01-10 05:53:04,424][00763] Avg episode reward: [(0, '4.373')] +[2025-01-10 05:53:09,424][00763] Fps is (10 sec: 3685.3, 60 sec: 3344.9, 300 sec: 3344.9). Total num frames: 200704. Throughput: 0: 1000.2. Samples: 50910. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-01-10 05:53:09,428][00763] Avg episode reward: [(0, '4.458')] +[2025-01-10 05:53:09,438][08241] Saving new best policy, reward=4.458! +[2025-01-10 05:53:10,679][08254] Updated weights for policy 0, policy_version 50 (0.0020) +[2025-01-10 05:53:14,421][00763] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3402.8). Total num frames: 221184. Throughput: 0: 977.9. Samples: 53580. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2025-01-10 05:53:14,424][00763] Avg episode reward: [(0, '4.189')] +[2025-01-10 05:53:19,421][00763] Fps is (10 sec: 4097.2, 60 sec: 4027.7, 300 sec: 3452.3). Total num frames: 241664. Throughput: 0: 988.7. Samples: 60304. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2025-01-10 05:53:19,429][00763] Avg episode reward: [(0, '4.354')] +[2025-01-10 05:53:19,662][08254] Updated weights for policy 0, policy_version 60 (0.0020) +[2025-01-10 05:53:24,421][00763] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3440.6). Total num frames: 258048. Throughput: 0: 983.9. Samples: 65402. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-01-10 05:53:24,424][00763] Avg episode reward: [(0, '4.491')] +[2025-01-10 05:53:24,433][08241] Saving new best policy, reward=4.491! +[2025-01-10 05:53:29,421][00763] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3430.4). Total num frames: 274432. Throughput: 0: 949.1. Samples: 67382. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-01-10 05:53:29,423][00763] Avg episode reward: [(0, '4.648')] +[2025-01-10 05:53:29,432][08241] Saving new best policy, reward=4.648! +[2025-01-10 05:53:31,718][08254] Updated weights for policy 0, policy_version 70 (0.0020) +[2025-01-10 05:53:34,421][00763] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3469.6). Total num frames: 294912. Throughput: 0: 941.1. Samples: 73784. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2025-01-10 05:53:34,424][00763] Avg episode reward: [(0, '4.484')] +[2025-01-10 05:53:39,422][00763] Fps is (10 sec: 4095.9, 60 sec: 3891.2, 300 sec: 3504.3). Total num frames: 315392. Throughput: 0: 976.6. Samples: 79686. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-01-10 05:53:39,427][00763] Avg episode reward: [(0, '4.213')] +[2025-01-10 05:53:39,436][08241] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000077_315392.pth... +[2025-01-10 05:53:43,017][08254] Updated weights for policy 0, policy_version 80 (0.0018) +[2025-01-10 05:53:44,422][00763] Fps is (10 sec: 3686.3, 60 sec: 3754.7, 300 sec: 3492.4). Total num frames: 331776. Throughput: 0: 942.7. Samples: 81778. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-01-10 05:53:44,426][00763] Avg episode reward: [(0, '4.341')] +[2025-01-10 05:53:49,421][00763] Fps is (10 sec: 3686.5, 60 sec: 3755.3, 300 sec: 3522.6). Total num frames: 352256. Throughput: 0: 921.6. Samples: 87996. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2025-01-10 05:53:49,426][00763] Avg episode reward: [(0, '4.521')] +[2025-01-10 05:53:52,174][08254] Updated weights for policy 0, policy_version 90 (0.0017) +[2025-01-10 05:53:54,421][00763] Fps is (10 sec: 4505.7, 60 sec: 3959.5, 300 sec: 3588.9). Total num frames: 376832. Throughput: 0: 979.6. Samples: 94988. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-01-10 05:53:54,429][00763] Avg episode reward: [(0, '4.539')] +[2025-01-10 05:53:59,423][00763] Fps is (10 sec: 3685.6, 60 sec: 3754.5, 300 sec: 3537.4). Total num frames: 389120. Throughput: 0: 966.9. Samples: 97092. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2025-01-10 05:53:59,426][00763] Avg episode reward: [(0, '4.519')] +[2025-01-10 05:54:03,808][08254] Updated weights for policy 0, policy_version 100 (0.0031) +[2025-01-10 05:54:04,421][00763] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3561.7). Total num frames: 409600. Throughput: 0: 933.6. Samples: 102314. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2025-01-10 05:54:04,429][00763] Avg episode reward: [(0, '4.516')] +[2025-01-10 05:54:09,421][00763] Fps is (10 sec: 4506.5, 60 sec: 3891.4, 300 sec: 3618.1). Total num frames: 434176. Throughput: 0: 975.8. Samples: 109312. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-01-10 05:54:09,429][00763] Avg episode reward: [(0, '4.671')] +[2025-01-10 05:54:09,437][08241] Saving new best policy, reward=4.671! +[2025-01-10 05:54:14,256][08254] Updated weights for policy 0, policy_version 110 (0.0024) +[2025-01-10 05:54:14,421][00763] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3604.5). Total num frames: 450560. Throughput: 0: 993.2. Samples: 112078. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2025-01-10 05:54:14,427][00763] Avg episode reward: [(0, '4.510')] +[2025-01-10 05:54:19,422][00763] Fps is (10 sec: 2457.5, 60 sec: 3618.1, 300 sec: 3528.8). Total num frames: 458752. Throughput: 0: 927.1. Samples: 115504. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-01-10 05:54:19,430][00763] Avg episode reward: [(0, '4.500')] +[2025-01-10 05:54:24,421][00763] Fps is (10 sec: 2867.2, 60 sec: 3686.4, 300 sec: 3549.9). Total num frames: 479232. Throughput: 0: 907.1. Samples: 120506. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-01-10 05:54:24,424][00763] Avg episode reward: [(0, '4.551')] +[2025-01-10 05:54:26,801][08254] Updated weights for policy 0, policy_version 120 (0.0019) +[2025-01-10 05:54:29,422][00763] Fps is (10 sec: 4096.2, 60 sec: 3754.7, 300 sec: 3569.4). Total num frames: 499712. Throughput: 0: 937.8. Samples: 123980. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-01-10 05:54:29,429][00763] Avg episode reward: [(0, '4.713')] +[2025-01-10 05:54:29,438][08241] Saving new best policy, reward=4.713! +[2025-01-10 05:54:34,421][00763] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3559.3). Total num frames: 516096. Throughput: 0: 910.0. Samples: 128946. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-01-10 05:54:34,426][00763] Avg episode reward: [(0, '4.829')] +[2025-01-10 05:54:34,430][08241] Saving new best policy, reward=4.829! +[2025-01-10 05:54:38,366][08254] Updated weights for policy 0, policy_version 130 (0.0028) +[2025-01-10 05:54:39,421][00763] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3577.2). Total num frames: 536576. Throughput: 0: 881.9. Samples: 134674. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-01-10 05:54:39,424][00763] Avg episode reward: [(0, '4.829')] +[2025-01-10 05:54:44,421][00763] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3593.9). Total num frames: 557056. Throughput: 0: 910.3. Samples: 138052. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-01-10 05:54:44,426][00763] Avg episode reward: [(0, '4.436')] +[2025-01-10 05:54:48,445][08254] Updated weights for policy 0, policy_version 140 (0.0017) +[2025-01-10 05:54:49,423][00763] Fps is (10 sec: 3686.0, 60 sec: 3686.3, 300 sec: 3584.0). Total num frames: 573440. Throughput: 0: 919.2. Samples: 143680. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-01-10 05:54:49,429][00763] Avg episode reward: [(0, '4.605')] +[2025-01-10 05:54:54,421][00763] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3574.7). Total num frames: 589824. Throughput: 0: 875.5. Samples: 148708. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2025-01-10 05:54:54,424][00763] Avg episode reward: [(0, '4.961')] +[2025-01-10 05:54:54,458][08241] Saving new best policy, reward=4.961! +[2025-01-10 05:54:58,952][08254] Updated weights for policy 0, policy_version 150 (0.0020) +[2025-01-10 05:54:59,421][00763] Fps is (10 sec: 4096.5, 60 sec: 3754.8, 300 sec: 3614.1). Total num frames: 614400. Throughput: 0: 889.1. Samples: 152088. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-01-10 05:54:59,428][00763] Avg episode reward: [(0, '4.854')] +[2025-01-10 05:55:04,429][00763] Fps is (10 sec: 4502.3, 60 sec: 3754.2, 300 sec: 3627.7). Total num frames: 634880. Throughput: 0: 962.2. Samples: 158808. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-01-10 05:55:04,432][00763] Avg episode reward: [(0, '5.042')] +[2025-01-10 05:55:04,444][08241] Saving new best policy, reward=5.042! +[2025-01-10 05:55:09,421][00763] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3618.1). Total num frames: 651264. Throughput: 0: 958.8. Samples: 163654. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2025-01-10 05:55:09,424][00763] Avg episode reward: [(0, '4.940')] +[2025-01-10 05:55:09,782][08254] Updated weights for policy 0, policy_version 160 (0.0015) +[2025-01-10 05:55:14,421][00763] Fps is (10 sec: 4099.0, 60 sec: 3754.7, 300 sec: 3653.2). Total num frames: 675840. Throughput: 0: 965.6. Samples: 167434. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2025-01-10 05:55:14,426][00763] Avg episode reward: [(0, '4.634')] +[2025-01-10 05:55:17,976][08254] Updated weights for policy 0, policy_version 170 (0.0019) +[2025-01-10 05:55:19,421][00763] Fps is (10 sec: 4915.2, 60 sec: 4027.8, 300 sec: 3686.4). Total num frames: 700416. Throughput: 0: 1023.4. Samples: 174998. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-01-10 05:55:19,424][00763] Avg episode reward: [(0, '4.891')] +[2025-01-10 05:55:24,425][00763] Fps is (10 sec: 4094.5, 60 sec: 3959.2, 300 sec: 3675.8). Total num frames: 716800. Throughput: 0: 1000.1. Samples: 179684. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-01-10 05:55:24,429][00763] Avg episode reward: [(0, '4.975')] +[2025-01-10 05:55:28,841][08254] Updated weights for policy 0, policy_version 180 (0.0033) +[2025-01-10 05:55:29,421][00763] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3686.4). Total num frames: 737280. Throughput: 0: 999.3. Samples: 183020. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-01-10 05:55:29,424][00763] Avg episode reward: [(0, '5.144')] +[2025-01-10 05:55:29,435][08241] Saving new best policy, reward=5.144! +[2025-01-10 05:55:34,421][00763] Fps is (10 sec: 4507.2, 60 sec: 4096.0, 300 sec: 3716.4). Total num frames: 761856. Throughput: 0: 1037.7. Samples: 190374. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2025-01-10 05:55:34,425][00763] Avg episode reward: [(0, '5.050')] +[2025-01-10 05:55:38,524][08254] Updated weights for policy 0, policy_version 190 (0.0036) +[2025-01-10 05:55:39,427][00763] Fps is (10 sec: 4093.7, 60 sec: 4027.4, 300 sec: 3705.8). Total num frames: 778240. Throughput: 0: 1043.6. Samples: 195676. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-01-10 05:55:39,431][00763] Avg episode reward: [(0, '5.300')] +[2025-01-10 05:55:39,441][08241] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000190_778240.pth... +[2025-01-10 05:55:39,575][08241] Saving new best policy, reward=5.300! +[2025-01-10 05:55:44,421][00763] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3715.0). Total num frames: 798720. Throughput: 0: 1020.7. Samples: 198020. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-01-10 05:55:44,427][00763] Avg episode reward: [(0, '5.308')] +[2025-01-10 05:55:44,431][08241] Saving new best policy, reward=5.308! +[2025-01-10 05:55:48,592][08254] Updated weights for policy 0, policy_version 200 (0.0021) +[2025-01-10 05:55:49,421][00763] Fps is (10 sec: 4508.1, 60 sec: 4164.4, 300 sec: 3742.3). Total num frames: 823296. Throughput: 0: 1026.8. Samples: 205008. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2025-01-10 05:55:49,424][00763] Avg episode reward: [(0, '5.365')] +[2025-01-10 05:55:49,437][08241] Saving new best policy, reward=5.365! +[2025-01-10 05:55:54,426][00763] Fps is (10 sec: 4094.2, 60 sec: 4164.0, 300 sec: 3731.8). Total num frames: 839680. Throughput: 0: 1043.8. Samples: 210630. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2025-01-10 05:55:54,428][00763] Avg episode reward: [(0, '5.074')] +[2025-01-10 05:55:59,421][00763] Fps is (10 sec: 3276.8, 60 sec: 4027.7, 300 sec: 3722.0). Total num frames: 856064. Throughput: 0: 1008.0. Samples: 212792. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2025-01-10 05:55:59,429][00763] Avg episode reward: [(0, '4.863')] +[2025-01-10 05:56:00,145][08254] Updated weights for policy 0, policy_version 210 (0.0019) +[2025-01-10 05:56:04,422][00763] Fps is (10 sec: 3688.0, 60 sec: 4028.2, 300 sec: 3730.0). Total num frames: 876544. Throughput: 0: 982.1. Samples: 219192. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-01-10 05:56:04,425][00763] Avg episode reward: [(0, '4.858')] +[2025-01-10 05:56:09,221][08254] Updated weights for policy 0, policy_version 220 (0.0018) +[2025-01-10 05:56:09,421][00763] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 3754.7). Total num frames: 901120. Throughput: 0: 1026.9. Samples: 225890. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-01-10 05:56:09,424][00763] Avg episode reward: [(0, '5.011')] +[2025-01-10 05:56:14,423][00763] Fps is (10 sec: 3685.8, 60 sec: 3959.3, 300 sec: 3728.2). Total num frames: 913408. Throughput: 0: 999.3. Samples: 227990. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-01-10 05:56:14,426][00763] Avg episode reward: [(0, '4.824')] +[2025-01-10 05:56:19,421][00763] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3735.6). Total num frames: 933888. Throughput: 0: 961.5. Samples: 233642. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-01-10 05:56:19,424][00763] Avg episode reward: [(0, '4.800')] +[2025-01-10 05:56:20,326][08254] Updated weights for policy 0, policy_version 230 (0.0022) +[2025-01-10 05:56:24,421][00763] Fps is (10 sec: 4506.5, 60 sec: 4028.0, 300 sec: 3758.7). Total num frames: 958464. Throughput: 0: 991.2. Samples: 240276. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-01-10 05:56:24,424][00763] Avg episode reward: [(0, '4.965')] +[2025-01-10 05:56:29,427][00763] Fps is (10 sec: 3684.5, 60 sec: 3890.9, 300 sec: 3733.6). Total num frames: 970752. Throughput: 0: 988.7. Samples: 242518. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2025-01-10 05:56:29,430][00763] Avg episode reward: [(0, '5.096')] +[2025-01-10 05:56:32,194][08254] Updated weights for policy 0, policy_version 240 (0.0016) +[2025-01-10 05:56:34,421][00763] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3740.5). Total num frames: 991232. Throughput: 0: 944.0. Samples: 247490. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2025-01-10 05:56:34,426][00763] Avg episode reward: [(0, '5.091')] +[2025-01-10 05:56:39,421][00763] Fps is (10 sec: 4507.9, 60 sec: 3959.8, 300 sec: 3762.3). Total num frames: 1015808. Throughput: 0: 970.1. Samples: 254282. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-01-10 05:56:39,424][00763] Avg episode reward: [(0, '5.398')] +[2025-01-10 05:56:39,437][08241] Saving new best policy, reward=5.398! +[2025-01-10 05:56:41,281][08254] Updated weights for policy 0, policy_version 250 (0.0031) +[2025-01-10 05:56:44,421][00763] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3753.4). Total num frames: 1032192. Throughput: 0: 986.0. Samples: 257164. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2025-01-10 05:56:44,427][00763] Avg episode reward: [(0, '5.605')] +[2025-01-10 05:56:44,431][08241] Saving new best policy, reward=5.605! +[2025-01-10 05:56:49,421][00763] Fps is (10 sec: 2867.2, 60 sec: 3686.4, 300 sec: 3730.3). Total num frames: 1044480. Throughput: 0: 935.7. Samples: 261300. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2025-01-10 05:56:49,424][00763] Avg episode reward: [(0, '5.424')] +[2025-01-10 05:56:53,027][08254] Updated weights for policy 0, policy_version 260 (0.0025) +[2025-01-10 05:56:54,421][00763] Fps is (10 sec: 3686.4, 60 sec: 3823.2, 300 sec: 3751.1). Total num frames: 1069056. Throughput: 0: 942.8. Samples: 268316. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2025-01-10 05:56:54,428][00763] Avg episode reward: [(0, '5.347')] +[2025-01-10 05:56:59,422][00763] Fps is (10 sec: 4505.5, 60 sec: 3891.2, 300 sec: 3757.0). Total num frames: 1089536. Throughput: 0: 970.5. Samples: 271660. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2025-01-10 05:56:59,426][00763] Avg episode reward: [(0, '5.444')] +[2025-01-10 05:57:04,421][00763] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3735.0). Total num frames: 1101824. Throughput: 0: 945.2. Samples: 276178. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2025-01-10 05:57:04,423][00763] Avg episode reward: [(0, '5.492')] +[2025-01-10 05:57:04,464][08254] Updated weights for policy 0, policy_version 270 (0.0026) +[2025-01-10 05:57:09,421][00763] Fps is (10 sec: 3686.5, 60 sec: 3754.7, 300 sec: 3818.3). Total num frames: 1126400. Throughput: 0: 936.4. Samples: 282412. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2025-01-10 05:57:09,426][00763] Avg episode reward: [(0, '5.460')] +[2025-01-10 05:57:13,471][08254] Updated weights for policy 0, policy_version 280 (0.0040) +[2025-01-10 05:57:14,421][00763] Fps is (10 sec: 4915.2, 60 sec: 3959.6, 300 sec: 3901.6). Total num frames: 1150976. Throughput: 0: 963.8. Samples: 285886. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-01-10 05:57:14,425][00763] Avg episode reward: [(0, '5.518')] +[2025-01-10 05:57:19,421][00763] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3873.8). Total num frames: 1163264. Throughput: 0: 968.4. Samples: 291068. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-01-10 05:57:19,425][00763] Avg episode reward: [(0, '5.621')] +[2025-01-10 05:57:19,439][08241] Saving new best policy, reward=5.621! +[2025-01-10 05:57:24,421][00763] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3860.0). Total num frames: 1183744. Throughput: 0: 938.8. Samples: 296530. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-01-10 05:57:24,424][00763] Avg episode reward: [(0, '5.330')] +[2025-01-10 05:57:25,115][08254] Updated weights for policy 0, policy_version 290 (0.0013) +[2025-01-10 05:57:29,421][00763] Fps is (10 sec: 4096.0, 60 sec: 3891.5, 300 sec: 3873.8). Total num frames: 1204224. Throughput: 0: 947.1. Samples: 299784. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2025-01-10 05:57:29,424][00763] Avg episode reward: [(0, '5.558')] +[2025-01-10 05:57:34,424][00763] Fps is (10 sec: 3685.6, 60 sec: 3822.8, 300 sec: 3859.9). Total num frames: 1220608. Throughput: 0: 984.0. Samples: 305580. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2025-01-10 05:57:34,426][00763] Avg episode reward: [(0, '5.695')] +[2025-01-10 05:57:34,434][08241] Saving new best policy, reward=5.695! +[2025-01-10 05:57:36,381][08254] Updated weights for policy 0, policy_version 300 (0.0026) +[2025-01-10 05:57:39,421][00763] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3832.2). Total num frames: 1236992. Throughput: 0: 932.2. Samples: 310266. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-01-10 05:57:39,425][00763] Avg episode reward: [(0, '5.851')] +[2025-01-10 05:57:39,435][08241] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000302_1236992.pth... +[2025-01-10 05:57:39,561][08241] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000077_315392.pth +[2025-01-10 05:57:39,578][08241] Saving new best policy, reward=5.851! +[2025-01-10 05:57:44,421][00763] Fps is (10 sec: 3687.2, 60 sec: 3754.7, 300 sec: 3832.3). Total num frames: 1257472. Throughput: 0: 927.4. Samples: 313394. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-01-10 05:57:44,426][00763] Avg episode reward: [(0, '5.832')] +[2025-01-10 05:57:46,366][08254] Updated weights for policy 0, policy_version 310 (0.0020) +[2025-01-10 05:57:49,421][00763] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 1277952. Throughput: 0: 974.2. Samples: 320018. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-01-10 05:57:49,431][00763] Avg episode reward: [(0, '6.220')] +[2025-01-10 05:57:49,449][08241] Saving new best policy, reward=6.220! +[2025-01-10 05:57:54,425][00763] Fps is (10 sec: 3685.1, 60 sec: 3754.4, 300 sec: 3832.1). Total num frames: 1294336. Throughput: 0: 928.0. Samples: 324174. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-01-10 05:57:54,433][00763] Avg episode reward: [(0, '6.326')] +[2025-01-10 05:57:54,436][08241] Saving new best policy, reward=6.326! +[2025-01-10 05:57:58,182][08254] Updated weights for policy 0, policy_version 320 (0.0032) +[2025-01-10 05:57:59,424][00763] Fps is (10 sec: 3685.3, 60 sec: 3754.5, 300 sec: 3818.3). Total num frames: 1314816. Throughput: 0: 917.7. Samples: 327184. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2025-01-10 05:57:59,427][00763] Avg episode reward: [(0, '6.578')] +[2025-01-10 05:57:59,438][08241] Saving new best policy, reward=6.578! +[2025-01-10 05:58:04,421][00763] Fps is (10 sec: 3278.0, 60 sec: 3754.7, 300 sec: 3818.3). Total num frames: 1327104. Throughput: 0: 910.0. Samples: 332020. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-01-10 05:58:04,427][00763] Avg episode reward: [(0, '6.254')] +[2025-01-10 05:58:09,426][00763] Fps is (10 sec: 2457.1, 60 sec: 3549.6, 300 sec: 3790.5). Total num frames: 1339392. Throughput: 0: 874.7. Samples: 335896. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2025-01-10 05:58:09,431][00763] Avg episode reward: [(0, '6.222')] +[2025-01-10 05:58:12,638][08254] Updated weights for policy 0, policy_version 330 (0.0033) +[2025-01-10 05:58:14,421][00763] Fps is (10 sec: 2867.2, 60 sec: 3413.3, 300 sec: 3776.7). Total num frames: 1355776. Throughput: 0: 852.2. Samples: 338132. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2025-01-10 05:58:14,425][00763] Avg episode reward: [(0, '6.519')] +[2025-01-10 05:58:19,421][00763] Fps is (10 sec: 4098.1, 60 sec: 3618.1, 300 sec: 3804.4). Total num frames: 1380352. Throughput: 0: 869.9. Samples: 344722. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-01-10 05:58:19,423][00763] Avg episode reward: [(0, '6.712')] +[2025-01-10 05:58:19,434][08241] Saving new best policy, reward=6.712! +[2025-01-10 05:58:22,026][08254] Updated weights for policy 0, policy_version 340 (0.0037) +[2025-01-10 05:58:24,421][00763] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3804.4). Total num frames: 1396736. Throughput: 0: 886.7. Samples: 350166. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2025-01-10 05:58:24,430][00763] Avg episode reward: [(0, '6.427')] +[2025-01-10 05:58:29,421][00763] Fps is (10 sec: 2867.2, 60 sec: 3413.3, 300 sec: 3776.7). Total num frames: 1409024. Throughput: 0: 858.3. Samples: 352018. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-01-10 05:58:29,427][00763] Avg episode reward: [(0, '6.713')] +[2025-01-10 05:58:29,449][08241] Saving new best policy, reward=6.713! +[2025-01-10 05:58:34,089][08254] Updated weights for policy 0, policy_version 350 (0.0015) +[2025-01-10 05:58:34,421][00763] Fps is (10 sec: 3686.4, 60 sec: 3550.0, 300 sec: 3790.5). Total num frames: 1433600. Throughput: 0: 845.6. Samples: 358072. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-01-10 05:58:34,427][00763] Avg episode reward: [(0, '6.438')] +[2025-01-10 05:58:39,423][00763] Fps is (10 sec: 4504.9, 60 sec: 3618.0, 300 sec: 3804.4). Total num frames: 1454080. Throughput: 0: 897.1. Samples: 364542. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-01-10 05:58:39,426][00763] Avg episode reward: [(0, '5.957')] +[2025-01-10 05:58:44,421][00763] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3776.7). Total num frames: 1466368. Throughput: 0: 874.4. Samples: 366530. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2025-01-10 05:58:44,426][00763] Avg episode reward: [(0, '6.075')] +[2025-01-10 05:58:45,756][08254] Updated weights for policy 0, policy_version 360 (0.0025) +[2025-01-10 05:58:49,424][00763] Fps is (10 sec: 3685.9, 60 sec: 3549.7, 300 sec: 3776.6). Total num frames: 1490944. Throughput: 0: 890.9. Samples: 372114. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-01-10 05:58:49,427][00763] Avg episode reward: [(0, '6.133')] +[2025-01-10 05:58:54,421][00763] Fps is (10 sec: 4505.6, 60 sec: 3618.4, 300 sec: 3804.4). Total num frames: 1511424. Throughput: 0: 957.7. Samples: 378988. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2025-01-10 05:58:54,425][00763] Avg episode reward: [(0, '5.936')] +[2025-01-10 05:58:54,670][08254] Updated weights for policy 0, policy_version 370 (0.0022) +[2025-01-10 05:58:59,421][00763] Fps is (10 sec: 3687.5, 60 sec: 3550.0, 300 sec: 3790.5). Total num frames: 1527808. Throughput: 0: 959.4. Samples: 381304. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2025-01-10 05:58:59,425][00763] Avg episode reward: [(0, '6.200')] +[2025-01-10 05:59:04,422][00763] Fps is (10 sec: 3686.3, 60 sec: 3686.4, 300 sec: 3776.6). Total num frames: 1548288. Throughput: 0: 925.0. Samples: 386346. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2025-01-10 05:59:04,427][00763] Avg episode reward: [(0, '6.592')] +[2025-01-10 05:59:05,903][08254] Updated weights for policy 0, policy_version 380 (0.0027) +[2025-01-10 05:59:09,421][00763] Fps is (10 sec: 4505.6, 60 sec: 3891.5, 300 sec: 3804.4). Total num frames: 1572864. Throughput: 0: 964.8. Samples: 393580. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-01-10 05:59:09,424][00763] Avg episode reward: [(0, '6.905')] +[2025-01-10 05:59:09,435][08241] Saving new best policy, reward=6.905! +[2025-01-10 05:59:14,421][00763] Fps is (10 sec: 4096.1, 60 sec: 3891.2, 300 sec: 3832.2). Total num frames: 1589248. Throughput: 0: 997.0. Samples: 396884. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2025-01-10 05:59:14,428][00763] Avg episode reward: [(0, '6.871')] +[2025-01-10 05:59:16,755][08254] Updated weights for policy 0, policy_version 390 (0.0019) +[2025-01-10 05:59:19,422][00763] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3818.3). Total num frames: 1605632. Throughput: 0: 950.0. Samples: 400824. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2025-01-10 05:59:19,428][00763] Avg episode reward: [(0, '6.933')] +[2025-01-10 05:59:19,438][08241] Saving new best policy, reward=6.933! +[2025-01-10 05:59:24,421][00763] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3818.3). Total num frames: 1626112. Throughput: 0: 957.4. Samples: 407622. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2025-01-10 05:59:24,424][00763] Avg episode reward: [(0, '7.013')] +[2025-01-10 05:59:24,426][08241] Saving new best policy, reward=7.013! +[2025-01-10 05:59:26,661][08254] Updated weights for policy 0, policy_version 400 (0.0019) +[2025-01-10 05:59:29,421][00763] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3832.2). Total num frames: 1646592. Throughput: 0: 981.6. Samples: 410702. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2025-01-10 05:59:29,429][00763] Avg episode reward: [(0, '7.196')] +[2025-01-10 05:59:29,449][08241] Saving new best policy, reward=7.196! +[2025-01-10 05:59:34,421][00763] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3804.4). Total num frames: 1658880. Throughput: 0: 952.2. Samples: 414958. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-01-10 05:59:34,431][00763] Avg episode reward: [(0, '7.361')] +[2025-01-10 05:59:34,433][08241] Saving new best policy, reward=7.361! +[2025-01-10 05:59:38,767][08254] Updated weights for policy 0, policy_version 410 (0.0014) +[2025-01-10 05:59:39,421][00763] Fps is (10 sec: 3276.8, 60 sec: 3754.8, 300 sec: 3804.4). Total num frames: 1679360. Throughput: 0: 930.6. Samples: 420866. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2025-01-10 05:59:39,428][00763] Avg episode reward: [(0, '7.414')] +[2025-01-10 05:59:39,438][08241] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000410_1679360.pth... +[2025-01-10 05:59:39,583][08241] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000190_778240.pth +[2025-01-10 05:59:39,602][08241] Saving new best policy, reward=7.414! +[2025-01-10 05:59:44,421][00763] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3832.2). Total num frames: 1703936. Throughput: 0: 951.3. Samples: 424112. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2025-01-10 05:59:44,424][00763] Avg episode reward: [(0, '7.796')] +[2025-01-10 05:59:44,427][08241] Saving new best policy, reward=7.796! +[2025-01-10 05:59:49,421][00763] Fps is (10 sec: 3686.4, 60 sec: 3754.9, 300 sec: 3818.3). Total num frames: 1716224. Throughput: 0: 953.0. Samples: 429232. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2025-01-10 05:59:49,424][00763] Avg episode reward: [(0, '7.829')] +[2025-01-10 05:59:49,437][08241] Saving new best policy, reward=7.829! +[2025-01-10 05:59:50,338][08254] Updated weights for policy 0, policy_version 420 (0.0024) +[2025-01-10 05:59:54,421][00763] Fps is (10 sec: 2867.2, 60 sec: 3686.4, 300 sec: 3790.5). Total num frames: 1732608. Throughput: 0: 907.1. Samples: 434400. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2025-01-10 05:59:54,425][00763] Avg episode reward: [(0, '8.059')] +[2025-01-10 05:59:54,480][08241] Saving new best policy, reward=8.059! +[2025-01-10 05:59:59,421][00763] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3804.5). Total num frames: 1757184. Throughput: 0: 901.2. Samples: 437436. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-01-10 05:59:59,428][00763] Avg episode reward: [(0, '8.652')] +[2025-01-10 05:59:59,438][08241] Saving new best policy, reward=8.652! +[2025-01-10 06:00:00,405][08254] Updated weights for policy 0, policy_version 430 (0.0020) +[2025-01-10 06:00:04,421][00763] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3804.4). Total num frames: 1773568. Throughput: 0: 940.7. Samples: 443156. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-01-10 06:00:04,424][00763] Avg episode reward: [(0, '9.189')] +[2025-01-10 06:00:04,426][08241] Saving new best policy, reward=9.189! +[2025-01-10 06:00:09,421][00763] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3762.8). Total num frames: 1785856. Throughput: 0: 890.4. Samples: 447688. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-01-10 06:00:09,427][00763] Avg episode reward: [(0, '9.536')] +[2025-01-10 06:00:09,464][08241] Saving new best policy, reward=9.536! +[2025-01-10 06:00:12,366][08254] Updated weights for policy 0, policy_version 440 (0.0015) +[2025-01-10 06:00:14,421][00763] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3762.8). Total num frames: 1810432. Throughput: 0: 892.6. Samples: 450870. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2025-01-10 06:00:14,427][00763] Avg episode reward: [(0, '9.500')] +[2025-01-10 06:00:19,422][00763] Fps is (10 sec: 4095.8, 60 sec: 3686.4, 300 sec: 3762.8). Total num frames: 1826816. Throughput: 0: 938.8. Samples: 457204. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2025-01-10 06:00:19,425][00763] Avg episode reward: [(0, '9.830')] +[2025-01-10 06:00:19,437][08241] Saving new best policy, reward=9.830! +[2025-01-10 06:00:24,188][08254] Updated weights for policy 0, policy_version 450 (0.0023) +[2025-01-10 06:00:24,421][00763] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3748.9). Total num frames: 1843200. Throughput: 0: 897.0. Samples: 461232. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2025-01-10 06:00:24,424][00763] Avg episode reward: [(0, '9.739')] +[2025-01-10 06:00:29,421][00763] Fps is (10 sec: 3686.6, 60 sec: 3618.1, 300 sec: 3735.0). Total num frames: 1863680. Throughput: 0: 896.9. Samples: 464472. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-01-10 06:00:29,427][00763] Avg episode reward: [(0, '9.937')] +[2025-01-10 06:00:29,441][08241] Saving new best policy, reward=9.937! +[2025-01-10 06:00:33,605][08254] Updated weights for policy 0, policy_version 460 (0.0033) +[2025-01-10 06:00:34,421][00763] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3749.0). Total num frames: 1884160. Throughput: 0: 934.7. Samples: 471292. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2025-01-10 06:00:34,424][00763] Avg episode reward: [(0, '9.140')] +[2025-01-10 06:00:39,421][00763] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3735.0). Total num frames: 1900544. Throughput: 0: 919.4. Samples: 475774. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-01-10 06:00:39,427][00763] Avg episode reward: [(0, '9.456')] +[2025-01-10 06:00:44,421][00763] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3721.1). Total num frames: 1921024. Throughput: 0: 908.9. Samples: 478338. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2025-01-10 06:00:44,428][00763] Avg episode reward: [(0, '9.984')] +[2025-01-10 06:00:44,433][08241] Saving new best policy, reward=9.984! +[2025-01-10 06:00:45,248][08254] Updated weights for policy 0, policy_version 470 (0.0055) +[2025-01-10 06:00:49,421][00763] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3735.1). Total num frames: 1941504. Throughput: 0: 930.1. Samples: 485010. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-01-10 06:00:49,429][00763] Avg episode reward: [(0, '11.747')] +[2025-01-10 06:00:49,442][08241] Saving new best policy, reward=11.747! +[2025-01-10 06:00:54,422][00763] Fps is (10 sec: 3686.0, 60 sec: 3754.6, 300 sec: 3735.0). Total num frames: 1957888. Throughput: 0: 943.4. Samples: 490142. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2025-01-10 06:00:54,425][00763] Avg episode reward: [(0, '13.431')] +[2025-01-10 06:00:54,426][08241] Saving new best policy, reward=13.431! +[2025-01-10 06:00:57,009][08254] Updated weights for policy 0, policy_version 480 (0.0013) +[2025-01-10 06:00:59,421][00763] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3721.1). Total num frames: 1974272. Throughput: 0: 914.8. Samples: 492038. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2025-01-10 06:00:59,427][00763] Avg episode reward: [(0, '13.469')] +[2025-01-10 06:00:59,437][08241] Saving new best policy, reward=13.469! +[2025-01-10 06:01:04,421][00763] Fps is (10 sec: 3686.8, 60 sec: 3686.4, 300 sec: 3707.2). Total num frames: 1994752. Throughput: 0: 918.3. Samples: 498526. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-01-10 06:01:04,424][00763] Avg episode reward: [(0, '13.165')] +[2025-01-10 06:01:06,568][08254] Updated weights for policy 0, policy_version 490 (0.0027) +[2025-01-10 06:01:09,428][00763] Fps is (10 sec: 4093.4, 60 sec: 3822.5, 300 sec: 3734.9). Total num frames: 2015232. Throughput: 0: 964.3. Samples: 504632. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2025-01-10 06:01:09,430][00763] Avg episode reward: [(0, '11.068')] +[2025-01-10 06:01:14,422][00763] Fps is (10 sec: 3276.7, 60 sec: 3618.1, 300 sec: 3707.2). Total num frames: 2027520. Throughput: 0: 936.8. Samples: 506630. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-01-10 06:01:14,425][00763] Avg episode reward: [(0, '11.454')] +[2025-01-10 06:01:18,408][08254] Updated weights for policy 0, policy_version 500 (0.0023) +[2025-01-10 06:01:19,421][00763] Fps is (10 sec: 3688.7, 60 sec: 3754.7, 300 sec: 3707.2). Total num frames: 2052096. Throughput: 0: 913.4. Samples: 512394. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-01-10 06:01:19,430][00763] Avg episode reward: [(0, '12.028')] +[2025-01-10 06:01:24,423][00763] Fps is (10 sec: 4505.0, 60 sec: 3822.8, 300 sec: 3735.0). Total num frames: 2072576. Throughput: 0: 961.3. Samples: 519034. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-01-10 06:01:24,426][00763] Avg episode reward: [(0, '12.834')] +[2025-01-10 06:01:29,425][00763] Fps is (10 sec: 3275.8, 60 sec: 3686.2, 300 sec: 3707.2). Total num frames: 2084864. Throughput: 0: 945.7. Samples: 520898. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2025-01-10 06:01:29,430][00763] Avg episode reward: [(0, '12.611')] +[2025-01-10 06:01:30,236][08254] Updated weights for policy 0, policy_version 510 (0.0039) +[2025-01-10 06:01:34,421][00763] Fps is (10 sec: 3277.3, 60 sec: 3686.4, 300 sec: 3693.3). Total num frames: 2105344. Throughput: 0: 908.6. Samples: 525896. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2025-01-10 06:01:34,430][00763] Avg episode reward: [(0, '13.630')] +[2025-01-10 06:01:34,435][08241] Saving new best policy, reward=13.630! +[2025-01-10 06:01:39,421][00763] Fps is (10 sec: 4097.2, 60 sec: 3754.7, 300 sec: 3707.2). Total num frames: 2125824. Throughput: 0: 940.2. Samples: 532452. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2025-01-10 06:01:39,428][00763] Avg episode reward: [(0, '12.921')] +[2025-01-10 06:01:39,439][08241] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000519_2125824.pth... +[2025-01-10 06:01:39,581][08241] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000302_1236992.pth +[2025-01-10 06:01:39,731][08254] Updated weights for policy 0, policy_version 520 (0.0021) +[2025-01-10 06:01:44,422][00763] Fps is (10 sec: 3686.3, 60 sec: 3686.4, 300 sec: 3721.1). Total num frames: 2142208. Throughput: 0: 961.7. Samples: 535314. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2025-01-10 06:01:44,424][00763] Avg episode reward: [(0, '13.411')] +[2025-01-10 06:01:49,422][00763] Fps is (10 sec: 2866.9, 60 sec: 3549.8, 300 sec: 3679.4). Total num frames: 2154496. Throughput: 0: 902.3. Samples: 539130. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2025-01-10 06:01:49,429][00763] Avg episode reward: [(0, '14.267')] +[2025-01-10 06:01:49,442][08241] Saving new best policy, reward=14.267! +[2025-01-10 06:01:54,141][08254] Updated weights for policy 0, policy_version 530 (0.0013) +[2025-01-10 06:01:54,421][00763] Fps is (10 sec: 2867.3, 60 sec: 3549.9, 300 sec: 3665.6). Total num frames: 2170880. Throughput: 0: 858.5. Samples: 543258. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2025-01-10 06:01:54,425][00763] Avg episode reward: [(0, '16.257')] +[2025-01-10 06:01:54,429][08241] Saving new best policy, reward=16.257! +[2025-01-10 06:01:59,421][00763] Fps is (10 sec: 3686.8, 60 sec: 3618.1, 300 sec: 3693.3). Total num frames: 2191360. Throughput: 0: 888.0. Samples: 546592. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2025-01-10 06:01:59,433][00763] Avg episode reward: [(0, '15.661')] +[2025-01-10 06:02:04,422][00763] Fps is (10 sec: 3276.7, 60 sec: 3481.6, 300 sec: 3651.7). Total num frames: 2203648. Throughput: 0: 859.8. Samples: 551086. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2025-01-10 06:02:04,428][00763] Avg episode reward: [(0, '15.637')] +[2025-01-10 06:02:05,944][08254] Updated weights for policy 0, policy_version 540 (0.0020) +[2025-01-10 06:02:09,421][00763] Fps is (10 sec: 3276.8, 60 sec: 3482.0, 300 sec: 3637.8). Total num frames: 2224128. Throughput: 0: 845.9. Samples: 557098. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-01-10 06:02:09,429][00763] Avg episode reward: [(0, '15.923')] +[2025-01-10 06:02:14,421][00763] Fps is (10 sec: 4505.7, 60 sec: 3686.4, 300 sec: 3679.5). Total num frames: 2248704. Throughput: 0: 880.3. Samples: 560510. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2025-01-10 06:02:14,427][00763] Avg episode reward: [(0, '15.114')] +[2025-01-10 06:02:15,320][08254] Updated weights for policy 0, policy_version 550 (0.0026) +[2025-01-10 06:02:19,425][00763] Fps is (10 sec: 3684.9, 60 sec: 3481.4, 300 sec: 3651.6). Total num frames: 2260992. Throughput: 0: 890.1. Samples: 565954. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2025-01-10 06:02:19,430][00763] Avg episode reward: [(0, '15.008')] +[2025-01-10 06:02:24,421][00763] Fps is (10 sec: 3276.8, 60 sec: 3481.7, 300 sec: 3651.7). Total num frames: 2281472. Throughput: 0: 871.1. Samples: 571650. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-01-10 06:02:24,427][00763] Avg episode reward: [(0, '14.980')] +[2025-01-10 06:02:26,330][08254] Updated weights for policy 0, policy_version 560 (0.0027) +[2025-01-10 06:02:29,421][00763] Fps is (10 sec: 4507.4, 60 sec: 3686.6, 300 sec: 3679.5). Total num frames: 2306048. Throughput: 0: 886.3. Samples: 575196. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2025-01-10 06:02:29,424][00763] Avg episode reward: [(0, '15.577')] +[2025-01-10 06:02:34,426][00763] Fps is (10 sec: 4503.3, 60 sec: 3686.1, 300 sec: 3693.3). Total num frames: 2326528. Throughput: 0: 947.3. Samples: 581764. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-01-10 06:02:34,429][00763] Avg episode reward: [(0, '15.381')] +[2025-01-10 06:02:36,423][08254] Updated weights for policy 0, policy_version 570 (0.0036) +[2025-01-10 06:02:39,421][00763] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3679.5). Total num frames: 2342912. Throughput: 0: 972.0. Samples: 587000. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-01-10 06:02:39,423][00763] Avg episode reward: [(0, '15.903')] +[2025-01-10 06:02:44,421][00763] Fps is (10 sec: 4507.9, 60 sec: 3823.0, 300 sec: 3707.2). Total num frames: 2371584. Throughput: 0: 981.8. Samples: 590772. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-01-10 06:02:44,426][00763] Avg episode reward: [(0, '17.082')] +[2025-01-10 06:02:44,431][08241] Saving new best policy, reward=17.082! +[2025-01-10 06:02:45,282][08254] Updated weights for policy 0, policy_version 580 (0.0025) +[2025-01-10 06:02:49,421][00763] Fps is (10 sec: 4915.2, 60 sec: 3959.5, 300 sec: 3721.2). Total num frames: 2392064. Throughput: 0: 1047.2. Samples: 598208. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2025-01-10 06:02:49,423][00763] Avg episode reward: [(0, '16.837')] +[2025-01-10 06:02:54,423][00763] Fps is (10 sec: 3685.8, 60 sec: 3959.4, 300 sec: 3707.2). Total num frames: 2408448. Throughput: 0: 1014.6. Samples: 602756. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-01-10 06:02:54,426][00763] Avg episode reward: [(0, '17.427')] +[2025-01-10 06:02:54,428][08241] Saving new best policy, reward=17.427! +[2025-01-10 06:02:56,040][08254] Updated weights for policy 0, policy_version 590 (0.0028) +[2025-01-10 06:02:59,421][00763] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3748.9). Total num frames: 2433024. Throughput: 0: 1016.2. Samples: 606240. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-01-10 06:02:59,426][00763] Avg episode reward: [(0, '17.527')] +[2025-01-10 06:02:59,438][08241] Saving new best policy, reward=17.527! +[2025-01-10 06:03:04,378][08254] Updated weights for policy 0, policy_version 600 (0.0015) +[2025-01-10 06:03:04,421][00763] Fps is (10 sec: 4915.9, 60 sec: 4232.6, 300 sec: 3790.6). Total num frames: 2457600. Throughput: 0: 1055.9. Samples: 613464. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-01-10 06:03:04,427][00763] Avg episode reward: [(0, '16.870')] +[2025-01-10 06:03:09,421][00763] Fps is (10 sec: 3686.4, 60 sec: 4096.0, 300 sec: 3776.7). Total num frames: 2469888. Throughput: 0: 1046.1. Samples: 618724. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-01-10 06:03:09,423][00763] Avg episode reward: [(0, '16.850')] +[2025-01-10 06:03:14,421][00763] Fps is (10 sec: 3276.8, 60 sec: 4027.7, 300 sec: 3762.8). Total num frames: 2490368. Throughput: 0: 1028.0. Samples: 621458. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2025-01-10 06:03:14,423][00763] Avg episode reward: [(0, '15.106')] +[2025-01-10 06:03:15,382][08254] Updated weights for policy 0, policy_version 610 (0.0028) +[2025-01-10 06:03:19,421][00763] Fps is (10 sec: 4505.6, 60 sec: 4232.8, 300 sec: 3790.5). Total num frames: 2514944. Throughput: 0: 1042.4. Samples: 628666. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-01-10 06:03:19,429][00763] Avg episode reward: [(0, '15.524')] +[2025-01-10 06:03:24,421][00763] Fps is (10 sec: 4505.6, 60 sec: 4232.5, 300 sec: 3818.3). Total num frames: 2535424. Throughput: 0: 1054.6. Samples: 634458. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2025-01-10 06:03:24,427][00763] Avg episode reward: [(0, '15.681')] +[2025-01-10 06:03:25,593][08254] Updated weights for policy 0, policy_version 620 (0.0035) +[2025-01-10 06:03:29,423][00763] Fps is (10 sec: 3685.7, 60 sec: 4095.9, 300 sec: 3790.5). Total num frames: 2551808. Throughput: 0: 1019.9. Samples: 636670. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-01-10 06:03:29,426][00763] Avg episode reward: [(0, '16.080')] +[2025-01-10 06:03:34,421][00763] Fps is (10 sec: 4096.0, 60 sec: 4164.6, 300 sec: 3804.4). Total num frames: 2576384. Throughput: 0: 1013.9. Samples: 643834. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-01-10 06:03:34,428][00763] Avg episode reward: [(0, '17.200')] +[2025-01-10 06:03:34,673][08254] Updated weights for policy 0, policy_version 630 (0.0037) +[2025-01-10 06:03:39,421][00763] Fps is (10 sec: 4506.5, 60 sec: 4232.5, 300 sec: 3832.2). Total num frames: 2596864. Throughput: 0: 1062.7. Samples: 650578. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-01-10 06:03:39,429][00763] Avg episode reward: [(0, '16.539')] +[2025-01-10 06:03:39,439][08241] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000634_2596864.pth... +[2025-01-10 06:03:39,602][08241] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000410_1679360.pth +[2025-01-10 06:03:44,421][00763] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3804.5). Total num frames: 2613248. Throughput: 0: 1035.0. Samples: 652816. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-01-10 06:03:44,423][00763] Avg episode reward: [(0, '16.489')] +[2025-01-10 06:03:45,407][08254] Updated weights for policy 0, policy_version 640 (0.0024) +[2025-01-10 06:03:49,424][00763] Fps is (10 sec: 4504.2, 60 sec: 4164.1, 300 sec: 3832.2). Total num frames: 2641920. Throughput: 0: 1026.0. Samples: 659638. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-01-10 06:03:49,433][00763] Avg episode reward: [(0, '16.405')] +[2025-01-10 06:03:53,338][08254] Updated weights for policy 0, policy_version 650 (0.0013) +[2025-01-10 06:03:54,425][00763] Fps is (10 sec: 5322.6, 60 sec: 4300.6, 300 sec: 3859.9). Total num frames: 2666496. Throughput: 0: 1073.8. Samples: 667050. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-01-10 06:03:54,428][00763] Avg episode reward: [(0, '16.743')] +[2025-01-10 06:03:59,421][00763] Fps is (10 sec: 3687.5, 60 sec: 4096.0, 300 sec: 3832.2). Total num frames: 2678784. Throughput: 0: 1064.0. Samples: 669336. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-01-10 06:03:59,426][00763] Avg episode reward: [(0, '17.149')] +[2025-01-10 06:04:04,078][08254] Updated weights for policy 0, policy_version 660 (0.0026) +[2025-01-10 06:04:04,421][00763] Fps is (10 sec: 3687.9, 60 sec: 4096.0, 300 sec: 3832.2). Total num frames: 2703360. Throughput: 0: 1037.4. Samples: 675348. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-01-10 06:04:04,427][00763] Avg episode reward: [(0, '18.541')] +[2025-01-10 06:04:04,430][08241] Saving new best policy, reward=18.541! +[2025-01-10 06:04:09,421][00763] Fps is (10 sec: 4915.2, 60 sec: 4300.8, 300 sec: 3860.0). Total num frames: 2727936. Throughput: 0: 1073.8. Samples: 682778. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2025-01-10 06:04:09,426][00763] Avg episode reward: [(0, '18.207')] +[2025-01-10 06:04:13,473][08254] Updated weights for policy 0, policy_version 670 (0.0016) +[2025-01-10 06:04:14,422][00763] Fps is (10 sec: 4095.6, 60 sec: 4232.5, 300 sec: 3859.9). Total num frames: 2744320. Throughput: 0: 1088.7. Samples: 685660. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-01-10 06:04:14,425][00763] Avg episode reward: [(0, '17.637')] +[2025-01-10 06:04:19,421][00763] Fps is (10 sec: 3686.4, 60 sec: 4164.3, 300 sec: 3860.0). Total num frames: 2764800. Throughput: 0: 1046.6. Samples: 690932. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-01-10 06:04:19,426][00763] Avg episode reward: [(0, '17.985')] +[2025-01-10 06:04:23,046][08254] Updated weights for policy 0, policy_version 680 (0.0015) +[2025-01-10 06:04:24,421][00763] Fps is (10 sec: 4506.0, 60 sec: 4232.5, 300 sec: 3873.8). Total num frames: 2789376. Throughput: 0: 1063.9. Samples: 698454. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-01-10 06:04:24,426][00763] Avg episode reward: [(0, '17.217')] +[2025-01-10 06:04:29,422][00763] Fps is (10 sec: 4505.5, 60 sec: 4300.9, 300 sec: 3901.6). Total num frames: 2809856. Throughput: 0: 1091.2. Samples: 701920. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2025-01-10 06:04:29,424][00763] Avg episode reward: [(0, '18.689')] +[2025-01-10 06:04:29,439][08241] Saving new best policy, reward=18.689! +[2025-01-10 06:04:34,147][08254] Updated weights for policy 0, policy_version 690 (0.0021) +[2025-01-10 06:04:34,421][00763] Fps is (10 sec: 3686.4, 60 sec: 4164.3, 300 sec: 3887.7). Total num frames: 2826240. Throughput: 0: 1031.8. Samples: 706064. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-01-10 06:04:34,426][00763] Avg episode reward: [(0, '19.186')] +[2025-01-10 06:04:34,429][08241] Saving new best policy, reward=19.186! +[2025-01-10 06:04:39,421][00763] Fps is (10 sec: 3686.4, 60 sec: 4164.3, 300 sec: 3873.8). Total num frames: 2846720. Throughput: 0: 1020.8. Samples: 712982. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-01-10 06:04:39,428][00763] Avg episode reward: [(0, '20.455')] +[2025-01-10 06:04:39,442][08241] Saving new best policy, reward=20.455! +[2025-01-10 06:04:43,111][08254] Updated weights for policy 0, policy_version 700 (0.0016) +[2025-01-10 06:04:44,421][00763] Fps is (10 sec: 4505.6, 60 sec: 4300.8, 300 sec: 3915.5). Total num frames: 2871296. Throughput: 0: 1045.2. Samples: 716372. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-01-10 06:04:44,424][00763] Avg episode reward: [(0, '20.635')] +[2025-01-10 06:04:44,425][08241] Saving new best policy, reward=20.635! +[2025-01-10 06:04:49,424][00763] Fps is (10 sec: 3685.3, 60 sec: 4027.7, 300 sec: 3901.6). Total num frames: 2883584. Throughput: 0: 1015.0. Samples: 721024. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-01-10 06:04:49,427][00763] Avg episode reward: [(0, '20.429')] +[2025-01-10 06:04:54,421][00763] Fps is (10 sec: 3276.8, 60 sec: 3959.7, 300 sec: 3887.7). Total num frames: 2904064. Throughput: 0: 984.1. Samples: 727062. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-01-10 06:04:54,427][00763] Avg episode reward: [(0, '19.571')] +[2025-01-10 06:04:54,753][08254] Updated weights for policy 0, policy_version 710 (0.0023) +[2025-01-10 06:04:59,422][00763] Fps is (10 sec: 4506.9, 60 sec: 4164.2, 300 sec: 3915.5). Total num frames: 2928640. Throughput: 0: 997.2. Samples: 730532. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-01-10 06:04:59,429][00763] Avg episode reward: [(0, '18.668')] +[2025-01-10 06:05:04,424][00763] Fps is (10 sec: 4095.0, 60 sec: 4027.6, 300 sec: 3929.4). Total num frames: 2945024. Throughput: 0: 999.9. Samples: 735928. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2025-01-10 06:05:04,426][00763] Avg episode reward: [(0, '19.712')] +[2025-01-10 06:05:05,887][08254] Updated weights for policy 0, policy_version 720 (0.0018) +[2025-01-10 06:05:09,421][00763] Fps is (10 sec: 3276.9, 60 sec: 3891.2, 300 sec: 3901.6). Total num frames: 2961408. Throughput: 0: 958.2. Samples: 741574. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2025-01-10 06:05:09,425][00763] Avg episode reward: [(0, '20.442')] +[2025-01-10 06:05:14,348][08254] Updated weights for policy 0, policy_version 730 (0.0017) +[2025-01-10 06:05:14,421][00763] Fps is (10 sec: 4506.7, 60 sec: 4096.1, 300 sec: 3943.3). Total num frames: 2990080. Throughput: 0: 964.5. Samples: 745322. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-01-10 06:05:14,424][00763] Avg episode reward: [(0, '20.344')] +[2025-01-10 06:05:19,421][00763] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3943.3). Total num frames: 3006464. Throughput: 0: 1024.8. Samples: 752180. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-01-10 06:05:19,423][00763] Avg episode reward: [(0, '19.186')] +[2025-01-10 06:05:24,421][00763] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3943.3). Total num frames: 3026944. Throughput: 0: 984.1. Samples: 757268. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-01-10 06:05:24,424][00763] Avg episode reward: [(0, '19.500')] +[2025-01-10 06:05:25,099][08254] Updated weights for policy 0, policy_version 740 (0.0021) +[2025-01-10 06:05:29,421][00763] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3957.2). Total num frames: 3051520. Throughput: 0: 992.4. Samples: 761032. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2025-01-10 06:05:29,431][00763] Avg episode reward: [(0, '19.584')] +[2025-01-10 06:05:33,339][08254] Updated weights for policy 0, policy_version 750 (0.0022) +[2025-01-10 06:05:34,421][00763] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 3971.0). Total num frames: 3072000. Throughput: 0: 1051.5. Samples: 768340. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-01-10 06:05:34,426][00763] Avg episode reward: [(0, '21.074')] +[2025-01-10 06:05:34,434][08241] Saving new best policy, reward=21.074! +[2025-01-10 06:05:39,421][00763] Fps is (10 sec: 3276.8, 60 sec: 3959.5, 300 sec: 3943.3). Total num frames: 3084288. Throughput: 0: 1001.1. Samples: 772112. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2025-01-10 06:05:39,424][00763] Avg episode reward: [(0, '20.785')] +[2025-01-10 06:05:39,438][08241] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000753_3084288.pth... +[2025-01-10 06:05:39,660][08241] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000519_2125824.pth +[2025-01-10 06:05:44,421][00763] Fps is (10 sec: 2867.2, 60 sec: 3822.9, 300 sec: 3929.4). Total num frames: 3100672. Throughput: 0: 963.9. Samples: 773908. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-01-10 06:05:44,424][00763] Avg episode reward: [(0, '21.492')] +[2025-01-10 06:05:44,431][08241] Saving new best policy, reward=21.492! +[2025-01-10 06:05:46,756][08254] Updated weights for policy 0, policy_version 760 (0.0035) +[2025-01-10 06:05:49,421][00763] Fps is (10 sec: 4096.0, 60 sec: 4027.9, 300 sec: 3957.2). Total num frames: 3125248. Throughput: 0: 993.3. Samples: 780624. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-01-10 06:05:49,423][00763] Avg episode reward: [(0, '20.613')] +[2025-01-10 06:05:54,422][00763] Fps is (10 sec: 4505.5, 60 sec: 4027.7, 300 sec: 3971.0). Total num frames: 3145728. Throughput: 0: 1024.6. Samples: 787680. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-01-10 06:05:54,427][00763] Avg episode reward: [(0, '19.856')] +[2025-01-10 06:05:56,148][08254] Updated weights for policy 0, policy_version 770 (0.0021) +[2025-01-10 06:05:59,421][00763] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3957.2). Total num frames: 3162112. Throughput: 0: 988.6. Samples: 789808. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-01-10 06:05:59,424][00763] Avg episode reward: [(0, '18.954')] +[2025-01-10 06:06:04,421][00763] Fps is (10 sec: 4096.1, 60 sec: 4027.9, 300 sec: 3971.1). Total num frames: 3186688. Throughput: 0: 970.0. Samples: 795830. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-01-10 06:06:04,428][00763] Avg episode reward: [(0, '19.339')] +[2025-01-10 06:06:06,032][08254] Updated weights for policy 0, policy_version 780 (0.0032) +[2025-01-10 06:06:09,421][00763] Fps is (10 sec: 4915.2, 60 sec: 4164.3, 300 sec: 4012.7). Total num frames: 3211264. Throughput: 0: 1022.4. Samples: 803276. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-01-10 06:06:09,425][00763] Avg episode reward: [(0, '19.982')] +[2025-01-10 06:06:14,424][00763] Fps is (10 sec: 4094.7, 60 sec: 3959.3, 300 sec: 3984.9). Total num frames: 3227648. Throughput: 0: 999.1. Samples: 805994. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2025-01-10 06:06:14,427][00763] Avg episode reward: [(0, '20.379')] +[2025-01-10 06:06:16,715][08254] Updated weights for policy 0, policy_version 790 (0.0029) +[2025-01-10 06:06:19,421][00763] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3984.9). Total num frames: 3248128. Throughput: 0: 958.2. Samples: 811458. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-01-10 06:06:19,424][00763] Avg episode reward: [(0, '20.942')] +[2025-01-10 06:06:24,421][00763] Fps is (10 sec: 4507.0, 60 sec: 4096.0, 300 sec: 4026.6). Total num frames: 3272704. Throughput: 0: 1040.8. Samples: 818946. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-01-10 06:06:24,424][00763] Avg episode reward: [(0, '21.571')] +[2025-01-10 06:06:24,427][08241] Saving new best policy, reward=21.571! +[2025-01-10 06:06:24,875][08254] Updated weights for policy 0, policy_version 800 (0.0020) +[2025-01-10 06:06:29,421][00763] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 4026.6). Total num frames: 3293184. Throughput: 0: 1076.5. Samples: 822352. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-01-10 06:06:29,427][00763] Avg episode reward: [(0, '22.608')] +[2025-01-10 06:06:29,436][08241] Saving new best policy, reward=22.608! +[2025-01-10 06:06:34,421][00763] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 4012.7). Total num frames: 3309568. Throughput: 0: 1026.3. Samples: 826808. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-01-10 06:06:34,430][00763] Avg episode reward: [(0, '22.381')] +[2025-01-10 06:06:35,718][08254] Updated weights for policy 0, policy_version 810 (0.0020) +[2025-01-10 06:06:39,421][00763] Fps is (10 sec: 4096.0, 60 sec: 4164.3, 300 sec: 4040.5). Total num frames: 3334144. Throughput: 0: 1037.6. Samples: 834372. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-01-10 06:06:39,424][00763] Avg episode reward: [(0, '21.764')] +[2025-01-10 06:06:44,421][00763] Fps is (10 sec: 4505.6, 60 sec: 4232.5, 300 sec: 4068.2). Total num frames: 3354624. Throughput: 0: 1074.0. Samples: 838136. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-01-10 06:06:44,427][00763] Avg episode reward: [(0, '22.618')] +[2025-01-10 06:06:44,514][08241] Saving new best policy, reward=22.618! +[2025-01-10 06:06:44,515][08254] Updated weights for policy 0, policy_version 820 (0.0025) +[2025-01-10 06:06:49,421][00763] Fps is (10 sec: 3686.4, 60 sec: 4096.0, 300 sec: 4068.2). Total num frames: 3371008. Throughput: 0: 1049.7. Samples: 843066. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-01-10 06:06:49,424][00763] Avg episode reward: [(0, '23.290')] +[2025-01-10 06:06:49,434][08241] Saving new best policy, reward=23.290! +[2025-01-10 06:06:54,422][00763] Fps is (10 sec: 4095.9, 60 sec: 4164.3, 300 sec: 4082.1). Total num frames: 3395584. Throughput: 0: 1036.2. Samples: 849904. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2025-01-10 06:06:54,427][00763] Avg episode reward: [(0, '21.777')] +[2025-01-10 06:06:54,647][08254] Updated weights for policy 0, policy_version 830 (0.0013) +[2025-01-10 06:06:59,421][00763] Fps is (10 sec: 4915.2, 60 sec: 4300.8, 300 sec: 4123.8). Total num frames: 3420160. Throughput: 0: 1058.7. Samples: 853634. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2025-01-10 06:06:59,424][00763] Avg episode reward: [(0, '22.152')] +[2025-01-10 06:07:04,421][00763] Fps is (10 sec: 4096.1, 60 sec: 4164.3, 300 sec: 4109.9). Total num frames: 3436544. Throughput: 0: 1065.0. Samples: 859384. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2025-01-10 06:07:04,431][00763] Avg episode reward: [(0, '23.458')] +[2025-01-10 06:07:04,433][08241] Saving new best policy, reward=23.458! +[2025-01-10 06:07:05,077][08254] Updated weights for policy 0, policy_version 840 (0.0025) +[2025-01-10 06:07:09,421][00763] Fps is (10 sec: 3686.4, 60 sec: 4096.0, 300 sec: 4096.0). Total num frames: 3457024. Throughput: 0: 1031.0. Samples: 865342. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-01-10 06:07:09,430][00763] Avg episode reward: [(0, '22.645')] +[2025-01-10 06:07:14,019][08254] Updated weights for policy 0, policy_version 850 (0.0022) +[2025-01-10 06:07:14,421][00763] Fps is (10 sec: 4505.6, 60 sec: 4232.7, 300 sec: 4137.7). Total num frames: 3481600. Throughput: 0: 1035.3. Samples: 868940. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-01-10 06:07:14,424][00763] Avg episode reward: [(0, '21.904')] +[2025-01-10 06:07:19,426][00763] Fps is (10 sec: 4093.9, 60 sec: 4163.9, 300 sec: 4123.7). Total num frames: 3497984. Throughput: 0: 1074.2. Samples: 875152. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-01-10 06:07:19,431][00763] Avg episode reward: [(0, '22.590')] +[2025-01-10 06:07:24,421][00763] Fps is (10 sec: 3686.4, 60 sec: 4096.0, 300 sec: 4109.9). Total num frames: 3518464. Throughput: 0: 1020.3. Samples: 880284. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-01-10 06:07:24,426][00763] Avg episode reward: [(0, '22.425')] +[2025-01-10 06:07:25,014][08254] Updated weights for policy 0, policy_version 860 (0.0041) +[2025-01-10 06:07:29,421][00763] Fps is (10 sec: 4507.9, 60 sec: 4164.3, 300 sec: 4123.8). Total num frames: 3543040. Throughput: 0: 1017.6. Samples: 883928. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2025-01-10 06:07:29,425][00763] Avg episode reward: [(0, '22.197')] +[2025-01-10 06:07:33,860][08254] Updated weights for policy 0, policy_version 870 (0.0015) +[2025-01-10 06:07:34,421][00763] Fps is (10 sec: 4505.6, 60 sec: 4232.5, 300 sec: 4137.7). Total num frames: 3563520. Throughput: 0: 1064.0. Samples: 890944. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2025-01-10 06:07:34,427][00763] Avg episode reward: [(0, '21.631')] +[2025-01-10 06:07:39,421][00763] Fps is (10 sec: 3276.8, 60 sec: 4027.7, 300 sec: 4082.1). Total num frames: 3575808. Throughput: 0: 1003.6. Samples: 895068. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2025-01-10 06:07:39,428][00763] Avg episode reward: [(0, '21.796')] +[2025-01-10 06:07:39,440][08241] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000873_3575808.pth... +[2025-01-10 06:07:39,569][08241] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000634_2596864.pth +[2025-01-10 06:07:44,421][00763] Fps is (10 sec: 3686.4, 60 sec: 4096.0, 300 sec: 4096.0). Total num frames: 3600384. Throughput: 0: 995.6. Samples: 898436. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2025-01-10 06:07:44,434][00763] Avg episode reward: [(0, '21.977')] +[2025-01-10 06:07:44,942][08254] Updated weights for policy 0, policy_version 880 (0.0013) +[2025-01-10 06:07:49,424][00763] Fps is (10 sec: 4913.7, 60 sec: 4232.3, 300 sec: 4123.7). Total num frames: 3624960. Throughput: 0: 1027.0. Samples: 905602. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-01-10 06:07:49,430][00763] Avg episode reward: [(0, '20.035')] +[2025-01-10 06:07:54,421][00763] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4082.1). Total num frames: 3637248. Throughput: 0: 994.2. Samples: 910082. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-01-10 06:07:54,423][00763] Avg episode reward: [(0, '19.392')] +[2025-01-10 06:07:56,571][08254] Updated weights for policy 0, policy_version 890 (0.0015) +[2025-01-10 06:07:59,421][00763] Fps is (10 sec: 3277.8, 60 sec: 3959.5, 300 sec: 4068.2). Total num frames: 3657728. Throughput: 0: 973.4. Samples: 912744. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-01-10 06:07:59,424][00763] Avg episode reward: [(0, '19.695')] +[2025-01-10 06:08:04,421][00763] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 4096.0). Total num frames: 3678208. Throughput: 0: 987.9. Samples: 919602. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2025-01-10 06:08:04,423][00763] Avg episode reward: [(0, '18.858')] +[2025-01-10 06:08:05,518][08254] Updated weights for policy 0, policy_version 900 (0.0014) +[2025-01-10 06:08:09,421][00763] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 4082.1). Total num frames: 3694592. Throughput: 0: 995.8. Samples: 925094. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2025-01-10 06:08:09,424][00763] Avg episode reward: [(0, '18.423')] +[2025-01-10 06:08:14,422][00763] Fps is (10 sec: 3686.3, 60 sec: 3891.2, 300 sec: 4068.2). Total num frames: 3715072. Throughput: 0: 960.8. Samples: 927166. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2025-01-10 06:08:14,423][00763] Avg episode reward: [(0, '19.838')] +[2025-01-10 06:08:16,843][08254] Updated weights for policy 0, policy_version 910 (0.0035) +[2025-01-10 06:08:19,421][00763] Fps is (10 sec: 4096.0, 60 sec: 3959.8, 300 sec: 4068.2). Total num frames: 3735552. Throughput: 0: 958.9. Samples: 934094. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-01-10 06:08:19,429][00763] Avg episode reward: [(0, '19.352')] +[2025-01-10 06:08:24,424][00763] Fps is (10 sec: 4094.9, 60 sec: 3959.3, 300 sec: 4082.1). Total num frames: 3756032. Throughput: 0: 1004.9. Samples: 940292. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-01-10 06:08:24,427][00763] Avg episode reward: [(0, '20.150')] +[2025-01-10 06:08:27,657][08254] Updated weights for policy 0, policy_version 920 (0.0017) +[2025-01-10 06:08:29,421][00763] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 4054.3). Total num frames: 3772416. Throughput: 0: 977.7. Samples: 942432. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2025-01-10 06:08:29,426][00763] Avg episode reward: [(0, '19.733')] +[2025-01-10 06:08:34,421][00763] Fps is (10 sec: 4097.3, 60 sec: 3891.2, 300 sec: 4068.2). Total num frames: 3796992. Throughput: 0: 959.6. Samples: 948780. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-01-10 06:08:34,424][00763] Avg episode reward: [(0, '19.948')] +[2025-01-10 06:08:36,633][08254] Updated weights for policy 0, policy_version 930 (0.0025) +[2025-01-10 06:08:39,421][00763] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 4082.1). Total num frames: 3817472. Throughput: 0: 1013.2. Samples: 955676. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-01-10 06:08:39,429][00763] Avg episode reward: [(0, '19.551')] +[2025-01-10 06:08:44,421][00763] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 4040.5). Total num frames: 3833856. Throughput: 0: 1000.4. Samples: 957762. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2025-01-10 06:08:44,427][00763] Avg episode reward: [(0, '20.313')] +[2025-01-10 06:08:48,094][08254] Updated weights for policy 0, policy_version 940 (0.0015) +[2025-01-10 06:08:49,421][00763] Fps is (10 sec: 3686.4, 60 sec: 3823.1, 300 sec: 4026.6). Total num frames: 3854336. Throughput: 0: 969.0. Samples: 963208. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-01-10 06:08:49,427][00763] Avg episode reward: [(0, '20.061')] +[2025-01-10 06:08:54,425][00763] Fps is (10 sec: 4094.3, 60 sec: 3959.2, 300 sec: 4054.3). Total num frames: 3874816. Throughput: 0: 999.1. Samples: 970056. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2025-01-10 06:08:54,432][00763] Avg episode reward: [(0, '21.102')] +[2025-01-10 06:08:58,391][08254] Updated weights for policy 0, policy_version 950 (0.0025) +[2025-01-10 06:08:59,421][00763] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 4026.6). Total num frames: 3891200. Throughput: 0: 1008.9. Samples: 972564. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-01-10 06:08:59,427][00763] Avg episode reward: [(0, '21.031')] +[2025-01-10 06:09:04,421][00763] Fps is (10 sec: 3278.1, 60 sec: 3822.9, 300 sec: 3998.8). Total num frames: 3907584. Throughput: 0: 955.3. Samples: 977082. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2025-01-10 06:09:04,424][00763] Avg episode reward: [(0, '20.977')] +[2025-01-10 06:09:08,861][08254] Updated weights for policy 0, policy_version 960 (0.0016) +[2025-01-10 06:09:09,421][00763] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 4026.6). Total num frames: 3932160. Throughput: 0: 971.5. Samples: 984006. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-01-10 06:09:09,427][00763] Avg episode reward: [(0, '21.311')] +[2025-01-10 06:09:14,424][00763] Fps is (10 sec: 4504.2, 60 sec: 3959.3, 300 sec: 4026.5). Total num frames: 3952640. Throughput: 0: 1000.7. Samples: 987468. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-01-10 06:09:14,427][00763] Avg episode reward: [(0, '21.831')] +[2025-01-10 06:09:19,421][00763] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3984.9). Total num frames: 3964928. Throughput: 0: 953.3. Samples: 991680. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-01-10 06:09:19,430][00763] Avg episode reward: [(0, '21.761')] +[2025-01-10 06:09:20,404][08254] Updated weights for policy 0, policy_version 970 (0.0013) +[2025-01-10 06:09:24,421][00763] Fps is (10 sec: 3687.5, 60 sec: 3891.4, 300 sec: 3998.8). Total num frames: 3989504. Throughput: 0: 945.0. Samples: 998200. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-01-10 06:09:24,430][00763] Avg episode reward: [(0, '20.944')] +[2025-01-10 06:09:29,421][00763] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3984.9). Total num frames: 4001792. Throughput: 0: 950.4. Samples: 1000528. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-01-10 06:09:29,424][00763] Avg episode reward: [(0, '21.410')] +[2025-01-10 06:09:29,750][08241] Stopping Batcher_0... +[2025-01-10 06:09:29,750][08241] Loop batcher_evt_loop terminating... +[2025-01-10 06:09:29,750][00763] Component Batcher_0 stopped! +[2025-01-10 06:09:29,758][08241] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... +[2025-01-10 06:09:29,869][08254] Weights refcount: 2 0 +[2025-01-10 06:09:29,873][08254] Stopping InferenceWorker_p0-w0... +[2025-01-10 06:09:29,873][08254] Loop inference_proc0-0_evt_loop terminating... +[2025-01-10 06:09:29,873][00763] Component InferenceWorker_p0-w0 stopped! +[2025-01-10 06:09:29,988][08241] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000753_3084288.pth +[2025-01-10 06:09:30,008][08241] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... +[2025-01-10 06:09:30,229][08241] Stopping LearnerWorker_p0... +[2025-01-10 06:09:30,230][08241] Loop learner_proc0_evt_loop terminating... +[2025-01-10 06:09:30,229][00763] Component LearnerWorker_p0 stopped! +[2025-01-10 06:09:30,488][08260] Stopping RolloutWorker_w5... +[2025-01-10 06:09:30,491][08260] Loop rollout_proc5_evt_loop terminating... +[2025-01-10 06:09:30,491][00763] Component RolloutWorker_w5 stopped! +[2025-01-10 06:09:30,512][08262] Stopping RolloutWorker_w7... +[2025-01-10 06:09:30,513][08262] Loop rollout_proc7_evt_loop terminating... +[2025-01-10 06:09:30,512][00763] Component RolloutWorker_w7 stopped! +[2025-01-10 06:09:30,523][08256] Stopping RolloutWorker_w1... +[2025-01-10 06:09:30,524][08256] Loop rollout_proc1_evt_loop terminating... +[2025-01-10 06:09:30,523][00763] Component RolloutWorker_w1 stopped! +[2025-01-10 06:09:30,566][00763] Component RolloutWorker_w3 stopped! +[2025-01-10 06:09:30,570][08259] Stopping RolloutWorker_w3... +[2025-01-10 06:09:30,571][08259] Loop rollout_proc3_evt_loop terminating... +[2025-01-10 06:09:30,620][08261] Stopping RolloutWorker_w6... +[2025-01-10 06:09:30,620][08261] Loop rollout_proc6_evt_loop terminating... +[2025-01-10 06:09:30,620][00763] Component RolloutWorker_w6 stopped! +[2025-01-10 06:09:30,655][08257] Stopping RolloutWorker_w2... +[2025-01-10 06:09:30,655][00763] Component RolloutWorker_w2 stopped! +[2025-01-10 06:09:30,662][08257] Loop rollout_proc2_evt_loop terminating... +[2025-01-10 06:09:30,691][08258] Stopping RolloutWorker_w4... +[2025-01-10 06:09:30,691][08258] Loop rollout_proc4_evt_loop terminating... +[2025-01-10 06:09:30,691][00763] Component RolloutWorker_w4 stopped! +[2025-01-10 06:09:30,700][00763] Component RolloutWorker_w0 stopped! +[2025-01-10 06:09:30,704][00763] Waiting for process learner_proc0 to stop... +[2025-01-10 06:09:30,704][08255] Stopping RolloutWorker_w0... +[2025-01-10 06:09:30,710][08255] Loop rollout_proc0_evt_loop terminating... +[2025-01-10 06:09:32,890][00763] Waiting for process inference_proc0-0 to join... +[2025-01-10 06:09:33,054][00763] Waiting for process rollout_proc0 to join... +[2025-01-10 06:09:35,772][00763] Waiting for process rollout_proc1 to join... +[2025-01-10 06:09:35,782][00763] Waiting for process rollout_proc2 to join... +[2025-01-10 06:09:35,785][00763] Waiting for process rollout_proc3 to join... +[2025-01-10 06:09:35,789][00763] Waiting for process rollout_proc4 to join... +[2025-01-10 06:09:35,793][00763] Waiting for process rollout_proc5 to join... +[2025-01-10 06:09:35,796][00763] Waiting for process rollout_proc6 to join... +[2025-01-10 06:09:35,801][00763] Waiting for process rollout_proc7 to join... +[2025-01-10 06:09:35,804][00763] Batcher 0 profile tree view: +batching: 26.0988, releasing_batches: 0.0293 +[2025-01-10 06:09:35,806][00763] InferenceWorker_p0-w0 profile tree view: +wait_policy: 0.0163 + wait_policy_total: 420.5668 +update_model: 8.4490 + weight_update: 0.0017 +one_step: 0.0113 + handle_policy_step: 569.2677 + deserialize: 14.6130, stack: 3.0792, obs_to_device_normalize: 122.1415, forward: 285.8168, send_messages: 28.7464 + prepare_outputs: 86.2009 + to_cpu: 51.5246 +[2025-01-10 06:09:35,809][00763] Learner 0 profile tree view: +misc: 0.0065, prepare_batch: 13.8782 +train: 73.4434 + epoch_init: 0.0056, minibatch_init: 0.0220, losses_postprocess: 0.6628, kl_divergence: 0.6089, after_optimizer: 33.6844 + calculate_losses: 26.0860 + losses_init: 0.0038, forward_head: 1.2761, bptt_initial: 17.5376, tail: 1.0784, advantages_returns: 0.2250, losses: 3.7424 + bptt: 1.9195 + bptt_forward_core: 1.8274 + update: 11.7090 + clip: 0.9054 +[2025-01-10 06:09:35,810][00763] RolloutWorker_w0 profile tree view: +wait_for_trajectories: 0.3070, enqueue_policy_requests: 100.9351, env_step: 815.3342, overhead: 14.0398, complete_rollouts: 6.9394 +save_policy_outputs: 21.3297 + split_output_tensors: 8.6742 +[2025-01-10 06:09:35,811][00763] RolloutWorker_w7 profile tree view: +wait_for_trajectories: 0.3006, enqueue_policy_requests: 105.5824, env_step: 808.0028, overhead: 14.2088, complete_rollouts: 6.8331 +save_policy_outputs: 21.1441 + split_output_tensors: 8.7494 +[2025-01-10 06:09:35,813][00763] Loop Runner_EvtLoop terminating... +[2025-01-10 06:09:35,814][00763] Runner profile tree view: +main_loop: 1071.2717 +[2025-01-10 06:09:35,817][00763] Collected {0: 4005888}, FPS: 3739.4 +[2025-01-10 06:09:56,599][00763] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json +[2025-01-10 06:09:56,601][00763] Overriding arg 'num_workers' with value 1 passed from command line +[2025-01-10 06:09:56,603][00763] Adding new argument 'no_render'=True that is not in the saved config file! +[2025-01-10 06:09:56,605][00763] Adding new argument 'save_video'=True that is not in the saved config file! +[2025-01-10 06:09:56,606][00763] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! +[2025-01-10 06:09:56,608][00763] Adding new argument 'video_name'=None that is not in the saved config file! +[2025-01-10 06:09:56,609][00763] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! +[2025-01-10 06:09:56,611][00763] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! +[2025-01-10 06:09:56,612][00763] Adding new argument 'push_to_hub'=False that is not in the saved config file! +[2025-01-10 06:09:56,613][00763] Adding new argument 'hf_repository'=None that is not in the saved config file! +[2025-01-10 06:09:56,614][00763] Adding new argument 'policy_index'=0 that is not in the saved config file! +[2025-01-10 06:09:56,615][00763] Adding new argument 'eval_deterministic'=False that is not in the saved config file! +[2025-01-10 06:09:56,616][00763] Adding new argument 'train_script'=None that is not in the saved config file! +[2025-01-10 06:09:56,617][00763] Adding new argument 'enjoy_script'=None that is not in the saved config file! +[2025-01-10 06:09:56,618][00763] Using frameskip 1 and render_action_repeat=4 for evaluation +[2025-01-10 06:09:56,662][00763] Doom resolution: 160x120, resize resolution: (128, 72) +[2025-01-10 06:09:56,667][00763] RunningMeanStd input shape: (3, 72, 128) +[2025-01-10 06:09:56,669][00763] RunningMeanStd input shape: (1,) +[2025-01-10 06:09:56,689][00763] ConvEncoder: input_channels=3 +[2025-01-10 06:09:56,851][00763] Conv encoder output size: 512 +[2025-01-10 06:09:56,853][00763] Policy head output size: 512 +[2025-01-10 06:09:57,189][00763] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... +[2025-01-10 06:09:58,223][00763] Num frames 100... +[2025-01-10 06:09:58,342][00763] Num frames 200... +[2025-01-10 06:09:58,461][00763] Num frames 300... +[2025-01-10 06:09:58,579][00763] Num frames 400... +[2025-01-10 06:09:58,702][00763] Num frames 500... +[2025-01-10 06:09:58,823][00763] Num frames 600... +[2025-01-10 06:09:58,954][00763] Num frames 700... +[2025-01-10 06:09:59,081][00763] Num frames 800... +[2025-01-10 06:09:59,202][00763] Num frames 900... +[2025-01-10 06:09:59,322][00763] Num frames 1000... +[2025-01-10 06:09:59,448][00763] Num frames 1100... +[2025-01-10 06:09:59,603][00763] Avg episode rewards: #0: 28.790, true rewards: #0: 11.790 +[2025-01-10 06:09:59,605][00763] Avg episode reward: 28.790, avg true_objective: 11.790 +[2025-01-10 06:09:59,633][00763] Num frames 1200... +[2025-01-10 06:09:59,753][00763] Num frames 1300... +[2025-01-10 06:09:59,871][00763] Num frames 1400... +[2025-01-10 06:10:00,001][00763] Num frames 1500... +[2025-01-10 06:10:00,135][00763] Avg episode rewards: #0: 17.825, true rewards: #0: 7.825 +[2025-01-10 06:10:00,136][00763] Avg episode reward: 17.825, avg true_objective: 7.825 +[2025-01-10 06:10:00,181][00763] Num frames 1600... +[2025-01-10 06:10:00,296][00763] Num frames 1700... +[2025-01-10 06:10:00,416][00763] Num frames 1800... +[2025-01-10 06:10:00,538][00763] Num frames 1900... +[2025-01-10 06:10:00,658][00763] Num frames 2000... +[2025-01-10 06:10:00,779][00763] Num frames 2100... +[2025-01-10 06:10:00,900][00763] Num frames 2200... +[2025-01-10 06:10:01,035][00763] Num frames 2300... +[2025-01-10 06:10:01,154][00763] Num frames 2400... +[2025-01-10 06:10:01,282][00763] Num frames 2500... +[2025-01-10 06:10:01,402][00763] Num frames 2600... +[2025-01-10 06:10:01,525][00763] Num frames 2700... +[2025-01-10 06:10:01,678][00763] Avg episode rewards: #0: 20.603, true rewards: #0: 9.270 +[2025-01-10 06:10:01,681][00763] Avg episode reward: 20.603, avg true_objective: 9.270 +[2025-01-10 06:10:01,708][00763] Num frames 2800... +[2025-01-10 06:10:01,826][00763] Num frames 2900... +[2025-01-10 06:10:01,974][00763] Num frames 3000... +[2025-01-10 06:10:02,115][00763] Num frames 3100... +[2025-01-10 06:10:02,239][00763] Num frames 3200... +[2025-01-10 06:10:02,362][00763] Num frames 3300... +[2025-01-10 06:10:02,483][00763] Num frames 3400... +[2025-01-10 06:10:02,605][00763] Num frames 3500... +[2025-01-10 06:10:02,765][00763] Avg episode rewards: #0: 19.703, true rewards: #0: 8.952 +[2025-01-10 06:10:02,767][00763] Avg episode reward: 19.703, avg true_objective: 8.952 +[2025-01-10 06:10:02,793][00763] Num frames 3600... +[2025-01-10 06:10:02,915][00763] Num frames 3700... +[2025-01-10 06:10:03,057][00763] Num frames 3800... +[2025-01-10 06:10:03,178][00763] Num frames 3900... +[2025-01-10 06:10:03,299][00763] Num frames 4000... +[2025-01-10 06:10:03,424][00763] Num frames 4100... +[2025-01-10 06:10:03,547][00763] Num frames 4200... +[2025-01-10 06:10:03,668][00763] Num frames 4300... +[2025-01-10 06:10:03,792][00763] Num frames 4400... +[2025-01-10 06:10:03,913][00763] Num frames 4500... +[2025-01-10 06:10:04,044][00763] Num frames 4600... +[2025-01-10 06:10:04,182][00763] Num frames 4700... +[2025-01-10 06:10:04,268][00763] Avg episode rewards: #0: 21.246, true rewards: #0: 9.446 +[2025-01-10 06:10:04,271][00763] Avg episode reward: 21.246, avg true_objective: 9.446 +[2025-01-10 06:10:04,367][00763] Num frames 4800... +[2025-01-10 06:10:04,501][00763] Num frames 4900... +[2025-01-10 06:10:04,623][00763] Num frames 5000... +[2025-01-10 06:10:04,746][00763] Num frames 5100... +[2025-01-10 06:10:04,873][00763] Num frames 5200... +[2025-01-10 06:10:05,000][00763] Num frames 5300... +[2025-01-10 06:10:05,136][00763] Num frames 5400... +[2025-01-10 06:10:05,260][00763] Num frames 5500... +[2025-01-10 06:10:05,380][00763] Num frames 5600... +[2025-01-10 06:10:05,498][00763] Num frames 5700... +[2025-01-10 06:10:05,623][00763] Num frames 5800... +[2025-01-10 06:10:05,743][00763] Num frames 5900... +[2025-01-10 06:10:05,860][00763] Num frames 6000... +[2025-01-10 06:10:05,984][00763] Num frames 6100... +[2025-01-10 06:10:06,108][00763] Num frames 6200... +[2025-01-10 06:10:06,216][00763] Avg episode rewards: #0: 23.728, true rewards: #0: 10.395 +[2025-01-10 06:10:06,217][00763] Avg episode reward: 23.728, avg true_objective: 10.395 +[2025-01-10 06:10:06,295][00763] Num frames 6300... +[2025-01-10 06:10:06,414][00763] Num frames 6400... +[2025-01-10 06:10:06,534][00763] Num frames 6500... +[2025-01-10 06:10:06,656][00763] Num frames 6600... +[2025-01-10 06:10:06,776][00763] Num frames 6700... +[2025-01-10 06:10:06,897][00763] Num frames 6800... +[2025-01-10 06:10:07,028][00763] Num frames 6900... +[2025-01-10 06:10:07,155][00763] Num frames 7000... +[2025-01-10 06:10:07,273][00763] Num frames 7100... +[2025-01-10 06:10:07,405][00763] Num frames 7200... +[2025-01-10 06:10:07,530][00763] Num frames 7300... +[2025-01-10 06:10:07,652][00763] Num frames 7400... +[2025-01-10 06:10:07,814][00763] Avg episode rewards: #0: 24.837, true rewards: #0: 10.694 +[2025-01-10 06:10:07,817][00763] Avg episode reward: 24.837, avg true_objective: 10.694 +[2025-01-10 06:10:07,837][00763] Num frames 7500... +[2025-01-10 06:10:07,958][00763] Num frames 7600... +[2025-01-10 06:10:08,148][00763] Num frames 7700... +[2025-01-10 06:10:08,333][00763] Num frames 7800... +[2025-01-10 06:10:08,501][00763] Num frames 7900... +[2025-01-10 06:10:08,666][00763] Num frames 8000... +[2025-01-10 06:10:08,837][00763] Num frames 8100... +[2025-01-10 06:10:09,004][00763] Num frames 8200... +[2025-01-10 06:10:09,174][00763] Num frames 8300... +[2025-01-10 06:10:09,352][00763] Num frames 8400... +[2025-01-10 06:10:09,528][00763] Num frames 8500... +[2025-01-10 06:10:09,709][00763] Num frames 8600... +[2025-01-10 06:10:09,893][00763] Num frames 8700... +[2025-01-10 06:10:10,066][00763] Num frames 8800... +[2025-01-10 06:10:10,244][00763] Num frames 8900... +[2025-01-10 06:10:10,405][00763] Num frames 9000... +[2025-01-10 06:10:10,526][00763] Num frames 9100... +[2025-01-10 06:10:10,648][00763] Num frames 9200... +[2025-01-10 06:10:10,773][00763] Num frames 9300... +[2025-01-10 06:10:10,888][00763] Num frames 9400... +[2025-01-10 06:10:11,012][00763] Num frames 9500... +[2025-01-10 06:10:11,171][00763] Avg episode rewards: #0: 28.857, true rewards: #0: 11.982 +[2025-01-10 06:10:11,173][00763] Avg episode reward: 28.857, avg true_objective: 11.982 +[2025-01-10 06:10:11,195][00763] Num frames 9600... +[2025-01-10 06:10:11,320][00763] Num frames 9700... +[2025-01-10 06:10:11,439][00763] Num frames 9800... +[2025-01-10 06:10:11,557][00763] Num frames 9900... +[2025-01-10 06:10:11,697][00763] Avg episode rewards: #0: 26.078, true rewards: #0: 11.078 +[2025-01-10 06:10:11,699][00763] Avg episode reward: 26.078, avg true_objective: 11.078 +[2025-01-10 06:10:11,739][00763] Num frames 10000... +[2025-01-10 06:10:11,863][00763] Num frames 10100... +[2025-01-10 06:10:11,981][00763] Num frames 10200... +[2025-01-10 06:10:12,109][00763] Num frames 10300... +[2025-01-10 06:10:12,226][00763] Num frames 10400... +[2025-01-10 06:10:12,352][00763] Num frames 10500... +[2025-01-10 06:10:12,474][00763] Num frames 10600... +[2025-01-10 06:10:12,593][00763] Num frames 10700... +[2025-01-10 06:10:12,712][00763] Num frames 10800... +[2025-01-10 06:10:12,838][00763] Num frames 10900... +[2025-01-10 06:10:12,959][00763] Num frames 11000... +[2025-01-10 06:10:13,087][00763] Num frames 11100... +[2025-01-10 06:10:13,214][00763] Num frames 11200... +[2025-01-10 06:10:13,337][00763] Num frames 11300... +[2025-01-10 06:10:13,469][00763] Num frames 11400... +[2025-01-10 06:10:13,544][00763] Avg episode rewards: #0: 27.115, true rewards: #0: 11.415 +[2025-01-10 06:10:13,547][00763] Avg episode reward: 27.115, avg true_objective: 11.415 +[2025-01-10 06:11:17,441][00763] Replay video saved to /content/train_dir/default_experiment/replay.mp4! +[2025-01-10 06:13:32,898][00763] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json +[2025-01-10 06:13:32,899][00763] Overriding arg 'num_workers' with value 1 passed from command line +[2025-01-10 06:13:32,901][00763] Adding new argument 'no_render'=True that is not in the saved config file! +[2025-01-10 06:13:32,903][00763] Adding new argument 'save_video'=True that is not in the saved config file! +[2025-01-10 06:13:32,903][00763] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! +[2025-01-10 06:13:32,905][00763] Adding new argument 'video_name'=None that is not in the saved config file! +[2025-01-10 06:13:32,906][00763] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! +[2025-01-10 06:13:32,907][00763] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! +[2025-01-10 06:13:32,909][00763] Adding new argument 'push_to_hub'=True that is not in the saved config file! +[2025-01-10 06:13:32,911][00763] Adding new argument 'hf_repository'='ThomasSimonini/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! +[2025-01-10 06:13:32,915][00763] Adding new argument 'policy_index'=0 that is not in the saved config file! +[2025-01-10 06:13:32,916][00763] Adding new argument 'eval_deterministic'=False that is not in the saved config file! +[2025-01-10 06:13:32,917][00763] Adding new argument 'train_script'=None that is not in the saved config file! +[2025-01-10 06:13:32,918][00763] Adding new argument 'enjoy_script'=None that is not in the saved config file! +[2025-01-10 06:13:32,919][00763] Using frameskip 1 and render_action_repeat=4 for evaluation +[2025-01-10 06:13:32,958][00763] RunningMeanStd input shape: (3, 72, 128) +[2025-01-10 06:13:32,959][00763] RunningMeanStd input shape: (1,) +[2025-01-10 06:13:32,973][00763] ConvEncoder: input_channels=3 +[2025-01-10 06:13:33,022][00763] Conv encoder output size: 512 +[2025-01-10 06:13:33,024][00763] Policy head output size: 512 +[2025-01-10 06:13:33,045][00763] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... +[2025-01-10 06:13:33,447][00763] Num frames 100... +[2025-01-10 06:13:33,565][00763] Num frames 200... +[2025-01-10 06:13:33,685][00763] Num frames 300... +[2025-01-10 06:13:33,800][00763] Num frames 400... +[2025-01-10 06:13:33,917][00763] Num frames 500... +[2025-01-10 06:13:34,046][00763] Num frames 600... +[2025-01-10 06:13:34,167][00763] Num frames 700... +[2025-01-10 06:13:34,286][00763] Num frames 800... +[2025-01-10 06:13:34,417][00763] Avg episode rewards: #0: 16.640, true rewards: #0: 8.640 +[2025-01-10 06:13:34,419][00763] Avg episode reward: 16.640, avg true_objective: 8.640 +[2025-01-10 06:13:34,463][00763] Num frames 900... +[2025-01-10 06:13:34,586][00763] Num frames 1000... +[2025-01-10 06:13:34,706][00763] Num frames 1100... +[2025-01-10 06:13:34,826][00763] Num frames 1200... +[2025-01-10 06:13:34,944][00763] Num frames 1300... +[2025-01-10 06:13:35,079][00763] Num frames 1400... +[2025-01-10 06:13:35,181][00763] Avg episode rewards: #0: 13.700, true rewards: #0: 7.200 +[2025-01-10 06:13:35,183][00763] Avg episode reward: 13.700, avg true_objective: 7.200 +[2025-01-10 06:13:35,254][00763] Num frames 1500... +[2025-01-10 06:13:35,370][00763] Num frames 1600... +[2025-01-10 06:13:35,489][00763] Num frames 1700... +[2025-01-10 06:13:35,608][00763] Num frames 1800... +[2025-01-10 06:13:35,725][00763] Num frames 1900... +[2025-01-10 06:13:35,842][00763] Num frames 2000... +[2025-01-10 06:13:35,961][00763] Num frames 2100... +[2025-01-10 06:13:36,098][00763] Num frames 2200... +[2025-01-10 06:13:36,217][00763] Num frames 2300... +[2025-01-10 06:13:36,338][00763] Num frames 2400... +[2025-01-10 06:13:36,391][00763] Avg episode rewards: #0: 15.667, true rewards: #0: 8.000 +[2025-01-10 06:13:36,392][00763] Avg episode reward: 15.667, avg true_objective: 8.000 +[2025-01-10 06:13:36,509][00763] Num frames 2500... +[2025-01-10 06:13:36,634][00763] Num frames 2600... +[2025-01-10 06:13:36,754][00763] Num frames 2700... +[2025-01-10 06:13:36,874][00763] Num frames 2800... +[2025-01-10 06:13:36,996][00763] Num frames 2900... +[2025-01-10 06:13:37,126][00763] Num frames 3000... +[2025-01-10 06:13:37,251][00763] Num frames 3100... +[2025-01-10 06:13:37,381][00763] Num frames 3200... +[2025-01-10 06:13:37,499][00763] Num frames 3300... +[2025-01-10 06:13:37,622][00763] Num frames 3400... +[2025-01-10 06:13:37,740][00763] Num frames 3500... +[2025-01-10 06:13:37,863][00763] Num frames 3600... +[2025-01-10 06:13:37,986][00763] Num frames 3700... +[2025-01-10 06:13:38,143][00763] Avg episode rewards: #0: 19.940, true rewards: #0: 9.440 +[2025-01-10 06:13:38,145][00763] Avg episode reward: 19.940, avg true_objective: 9.440 +[2025-01-10 06:13:38,175][00763] Num frames 3800... +[2025-01-10 06:13:38,294][00763] Num frames 3900... +[2025-01-10 06:13:38,410][00763] Num frames 4000... +[2025-01-10 06:13:38,529][00763] Num frames 4100... +[2025-01-10 06:13:38,649][00763] Num frames 4200... +[2025-01-10 06:13:51,050][00763] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json +[2025-01-10 06:13:51,051][00763] Overriding arg 'num_workers' with value 1 passed from command line +[2025-01-10 06:13:51,054][00763] Adding new argument 'no_render'=True that is not in the saved config file! +[2025-01-10 06:13:51,056][00763] Adding new argument 'save_video'=True that is not in the saved config file! +[2025-01-10 06:13:51,058][00763] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! +[2025-01-10 06:13:51,059][00763] Adding new argument 'video_name'=None that is not in the saved config file! +[2025-01-10 06:13:51,060][00763] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! +[2025-01-10 06:13:51,062][00763] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! +[2025-01-10 06:13:51,063][00763] Adding new argument 'push_to_hub'=True that is not in the saved config file! +[2025-01-10 06:13:51,064][00763] Adding new argument 'hf_repository'='sErial03/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! +[2025-01-10 06:13:51,065][00763] Adding new argument 'policy_index'=0 that is not in the saved config file! +[2025-01-10 06:13:51,066][00763] Adding new argument 'eval_deterministic'=False that is not in the saved config file! +[2025-01-10 06:13:51,067][00763] Adding new argument 'train_script'=None that is not in the saved config file! +[2025-01-10 06:13:51,068][00763] Adding new argument 'enjoy_script'=None that is not in the saved config file! +[2025-01-10 06:13:51,069][00763] Using frameskip 1 and render_action_repeat=4 for evaluation +[2025-01-10 06:13:51,107][00763] RunningMeanStd input shape: (3, 72, 128) +[2025-01-10 06:13:51,109][00763] RunningMeanStd input shape: (1,) +[2025-01-10 06:13:51,121][00763] ConvEncoder: input_channels=3 +[2025-01-10 06:13:51,156][00763] Conv encoder output size: 512 +[2025-01-10 06:13:51,157][00763] Policy head output size: 512 +[2025-01-10 06:13:51,175][00763] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... +[2025-01-10 06:13:51,568][00763] Num frames 100... +[2025-01-10 06:13:51,687][00763] Num frames 200... +[2025-01-10 06:13:51,806][00763] Num frames 300... +[2025-01-10 06:13:51,928][00763] Num frames 400... +[2025-01-10 06:13:52,062][00763] Num frames 500... +[2025-01-10 06:13:52,178][00763] Avg episode rewards: #0: 9.440, true rewards: #0: 5.440 +[2025-01-10 06:13:52,180][00763] Avg episode reward: 9.440, avg true_objective: 5.440 +[2025-01-10 06:13:52,248][00763] Num frames 600... +[2025-01-10 06:13:52,374][00763] Num frames 700... +[2025-01-10 06:13:52,533][00763] Num frames 800... +[2025-01-10 06:13:52,697][00763] Num frames 900... +[2025-01-10 06:13:52,862][00763] Num frames 1000... +[2025-01-10 06:13:53,033][00763] Num frames 1100... +[2025-01-10 06:13:53,128][00763] Avg episode rewards: #0: 10.100, true rewards: #0: 5.600 +[2025-01-10 06:13:53,129][00763] Avg episode reward: 10.100, avg true_objective: 5.600 +[2025-01-10 06:13:53,261][00763] Num frames 1200... +[2025-01-10 06:13:53,421][00763] Num frames 1300... +[2025-01-10 06:13:53,612][00763] Avg episode rewards: #0: 8.283, true rewards: #0: 4.617 +[2025-01-10 06:13:53,616][00763] Avg episode reward: 8.283, avg true_objective: 4.617 +[2025-01-10 06:13:53,643][00763] Num frames 1400... +[2025-01-10 06:13:53,806][00763] Num frames 1500... +[2025-01-10 06:13:53,969][00763] Num frames 1600... +[2025-01-10 06:13:54,147][00763] Num frames 1700... +[2025-01-10 06:13:54,322][00763] Num frames 1800... +[2025-01-10 06:13:54,506][00763] Num frames 1900... +[2025-01-10 06:13:54,680][00763] Num frames 2000... +[2025-01-10 06:13:54,811][00763] Num frames 2100... +[2025-01-10 06:13:54,937][00763] Num frames 2200... +[2025-01-10 06:13:55,017][00763] Avg episode rewards: #0: 10.300, true rewards: #0: 5.550 +[2025-01-10 06:13:55,018][00763] Avg episode reward: 10.300, avg true_objective: 5.550 +[2025-01-10 06:13:55,114][00763] Num frames 2300... +[2025-01-10 06:13:55,232][00763] Num frames 2400... +[2025-01-10 06:13:55,350][00763] Num frames 2500... +[2025-01-10 06:13:55,476][00763] Num frames 2600... +[2025-01-10 06:13:55,573][00763] Avg episode rewards: #0: 9.472, true rewards: #0: 5.272 +[2025-01-10 06:13:55,574][00763] Avg episode reward: 9.472, avg true_objective: 5.272 +[2025-01-10 06:13:55,651][00763] Num frames 2700... +[2025-01-10 06:13:55,771][00763] Num frames 2800... +[2025-01-10 06:13:55,888][00763] Num frames 2900... +[2025-01-10 06:13:56,016][00763] Num frames 3000... +[2025-01-10 06:13:56,143][00763] Num frames 3100... +[2025-01-10 06:13:56,262][00763] Num frames 3200... +[2025-01-10 06:13:56,378][00763] Num frames 3300... +[2025-01-10 06:13:56,527][00763] Avg episode rewards: #0: 10.287, true rewards: #0: 5.620 +[2025-01-10 06:13:56,528][00763] Avg episode reward: 10.287, avg true_objective: 5.620 +[2025-01-10 06:13:56,565][00763] Num frames 3400... +[2025-01-10 06:13:56,684][00763] Num frames 3500... +[2025-01-10 06:13:56,803][00763] Num frames 3600... +[2025-01-10 06:13:56,919][00763] Num frames 3700... +[2025-01-10 06:13:57,044][00763] Num frames 3800... +[2025-01-10 06:13:57,163][00763] Num frames 3900... +[2025-01-10 06:13:57,284][00763] Num frames 4000... +[2025-01-10 06:13:57,403][00763] Num frames 4100... +[2025-01-10 06:13:57,535][00763] Num frames 4200... +[2025-01-10 06:13:57,655][00763] Num frames 4300... +[2025-01-10 06:13:57,774][00763] Num frames 4400... +[2025-01-10 06:13:57,895][00763] Num frames 4500... +[2025-01-10 06:13:58,062][00763] Avg episode rewards: #0: 12.697, true rewards: #0: 6.554 +[2025-01-10 06:13:58,063][00763] Avg episode reward: 12.697, avg true_objective: 6.554 +[2025-01-10 06:13:58,080][00763] Num frames 4600... +[2025-01-10 06:13:58,202][00763] Num frames 4700... +[2025-01-10 06:13:58,318][00763] Num frames 4800... +[2025-01-10 06:13:58,435][00763] Num frames 4900... +[2025-01-10 06:13:58,564][00763] Num frames 5000... +[2025-01-10 06:13:58,683][00763] Num frames 5100... +[2025-01-10 06:13:58,801][00763] Num frames 5200... +[2025-01-10 06:13:58,919][00763] Num frames 5300... +[2025-01-10 06:13:59,042][00763] Num frames 5400... +[2025-01-10 06:13:59,163][00763] Num frames 5500... +[2025-01-10 06:13:59,284][00763] Num frames 5600... +[2025-01-10 06:13:59,402][00763] Num frames 5700... +[2025-01-10 06:13:59,523][00763] Num frames 5800... +[2025-01-10 06:13:59,650][00763] Num frames 5900... +[2025-01-10 06:13:59,713][00763] Avg episode rewards: #0: 15.131, true rewards: #0: 7.381 +[2025-01-10 06:13:59,715][00763] Avg episode reward: 15.131, avg true_objective: 7.381 +[2025-01-10 06:13:59,827][00763] Num frames 6000... +[2025-01-10 06:13:59,945][00763] Num frames 6100... +[2025-01-10 06:14:00,075][00763] Num frames 6200... +[2025-01-10 06:14:00,196][00763] Num frames 6300... +[2025-01-10 06:14:00,314][00763] Num frames 6400... +[2025-01-10 06:14:00,433][00763] Num frames 6500... +[2025-01-10 06:14:00,558][00763] Num frames 6600... +[2025-01-10 06:14:00,681][00763] Num frames 6700... +[2025-01-10 06:14:00,817][00763] Avg episode rewards: #0: 15.743, true rewards: #0: 7.521 +[2025-01-10 06:14:00,818][00763] Avg episode reward: 15.743, avg true_objective: 7.521 +[2025-01-10 06:14:00,859][00763] Num frames 6800... +[2025-01-10 06:14:00,977][00763] Num frames 6900... +[2025-01-10 06:14:01,103][00763] Num frames 7000... +[2025-01-10 06:14:01,225][00763] Num frames 7100... +[2025-01-10 06:14:01,342][00763] Num frames 7200... +[2025-01-10 06:14:01,459][00763] Num frames 7300... +[2025-01-10 06:14:01,611][00763] Avg episode rewards: #0: 15.177, true rewards: #0: 7.377 +[2025-01-10 06:14:01,613][00763] Avg episode reward: 15.177, avg true_objective: 7.377 +[2025-01-10 06:14:41,163][00763] Replay video saved to /content/train_dir/default_experiment/replay.mp4!