[2025-02-16 00:44:37,100][01307] Saving configuration to /content/train_dir/default_experiment/config.json... [2025-02-16 00:44:37,102][01307] Rollout worker 0 uses device cpu [2025-02-16 00:44:37,103][01307] Rollout worker 1 uses device cpu [2025-02-16 00:44:37,106][01307] Rollout worker 2 uses device cpu [2025-02-16 00:44:37,107][01307] Rollout worker 3 uses device cpu [2025-02-16 00:44:37,108][01307] Rollout worker 4 uses device cpu [2025-02-16 00:44:37,109][01307] Rollout worker 5 uses device cpu [2025-02-16 00:44:37,110][01307] Rollout worker 6 uses device cpu [2025-02-16 00:44:37,111][01307] Rollout worker 7 uses device cpu [2025-02-16 00:44:37,260][01307] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2025-02-16 00:44:37,262][01307] InferenceWorker_p0-w0: min num requests: 2 [2025-02-16 00:44:37,294][01307] Starting all processes... [2025-02-16 00:44:37,295][01307] Starting process learner_proc0 [2025-02-16 00:44:37,352][01307] Starting all processes... [2025-02-16 00:44:37,362][01307] Starting process inference_proc0-0 [2025-02-16 00:44:37,362][01307] Starting process rollout_proc0 [2025-02-16 00:44:37,362][01307] Starting process rollout_proc1 [2025-02-16 00:44:37,362][01307] Starting process rollout_proc2 [2025-02-16 00:44:37,363][01307] Starting process rollout_proc3 [2025-02-16 00:44:37,363][01307] Starting process rollout_proc4 [2025-02-16 00:44:37,363][01307] Starting process rollout_proc5 [2025-02-16 00:44:37,363][01307] Starting process rollout_proc6 [2025-02-16 00:44:37,363][01307] Starting process rollout_proc7 [2025-02-16 00:44:52,444][03416] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2025-02-16 00:44:52,450][03416] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2025-02-16 00:44:52,520][03416] Num visible devices: 1 [2025-02-16 00:44:52,575][03416] Starting seed is not provided [2025-02-16 00:44:52,576][03416] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2025-02-16 00:44:52,577][03416] Initializing actor-critic model on device cuda:0 [2025-02-16 00:44:52,578][03416] RunningMeanStd input shape: (3, 72, 128) [2025-02-16 00:44:52,582][03416] RunningMeanStd input shape: (1,) [2025-02-16 00:44:52,662][03416] ConvEncoder: input_channels=3 [2025-02-16 00:44:53,026][03437] Worker 7 uses CPU cores [1] [2025-02-16 00:44:53,084][03429] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2025-02-16 00:44:53,088][03429] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2025-02-16 00:44:53,191][03429] Num visible devices: 1 [2025-02-16 00:44:53,206][03432] Worker 2 uses CPU cores [0] [2025-02-16 00:44:53,466][03430] Worker 0 uses CPU cores [0] [2025-02-16 00:44:53,505][03434] Worker 4 uses CPU cores [0] [2025-02-16 00:44:53,565][03436] Worker 6 uses CPU cores [0] [2025-02-16 00:44:53,623][03435] Worker 5 uses CPU cores [1] [2025-02-16 00:44:53,677][03416] Conv encoder output size: 512 [2025-02-16 00:44:53,678][03416] Policy head output size: 512 [2025-02-16 00:44:53,687][03431] Worker 1 uses CPU cores [1] [2025-02-16 00:44:53,767][03416] Created Actor Critic model with architecture: [2025-02-16 00:44:53,768][03416] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2025-02-16 00:44:53,798][03433] Worker 3 uses CPU cores [1] [2025-02-16 00:44:54,177][03416] Using optimizer [2025-02-16 00:44:57,261][01307] Heartbeat connected on InferenceWorker_p0-w0 [2025-02-16 00:44:57,269][01307] Heartbeat connected on RolloutWorker_w0 [2025-02-16 00:44:57,272][01307] Heartbeat connected on RolloutWorker_w1 [2025-02-16 00:44:57,275][01307] Heartbeat connected on RolloutWorker_w2 [2025-02-16 00:44:57,279][01307] Heartbeat connected on RolloutWorker_w3 [2025-02-16 00:44:57,283][01307] Heartbeat connected on RolloutWorker_w4 [2025-02-16 00:44:57,287][01307] Heartbeat connected on RolloutWorker_w5 [2025-02-16 00:44:57,293][01307] Heartbeat connected on RolloutWorker_w7 [2025-02-16 00:44:57,294][01307] Heartbeat connected on RolloutWorker_w6 [2025-02-16 00:44:57,406][01307] Heartbeat connected on Batcher_0 [2025-02-16 00:44:58,784][03416] No checkpoints found [2025-02-16 00:44:58,784][03416] Did not load from checkpoint, starting from scratch! [2025-02-16 00:44:58,785][03416] Initialized policy 0 weights for model version 0 [2025-02-16 00:44:58,788][03416] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2025-02-16 00:44:58,796][03416] LearnerWorker_p0 finished initialization! [2025-02-16 00:44:58,796][01307] Heartbeat connected on LearnerWorker_p0 [2025-02-16 00:44:59,044][03429] RunningMeanStd input shape: (3, 72, 128) [2025-02-16 00:44:59,046][03429] RunningMeanStd input shape: (1,) [2025-02-16 00:44:59,060][03429] ConvEncoder: input_channels=3 [2025-02-16 00:44:59,172][03429] Conv encoder output size: 512 [2025-02-16 00:44:59,172][03429] Policy head output size: 512 [2025-02-16 00:44:59,207][01307] Inference worker 0-0 is ready! [2025-02-16 00:44:59,210][01307] All inference workers are ready! Signal rollout workers to start! [2025-02-16 00:44:59,499][03436] Doom resolution: 160x120, resize resolution: (128, 72) [2025-02-16 00:44:59,513][03435] Doom resolution: 160x120, resize resolution: (128, 72) [2025-02-16 00:44:59,542][03430] Doom resolution: 160x120, resize resolution: (128, 72) [2025-02-16 00:44:59,603][03433] Doom resolution: 160x120, resize resolution: (128, 72) [2025-02-16 00:44:59,609][03432] Doom resolution: 160x120, resize resolution: (128, 72) [2025-02-16 00:44:59,664][03434] Doom resolution: 160x120, resize resolution: (128, 72) [2025-02-16 00:44:59,721][03431] Doom resolution: 160x120, resize resolution: (128, 72) [2025-02-16 00:44:59,786][03437] Doom resolution: 160x120, resize resolution: (128, 72) [2025-02-16 00:45:00,760][03433] Decorrelating experience for 0 frames... [2025-02-16 00:45:00,759][03430] Decorrelating experience for 0 frames... [2025-02-16 00:45:00,761][03435] Decorrelating experience for 0 frames... [2025-02-16 00:45:00,760][03436] Decorrelating experience for 0 frames... [2025-02-16 00:45:01,516][03436] Decorrelating experience for 32 frames... [2025-02-16 00:45:01,518][03430] Decorrelating experience for 32 frames... [2025-02-16 00:45:01,854][01307] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2025-02-16 00:45:02,141][03436] Decorrelating experience for 64 frames... [2025-02-16 00:45:02,181][03431] Decorrelating experience for 0 frames... [2025-02-16 00:45:02,236][03435] Decorrelating experience for 32 frames... [2025-02-16 00:45:02,241][03433] Decorrelating experience for 32 frames... [2025-02-16 00:45:02,260][03437] Decorrelating experience for 0 frames... [2025-02-16 00:45:02,958][03432] Decorrelating experience for 0 frames... [2025-02-16 00:45:02,965][03436] Decorrelating experience for 96 frames... [2025-02-16 00:45:03,722][03431] Decorrelating experience for 32 frames... [2025-02-16 00:45:03,824][03437] Decorrelating experience for 32 frames... [2025-02-16 00:45:04,318][03435] Decorrelating experience for 64 frames... [2025-02-16 00:45:04,337][03433] Decorrelating experience for 64 frames... [2025-02-16 00:45:04,606][03432] Decorrelating experience for 32 frames... [2025-02-16 00:45:04,663][03430] Decorrelating experience for 64 frames... [2025-02-16 00:45:05,596][03434] Decorrelating experience for 0 frames... [2025-02-16 00:45:05,797][03431] Decorrelating experience for 64 frames... [2025-02-16 00:45:05,921][03437] Decorrelating experience for 64 frames... [2025-02-16 00:45:06,356][03435] Decorrelating experience for 96 frames... [2025-02-16 00:45:06,854][01307] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 41.6. Samples: 208. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2025-02-16 00:45:06,859][01307] Avg episode reward: [(0, '3.450')] [2025-02-16 00:45:07,200][03430] Decorrelating experience for 96 frames... [2025-02-16 00:45:07,804][03432] Decorrelating experience for 64 frames... [2025-02-16 00:45:08,884][03434] Decorrelating experience for 32 frames... [2025-02-16 00:45:09,085][03431] Decorrelating experience for 96 frames... [2025-02-16 00:45:09,324][03437] Decorrelating experience for 96 frames... [2025-02-16 00:45:10,177][03433] Decorrelating experience for 96 frames... [2025-02-16 00:45:11,854][01307] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 186.0. Samples: 1860. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2025-02-16 00:45:11,856][01307] Avg episode reward: [(0, '3.190')] [2025-02-16 00:45:12,244][03432] Decorrelating experience for 96 frames... [2025-02-16 00:45:12,329][03416] Signal inference workers to stop experience collection... [2025-02-16 00:45:12,347][03429] InferenceWorker_p0-w0: stopping experience collection [2025-02-16 00:45:12,777][03434] Decorrelating experience for 64 frames... [2025-02-16 00:45:13,189][03434] Decorrelating experience for 96 frames... [2025-02-16 00:45:13,394][03416] Signal inference workers to resume experience collection... [2025-02-16 00:45:13,395][03429] InferenceWorker_p0-w0: resuming experience collection [2025-02-16 00:45:16,854][01307] Fps is (10 sec: 2048.0, 60 sec: 1365.3, 300 sec: 1365.3). Total num frames: 20480. Throughput: 0: 205.9. Samples: 3088. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2025-02-16 00:45:16,861][01307] Avg episode reward: [(0, '3.418')] [2025-02-16 00:45:21,855][01307] Fps is (10 sec: 3686.0, 60 sec: 1843.1, 300 sec: 1843.1). Total num frames: 36864. Throughput: 0: 461.4. Samples: 9228. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-02-16 00:45:21,860][01307] Avg episode reward: [(0, '3.769')] [2025-02-16 00:45:22,455][03429] Updated weights for policy 0, policy_version 10 (0.0016) [2025-02-16 00:45:26,854][01307] Fps is (10 sec: 3276.8, 60 sec: 2129.9, 300 sec: 2129.9). Total num frames: 53248. Throughput: 0: 559.8. Samples: 13996. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-16 00:45:26,856][01307] Avg episode reward: [(0, '4.359')] [2025-02-16 00:45:31,854][01307] Fps is (10 sec: 4096.4, 60 sec: 2594.1, 300 sec: 2594.1). Total num frames: 77824. Throughput: 0: 581.5. Samples: 17444. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-02-16 00:45:31,856][01307] Avg episode reward: [(0, '4.530')] [2025-02-16 00:45:32,226][03429] Updated weights for policy 0, policy_version 20 (0.0021) [2025-02-16 00:45:36,854][01307] Fps is (10 sec: 4505.6, 60 sec: 2808.7, 300 sec: 2808.7). Total num frames: 98304. Throughput: 0: 690.2. Samples: 24156. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-16 00:45:36,860][01307] Avg episode reward: [(0, '4.449')] [2025-02-16 00:45:41,854][01307] Fps is (10 sec: 3686.3, 60 sec: 2867.2, 300 sec: 2867.2). Total num frames: 114688. Throughput: 0: 728.5. Samples: 29142. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2025-02-16 00:45:41,860][01307] Avg episode reward: [(0, '4.379')] [2025-02-16 00:45:41,866][03416] Saving new best policy, reward=4.379! [2025-02-16 00:45:42,971][03429] Updated weights for policy 0, policy_version 30 (0.0034) [2025-02-16 00:45:46,854][01307] Fps is (10 sec: 4096.0, 60 sec: 3094.8, 300 sec: 3094.8). Total num frames: 139264. Throughput: 0: 724.4. Samples: 32598. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-16 00:45:46,857][01307] Avg episode reward: [(0, '4.483')] [2025-02-16 00:45:46,862][03416] Saving new best policy, reward=4.483! [2025-02-16 00:45:51,854][01307] Fps is (10 sec: 4096.1, 60 sec: 3113.0, 300 sec: 3113.0). Total num frames: 155648. Throughput: 0: 862.1. Samples: 39002. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2025-02-16 00:45:51,856][01307] Avg episode reward: [(0, '4.657')] [2025-02-16 00:45:51,863][03416] Saving new best policy, reward=4.657! [2025-02-16 00:45:53,858][03429] Updated weights for policy 0, policy_version 40 (0.0026) [2025-02-16 00:45:56,854][01307] Fps is (10 sec: 3686.4, 60 sec: 3202.3, 300 sec: 3202.3). Total num frames: 176128. Throughput: 0: 940.0. Samples: 44160. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-16 00:45:56,860][01307] Avg episode reward: [(0, '4.527')] [2025-02-16 00:46:01,854][01307] Fps is (10 sec: 4505.5, 60 sec: 3345.1, 300 sec: 3345.1). Total num frames: 200704. Throughput: 0: 989.1. Samples: 47600. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2025-02-16 00:46:01,860][01307] Avg episode reward: [(0, '4.511')] [2025-02-16 00:46:02,723][03429] Updated weights for policy 0, policy_version 50 (0.0014) [2025-02-16 00:46:06,854][01307] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3339.8). Total num frames: 217088. Throughput: 0: 995.4. Samples: 54020. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-02-16 00:46:06,856][01307] Avg episode reward: [(0, '4.543')] [2025-02-16 00:46:11,854][01307] Fps is (10 sec: 3686.5, 60 sec: 3959.5, 300 sec: 3393.8). Total num frames: 237568. Throughput: 0: 1008.4. Samples: 59372. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-16 00:46:11,858][01307] Avg episode reward: [(0, '4.553')] [2025-02-16 00:46:13,349][03429] Updated weights for policy 0, policy_version 60 (0.0021) [2025-02-16 00:46:16,854][01307] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3495.3). Total num frames: 262144. Throughput: 0: 1009.7. Samples: 62880. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-02-16 00:46:16,865][01307] Avg episode reward: [(0, '4.723')] [2025-02-16 00:46:16,867][03416] Saving new best policy, reward=4.723! [2025-02-16 00:46:21,856][01307] Fps is (10 sec: 3686.2, 60 sec: 3959.5, 300 sec: 3430.4). Total num frames: 274432. Throughput: 0: 995.0. Samples: 68932. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-02-16 00:46:21,865][01307] Avg episode reward: [(0, '4.819')] [2025-02-16 00:46:21,909][03416] Saving new best policy, reward=4.819! [2025-02-16 00:46:24,570][03429] Updated weights for policy 0, policy_version 70 (0.0020) [2025-02-16 00:46:26,857][01307] Fps is (10 sec: 3275.8, 60 sec: 4027.5, 300 sec: 3469.4). Total num frames: 294912. Throughput: 0: 1001.2. Samples: 74200. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-16 00:46:26,860][01307] Avg episode reward: [(0, '4.786')] [2025-02-16 00:46:31,854][01307] Fps is (10 sec: 4505.8, 60 sec: 4027.7, 300 sec: 3549.9). Total num frames: 319488. Throughput: 0: 997.9. Samples: 77504. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2025-02-16 00:46:31,861][01307] Avg episode reward: [(0, '4.509')] [2025-02-16 00:46:31,867][03416] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000078_319488.pth... [2025-02-16 00:46:33,703][03429] Updated weights for policy 0, policy_version 80 (0.0019) [2025-02-16 00:46:36,854][01307] Fps is (10 sec: 4097.3, 60 sec: 3959.5, 300 sec: 3535.5). Total num frames: 335872. Throughput: 0: 985.5. Samples: 83348. Policy #0 lag: (min: 0.0, avg: 0.3, max: 2.0) [2025-02-16 00:46:36,861][01307] Avg episode reward: [(0, '4.570')] [2025-02-16 00:46:41,854][01307] Fps is (10 sec: 3276.8, 60 sec: 3959.5, 300 sec: 3522.6). Total num frames: 352256. Throughput: 0: 993.7. Samples: 88878. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-16 00:46:41,859][01307] Avg episode reward: [(0, '4.628')] [2025-02-16 00:46:44,450][03429] Updated weights for policy 0, policy_version 90 (0.0017) [2025-02-16 00:46:46,854][01307] Fps is (10 sec: 4095.9, 60 sec: 3959.4, 300 sec: 3588.9). Total num frames: 376832. Throughput: 0: 993.6. Samples: 92310. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-16 00:46:46,857][01307] Avg episode reward: [(0, '4.880')] [2025-02-16 00:46:46,860][03416] Saving new best policy, reward=4.880! [2025-02-16 00:46:51,856][01307] Fps is (10 sec: 4095.4, 60 sec: 3959.4, 300 sec: 3574.6). Total num frames: 393216. Throughput: 0: 980.8. Samples: 98156. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-16 00:46:51,858][01307] Avg episode reward: [(0, '5.142')] [2025-02-16 00:46:51,876][03416] Saving new best policy, reward=5.142! [2025-02-16 00:46:55,338][03429] Updated weights for policy 0, policy_version 100 (0.0026) [2025-02-16 00:46:56,854][01307] Fps is (10 sec: 3686.5, 60 sec: 3959.5, 300 sec: 3597.4). Total num frames: 413696. Throughput: 0: 993.2. Samples: 104066. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-02-16 00:46:56,860][01307] Avg episode reward: [(0, '5.176')] [2025-02-16 00:46:56,862][03416] Saving new best policy, reward=5.176! [2025-02-16 00:47:01,854][01307] Fps is (10 sec: 4506.3, 60 sec: 3959.5, 300 sec: 3652.3). Total num frames: 438272. Throughput: 0: 993.2. Samples: 107576. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-02-16 00:47:01,858][01307] Avg episode reward: [(0, '4.903')] [2025-02-16 00:47:04,526][03429] Updated weights for policy 0, policy_version 110 (0.0020) [2025-02-16 00:47:06,856][01307] Fps is (10 sec: 4095.1, 60 sec: 3959.3, 300 sec: 3637.2). Total num frames: 454656. Throughput: 0: 992.5. Samples: 113594. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2025-02-16 00:47:06,858][01307] Avg episode reward: [(0, '4.880')] [2025-02-16 00:47:11,856][01307] Fps is (10 sec: 4095.2, 60 sec: 4027.6, 300 sec: 3686.3). Total num frames: 479232. Throughput: 0: 1011.9. Samples: 119736. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2025-02-16 00:47:11,862][01307] Avg episode reward: [(0, '4.975')] [2025-02-16 00:47:14,306][03429] Updated weights for policy 0, policy_version 120 (0.0028) [2025-02-16 00:47:16,854][01307] Fps is (10 sec: 4916.2, 60 sec: 4027.7, 300 sec: 3731.9). Total num frames: 503808. Throughput: 0: 1019.9. Samples: 123400. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-16 00:47:16,856][01307] Avg episode reward: [(0, '4.800')] [2025-02-16 00:47:21,854][01307] Fps is (10 sec: 3687.1, 60 sec: 4027.8, 300 sec: 3686.4). Total num frames: 516096. Throughput: 0: 1018.4. Samples: 129174. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-16 00:47:21,858][01307] Avg episode reward: [(0, '4.669')] [2025-02-16 00:47:24,699][03429] Updated weights for policy 0, policy_version 130 (0.0017) [2025-02-16 00:47:26,854][01307] Fps is (10 sec: 3686.4, 60 sec: 4096.2, 300 sec: 3728.8). Total num frames: 540672. Throughput: 0: 1034.8. Samples: 135446. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-16 00:47:26,858][01307] Avg episode reward: [(0, '4.777')] [2025-02-16 00:47:31,854][01307] Fps is (10 sec: 4915.2, 60 sec: 4096.0, 300 sec: 3768.3). Total num frames: 565248. Throughput: 0: 1038.0. Samples: 139020. Policy #0 lag: (min: 0.0, avg: 0.3, max: 2.0) [2025-02-16 00:47:31,859][01307] Avg episode reward: [(0, '4.690')] [2025-02-16 00:47:34,101][03429] Updated weights for policy 0, policy_version 140 (0.0016) [2025-02-16 00:47:36,854][01307] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3726.0). Total num frames: 577536. Throughput: 0: 1032.7. Samples: 144628. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-02-16 00:47:36,858][01307] Avg episode reward: [(0, '4.623')] [2025-02-16 00:47:41,854][01307] Fps is (10 sec: 3686.4, 60 sec: 4164.3, 300 sec: 3763.2). Total num frames: 602112. Throughput: 0: 1044.6. Samples: 151074. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-02-16 00:47:41,858][01307] Avg episode reward: [(0, '4.463')] [2025-02-16 00:47:43,860][03429] Updated weights for policy 0, policy_version 150 (0.0017) [2025-02-16 00:47:46,855][01307] Fps is (10 sec: 4914.7, 60 sec: 4164.2, 300 sec: 3798.1). Total num frames: 626688. Throughput: 0: 1044.5. Samples: 154580. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-16 00:47:46,857][01307] Avg episode reward: [(0, '4.569')] [2025-02-16 00:47:51,854][01307] Fps is (10 sec: 3686.4, 60 sec: 4096.1, 300 sec: 3758.7). Total num frames: 638976. Throughput: 0: 1029.4. Samples: 159916. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-16 00:47:51,860][01307] Avg episode reward: [(0, '4.682')] [2025-02-16 00:47:54,677][03429] Updated weights for policy 0, policy_version 160 (0.0014) [2025-02-16 00:47:56,854][01307] Fps is (10 sec: 3686.8, 60 sec: 4164.3, 300 sec: 3791.7). Total num frames: 663552. Throughput: 0: 1037.2. Samples: 166408. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-02-16 00:47:56,856][01307] Avg episode reward: [(0, '4.657')] [2025-02-16 00:48:01,855][01307] Fps is (10 sec: 4914.7, 60 sec: 4164.2, 300 sec: 3822.9). Total num frames: 688128. Throughput: 0: 1032.2. Samples: 169852. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-02-16 00:48:01,857][01307] Avg episode reward: [(0, '4.509')] [2025-02-16 00:48:04,580][03429] Updated weights for policy 0, policy_version 170 (0.0013) [2025-02-16 00:48:06,854][01307] Fps is (10 sec: 4095.9, 60 sec: 4164.4, 300 sec: 3808.2). Total num frames: 704512. Throughput: 0: 1019.7. Samples: 175060. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-16 00:48:06,857][01307] Avg episode reward: [(0, '4.394')] [2025-02-16 00:48:11,854][01307] Fps is (10 sec: 3686.8, 60 sec: 4096.1, 300 sec: 3815.7). Total num frames: 724992. Throughput: 0: 1032.6. Samples: 181914. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-16 00:48:11,858][01307] Avg episode reward: [(0, '4.513')] [2025-02-16 00:48:13,779][03429] Updated weights for policy 0, policy_version 180 (0.0019) [2025-02-16 00:48:16,860][01307] Fps is (10 sec: 4503.0, 60 sec: 4095.6, 300 sec: 3843.8). Total num frames: 749568. Throughput: 0: 1032.2. Samples: 185474. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-02-16 00:48:16,862][01307] Avg episode reward: [(0, '4.918')] [2025-02-16 00:48:21,854][01307] Fps is (10 sec: 4096.0, 60 sec: 4164.3, 300 sec: 3829.8). Total num frames: 765952. Throughput: 0: 1015.8. Samples: 190340. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-02-16 00:48:21,856][01307] Avg episode reward: [(0, '5.077')] [2025-02-16 00:48:24,380][03429] Updated weights for policy 0, policy_version 190 (0.0012) [2025-02-16 00:48:26,854][01307] Fps is (10 sec: 3688.6, 60 sec: 4096.0, 300 sec: 3836.3). Total num frames: 786432. Throughput: 0: 1023.2. Samples: 197116. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-02-16 00:48:26,856][01307] Avg episode reward: [(0, '4.991')] [2025-02-16 00:48:31,854][01307] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 3861.9). Total num frames: 811008. Throughput: 0: 1026.9. Samples: 200788. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2025-02-16 00:48:31,856][01307] Avg episode reward: [(0, '4.954')] [2025-02-16 00:48:31,862][03416] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000198_811008.pth... [2025-02-16 00:48:34,792][03429] Updated weights for policy 0, policy_version 200 (0.0015) [2025-02-16 00:48:36,854][01307] Fps is (10 sec: 4096.0, 60 sec: 4164.3, 300 sec: 3848.3). Total num frames: 827392. Throughput: 0: 1020.6. Samples: 205844. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2025-02-16 00:48:36,857][01307] Avg episode reward: [(0, '4.926')] [2025-02-16 00:48:41,854][01307] Fps is (10 sec: 4096.0, 60 sec: 4164.3, 300 sec: 3872.6). Total num frames: 851968. Throughput: 0: 1039.1. Samples: 213166. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2025-02-16 00:48:41,856][01307] Avg episode reward: [(0, '4.922')] [2025-02-16 00:48:43,221][03429] Updated weights for policy 0, policy_version 210 (0.0014) [2025-02-16 00:48:46,854][01307] Fps is (10 sec: 4505.6, 60 sec: 4096.1, 300 sec: 3877.5). Total num frames: 872448. Throughput: 0: 1042.5. Samples: 216764. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2025-02-16 00:48:46,856][01307] Avg episode reward: [(0, '4.996')] [2025-02-16 00:48:51,854][01307] Fps is (10 sec: 4096.0, 60 sec: 4232.5, 300 sec: 3882.3). Total num frames: 892928. Throughput: 0: 1041.4. Samples: 221922. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2025-02-16 00:48:51,856][01307] Avg episode reward: [(0, '4.992')] [2025-02-16 00:48:53,535][03429] Updated weights for policy 0, policy_version 220 (0.0020) [2025-02-16 00:48:56,854][01307] Fps is (10 sec: 4505.6, 60 sec: 4232.5, 300 sec: 3904.3). Total num frames: 917504. Throughput: 0: 1048.1. Samples: 229078. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-16 00:48:56,860][01307] Avg episode reward: [(0, '5.327')] [2025-02-16 00:48:56,864][03416] Saving new best policy, reward=5.327! [2025-02-16 00:49:01,854][01307] Fps is (10 sec: 4096.0, 60 sec: 4096.1, 300 sec: 3891.2). Total num frames: 933888. Throughput: 0: 1048.8. Samples: 232664. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-16 00:49:01,861][01307] Avg episode reward: [(0, '5.185')] [2025-02-16 00:49:03,874][03429] Updated weights for policy 0, policy_version 230 (0.0026) [2025-02-16 00:49:06,854][01307] Fps is (10 sec: 3686.4, 60 sec: 4164.3, 300 sec: 3895.4). Total num frames: 954368. Throughput: 0: 1054.7. Samples: 237800. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-16 00:49:06,856][01307] Avg episode reward: [(0, '5.007')] [2025-02-16 00:49:11,854][01307] Fps is (10 sec: 4505.6, 60 sec: 4232.5, 300 sec: 3915.8). Total num frames: 978944. Throughput: 0: 1067.4. Samples: 245150. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-16 00:49:11,856][01307] Avg episode reward: [(0, '5.148')] [2025-02-16 00:49:12,281][03429] Updated weights for policy 0, policy_version 240 (0.0018) [2025-02-16 00:49:16,854][01307] Fps is (10 sec: 4505.6, 60 sec: 4164.7, 300 sec: 3919.3). Total num frames: 999424. Throughput: 0: 1061.0. Samples: 248532. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-02-16 00:49:16,861][01307] Avg episode reward: [(0, '5.148')] [2025-02-16 00:49:21,854][01307] Fps is (10 sec: 4096.0, 60 sec: 4232.5, 300 sec: 3922.7). Total num frames: 1019904. Throughput: 0: 1068.0. Samples: 253906. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-16 00:49:21,858][01307] Avg episode reward: [(0, '5.223')] [2025-02-16 00:49:22,650][03429] Updated weights for policy 0, policy_version 250 (0.0034) [2025-02-16 00:49:26,854][01307] Fps is (10 sec: 4505.6, 60 sec: 4300.8, 300 sec: 3941.4). Total num frames: 1044480. Throughput: 0: 1062.6. Samples: 260984. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-16 00:49:26,856][01307] Avg episode reward: [(0, '5.319')] [2025-02-16 00:49:31,854][01307] Fps is (10 sec: 4096.0, 60 sec: 4164.3, 300 sec: 3929.1). Total num frames: 1060864. Throughput: 0: 1052.7. Samples: 264136. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-16 00:49:31,860][01307] Avg episode reward: [(0, '5.432')] [2025-02-16 00:49:31,873][03416] Saving new best policy, reward=5.432! [2025-02-16 00:49:32,865][03429] Updated weights for policy 0, policy_version 260 (0.0017) [2025-02-16 00:49:36,854][01307] Fps is (10 sec: 3686.4, 60 sec: 4232.5, 300 sec: 3932.2). Total num frames: 1081344. Throughput: 0: 1059.1. Samples: 269580. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-16 00:49:36,856][01307] Avg episode reward: [(0, '5.354')] [2025-02-16 00:49:41,573][03429] Updated weights for policy 0, policy_version 270 (0.0016) [2025-02-16 00:49:41,854][01307] Fps is (10 sec: 4505.6, 60 sec: 4232.5, 300 sec: 3949.7). Total num frames: 1105920. Throughput: 0: 1058.0. Samples: 276688. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-16 00:49:41,857][01307] Avg episode reward: [(0, '5.722')] [2025-02-16 00:49:41,861][03416] Saving new best policy, reward=5.722! [2025-02-16 00:49:46,854][01307] Fps is (10 sec: 4096.0, 60 sec: 4164.3, 300 sec: 3937.9). Total num frames: 1122304. Throughput: 0: 1040.9. Samples: 279506. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-16 00:49:46,857][01307] Avg episode reward: [(0, '5.933')] [2025-02-16 00:49:46,860][03416] Saving new best policy, reward=5.933! [2025-02-16 00:49:51,856][01307] Fps is (10 sec: 3685.7, 60 sec: 4164.1, 300 sec: 3940.6). Total num frames: 1142784. Throughput: 0: 1047.5. Samples: 284940. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-16 00:49:51,867][01307] Avg episode reward: [(0, '5.927')] [2025-02-16 00:49:52,293][03429] Updated weights for policy 0, policy_version 280 (0.0018) [2025-02-16 00:49:56,854][01307] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 3957.2). Total num frames: 1167360. Throughput: 0: 1038.4. Samples: 291878. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-02-16 00:49:56,856][01307] Avg episode reward: [(0, '6.028')] [2025-02-16 00:49:56,859][03416] Saving new best policy, reward=6.028! [2025-02-16 00:50:01,857][01307] Fps is (10 sec: 3686.0, 60 sec: 4095.8, 300 sec: 3998.8). Total num frames: 1179648. Throughput: 0: 1019.4. Samples: 294410. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-16 00:50:01,859][01307] Avg episode reward: [(0, '6.449')] [2025-02-16 00:50:01,866][03416] Saving new best policy, reward=6.449! [2025-02-16 00:50:03,114][03429] Updated weights for policy 0, policy_version 290 (0.0021) [2025-02-16 00:50:06,854][01307] Fps is (10 sec: 3686.4, 60 sec: 4164.3, 300 sec: 4082.1). Total num frames: 1204224. Throughput: 0: 1024.7. Samples: 300016. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-16 00:50:06,858][01307] Avg episode reward: [(0, '7.055')] [2025-02-16 00:50:06,862][03416] Saving new best policy, reward=7.055! [2025-02-16 00:50:11,814][03429] Updated weights for policy 0, policy_version 300 (0.0014) [2025-02-16 00:50:11,854][01307] Fps is (10 sec: 4916.7, 60 sec: 4164.3, 300 sec: 4096.0). Total num frames: 1228800. Throughput: 0: 1023.7. Samples: 307050. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-16 00:50:11,858][01307] Avg episode reward: [(0, '8.436')] [2025-02-16 00:50:11,864][03416] Saving new best policy, reward=8.436! [2025-02-16 00:50:16,854][01307] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4082.1). Total num frames: 1241088. Throughput: 0: 1010.0. Samples: 309584. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-02-16 00:50:16,858][01307] Avg episode reward: [(0, '8.516')] [2025-02-16 00:50:16,861][03416] Saving new best policy, reward=8.516! [2025-02-16 00:50:21,854][01307] Fps is (10 sec: 3686.3, 60 sec: 4096.0, 300 sec: 4109.9). Total num frames: 1265664. Throughput: 0: 1019.5. Samples: 315456. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-16 00:50:21,857][01307] Avg episode reward: [(0, '8.830')] [2025-02-16 00:50:21,863][03416] Saving new best policy, reward=8.830! [2025-02-16 00:50:22,529][03429] Updated weights for policy 0, policy_version 310 (0.0029) [2025-02-16 00:50:26,854][01307] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 4096.0). Total num frames: 1286144. Throughput: 0: 1015.8. Samples: 322400. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-16 00:50:26,859][01307] Avg episode reward: [(0, '8.583')] [2025-02-16 00:50:31,854][01307] Fps is (10 sec: 3686.5, 60 sec: 4027.7, 300 sec: 4082.1). Total num frames: 1302528. Throughput: 0: 1005.5. Samples: 324752. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-16 00:50:31,859][01307] Avg episode reward: [(0, '8.402')] [2025-02-16 00:50:31,866][03416] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000318_1302528.pth... [2025-02-16 00:50:31,999][03416] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000078_319488.pth [2025-02-16 00:50:32,977][03429] Updated weights for policy 0, policy_version 320 (0.0012) [2025-02-16 00:50:36,854][01307] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4109.9). Total num frames: 1327104. Throughput: 0: 1018.2. Samples: 330758. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-16 00:50:36,860][01307] Avg episode reward: [(0, '9.271')] [2025-02-16 00:50:36,863][03416] Saving new best policy, reward=9.271! [2025-02-16 00:50:41,775][03429] Updated weights for policy 0, policy_version 330 (0.0019) [2025-02-16 00:50:41,854][01307] Fps is (10 sec: 4915.2, 60 sec: 4096.0, 300 sec: 4109.9). Total num frames: 1351680. Throughput: 0: 1022.6. Samples: 337894. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-02-16 00:50:41,856][01307] Avg episode reward: [(0, '9.628')] [2025-02-16 00:50:41,865][03416] Saving new best policy, reward=9.628! [2025-02-16 00:50:46,854][01307] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4109.9). Total num frames: 1368064. Throughput: 0: 1013.5. Samples: 340014. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-02-16 00:50:46,859][01307] Avg episode reward: [(0, '10.712')] [2025-02-16 00:50:46,864][03416] Saving new best policy, reward=10.712! [2025-02-16 00:50:51,854][01307] Fps is (10 sec: 3686.4, 60 sec: 4096.1, 300 sec: 4109.9). Total num frames: 1388544. Throughput: 0: 1032.5. Samples: 346480. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-02-16 00:50:51,863][01307] Avg episode reward: [(0, '9.704')] [2025-02-16 00:50:52,193][03429] Updated weights for policy 0, policy_version 340 (0.0015) [2025-02-16 00:50:56,854][01307] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 4096.0). Total num frames: 1409024. Throughput: 0: 1025.3. Samples: 353190. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-16 00:50:56,861][01307] Avg episode reward: [(0, '9.456')] [2025-02-16 00:51:01,854][01307] Fps is (10 sec: 4096.0, 60 sec: 4164.5, 300 sec: 4109.9). Total num frames: 1429504. Throughput: 0: 1016.2. Samples: 355314. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-16 00:51:01,858][01307] Avg episode reward: [(0, '9.349')] [2025-02-16 00:51:02,653][03429] Updated weights for policy 0, policy_version 350 (0.0020) [2025-02-16 00:51:06,854][01307] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4109.9). Total num frames: 1449984. Throughput: 0: 1035.6. Samples: 362056. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-16 00:51:06,858][01307] Avg episode reward: [(0, '10.067')] [2025-02-16 00:51:11,738][03429] Updated weights for policy 0, policy_version 360 (0.0016) [2025-02-16 00:51:11,854][01307] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 4109.9). Total num frames: 1474560. Throughput: 0: 1031.4. Samples: 368814. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-16 00:51:11,857][01307] Avg episode reward: [(0, '10.304')] [2025-02-16 00:51:16,854][01307] Fps is (10 sec: 4096.0, 60 sec: 4164.3, 300 sec: 4123.8). Total num frames: 1490944. Throughput: 0: 1027.9. Samples: 371006. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-16 00:51:16,860][01307] Avg episode reward: [(0, '10.338')] [2025-02-16 00:51:21,540][03429] Updated weights for policy 0, policy_version 370 (0.0016) [2025-02-16 00:51:21,854][01307] Fps is (10 sec: 4096.0, 60 sec: 4164.3, 300 sec: 4137.7). Total num frames: 1515520. Throughput: 0: 1048.2. Samples: 377926. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-16 00:51:21,856][01307] Avg episode reward: [(0, '11.396')] [2025-02-16 00:51:21,866][03416] Saving new best policy, reward=11.396! [2025-02-16 00:51:26,854][01307] Fps is (10 sec: 4505.5, 60 sec: 4164.3, 300 sec: 4123.8). Total num frames: 1536000. Throughput: 0: 1031.2. Samples: 384296. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-16 00:51:26,858][01307] Avg episode reward: [(0, '12.082')] [2025-02-16 00:51:26,864][03416] Saving new best policy, reward=12.082! [2025-02-16 00:51:31,854][01307] Fps is (10 sec: 3686.3, 60 sec: 4164.3, 300 sec: 4123.8). Total num frames: 1552384. Throughput: 0: 1029.6. Samples: 386348. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-16 00:51:31,860][01307] Avg episode reward: [(0, '12.530')] [2025-02-16 00:51:31,867][03416] Saving new best policy, reward=12.530! [2025-02-16 00:51:32,189][03429] Updated weights for policy 0, policy_version 380 (0.0023) [2025-02-16 00:51:36,854][01307] Fps is (10 sec: 4096.1, 60 sec: 4164.3, 300 sec: 4151.5). Total num frames: 1576960. Throughput: 0: 1040.3. Samples: 393294. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-16 00:51:36,856][01307] Avg episode reward: [(0, '13.589')] [2025-02-16 00:51:36,863][03416] Saving new best policy, reward=13.589! [2025-02-16 00:51:41,666][03429] Updated weights for policy 0, policy_version 390 (0.0023) [2025-02-16 00:51:41,854][01307] Fps is (10 sec: 4505.7, 60 sec: 4096.0, 300 sec: 4137.7). Total num frames: 1597440. Throughput: 0: 1033.3. Samples: 399690. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-16 00:51:41,856][01307] Avg episode reward: [(0, '12.906')] [2025-02-16 00:51:46,854][01307] Fps is (10 sec: 3686.4, 60 sec: 4096.0, 300 sec: 4137.7). Total num frames: 1613824. Throughput: 0: 1031.2. Samples: 401720. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-16 00:51:46,857][01307] Avg episode reward: [(0, '11.932')] [2025-02-16 00:51:51,507][03429] Updated weights for policy 0, policy_version 400 (0.0020) [2025-02-16 00:51:51,854][01307] Fps is (10 sec: 4096.0, 60 sec: 4164.3, 300 sec: 4151.5). Total num frames: 1638400. Throughput: 0: 1037.2. Samples: 408730. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-16 00:51:51,859][01307] Avg episode reward: [(0, '10.342')] [2025-02-16 00:51:56,854][01307] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4123.8). Total num frames: 1654784. Throughput: 0: 1016.0. Samples: 414536. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-02-16 00:51:56,860][01307] Avg episode reward: [(0, '10.793')] [2025-02-16 00:52:01,854][01307] Fps is (10 sec: 3686.4, 60 sec: 4096.0, 300 sec: 4137.7). Total num frames: 1675264. Throughput: 0: 1015.3. Samples: 416696. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-16 00:52:01,862][01307] Avg episode reward: [(0, '12.088')] [2025-02-16 00:52:02,244][03429] Updated weights for policy 0, policy_version 410 (0.0013) [2025-02-16 00:52:06,854][01307] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 4137.7). Total num frames: 1699840. Throughput: 0: 1020.6. Samples: 423852. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-16 00:52:06,859][01307] Avg episode reward: [(0, '12.770')] [2025-02-16 00:52:11,854][01307] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 4109.9). Total num frames: 1716224. Throughput: 0: 1008.5. Samples: 429680. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-16 00:52:11,856][01307] Avg episode reward: [(0, '13.791')] [2025-02-16 00:52:11,871][03416] Saving new best policy, reward=13.791! [2025-02-16 00:52:12,820][03429] Updated weights for policy 0, policy_version 420 (0.0025) [2025-02-16 00:52:16,854][01307] Fps is (10 sec: 3686.4, 60 sec: 4096.0, 300 sec: 4137.7). Total num frames: 1736704. Throughput: 0: 1015.2. Samples: 432032. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-16 00:52:16,856][01307] Avg episode reward: [(0, '13.545')] [2025-02-16 00:52:21,854][01307] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 4123.8). Total num frames: 1757184. Throughput: 0: 1011.3. Samples: 438804. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-16 00:52:21,859][01307] Avg episode reward: [(0, '11.706')] [2025-02-16 00:52:22,053][03429] Updated weights for policy 0, policy_version 430 (0.0017) [2025-02-16 00:52:26,856][01307] Fps is (10 sec: 3685.6, 60 sec: 3959.3, 300 sec: 4096.0). Total num frames: 1773568. Throughput: 0: 987.6. Samples: 444134. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-16 00:52:26,858][01307] Avg episode reward: [(0, '12.098')] [2025-02-16 00:52:31,854][01307] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4123.8). Total num frames: 1794048. Throughput: 0: 998.4. Samples: 446648. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-16 00:52:31,856][01307] Avg episode reward: [(0, '12.086')] [2025-02-16 00:52:31,865][03416] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000438_1794048.pth... [2025-02-16 00:52:31,993][03416] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000198_811008.pth [2025-02-16 00:52:33,172][03429] Updated weights for policy 0, policy_version 440 (0.0024) [2025-02-16 00:52:36,854][01307] Fps is (10 sec: 4096.8, 60 sec: 3959.5, 300 sec: 4109.9). Total num frames: 1814528. Throughput: 0: 991.2. Samples: 453336. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-02-16 00:52:36,860][01307] Avg episode reward: [(0, '12.692')] [2025-02-16 00:52:41,854][01307] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 4082.1). Total num frames: 1830912. Throughput: 0: 979.1. Samples: 458596. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-02-16 00:52:41,858][01307] Avg episode reward: [(0, '13.418')] [2025-02-16 00:52:44,189][03429] Updated weights for policy 0, policy_version 450 (0.0012) [2025-02-16 00:52:46,854][01307] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 4109.9). Total num frames: 1851392. Throughput: 0: 991.6. Samples: 461316. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-16 00:52:46,861][01307] Avg episode reward: [(0, '14.096')] [2025-02-16 00:52:46,864][03416] Saving new best policy, reward=14.096! [2025-02-16 00:52:51,854][01307] Fps is (10 sec: 4505.5, 60 sec: 3959.5, 300 sec: 4109.9). Total num frames: 1875968. Throughput: 0: 981.7. Samples: 468028. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-02-16 00:52:51,857][01307] Avg episode reward: [(0, '14.747')] [2025-02-16 00:52:51,873][03416] Saving new best policy, reward=14.747! [2025-02-16 00:52:54,021][03429] Updated weights for policy 0, policy_version 460 (0.0022) [2025-02-16 00:52:56,854][01307] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 4068.2). Total num frames: 1888256. Throughput: 0: 963.6. Samples: 473044. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-16 00:52:56,862][01307] Avg episode reward: [(0, '15.352')] [2025-02-16 00:52:56,863][03416] Saving new best policy, reward=15.352! [2025-02-16 00:53:01,854][01307] Fps is (10 sec: 3276.9, 60 sec: 3891.2, 300 sec: 4082.1). Total num frames: 1908736. Throughput: 0: 970.1. Samples: 475688. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2025-02-16 00:53:01,860][01307] Avg episode reward: [(0, '16.693')] [2025-02-16 00:53:01,868][03416] Saving new best policy, reward=16.693! [2025-02-16 00:53:04,706][03429] Updated weights for policy 0, policy_version 470 (0.0029) [2025-02-16 00:53:06,854][01307] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 4096.0). Total num frames: 1933312. Throughput: 0: 970.0. Samples: 482454. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-16 00:53:06,858][01307] Avg episode reward: [(0, '15.901')] [2025-02-16 00:53:11,854][01307] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 4054.4). Total num frames: 1945600. Throughput: 0: 962.8. Samples: 487460. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-16 00:53:11,856][01307] Avg episode reward: [(0, '16.798')] [2025-02-16 00:53:11,864][03416] Saving new best policy, reward=16.798! [2025-02-16 00:53:15,832][03429] Updated weights for policy 0, policy_version 480 (0.0021) [2025-02-16 00:53:16,854][01307] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 4082.1). Total num frames: 1970176. Throughput: 0: 970.7. Samples: 490328. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-16 00:53:16,856][01307] Avg episode reward: [(0, '16.255')] [2025-02-16 00:53:21,854][01307] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 4082.1). Total num frames: 1990656. Throughput: 0: 972.7. Samples: 497106. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-02-16 00:53:21,860][01307] Avg episode reward: [(0, '15.046')] [2025-02-16 00:53:26,804][03429] Updated weights for policy 0, policy_version 490 (0.0025) [2025-02-16 00:53:26,854][01307] Fps is (10 sec: 3686.4, 60 sec: 3891.3, 300 sec: 4054.3). Total num frames: 2007040. Throughput: 0: 963.7. Samples: 501964. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-02-16 00:53:26,861][01307] Avg episode reward: [(0, '15.843')] [2025-02-16 00:53:31,854][01307] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 4068.2). Total num frames: 2027520. Throughput: 0: 973.4. Samples: 505120. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-16 00:53:31,862][01307] Avg episode reward: [(0, '16.282')] [2025-02-16 00:53:35,844][03429] Updated weights for policy 0, policy_version 500 (0.0016) [2025-02-16 00:53:36,854][01307] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 4068.2). Total num frames: 2052096. Throughput: 0: 973.5. Samples: 511836. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-16 00:53:36,862][01307] Avg episode reward: [(0, '16.629')] [2025-02-16 00:53:41,854][01307] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 4040.5). Total num frames: 2064384. Throughput: 0: 968.2. Samples: 516614. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-02-16 00:53:41,857][01307] Avg episode reward: [(0, '16.991')] [2025-02-16 00:53:41,866][03416] Saving new best policy, reward=16.991! [2025-02-16 00:53:46,854][01307] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 4040.5). Total num frames: 2084864. Throughput: 0: 978.8. Samples: 519736. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-16 00:53:46,861][01307] Avg episode reward: [(0, '19.314')] [2025-02-16 00:53:46,867][03416] Saving new best policy, reward=19.314! [2025-02-16 00:53:47,088][03429] Updated weights for policy 0, policy_version 510 (0.0018) [2025-02-16 00:53:51,855][01307] Fps is (10 sec: 4505.2, 60 sec: 3891.2, 300 sec: 4040.4). Total num frames: 2109440. Throughput: 0: 976.2. Samples: 526386. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-16 00:53:51,859][01307] Avg episode reward: [(0, '19.902')] [2025-02-16 00:53:51,870][03416] Saving new best policy, reward=19.902! [2025-02-16 00:53:56,854][01307] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 4026.6). Total num frames: 2121728. Throughput: 0: 964.9. Samples: 530880. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-02-16 00:53:56,857][01307] Avg episode reward: [(0, '18.659')] [2025-02-16 00:53:58,219][03429] Updated weights for policy 0, policy_version 520 (0.0021) [2025-02-16 00:54:01,854][01307] Fps is (10 sec: 3277.1, 60 sec: 3891.2, 300 sec: 4026.6). Total num frames: 2142208. Throughput: 0: 974.3. Samples: 534172. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-02-16 00:54:01,856][01307] Avg episode reward: [(0, '18.494')] [2025-02-16 00:54:06,854][01307] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 4026.6). Total num frames: 2166784. Throughput: 0: 974.0. Samples: 540934. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-16 00:54:06,856][01307] Avg episode reward: [(0, '18.307')] [2025-02-16 00:54:07,909][03429] Updated weights for policy 0, policy_version 530 (0.0027) [2025-02-16 00:54:11,854][01307] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3998.8). Total num frames: 2179072. Throughput: 0: 970.0. Samples: 545614. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-16 00:54:11,861][01307] Avg episode reward: [(0, '17.813')] [2025-02-16 00:54:16,854][01307] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 4012.7). Total num frames: 2203648. Throughput: 0: 973.2. Samples: 548912. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-16 00:54:16,858][01307] Avg episode reward: [(0, '18.574')] [2025-02-16 00:54:18,361][03429] Updated weights for policy 0, policy_version 540 (0.0014) [2025-02-16 00:54:21,854][01307] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3998.8). Total num frames: 2224128. Throughput: 0: 973.4. Samples: 555640. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-16 00:54:21,857][01307] Avg episode reward: [(0, '18.271')] [2025-02-16 00:54:26,854][01307] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3998.8). Total num frames: 2240512. Throughput: 0: 970.8. Samples: 560300. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2025-02-16 00:54:26,859][01307] Avg episode reward: [(0, '18.452')] [2025-02-16 00:54:29,434][03429] Updated weights for policy 0, policy_version 550 (0.0013) [2025-02-16 00:54:31,854][01307] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3998.8). Total num frames: 2260992. Throughput: 0: 975.2. Samples: 563618. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-16 00:54:31,856][01307] Avg episode reward: [(0, '19.152')] [2025-02-16 00:54:31,868][03416] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000552_2260992.pth... [2025-02-16 00:54:31,994][03416] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000318_1302528.pth [2025-02-16 00:54:36,854][01307] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3984.9). Total num frames: 2281472. Throughput: 0: 970.9. Samples: 570076. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-16 00:54:36,859][01307] Avg episode reward: [(0, '18.440')] [2025-02-16 00:54:40,369][03429] Updated weights for policy 0, policy_version 560 (0.0015) [2025-02-16 00:54:41,854][01307] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3984.9). Total num frames: 2297856. Throughput: 0: 979.5. Samples: 574956. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-16 00:54:41,861][01307] Avg episode reward: [(0, '18.770')] [2025-02-16 00:54:46,854][01307] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3998.8). Total num frames: 2322432. Throughput: 0: 979.9. Samples: 578268. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-16 00:54:46,859][01307] Avg episode reward: [(0, '19.515')] [2025-02-16 00:54:49,557][03429] Updated weights for policy 0, policy_version 570 (0.0024) [2025-02-16 00:54:51,854][01307] Fps is (10 sec: 4096.0, 60 sec: 3823.0, 300 sec: 3971.0). Total num frames: 2338816. Throughput: 0: 970.2. Samples: 584594. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-16 00:54:51,856][01307] Avg episode reward: [(0, '19.194')] [2025-02-16 00:54:56,854][01307] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3985.0). Total num frames: 2355200. Throughput: 0: 977.9. Samples: 589618. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-16 00:54:56,856][01307] Avg episode reward: [(0, '18.720')] [2025-02-16 00:55:00,707][03429] Updated weights for policy 0, policy_version 580 (0.0012) [2025-02-16 00:55:01,854][01307] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3984.9). Total num frames: 2379776. Throughput: 0: 978.5. Samples: 592946. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2025-02-16 00:55:01,856][01307] Avg episode reward: [(0, '17.810')] [2025-02-16 00:55:06,854][01307] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3957.2). Total num frames: 2396160. Throughput: 0: 963.2. Samples: 598982. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-02-16 00:55:06,860][01307] Avg episode reward: [(0, '17.714')] [2025-02-16 00:55:11,451][03429] Updated weights for policy 0, policy_version 590 (0.0036) [2025-02-16 00:55:11,854][01307] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3984.9). Total num frames: 2416640. Throughput: 0: 982.8. Samples: 604526. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-16 00:55:11,856][01307] Avg episode reward: [(0, '18.384')] [2025-02-16 00:55:16,854][01307] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3971.0). Total num frames: 2437120. Throughput: 0: 981.6. Samples: 607792. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-16 00:55:16,862][01307] Avg episode reward: [(0, '17.071')] [2025-02-16 00:55:21,860][01307] Fps is (10 sec: 3684.2, 60 sec: 3822.5, 300 sec: 3957.1). Total num frames: 2453504. Throughput: 0: 966.4. Samples: 613568. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-02-16 00:55:21,863][01307] Avg episode reward: [(0, '16.764')] [2025-02-16 00:55:22,159][03429] Updated weights for policy 0, policy_version 600 (0.0023) [2025-02-16 00:55:26,854][01307] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3971.0). Total num frames: 2473984. Throughput: 0: 982.5. Samples: 619170. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-16 00:55:26,856][01307] Avg episode reward: [(0, '17.108')] [2025-02-16 00:55:31,651][03429] Updated weights for policy 0, policy_version 610 (0.0029) [2025-02-16 00:55:31,854][01307] Fps is (10 sec: 4508.3, 60 sec: 3959.5, 300 sec: 3971.0). Total num frames: 2498560. Throughput: 0: 983.0. Samples: 622502. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-16 00:55:31,856][01307] Avg episode reward: [(0, '17.775')] [2025-02-16 00:55:36,854][01307] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3929.4). Total num frames: 2510848. Throughput: 0: 961.3. Samples: 627854. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-16 00:55:36,863][01307] Avg episode reward: [(0, '18.362')] [2025-02-16 00:55:41,854][01307] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3957.2). Total num frames: 2535424. Throughput: 0: 986.5. Samples: 634012. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-16 00:55:41,858][01307] Avg episode reward: [(0, '18.655')] [2025-02-16 00:55:42,519][03429] Updated weights for policy 0, policy_version 620 (0.0013) [2025-02-16 00:55:46,859][01307] Fps is (10 sec: 4503.3, 60 sec: 3890.9, 300 sec: 3957.1). Total num frames: 2555904. Throughput: 0: 987.8. Samples: 637404. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-16 00:55:46,864][01307] Avg episode reward: [(0, '19.948')] [2025-02-16 00:55:46,872][03416] Saving new best policy, reward=19.948! [2025-02-16 00:55:51,854][01307] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3943.3). Total num frames: 2572288. Throughput: 0: 964.4. Samples: 642380. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-02-16 00:55:51,860][01307] Avg episode reward: [(0, '21.257')] [2025-02-16 00:55:51,866][03416] Saving new best policy, reward=21.257! [2025-02-16 00:55:53,372][03429] Updated weights for policy 0, policy_version 630 (0.0026) [2025-02-16 00:55:56,854][01307] Fps is (10 sec: 3688.2, 60 sec: 3959.5, 300 sec: 3943.3). Total num frames: 2592768. Throughput: 0: 978.9. Samples: 648576. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-02-16 00:55:56,862][01307] Avg episode reward: [(0, '21.603')] [2025-02-16 00:55:56,864][03416] Saving new best policy, reward=21.603! [2025-02-16 00:56:01,854][01307] Fps is (10 sec: 4095.8, 60 sec: 3891.2, 300 sec: 3943.3). Total num frames: 2613248. Throughput: 0: 978.4. Samples: 651820. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-16 00:56:01,856][01307] Avg episode reward: [(0, '20.933')] [2025-02-16 00:56:04,140][03429] Updated weights for policy 0, policy_version 640 (0.0027) [2025-02-16 00:56:06,854][01307] Fps is (10 sec: 3686.5, 60 sec: 3891.2, 300 sec: 3915.5). Total num frames: 2629632. Throughput: 0: 956.2. Samples: 656590. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-02-16 00:56:06,859][01307] Avg episode reward: [(0, '21.944')] [2025-02-16 00:56:06,862][03416] Saving new best policy, reward=21.944! [2025-02-16 00:56:11,854][01307] Fps is (10 sec: 3686.5, 60 sec: 3891.2, 300 sec: 3929.4). Total num frames: 2650112. Throughput: 0: 977.6. Samples: 663164. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-02-16 00:56:11,857][01307] Avg episode reward: [(0, '20.313')] [2025-02-16 00:56:13,919][03429] Updated weights for policy 0, policy_version 650 (0.0017) [2025-02-16 00:56:16,858][01307] Fps is (10 sec: 4094.4, 60 sec: 3890.9, 300 sec: 3915.4). Total num frames: 2670592. Throughput: 0: 977.1. Samples: 666474. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-16 00:56:16,860][01307] Avg episode reward: [(0, '18.845')] [2025-02-16 00:56:21,854][01307] Fps is (10 sec: 3686.4, 60 sec: 3891.6, 300 sec: 3901.6). Total num frames: 2686976. Throughput: 0: 963.8. Samples: 671224. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-16 00:56:21,860][01307] Avg episode reward: [(0, '17.956')] [2025-02-16 00:56:24,887][03429] Updated weights for policy 0, policy_version 660 (0.0020) [2025-02-16 00:56:26,854][01307] Fps is (10 sec: 3687.8, 60 sec: 3891.2, 300 sec: 3915.5). Total num frames: 2707456. Throughput: 0: 974.1. Samples: 677848. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-02-16 00:56:26,858][01307] Avg episode reward: [(0, '16.898')] [2025-02-16 00:56:31,857][01307] Fps is (10 sec: 4094.7, 60 sec: 3822.7, 300 sec: 3901.6). Total num frames: 2727936. Throughput: 0: 971.9. Samples: 681138. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-16 00:56:31,859][01307] Avg episode reward: [(0, '16.523')] [2025-02-16 00:56:31,870][03416] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000666_2727936.pth... [2025-02-16 00:56:32,083][03416] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000438_1794048.pth [2025-02-16 00:56:36,056][03429] Updated weights for policy 0, policy_version 670 (0.0015) [2025-02-16 00:56:36,854][01307] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3887.7). Total num frames: 2744320. Throughput: 0: 964.9. Samples: 685800. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-16 00:56:36,857][01307] Avg episode reward: [(0, '16.391')] [2025-02-16 00:56:41,854][01307] Fps is (10 sec: 4097.3, 60 sec: 3891.2, 300 sec: 3915.5). Total num frames: 2768896. Throughput: 0: 978.5. Samples: 692610. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-16 00:56:41,856][01307] Avg episode reward: [(0, '16.733')] [2025-02-16 00:56:45,939][03429] Updated weights for policy 0, policy_version 680 (0.0018) [2025-02-16 00:56:46,854][01307] Fps is (10 sec: 4096.0, 60 sec: 3823.3, 300 sec: 3887.7). Total num frames: 2785280. Throughput: 0: 977.7. Samples: 695818. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-16 00:56:46,860][01307] Avg episode reward: [(0, '16.475')] [2025-02-16 00:56:51,854][01307] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3901.6). Total num frames: 2805760. Throughput: 0: 980.4. Samples: 700708. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2025-02-16 00:56:51,862][01307] Avg episode reward: [(0, '17.829')] [2025-02-16 00:56:55,831][03429] Updated weights for policy 0, policy_version 690 (0.0013) [2025-02-16 00:56:56,854][01307] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3915.5). Total num frames: 2830336. Throughput: 0: 985.3. Samples: 707502. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-16 00:56:56,856][01307] Avg episode reward: [(0, '17.970')] [2025-02-16 00:57:01,854][01307] Fps is (10 sec: 3686.4, 60 sec: 3823.0, 300 sec: 3873.8). Total num frames: 2842624. Throughput: 0: 976.0. Samples: 710388. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-16 00:57:01,859][01307] Avg episode reward: [(0, '18.638')] [2025-02-16 00:57:06,682][03429] Updated weights for policy 0, policy_version 700 (0.0027) [2025-02-16 00:57:06,854][01307] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3901.6). Total num frames: 2867200. Throughput: 0: 985.0. Samples: 715550. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-02-16 00:57:06,856][01307] Avg episode reward: [(0, '18.722')] [2025-02-16 00:57:11,854][01307] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3901.6). Total num frames: 2887680. Throughput: 0: 989.8. Samples: 722388. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-16 00:57:11,859][01307] Avg episode reward: [(0, '20.095')] [2025-02-16 00:57:16,854][01307] Fps is (10 sec: 3686.4, 60 sec: 3891.5, 300 sec: 3887.7). Total num frames: 2904064. Throughput: 0: 974.4. Samples: 724982. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-16 00:57:16,856][01307] Avg episode reward: [(0, '20.820')] [2025-02-16 00:57:17,719][03429] Updated weights for policy 0, policy_version 710 (0.0016) [2025-02-16 00:57:21,854][01307] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3901.6). Total num frames: 2924544. Throughput: 0: 994.6. Samples: 730558. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-16 00:57:21,859][01307] Avg episode reward: [(0, '21.967')] [2025-02-16 00:57:21,865][03416] Saving new best policy, reward=21.967! [2025-02-16 00:57:26,854][01307] Fps is (10 sec: 4095.9, 60 sec: 3959.5, 300 sec: 3901.6). Total num frames: 2945024. Throughput: 0: 991.0. Samples: 737206. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-16 00:57:26,859][01307] Avg episode reward: [(0, '22.838')] [2025-02-16 00:57:26,863][03416] Saving new best policy, reward=22.838! [2025-02-16 00:57:27,128][03429] Updated weights for policy 0, policy_version 720 (0.0020) [2025-02-16 00:57:31,854][01307] Fps is (10 sec: 3686.3, 60 sec: 3891.4, 300 sec: 3887.7). Total num frames: 2961408. Throughput: 0: 966.3. Samples: 739300. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-16 00:57:31,859][01307] Avg episode reward: [(0, '22.435')] [2025-02-16 00:57:36,854][01307] Fps is (10 sec: 3686.5, 60 sec: 3959.5, 300 sec: 3901.6). Total num frames: 2981888. Throughput: 0: 986.7. Samples: 745108. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-16 00:57:36,856][01307] Avg episode reward: [(0, '21.931')] [2025-02-16 00:57:37,873][03429] Updated weights for policy 0, policy_version 730 (0.0023) [2025-02-16 00:57:41,854][01307] Fps is (10 sec: 4505.7, 60 sec: 3959.5, 300 sec: 3915.5). Total num frames: 3006464. Throughput: 0: 986.9. Samples: 751912. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-16 00:57:41,859][01307] Avg episode reward: [(0, '21.923')] [2025-02-16 00:57:46,856][01307] Fps is (10 sec: 4095.2, 60 sec: 3959.3, 300 sec: 3887.7). Total num frames: 3022848. Throughput: 0: 968.8. Samples: 753986. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-16 00:57:46,858][01307] Avg episode reward: [(0, '22.011')] [2025-02-16 00:57:48,713][03429] Updated weights for policy 0, policy_version 740 (0.0015) [2025-02-16 00:57:51,854][01307] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3915.5). Total num frames: 3043328. Throughput: 0: 989.6. Samples: 760080. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-16 00:57:51,856][01307] Avg episode reward: [(0, '22.807')] [2025-02-16 00:57:56,857][01307] Fps is (10 sec: 4095.4, 60 sec: 3891.0, 300 sec: 3915.5). Total num frames: 3063808. Throughput: 0: 982.5. Samples: 766602. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-16 00:57:56,860][01307] Avg episode reward: [(0, '23.101')] [2025-02-16 00:57:56,865][03416] Saving new best policy, reward=23.101! [2025-02-16 00:57:59,193][03429] Updated weights for policy 0, policy_version 750 (0.0018) [2025-02-16 00:58:01,854][01307] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3887.7). Total num frames: 3080192. Throughput: 0: 969.1. Samples: 768590. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-16 00:58:01,859][01307] Avg episode reward: [(0, '23.891')] [2025-02-16 00:58:01,865][03416] Saving new best policy, reward=23.891! [2025-02-16 00:58:06,854][01307] Fps is (10 sec: 4097.4, 60 sec: 3959.5, 300 sec: 3929.4). Total num frames: 3104768. Throughput: 0: 992.3. Samples: 775212. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-16 00:58:06,856][01307] Avg episode reward: [(0, '24.706')] [2025-02-16 00:58:06,864][03416] Saving new best policy, reward=24.706! [2025-02-16 00:58:08,177][03429] Updated weights for policy 0, policy_version 760 (0.0025) [2025-02-16 00:58:11,854][01307] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3915.5). Total num frames: 3125248. Throughput: 0: 996.0. Samples: 782026. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2025-02-16 00:58:11,861][01307] Avg episode reward: [(0, '22.994')] [2025-02-16 00:58:16,854][01307] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3915.5). Total num frames: 3145728. Throughput: 0: 997.8. Samples: 784202. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2025-02-16 00:58:16,859][01307] Avg episode reward: [(0, '22.560')] [2025-02-16 00:58:18,296][03429] Updated weights for policy 0, policy_version 770 (0.0014) [2025-02-16 00:58:21,854][01307] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 3943.3). Total num frames: 3170304. Throughput: 0: 1030.0. Samples: 791456. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-16 00:58:21,861][01307] Avg episode reward: [(0, '21.817')] [2025-02-16 00:58:26,855][01307] Fps is (10 sec: 4505.2, 60 sec: 4095.9, 300 sec: 3943.3). Total num frames: 3190784. Throughput: 0: 1024.6. Samples: 798018. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-16 00:58:26,857][01307] Avg episode reward: [(0, '22.953')] [2025-02-16 00:58:27,950][03429] Updated weights for policy 0, policy_version 780 (0.0017) [2025-02-16 00:58:31,854][01307] Fps is (10 sec: 4096.0, 60 sec: 4164.3, 300 sec: 3929.4). Total num frames: 3211264. Throughput: 0: 1029.6. Samples: 800314. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-16 00:58:31,857][01307] Avg episode reward: [(0, '22.953')] [2025-02-16 00:58:31,868][03416] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000784_3211264.pth... [2025-02-16 00:58:31,990][03416] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000552_2260992.pth [2025-02-16 00:58:36,854][01307] Fps is (10 sec: 4096.4, 60 sec: 4164.3, 300 sec: 3957.2). Total num frames: 3231744. Throughput: 0: 1056.6. Samples: 807626. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-16 00:58:36,856][01307] Avg episode reward: [(0, '23.957')] [2025-02-16 00:58:36,929][03429] Updated weights for policy 0, policy_version 790 (0.0015) [2025-02-16 00:58:41,859][01307] Fps is (10 sec: 4093.7, 60 sec: 4095.6, 300 sec: 3957.1). Total num frames: 3252224. Throughput: 0: 1049.2. Samples: 813816. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-02-16 00:58:41,865][01307] Avg episode reward: [(0, '23.501')] [2025-02-16 00:58:46,840][03429] Updated weights for policy 0, policy_version 800 (0.0022) [2025-02-16 00:58:46,854][01307] Fps is (10 sec: 4505.5, 60 sec: 4232.7, 300 sec: 3957.2). Total num frames: 3276800. Throughput: 0: 1067.7. Samples: 816638. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-16 00:58:46,856][01307] Avg episode reward: [(0, '22.401')] [2025-02-16 00:58:51,854][01307] Fps is (10 sec: 4508.1, 60 sec: 4232.5, 300 sec: 3984.9). Total num frames: 3297280. Throughput: 0: 1080.6. Samples: 823840. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-16 00:58:51,856][01307] Avg episode reward: [(0, '20.673')] [2025-02-16 00:58:56,854][01307] Fps is (10 sec: 3686.5, 60 sec: 4164.5, 300 sec: 3971.0). Total num frames: 3313664. Throughput: 0: 1057.9. Samples: 829630. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-02-16 00:58:56,856][01307] Avg episode reward: [(0, '21.676')] [2025-02-16 00:58:57,025][03429] Updated weights for policy 0, policy_version 810 (0.0021) [2025-02-16 00:59:01,854][01307] Fps is (10 sec: 4096.0, 60 sec: 4300.8, 300 sec: 3971.0). Total num frames: 3338240. Throughput: 0: 1073.9. Samples: 832526. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-02-16 00:59:01,859][01307] Avg episode reward: [(0, '20.079')] [2025-02-16 00:59:05,664][03429] Updated weights for policy 0, policy_version 820 (0.0014) [2025-02-16 00:59:06,859][01307] Fps is (10 sec: 4912.7, 60 sec: 4300.4, 300 sec: 4012.6). Total num frames: 3362816. Throughput: 0: 1076.4. Samples: 839898. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-02-16 00:59:06,861][01307] Avg episode reward: [(0, '21.440')] [2025-02-16 00:59:11,854][01307] Fps is (10 sec: 4096.0, 60 sec: 4232.5, 300 sec: 3984.9). Total num frames: 3379200. Throughput: 0: 1055.4. Samples: 845510. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-02-16 00:59:11,857][01307] Avg episode reward: [(0, '21.573')] [2025-02-16 00:59:15,533][03429] Updated weights for policy 0, policy_version 830 (0.0022) [2025-02-16 00:59:16,854][01307] Fps is (10 sec: 4098.1, 60 sec: 4300.8, 300 sec: 3998.8). Total num frames: 3403776. Throughput: 0: 1079.5. Samples: 848890. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-02-16 00:59:16,859][01307] Avg episode reward: [(0, '22.689')] [2025-02-16 00:59:21,856][01307] Fps is (10 sec: 4914.2, 60 sec: 4300.7, 300 sec: 4026.5). Total num frames: 3428352. Throughput: 0: 1079.2. Samples: 856194. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-02-16 00:59:21,863][01307] Avg episode reward: [(0, '22.375')] [2025-02-16 00:59:25,374][03429] Updated weights for policy 0, policy_version 840 (0.0016) [2025-02-16 00:59:26,857][01307] Fps is (10 sec: 4094.6, 60 sec: 4232.4, 300 sec: 4012.6). Total num frames: 3444736. Throughput: 0: 1062.2. Samples: 861612. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-02-16 00:59:26,860][01307] Avg episode reward: [(0, '22.317')] [2025-02-16 00:59:31,854][01307] Fps is (10 sec: 4096.9, 60 sec: 4300.8, 300 sec: 4026.6). Total num frames: 3469312. Throughput: 0: 1077.2. Samples: 865114. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2025-02-16 00:59:31,859][01307] Avg episode reward: [(0, '22.294')] [2025-02-16 00:59:34,054][03429] Updated weights for policy 0, policy_version 850 (0.0018) [2025-02-16 00:59:36,854][01307] Fps is (10 sec: 4916.9, 60 sec: 4369.1, 300 sec: 4054.3). Total num frames: 3493888. Throughput: 0: 1083.2. Samples: 872584. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-16 00:59:36,858][01307] Avg episode reward: [(0, '22.930')] [2025-02-16 00:59:41,854][01307] Fps is (10 sec: 4096.0, 60 sec: 4301.2, 300 sec: 4026.6). Total num frames: 3510272. Throughput: 0: 1074.0. Samples: 877962. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-16 00:59:41,858][01307] Avg episode reward: [(0, '23.344')] [2025-02-16 00:59:43,826][03429] Updated weights for policy 0, policy_version 860 (0.0023) [2025-02-16 00:59:46,854][01307] Fps is (10 sec: 4095.9, 60 sec: 4300.8, 300 sec: 4054.3). Total num frames: 3534848. Throughput: 0: 1091.3. Samples: 881634. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-16 00:59:46,858][01307] Avg episode reward: [(0, '23.805')] [2025-02-16 00:59:51,854][01307] Fps is (10 sec: 4505.6, 60 sec: 4300.8, 300 sec: 4068.2). Total num frames: 3555328. Throughput: 0: 1091.6. Samples: 889014. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-16 00:59:51,860][01307] Avg episode reward: [(0, '23.154')] [2025-02-16 00:59:53,499][03429] Updated weights for policy 0, policy_version 870 (0.0014) [2025-02-16 00:59:56,854][01307] Fps is (10 sec: 4096.1, 60 sec: 4369.1, 300 sec: 4054.3). Total num frames: 3575808. Throughput: 0: 1088.2. Samples: 894480. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-16 00:59:56,856][01307] Avg episode reward: [(0, '23.270')] [2025-02-16 01:00:01,854][01307] Fps is (10 sec: 4505.6, 60 sec: 4369.1, 300 sec: 4082.1). Total num frames: 3600384. Throughput: 0: 1091.1. Samples: 897990. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-16 01:00:01,857][01307] Avg episode reward: [(0, '23.388')] [2025-02-16 01:00:02,231][03429] Updated weights for policy 0, policy_version 880 (0.0024) [2025-02-16 01:00:06,855][01307] Fps is (10 sec: 4505.0, 60 sec: 4301.1, 300 sec: 4082.1). Total num frames: 3620864. Throughput: 0: 1085.0. Samples: 905020. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2025-02-16 01:00:06,860][01307] Avg episode reward: [(0, '22.849')] [2025-02-16 01:00:11,854][01307] Fps is (10 sec: 4096.0, 60 sec: 4369.1, 300 sec: 4082.1). Total num frames: 3641344. Throughput: 0: 1093.2. Samples: 910802. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-16 01:00:11,856][01307] Avg episode reward: [(0, '22.706')] [2025-02-16 01:00:12,191][03429] Updated weights for policy 0, policy_version 890 (0.0021) [2025-02-16 01:00:16,854][01307] Fps is (10 sec: 4506.2, 60 sec: 4369.1, 300 sec: 4110.0). Total num frames: 3665920. Throughput: 0: 1097.3. Samples: 914492. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2025-02-16 01:00:16,858][01307] Avg episode reward: [(0, '24.336')] [2025-02-16 01:00:21,525][03429] Updated weights for policy 0, policy_version 900 (0.0028) [2025-02-16 01:00:21,854][01307] Fps is (10 sec: 4505.6, 60 sec: 4301.0, 300 sec: 4109.9). Total num frames: 3686400. Throughput: 0: 1080.6. Samples: 921212. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-16 01:00:21,859][01307] Avg episode reward: [(0, '24.144')] [2025-02-16 01:00:26,854][01307] Fps is (10 sec: 4096.0, 60 sec: 4369.3, 300 sec: 4096.0). Total num frames: 3706880. Throughput: 0: 1094.1. Samples: 927198. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-16 01:00:26,856][01307] Avg episode reward: [(0, '24.656')] [2025-02-16 01:00:30,506][03429] Updated weights for policy 0, policy_version 910 (0.0015) [2025-02-16 01:00:31,854][01307] Fps is (10 sec: 4505.5, 60 sec: 4369.1, 300 sec: 4137.7). Total num frames: 3731456. Throughput: 0: 1092.3. Samples: 930786. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-16 01:00:31,862][01307] Avg episode reward: [(0, '25.087')] [2025-02-16 01:00:31,874][03416] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000911_3731456.pth... [2025-02-16 01:00:32,023][03416] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000666_2727936.pth [2025-02-16 01:00:32,042][03416] Saving new best policy, reward=25.087! [2025-02-16 01:00:36,856][01307] Fps is (10 sec: 4095.2, 60 sec: 4232.4, 300 sec: 4109.9). Total num frames: 3747840. Throughput: 0: 1065.9. Samples: 936982. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-02-16 01:00:36,858][01307] Avg episode reward: [(0, '25.603')] [2025-02-16 01:00:36,860][03416] Saving new best policy, reward=25.603! [2025-02-16 01:00:40,784][03429] Updated weights for policy 0, policy_version 920 (0.0020) [2025-02-16 01:00:41,854][01307] Fps is (10 sec: 4096.1, 60 sec: 4369.1, 300 sec: 4123.8). Total num frames: 3772416. Throughput: 0: 1084.8. Samples: 943294. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2025-02-16 01:00:41,859][01307] Avg episode reward: [(0, '25.379')] [2025-02-16 01:00:46,854][01307] Fps is (10 sec: 4916.2, 60 sec: 4369.1, 300 sec: 4151.5). Total num frames: 3796992. Throughput: 0: 1087.2. Samples: 946914. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-16 01:00:46,856][01307] Avg episode reward: [(0, '25.952')] [2025-02-16 01:00:46,865][03416] Saving new best policy, reward=25.952! [2025-02-16 01:00:50,205][03429] Updated weights for policy 0, policy_version 930 (0.0037) [2025-02-16 01:00:51,854][01307] Fps is (10 sec: 4096.0, 60 sec: 4300.8, 300 sec: 4137.7). Total num frames: 3813376. Throughput: 0: 1064.2. Samples: 952908. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-16 01:00:51,856][01307] Avg episode reward: [(0, '24.946')] [2025-02-16 01:00:56,854][01307] Fps is (10 sec: 3686.4, 60 sec: 4300.8, 300 sec: 4137.7). Total num frames: 3833856. Throughput: 0: 1079.5. Samples: 959380. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-02-16 01:00:56,856][01307] Avg episode reward: [(0, '24.965')] [2025-02-16 01:00:59,452][03429] Updated weights for policy 0, policy_version 940 (0.0025) [2025-02-16 01:01:01,856][01307] Fps is (10 sec: 4504.8, 60 sec: 4300.7, 300 sec: 4165.4). Total num frames: 3858432. Throughput: 0: 1077.6. Samples: 962986. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-16 01:01:01,858][01307] Avg episode reward: [(0, '24.233')] [2025-02-16 01:01:06,854][01307] Fps is (10 sec: 4096.0, 60 sec: 4232.6, 300 sec: 4151.5). Total num frames: 3874816. Throughput: 0: 1047.1. Samples: 968332. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-16 01:01:06,861][01307] Avg episode reward: [(0, '24.059')] [2025-02-16 01:01:09,850][03429] Updated weights for policy 0, policy_version 950 (0.0025) [2025-02-16 01:01:11,854][01307] Fps is (10 sec: 4096.7, 60 sec: 4300.8, 300 sec: 4165.5). Total num frames: 3899392. Throughput: 0: 1064.1. Samples: 975082. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-16 01:01:11,856][01307] Avg episode reward: [(0, '24.621')] [2025-02-16 01:01:16,854][01307] Fps is (10 sec: 4505.6, 60 sec: 4232.5, 300 sec: 4179.3). Total num frames: 3919872. Throughput: 0: 1063.6. Samples: 978646. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-16 01:01:16,856][01307] Avg episode reward: [(0, '24.453')] [2025-02-16 01:01:19,807][03429] Updated weights for policy 0, policy_version 960 (0.0025) [2025-02-16 01:01:21,854][01307] Fps is (10 sec: 4096.0, 60 sec: 4232.5, 300 sec: 4179.3). Total num frames: 3940352. Throughput: 0: 1042.9. Samples: 983912. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-16 01:01:21,856][01307] Avg episode reward: [(0, '24.412')] [2025-02-16 01:01:26,854][01307] Fps is (10 sec: 4505.6, 60 sec: 4300.8, 300 sec: 4193.2). Total num frames: 3964928. Throughput: 0: 1063.2. Samples: 991140. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-16 01:01:26,860][01307] Avg episode reward: [(0, '25.836')] [2025-02-16 01:01:28,493][03429] Updated weights for policy 0, policy_version 970 (0.0028) [2025-02-16 01:01:31,854][01307] Fps is (10 sec: 4505.6, 60 sec: 4232.5, 300 sec: 4207.1). Total num frames: 3985408. Throughput: 0: 1060.8. Samples: 994648. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-16 01:01:31,863][01307] Avg episode reward: [(0, '25.940')] [2025-02-16 01:01:36,854][01307] Fps is (10 sec: 3686.3, 60 sec: 4232.7, 300 sec: 4179.3). Total num frames: 4001792. Throughput: 0: 1041.7. Samples: 999784. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2025-02-16 01:01:36,860][01307] Avg episode reward: [(0, '25.314')] [2025-02-16 01:01:39,019][03429] Updated weights for policy 0, policy_version 980 (0.0021) [2025-02-16 01:01:41,854][01307] Fps is (10 sec: 4096.0, 60 sec: 4232.5, 300 sec: 4207.1). Total num frames: 4026368. Throughput: 0: 1052.2. Samples: 1006730. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-16 01:01:41,861][01307] Avg episode reward: [(0, '24.055')] [2025-02-16 01:01:46,854][01307] Fps is (10 sec: 4096.1, 60 sec: 4096.0, 300 sec: 4193.2). Total num frames: 4042752. Throughput: 0: 1049.3. Samples: 1010204. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-16 01:01:46,856][01307] Avg episode reward: [(0, '23.876')] [2025-02-16 01:01:49,485][03429] Updated weights for policy 0, policy_version 990 (0.0030) [2025-02-16 01:01:51,854][01307] Fps is (10 sec: 3686.3, 60 sec: 4164.2, 300 sec: 4179.3). Total num frames: 4063232. Throughput: 0: 1040.8. Samples: 1015170. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-16 01:01:51,864][01307] Avg episode reward: [(0, '23.355')] [2025-02-16 01:01:56,854][01307] Fps is (10 sec: 4505.6, 60 sec: 4232.5, 300 sec: 4221.0). Total num frames: 4087808. Throughput: 0: 1047.1. Samples: 1022200. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-16 01:01:56,862][01307] Avg episode reward: [(0, '23.114')] [2025-02-16 01:01:58,268][03429] Updated weights for policy 0, policy_version 1000 (0.0012) [2025-02-16 01:02:01,855][01307] Fps is (10 sec: 4095.7, 60 sec: 4096.1, 300 sec: 4193.2). Total num frames: 4104192. Throughput: 0: 1043.2. Samples: 1025590. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-16 01:02:01,857][01307] Avg episode reward: [(0, '24.883')] [2025-02-16 01:02:06,854][01307] Fps is (10 sec: 3686.4, 60 sec: 4164.3, 300 sec: 4193.2). Total num frames: 4124672. Throughput: 0: 1036.0. Samples: 1030534. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-16 01:02:06,863][01307] Avg episode reward: [(0, '24.960')] [2025-02-16 01:02:08,769][03429] Updated weights for policy 0, policy_version 1010 (0.0022) [2025-02-16 01:02:11,854][01307] Fps is (10 sec: 4505.9, 60 sec: 4164.2, 300 sec: 4221.0). Total num frames: 4149248. Throughput: 0: 1038.5. Samples: 1037872. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-16 01:02:11,857][01307] Avg episode reward: [(0, '26.123')] [2025-02-16 01:02:11,867][03416] Saving new best policy, reward=26.123! [2025-02-16 01:02:16,856][01307] Fps is (10 sec: 4095.2, 60 sec: 4095.9, 300 sec: 4207.1). Total num frames: 4165632. Throughput: 0: 1032.6. Samples: 1041116. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-02-16 01:02:16,863][01307] Avg episode reward: [(0, '25.746')] [2025-02-16 01:02:18,899][03429] Updated weights for policy 0, policy_version 1020 (0.0021) [2025-02-16 01:02:21,854][01307] Fps is (10 sec: 4096.2, 60 sec: 4164.3, 300 sec: 4221.0). Total num frames: 4190208. Throughput: 0: 1045.8. Samples: 1046844. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-16 01:02:21,856][01307] Avg episode reward: [(0, '26.335')] [2025-02-16 01:02:21,867][03416] Saving new best policy, reward=26.335! [2025-02-16 01:02:26,854][01307] Fps is (10 sec: 4916.2, 60 sec: 4164.3, 300 sec: 4248.7). Total num frames: 4214784. Throughput: 0: 1053.3. Samples: 1054128. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-16 01:02:26,860][01307] Avg episode reward: [(0, '26.343')] [2025-02-16 01:02:26,866][03416] Saving new best policy, reward=26.343! [2025-02-16 01:02:27,363][03429] Updated weights for policy 0, policy_version 1030 (0.0020) [2025-02-16 01:02:31,857][01307] Fps is (10 sec: 4094.6, 60 sec: 4095.8, 300 sec: 4234.8). Total num frames: 4231168. Throughput: 0: 1040.9. Samples: 1057050. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-16 01:02:31,863][01307] Avg episode reward: [(0, '26.918')] [2025-02-16 01:02:31,881][03416] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001033_4231168.pth... [2025-02-16 01:02:32,042][03416] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000784_3211264.pth [2025-02-16 01:02:32,073][03416] Saving new best policy, reward=26.918! [2025-02-16 01:02:36,854][01307] Fps is (10 sec: 4096.0, 60 sec: 4232.5, 300 sec: 4234.8). Total num frames: 4255744. Throughput: 0: 1056.3. Samples: 1062704. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-16 01:02:36,856][01307] Avg episode reward: [(0, '25.630')] [2025-02-16 01:02:37,538][03429] Updated weights for policy 0, policy_version 1040 (0.0019) [2025-02-16 01:02:41,854][01307] Fps is (10 sec: 4916.9, 60 sec: 4232.5, 300 sec: 4262.6). Total num frames: 4280320. Throughput: 0: 1064.8. Samples: 1070116. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-16 01:02:41,861][01307] Avg episode reward: [(0, '25.361')] [2025-02-16 01:02:46,854][01307] Fps is (10 sec: 4096.0, 60 sec: 4232.5, 300 sec: 4248.7). Total num frames: 4296704. Throughput: 0: 1049.9. Samples: 1072834. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-16 01:02:46,856][01307] Avg episode reward: [(0, '25.885')] [2025-02-16 01:02:47,585][03429] Updated weights for policy 0, policy_version 1050 (0.0022) [2025-02-16 01:02:51,854][01307] Fps is (10 sec: 4096.0, 60 sec: 4300.8, 300 sec: 4262.7). Total num frames: 4321280. Throughput: 0: 1079.4. Samples: 1079106. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-16 01:02:51,859][01307] Avg episode reward: [(0, '25.405')] [2025-02-16 01:02:55,828][03429] Updated weights for policy 0, policy_version 1060 (0.0019) [2025-02-16 01:02:56,854][01307] Fps is (10 sec: 4915.2, 60 sec: 4300.8, 300 sec: 4290.4). Total num frames: 4345856. Throughput: 0: 1079.3. Samples: 1086438. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-16 01:02:56,860][01307] Avg episode reward: [(0, '24.774')] [2025-02-16 01:03:01,854][01307] Fps is (10 sec: 3686.4, 60 sec: 4232.6, 300 sec: 4248.7). Total num frames: 4358144. Throughput: 0: 1059.2. Samples: 1088780. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-16 01:03:01,858][01307] Avg episode reward: [(0, '24.179')] [2025-02-16 01:03:06,203][03429] Updated weights for policy 0, policy_version 1070 (0.0012) [2025-02-16 01:03:06,854][01307] Fps is (10 sec: 3686.4, 60 sec: 4300.8, 300 sec: 4262.6). Total num frames: 4382720. Throughput: 0: 1071.6. Samples: 1095068. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-16 01:03:06,856][01307] Avg episode reward: [(0, '26.948')] [2025-02-16 01:03:06,859][03416] Saving new best policy, reward=26.948! [2025-02-16 01:03:11,854][01307] Fps is (10 sec: 4915.2, 60 sec: 4300.8, 300 sec: 4276.5). Total num frames: 4407296. Throughput: 0: 1065.3. Samples: 1102068. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-16 01:03:11,856][01307] Avg episode reward: [(0, '27.340')] [2025-02-16 01:03:11,864][03416] Saving new best policy, reward=27.340! [2025-02-16 01:03:16,701][03429] Updated weights for policy 0, policy_version 1080 (0.0018) [2025-02-16 01:03:16,854][01307] Fps is (10 sec: 4096.0, 60 sec: 4300.9, 300 sec: 4248.7). Total num frames: 4423680. Throughput: 0: 1045.0. Samples: 1104070. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-16 01:03:16,857][01307] Avg episode reward: [(0, '28.115')] [2025-02-16 01:03:16,862][03416] Saving new best policy, reward=28.115! [2025-02-16 01:03:21,854][01307] Fps is (10 sec: 4096.0, 60 sec: 4300.8, 300 sec: 4262.6). Total num frames: 4448256. Throughput: 0: 1068.0. Samples: 1110762. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-16 01:03:21,858][01307] Avg episode reward: [(0, '27.438')] [2025-02-16 01:03:25,345][03429] Updated weights for policy 0, policy_version 1090 (0.0026) [2025-02-16 01:03:26,854][01307] Fps is (10 sec: 4505.6, 60 sec: 4232.5, 300 sec: 4262.6). Total num frames: 4468736. Throughput: 0: 1048.8. Samples: 1117314. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-16 01:03:26,858][01307] Avg episode reward: [(0, '28.046')] [2025-02-16 01:03:31,854][01307] Fps is (10 sec: 3686.4, 60 sec: 4232.8, 300 sec: 4248.7). Total num frames: 4485120. Throughput: 0: 1034.8. Samples: 1119400. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-16 01:03:31,855][01307] Avg episode reward: [(0, '26.395')] [2025-02-16 01:03:36,029][03429] Updated weights for policy 0, policy_version 1100 (0.0023) [2025-02-16 01:03:36,854][01307] Fps is (10 sec: 3686.4, 60 sec: 4164.3, 300 sec: 4248.8). Total num frames: 4505600. Throughput: 0: 1043.2. Samples: 1126052. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-16 01:03:36,856][01307] Avg episode reward: [(0, '25.108')] [2025-02-16 01:03:41,860][01307] Fps is (10 sec: 4093.5, 60 sec: 4095.6, 300 sec: 4234.8). Total num frames: 4526080. Throughput: 0: 1020.6. Samples: 1132370. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-16 01:03:41,863][01307] Avg episode reward: [(0, '26.283')] [2025-02-16 01:03:46,159][03429] Updated weights for policy 0, policy_version 1110 (0.0018) [2025-02-16 01:03:46,854][01307] Fps is (10 sec: 4095.9, 60 sec: 4164.2, 300 sec: 4234.8). Total num frames: 4546560. Throughput: 0: 1022.0. Samples: 1134772. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-16 01:03:46,857][01307] Avg episode reward: [(0, '25.887')] [2025-02-16 01:03:51,854][01307] Fps is (10 sec: 4508.2, 60 sec: 4164.2, 300 sec: 4262.6). Total num frames: 4571136. Throughput: 0: 1046.9. Samples: 1142180. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-16 01:03:51,861][01307] Avg episode reward: [(0, '28.013')] [2025-02-16 01:03:55,071][03429] Updated weights for policy 0, policy_version 1120 (0.0024) [2025-02-16 01:03:56,854][01307] Fps is (10 sec: 4505.7, 60 sec: 4096.0, 300 sec: 4248.7). Total num frames: 4591616. Throughput: 0: 1029.6. Samples: 1148402. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-16 01:03:56,856][01307] Avg episode reward: [(0, '27.615')] [2025-02-16 01:04:01,854][01307] Fps is (10 sec: 4096.1, 60 sec: 4232.5, 300 sec: 4234.9). Total num frames: 4612096. Throughput: 0: 1045.9. Samples: 1151136. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-16 01:04:01,856][01307] Avg episode reward: [(0, '26.566')] [2025-02-16 01:04:04,707][03429] Updated weights for policy 0, policy_version 1130 (0.0020) [2025-02-16 01:04:06,854][01307] Fps is (10 sec: 4505.6, 60 sec: 4232.5, 300 sec: 4262.6). Total num frames: 4636672. Throughput: 0: 1058.0. Samples: 1158372. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-16 01:04:06,860][01307] Avg episode reward: [(0, '25.398')] [2025-02-16 01:04:11,854][01307] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4234.8). Total num frames: 4653056. Throughput: 0: 1040.7. Samples: 1164144. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-16 01:04:11,859][01307] Avg episode reward: [(0, '24.949')] [2025-02-16 01:04:14,606][03429] Updated weights for policy 0, policy_version 1140 (0.0020) [2025-02-16 01:04:16,854][01307] Fps is (10 sec: 4096.0, 60 sec: 4232.5, 300 sec: 4234.9). Total num frames: 4677632. Throughput: 0: 1064.6. Samples: 1167306. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-02-16 01:04:16,856][01307] Avg episode reward: [(0, '24.769')] [2025-02-16 01:04:21,854][01307] Fps is (10 sec: 4915.2, 60 sec: 4232.5, 300 sec: 4262.7). Total num frames: 4702208. Throughput: 0: 1081.2. Samples: 1174706. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2025-02-16 01:04:21,860][01307] Avg episode reward: [(0, '23.637')] [2025-02-16 01:04:23,495][03429] Updated weights for policy 0, policy_version 1150 (0.0021) [2025-02-16 01:04:26,854][01307] Fps is (10 sec: 4096.0, 60 sec: 4164.3, 300 sec: 4234.8). Total num frames: 4718592. Throughput: 0: 1063.8. Samples: 1180234. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-16 01:04:26,856][01307] Avg episode reward: [(0, '23.285')] [2025-02-16 01:04:31,854][01307] Fps is (10 sec: 4096.0, 60 sec: 4300.8, 300 sec: 4234.8). Total num frames: 4743168. Throughput: 0: 1087.6. Samples: 1183712. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-02-16 01:04:31,856][01307] Avg episode reward: [(0, '23.508')] [2025-02-16 01:04:31,863][03416] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001158_4743168.pth... [2025-02-16 01:04:31,983][03416] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000911_3731456.pth [2025-02-16 01:04:33,248][03429] Updated weights for policy 0, policy_version 1160 (0.0021) [2025-02-16 01:04:36,854][01307] Fps is (10 sec: 4915.1, 60 sec: 4369.1, 300 sec: 4262.6). Total num frames: 4767744. Throughput: 0: 1084.2. Samples: 1190970. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-02-16 01:04:36,856][01307] Avg episode reward: [(0, '22.692')] [2025-02-16 01:04:41,854][01307] Fps is (10 sec: 4096.0, 60 sec: 4301.2, 300 sec: 4234.9). Total num frames: 4784128. Throughput: 0: 1064.5. Samples: 1196304. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-02-16 01:04:41,862][01307] Avg episode reward: [(0, '21.349')] [2025-02-16 01:04:43,086][03429] Updated weights for policy 0, policy_version 1170 (0.0020) [2025-02-16 01:04:46,854][01307] Fps is (10 sec: 4096.1, 60 sec: 4369.1, 300 sec: 4248.7). Total num frames: 4808704. Throughput: 0: 1085.2. Samples: 1199972. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-02-16 01:04:46,860][01307] Avg episode reward: [(0, '23.267')] [2025-02-16 01:04:51,854][01307] Fps is (10 sec: 4505.6, 60 sec: 4300.8, 300 sec: 4248.7). Total num frames: 4829184. Throughput: 0: 1088.6. Samples: 1207360. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-02-16 01:04:51,858][01307] Avg episode reward: [(0, '23.547')] [2025-02-16 01:04:51,882][03429] Updated weights for policy 0, policy_version 1180 (0.0014) [2025-02-16 01:04:56,854][01307] Fps is (10 sec: 4096.0, 60 sec: 4300.8, 300 sec: 4234.8). Total num frames: 4849664. Throughput: 0: 1079.8. Samples: 1212734. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-02-16 01:04:56,861][01307] Avg episode reward: [(0, '22.596')] [2025-02-16 01:05:01,602][03429] Updated weights for policy 0, policy_version 1190 (0.0012) [2025-02-16 01:05:01,854][01307] Fps is (10 sec: 4505.6, 60 sec: 4369.1, 300 sec: 4248.7). Total num frames: 4874240. Throughput: 0: 1091.4. Samples: 1216420. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2025-02-16 01:05:01,857][01307] Avg episode reward: [(0, '23.634')] [2025-02-16 01:05:06,858][01307] Fps is (10 sec: 4503.9, 60 sec: 4300.5, 300 sec: 4248.7). Total num frames: 4894720. Throughput: 0: 1087.4. Samples: 1223642. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-16 01:05:06,863][01307] Avg episode reward: [(0, '24.117')] [2025-02-16 01:05:11,559][03429] Updated weights for policy 0, policy_version 1200 (0.0018) [2025-02-16 01:05:11,854][01307] Fps is (10 sec: 4096.0, 60 sec: 4369.1, 300 sec: 4234.8). Total num frames: 4915200. Throughput: 0: 1084.4. Samples: 1229034. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-16 01:05:11,857][01307] Avg episode reward: [(0, '24.410')] [2025-02-16 01:05:16,854][01307] Fps is (10 sec: 4507.3, 60 sec: 4369.1, 300 sec: 4248.7). Total num frames: 4939776. Throughput: 0: 1088.5. Samples: 1232696. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-16 01:05:16,861][01307] Avg episode reward: [(0, '25.609')] [2025-02-16 01:05:20,038][03429] Updated weights for policy 0, policy_version 1210 (0.0021) [2025-02-16 01:05:21,854][01307] Fps is (10 sec: 4505.5, 60 sec: 4300.8, 300 sec: 4248.7). Total num frames: 4960256. Throughput: 0: 1083.0. Samples: 1239706. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-16 01:05:21,861][01307] Avg episode reward: [(0, '26.437')] [2025-02-16 01:05:26,855][01307] Fps is (10 sec: 4095.6, 60 sec: 4369.0, 300 sec: 4234.8). Total num frames: 4980736. Throughput: 0: 1092.4. Samples: 1245464. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-16 01:05:26,857][01307] Avg episode reward: [(0, '26.693')] [2025-02-16 01:05:29,867][03429] Updated weights for policy 0, policy_version 1220 (0.0018) [2025-02-16 01:05:31,854][01307] Fps is (10 sec: 4505.7, 60 sec: 4369.1, 300 sec: 4262.6). Total num frames: 5005312. Throughput: 0: 1092.0. Samples: 1249114. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-16 01:05:31,856][01307] Avg episode reward: [(0, '25.406')] [2025-02-16 01:05:36,858][01307] Fps is (10 sec: 4094.8, 60 sec: 4232.3, 300 sec: 4234.8). Total num frames: 5021696. Throughput: 0: 1072.3. Samples: 1255618. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-16 01:05:36,862][01307] Avg episode reward: [(0, '24.798')] [2025-02-16 01:05:39,862][03429] Updated weights for policy 0, policy_version 1230 (0.0015) [2025-02-16 01:05:41,854][01307] Fps is (10 sec: 4096.0, 60 sec: 4369.1, 300 sec: 4234.8). Total num frames: 5046272. Throughput: 0: 1090.1. Samples: 1261788. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-16 01:05:41,856][01307] Avg episode reward: [(0, '24.701')] [2025-02-16 01:05:46,854][01307] Fps is (10 sec: 4917.2, 60 sec: 4369.1, 300 sec: 4262.6). Total num frames: 5070848. Throughput: 0: 1089.9. Samples: 1265464. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-16 01:05:46,856][01307] Avg episode reward: [(0, '25.362')] [2025-02-16 01:05:48,234][03429] Updated weights for policy 0, policy_version 1240 (0.0014) [2025-02-16 01:05:51,854][01307] Fps is (10 sec: 4096.0, 60 sec: 4300.8, 300 sec: 4248.7). Total num frames: 5087232. Throughput: 0: 1066.9. Samples: 1271648. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2025-02-16 01:05:51,860][01307] Avg episode reward: [(0, '24.766')] [2025-02-16 01:05:56,854][01307] Fps is (10 sec: 4096.0, 60 sec: 4369.1, 300 sec: 4248.8). Total num frames: 5111808. Throughput: 0: 1092.0. Samples: 1278176. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-16 01:05:56,861][01307] Avg episode reward: [(0, '26.155')] [2025-02-16 01:05:58,247][03429] Updated weights for policy 0, policy_version 1250 (0.0028) [2025-02-16 01:06:01,855][01307] Fps is (10 sec: 4914.7, 60 sec: 4369.0, 300 sec: 4276.5). Total num frames: 5136384. Throughput: 0: 1090.4. Samples: 1281766. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-16 01:06:01,862][01307] Avg episode reward: [(0, '27.195')] [2025-02-16 01:06:06,854][01307] Fps is (10 sec: 4096.0, 60 sec: 4301.1, 300 sec: 4248.7). Total num frames: 5152768. Throughput: 0: 1061.9. Samples: 1287490. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-16 01:06:06,861][01307] Avg episode reward: [(0, '27.597')] [2025-02-16 01:06:08,415][03429] Updated weights for policy 0, policy_version 1260 (0.0015) [2025-02-16 01:06:11,854][01307] Fps is (10 sec: 4096.4, 60 sec: 4369.1, 300 sec: 4262.6). Total num frames: 5177344. Throughput: 0: 1086.7. Samples: 1294366. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-16 01:06:11,856][01307] Avg episode reward: [(0, '26.864')] [2025-02-16 01:06:16,854][01307] Fps is (10 sec: 4505.6, 60 sec: 4300.8, 300 sec: 4262.6). Total num frames: 5197824. Throughput: 0: 1087.3. Samples: 1298044. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-16 01:06:16,856][01307] Avg episode reward: [(0, '27.879')] [2025-02-16 01:06:16,894][03429] Updated weights for policy 0, policy_version 1270 (0.0017) [2025-02-16 01:06:21,854][01307] Fps is (10 sec: 4096.0, 60 sec: 4300.8, 300 sec: 4248.7). Total num frames: 5218304. Throughput: 0: 1066.4. Samples: 1303602. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-16 01:06:21,856][01307] Avg episode reward: [(0, '26.857')] [2025-02-16 01:06:26,745][03429] Updated weights for policy 0, policy_version 1280 (0.0017) [2025-02-16 01:06:26,854][01307] Fps is (10 sec: 4505.4, 60 sec: 4369.1, 300 sec: 4262.6). Total num frames: 5242880. Throughput: 0: 1088.7. Samples: 1310782. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-16 01:06:26,858][01307] Avg episode reward: [(0, '26.913')] [2025-02-16 01:06:31,856][01307] Fps is (10 sec: 4504.9, 60 sec: 4300.7, 300 sec: 4276.5). Total num frames: 5263360. Throughput: 0: 1088.2. Samples: 1314436. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-16 01:06:31,863][01307] Avg episode reward: [(0, '27.885')] [2025-02-16 01:06:31,873][03416] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001285_5263360.pth... [2025-02-16 01:06:32,031][03416] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001033_4231168.pth [2025-02-16 01:06:36,854][01307] Fps is (10 sec: 3686.5, 60 sec: 4301.1, 300 sec: 4248.7). Total num frames: 5279744. Throughput: 0: 1064.3. Samples: 1319540. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-16 01:06:36,860][01307] Avg episode reward: [(0, '28.089')] [2025-02-16 01:06:36,900][03429] Updated weights for policy 0, policy_version 1290 (0.0014) [2025-02-16 01:06:41,854][01307] Fps is (10 sec: 4506.3, 60 sec: 4369.1, 300 sec: 4290.4). Total num frames: 5308416. Throughput: 0: 1082.7. Samples: 1326896. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-02-16 01:06:41,860][01307] Avg episode reward: [(0, '28.784')] [2025-02-16 01:06:41,867][03416] Saving new best policy, reward=28.784! [2025-02-16 01:06:45,180][03429] Updated weights for policy 0, policy_version 1300 (0.0019) [2025-02-16 01:06:46,854][01307] Fps is (10 sec: 4915.2, 60 sec: 4300.8, 300 sec: 4290.4). Total num frames: 5328896. Throughput: 0: 1083.4. Samples: 1330520. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-02-16 01:06:46,856][01307] Avg episode reward: [(0, '28.403')] [2025-02-16 01:06:51,854][01307] Fps is (10 sec: 3686.4, 60 sec: 4300.8, 300 sec: 4262.6). Total num frames: 5345280. Throughput: 0: 1075.0. Samples: 1335866. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-02-16 01:06:51,858][01307] Avg episode reward: [(0, '28.355')] [2025-02-16 01:06:55,235][03429] Updated weights for policy 0, policy_version 1310 (0.0026) [2025-02-16 01:06:56,854][01307] Fps is (10 sec: 4505.6, 60 sec: 4369.1, 300 sec: 4304.3). Total num frames: 5373952. Throughput: 0: 1087.4. Samples: 1343298. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-16 01:06:56,859][01307] Avg episode reward: [(0, '29.052')] [2025-02-16 01:06:56,861][03416] Saving new best policy, reward=29.052! [2025-02-16 01:07:01,854][01307] Fps is (10 sec: 4505.6, 60 sec: 4232.6, 300 sec: 4290.4). Total num frames: 5390336. Throughput: 0: 1087.0. Samples: 1346958. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-16 01:07:01,858][01307] Avg episode reward: [(0, '28.451')] [2025-02-16 01:07:05,388][03429] Updated weights for policy 0, policy_version 1320 (0.0023) [2025-02-16 01:07:06,854][01307] Fps is (10 sec: 3686.4, 60 sec: 4300.8, 300 sec: 4276.5). Total num frames: 5410816. Throughput: 0: 1077.4. Samples: 1352086. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-02-16 01:07:06,860][01307] Avg episode reward: [(0, '27.987')] [2025-02-16 01:07:11,854][01307] Fps is (10 sec: 4505.6, 60 sec: 4300.8, 300 sec: 4304.3). Total num frames: 5435392. Throughput: 0: 1082.3. Samples: 1359486. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-16 01:07:11,861][01307] Avg episode reward: [(0, '27.602')] [2025-02-16 01:07:13,874][03429] Updated weights for policy 0, policy_version 1330 (0.0018) [2025-02-16 01:07:16,858][01307] Fps is (10 sec: 4503.8, 60 sec: 4300.5, 300 sec: 4290.3). Total num frames: 5455872. Throughput: 0: 1077.3. Samples: 1362916. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-02-16 01:07:16,865][01307] Avg episode reward: [(0, '27.099')] [2025-02-16 01:07:21,854][01307] Fps is (10 sec: 4096.0, 60 sec: 4300.8, 300 sec: 4276.5). Total num frames: 5476352. Throughput: 0: 1088.9. Samples: 1368540. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-16 01:07:21,856][01307] Avg episode reward: [(0, '26.937')] [2025-02-16 01:07:23,668][03429] Updated weights for policy 0, policy_version 1340 (0.0022) [2025-02-16 01:07:26,854][01307] Fps is (10 sec: 4507.3, 60 sec: 4300.8, 300 sec: 4304.3). Total num frames: 5500928. Throughput: 0: 1088.7. Samples: 1375888. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-16 01:07:26,857][01307] Avg episode reward: [(0, '29.071')] [2025-02-16 01:07:26,859][03416] Saving new best policy, reward=29.071! [2025-02-16 01:07:31,854][01307] Fps is (10 sec: 4096.0, 60 sec: 4232.6, 300 sec: 4276.5). Total num frames: 5517312. Throughput: 0: 1074.5. Samples: 1378874. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-02-16 01:07:31,862][01307] Avg episode reward: [(0, '29.061')] [2025-02-16 01:07:33,790][03429] Updated weights for policy 0, policy_version 1350 (0.0031) [2025-02-16 01:07:36,854][01307] Fps is (10 sec: 4096.1, 60 sec: 4369.1, 300 sec: 4276.5). Total num frames: 5541888. Throughput: 0: 1086.1. Samples: 1384742. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-16 01:07:36,856][01307] Avg episode reward: [(0, '28.118')] [2025-02-16 01:07:41,854][01307] Fps is (10 sec: 4915.2, 60 sec: 4300.8, 300 sec: 4304.3). Total num frames: 5566464. Throughput: 0: 1084.6. Samples: 1392104. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-02-16 01:07:41,856][01307] Avg episode reward: [(0, '28.839')] [2025-02-16 01:07:42,314][03429] Updated weights for policy 0, policy_version 1360 (0.0019) [2025-02-16 01:07:46,854][01307] Fps is (10 sec: 4096.0, 60 sec: 4232.5, 300 sec: 4276.5). Total num frames: 5582848. Throughput: 0: 1061.7. Samples: 1394734. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-16 01:07:46,862][01307] Avg episode reward: [(0, '27.861')] [2025-02-16 01:07:51,854][01307] Fps is (10 sec: 4096.0, 60 sec: 4369.1, 300 sec: 4276.5). Total num frames: 5607424. Throughput: 0: 1091.8. Samples: 1401218. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-16 01:07:51,862][01307] Avg episode reward: [(0, '28.318')] [2025-02-16 01:07:51,953][03429] Updated weights for policy 0, policy_version 1370 (0.0016) [2025-02-16 01:07:56,854][01307] Fps is (10 sec: 4915.2, 60 sec: 4300.8, 300 sec: 4318.2). Total num frames: 5632000. Throughput: 0: 1091.7. Samples: 1408612. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-16 01:07:56,856][01307] Avg episode reward: [(0, '27.882')] [2025-02-16 01:08:01,854][01307] Fps is (10 sec: 4096.0, 60 sec: 4300.8, 300 sec: 4290.4). Total num frames: 5648384. Throughput: 0: 1067.0. Samples: 1410928. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2025-02-16 01:08:01,866][03429] Updated weights for policy 0, policy_version 1380 (0.0012) [2025-02-16 01:08:01,861][01307] Avg episode reward: [(0, '27.516')] [2025-02-16 01:08:06,854][01307] Fps is (10 sec: 4095.9, 60 sec: 4369.0, 300 sec: 4290.4). Total num frames: 5672960. Throughput: 0: 1088.5. Samples: 1417524. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-02-16 01:08:06,859][01307] Avg episode reward: [(0, '28.341')] [2025-02-16 01:08:10,395][03429] Updated weights for policy 0, policy_version 1390 (0.0016) [2025-02-16 01:08:11,854][01307] Fps is (10 sec: 4915.2, 60 sec: 4369.1, 300 sec: 4318.2). Total num frames: 5697536. Throughput: 0: 1082.4. Samples: 1424596. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-16 01:08:11,862][01307] Avg episode reward: [(0, '29.435')] [2025-02-16 01:08:11,869][03416] Saving new best policy, reward=29.435! [2025-02-16 01:08:16,854][01307] Fps is (10 sec: 4096.1, 60 sec: 4301.1, 300 sec: 4290.4). Total num frames: 5713920. Throughput: 0: 1065.6. Samples: 1426826. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-02-16 01:08:16,861][01307] Avg episode reward: [(0, '29.450')] [2025-02-16 01:08:16,864][03416] Saving new best policy, reward=29.450! [2025-02-16 01:08:20,584][03429] Updated weights for policy 0, policy_version 1400 (0.0022) [2025-02-16 01:08:21,854][01307] Fps is (10 sec: 4095.9, 60 sec: 4369.1, 300 sec: 4304.3). Total num frames: 5738496. Throughput: 0: 1091.9. Samples: 1433878. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-02-16 01:08:21,864][01307] Avg episode reward: [(0, '28.865')] [2025-02-16 01:08:26,854][01307] Fps is (10 sec: 4505.6, 60 sec: 4300.8, 300 sec: 4318.2). Total num frames: 5758976. Throughput: 0: 1075.9. Samples: 1440520. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-16 01:08:26,862][01307] Avg episode reward: [(0, '29.156')] [2025-02-16 01:08:30,607][03429] Updated weights for policy 0, policy_version 1410 (0.0013) [2025-02-16 01:08:31,854][01307] Fps is (10 sec: 4096.1, 60 sec: 4369.1, 300 sec: 4318.2). Total num frames: 5779456. Throughput: 0: 1068.1. Samples: 1442800. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-16 01:08:31,861][01307] Avg episode reward: [(0, '29.896')] [2025-02-16 01:08:31,873][03416] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001411_5779456.pth... [2025-02-16 01:08:31,993][03416] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001158_4743168.pth [2025-02-16 01:08:32,008][03416] Saving new best policy, reward=29.896! [2025-02-16 01:08:36,854][01307] Fps is (10 sec: 4505.6, 60 sec: 4369.1, 300 sec: 4332.1). Total num frames: 5804032. Throughput: 0: 1081.9. Samples: 1449904. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-02-16 01:08:36,861][01307] Avg episode reward: [(0, '29.143')] [2025-02-16 01:08:39,190][03429] Updated weights for policy 0, policy_version 1420 (0.0020) [2025-02-16 01:08:41,854][01307] Fps is (10 sec: 4096.0, 60 sec: 4232.5, 300 sec: 4318.2). Total num frames: 5820416. Throughput: 0: 1060.6. Samples: 1456338. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-16 01:08:41,862][01307] Avg episode reward: [(0, '29.367')] [2025-02-16 01:08:46,854][01307] Fps is (10 sec: 4096.0, 60 sec: 4369.1, 300 sec: 4318.2). Total num frames: 5844992. Throughput: 0: 1065.8. Samples: 1458890. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-02-16 01:08:46,861][01307] Avg episode reward: [(0, '29.236')] [2025-02-16 01:08:49,090][03429] Updated weights for policy 0, policy_version 1430 (0.0012) [2025-02-16 01:08:51,854][01307] Fps is (10 sec: 4915.2, 60 sec: 4369.1, 300 sec: 4332.0). Total num frames: 5869568. Throughput: 0: 1085.5. Samples: 1466370. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-16 01:08:51,861][01307] Avg episode reward: [(0, '28.873')] [2025-02-16 01:08:56,854][01307] Fps is (10 sec: 4096.0, 60 sec: 4232.5, 300 sec: 4318.2). Total num frames: 5885952. Throughput: 0: 1062.0. Samples: 1472388. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-02-16 01:08:56,861][01307] Avg episode reward: [(0, '28.042')] [2025-02-16 01:08:59,082][03429] Updated weights for policy 0, policy_version 1440 (0.0024) [2025-02-16 01:09:01,854][01307] Fps is (10 sec: 4096.0, 60 sec: 4369.1, 300 sec: 4318.2). Total num frames: 5910528. Throughput: 0: 1078.4. Samples: 1475356. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-16 01:09:01,857][01307] Avg episode reward: [(0, '28.037')] [2025-02-16 01:09:06,854][01307] Fps is (10 sec: 4915.2, 60 sec: 4369.1, 300 sec: 4345.9). Total num frames: 5935104. Throughput: 0: 1083.5. Samples: 1482636. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-16 01:09:06,856][01307] Avg episode reward: [(0, '26.542')] [2025-02-16 01:09:07,540][03429] Updated weights for policy 0, policy_version 1450 (0.0020) [2025-02-16 01:09:11,854][01307] Fps is (10 sec: 4096.0, 60 sec: 4232.5, 300 sec: 4318.2). Total num frames: 5951488. Throughput: 0: 1064.4. Samples: 1488420. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-16 01:09:11,856][01307] Avg episode reward: [(0, '27.478')] [2025-02-16 01:09:16,854][01307] Fps is (10 sec: 4096.0, 60 sec: 4369.1, 300 sec: 4318.2). Total num frames: 5976064. Throughput: 0: 1086.0. Samples: 1491672. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-16 01:09:16,856][01307] Avg episode reward: [(0, '25.691')] [2025-02-16 01:09:17,363][03429] Updated weights for policy 0, policy_version 1460 (0.0024) [2025-02-16 01:09:21,854][01307] Fps is (10 sec: 4915.3, 60 sec: 4369.1, 300 sec: 4345.9). Total num frames: 6000640. Throughput: 0: 1093.9. Samples: 1499128. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-16 01:09:21,858][01307] Avg episode reward: [(0, '26.319')] [2025-02-16 01:09:26,854][01307] Fps is (10 sec: 4096.0, 60 sec: 4300.8, 300 sec: 4318.2). Total num frames: 6017024. Throughput: 0: 1070.9. Samples: 1504530. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2025-02-16 01:09:26,859][01307] Avg episode reward: [(0, '25.354')] [2025-02-16 01:09:27,278][03429] Updated weights for policy 0, policy_version 1470 (0.0017) [2025-02-16 01:09:31,854][01307] Fps is (10 sec: 4096.0, 60 sec: 4369.1, 300 sec: 4318.2). Total num frames: 6041600. Throughput: 0: 1094.9. Samples: 1508160. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-16 01:09:31,856][01307] Avg episode reward: [(0, '25.870')] [2025-02-16 01:09:35,712][03429] Updated weights for policy 0, policy_version 1480 (0.0013) [2025-02-16 01:09:36,854][01307] Fps is (10 sec: 4915.2, 60 sec: 4369.1, 300 sec: 4345.9). Total num frames: 6066176. Throughput: 0: 1090.3. Samples: 1515432. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-16 01:09:36,856][01307] Avg episode reward: [(0, '25.063')] [2025-02-16 01:09:41,854][01307] Fps is (10 sec: 4096.0, 60 sec: 4369.1, 300 sec: 4318.2). Total num frames: 6082560. Throughput: 0: 1074.0. Samples: 1520716. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-16 01:09:41,856][01307] Avg episode reward: [(0, '25.742')] [2025-02-16 01:09:45,724][03429] Updated weights for policy 0, policy_version 1490 (0.0025) [2025-02-16 01:09:46,854][01307] Fps is (10 sec: 4095.8, 60 sec: 4369.0, 300 sec: 4332.0). Total num frames: 6107136. Throughput: 0: 1090.8. Samples: 1524442. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-16 01:09:46,857][01307] Avg episode reward: [(0, '24.787')] [2025-02-16 01:09:51,854][01307] Fps is (10 sec: 4505.6, 60 sec: 4300.8, 300 sec: 4332.0). Total num frames: 6127616. Throughput: 0: 1094.8. Samples: 1531900. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-16 01:09:51,863][01307] Avg episode reward: [(0, '25.737')] [2025-02-16 01:09:55,872][03429] Updated weights for policy 0, policy_version 1500 (0.0016) [2025-02-16 01:09:56,854][01307] Fps is (10 sec: 4096.2, 60 sec: 4369.1, 300 sec: 4318.2). Total num frames: 6148096. Throughput: 0: 1083.2. Samples: 1537166. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-02-16 01:09:56,856][01307] Avg episode reward: [(0, '25.175')] [2025-02-16 01:10:01,854][01307] Fps is (10 sec: 4505.6, 60 sec: 4369.1, 300 sec: 4332.1). Total num frames: 6172672. Throughput: 0: 1092.4. Samples: 1540830. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-16 01:10:01,861][01307] Avg episode reward: [(0, '25.625')] [2025-02-16 01:10:04,263][03429] Updated weights for policy 0, policy_version 1510 (0.0026) [2025-02-16 01:10:06,854][01307] Fps is (10 sec: 4505.6, 60 sec: 4300.8, 300 sec: 4332.0). Total num frames: 6193152. Throughput: 0: 1082.8. Samples: 1547852. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-02-16 01:10:06,861][01307] Avg episode reward: [(0, '26.939')] [2025-02-16 01:10:11,854][01307] Fps is (10 sec: 4096.0, 60 sec: 4369.1, 300 sec: 4318.2). Total num frames: 6213632. Throughput: 0: 1085.7. Samples: 1553388. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-16 01:10:11,863][01307] Avg episode reward: [(0, '29.741')] [2025-02-16 01:10:14,238][03429] Updated weights for policy 0, policy_version 1520 (0.0016) [2025-02-16 01:10:16,854][01307] Fps is (10 sec: 4505.6, 60 sec: 4369.1, 300 sec: 4332.0). Total num frames: 6238208. Throughput: 0: 1086.3. Samples: 1557044. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-16 01:10:16,856][01307] Avg episode reward: [(0, '29.869')] [2025-02-16 01:10:21,854][01307] Fps is (10 sec: 4096.0, 60 sec: 4232.5, 300 sec: 4318.2). Total num frames: 6254592. Throughput: 0: 1074.5. Samples: 1563786. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-16 01:10:21,859][01307] Avg episode reward: [(0, '30.799')] [2025-02-16 01:10:21,866][03416] Saving new best policy, reward=30.799! [2025-02-16 01:10:24,203][03429] Updated weights for policy 0, policy_version 1530 (0.0030) [2025-02-16 01:10:26,854][01307] Fps is (10 sec: 4096.0, 60 sec: 4369.1, 300 sec: 4318.2). Total num frames: 6279168. Throughput: 0: 1089.3. Samples: 1569734. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-16 01:10:26,860][01307] Avg episode reward: [(0, '31.634')] [2025-02-16 01:10:26,863][03416] Saving new best policy, reward=31.634! [2025-02-16 01:10:31,854][01307] Fps is (10 sec: 4505.6, 60 sec: 4300.8, 300 sec: 4332.1). Total num frames: 6299648. Throughput: 0: 1086.2. Samples: 1573320. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-02-16 01:10:31,861][01307] Avg episode reward: [(0, '29.520')] [2025-02-16 01:10:31,871][03416] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001539_6303744.pth... [2025-02-16 01:10:31,994][03416] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001285_5263360.pth [2025-02-16 01:10:32,784][03429] Updated weights for policy 0, policy_version 1540 (0.0024) [2025-02-16 01:10:36,854][01307] Fps is (10 sec: 4095.8, 60 sec: 4232.5, 300 sec: 4318.1). Total num frames: 6320128. Throughput: 0: 1058.3. Samples: 1579526. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-16 01:10:36,859][01307] Avg episode reward: [(0, '27.085')] [2025-02-16 01:10:41,854][01307] Fps is (10 sec: 4095.9, 60 sec: 4300.8, 300 sec: 4304.3). Total num frames: 6340608. Throughput: 0: 1081.2. Samples: 1585820. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-16 01:10:41,860][01307] Avg episode reward: [(0, '27.950')] [2025-02-16 01:10:42,771][03429] Updated weights for policy 0, policy_version 1550 (0.0025) [2025-02-16 01:10:46,854][01307] Fps is (10 sec: 4505.8, 60 sec: 4300.8, 300 sec: 4332.0). Total num frames: 6365184. Throughput: 0: 1083.1. Samples: 1589568. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-16 01:10:46,858][01307] Avg episode reward: [(0, '25.802')] [2025-02-16 01:10:51,854][01307] Fps is (10 sec: 4505.7, 60 sec: 4300.8, 300 sec: 4318.2). Total num frames: 6385664. Throughput: 0: 1062.8. Samples: 1595676. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-16 01:10:51,856][01307] Avg episode reward: [(0, '26.289')] [2025-02-16 01:10:52,645][03429] Updated weights for policy 0, policy_version 1560 (0.0031) [2025-02-16 01:10:56,856][01307] Fps is (10 sec: 4504.7, 60 sec: 4368.9, 300 sec: 4318.1). Total num frames: 6410240. Throughput: 0: 1088.7. Samples: 1602382. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-16 01:10:56,864][01307] Avg episode reward: [(0, '27.292')] [2025-02-16 01:11:01,100][03429] Updated weights for policy 0, policy_version 1570 (0.0012) [2025-02-16 01:11:01,854][01307] Fps is (10 sec: 4505.6, 60 sec: 4300.8, 300 sec: 4332.0). Total num frames: 6430720. Throughput: 0: 1088.8. Samples: 1606042. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-16 01:11:01,860][01307] Avg episode reward: [(0, '29.053')] [2025-02-16 01:11:06,854][01307] Fps is (10 sec: 3687.1, 60 sec: 4232.5, 300 sec: 4304.3). Total num frames: 6447104. Throughput: 0: 1065.3. Samples: 1611726. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-16 01:11:06,860][01307] Avg episode reward: [(0, '30.212')] [2025-02-16 01:11:11,111][03429] Updated weights for policy 0, policy_version 1580 (0.0015) [2025-02-16 01:11:11,854][01307] Fps is (10 sec: 4096.0, 60 sec: 4300.8, 300 sec: 4318.2). Total num frames: 6471680. Throughput: 0: 1086.1. Samples: 1618610. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-16 01:11:11,862][01307] Avg episode reward: [(0, '31.822')] [2025-02-16 01:11:11,938][03416] Saving new best policy, reward=31.822! [2025-02-16 01:11:16,854][01307] Fps is (10 sec: 4915.2, 60 sec: 4300.8, 300 sec: 4332.0). Total num frames: 6496256. Throughput: 0: 1086.7. Samples: 1622222. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-16 01:11:16,856][01307] Avg episode reward: [(0, '31.266')] [2025-02-16 01:11:21,208][03429] Updated weights for policy 0, policy_version 1590 (0.0025) [2025-02-16 01:11:21,854][01307] Fps is (10 sec: 4096.0, 60 sec: 4300.8, 300 sec: 4304.3). Total num frames: 6512640. Throughput: 0: 1068.3. Samples: 1627600. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-16 01:11:21,861][01307] Avg episode reward: [(0, '32.036')] [2025-02-16 01:11:21,870][03416] Saving new best policy, reward=32.036! [2025-02-16 01:11:26,854][01307] Fps is (10 sec: 4096.0, 60 sec: 4300.8, 300 sec: 4318.2). Total num frames: 6537216. Throughput: 0: 1089.3. Samples: 1634836. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-16 01:11:26,856][01307] Avg episode reward: [(0, '29.757')] [2025-02-16 01:11:29,593][03429] Updated weights for policy 0, policy_version 1600 (0.0032) [2025-02-16 01:11:31,854][01307] Fps is (10 sec: 4915.2, 60 sec: 4369.1, 300 sec: 4345.9). Total num frames: 6561792. Throughput: 0: 1088.5. Samples: 1638550. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-16 01:11:31,856][01307] Avg episode reward: [(0, '28.636')] [2025-02-16 01:11:36,859][01307] Fps is (10 sec: 4093.9, 60 sec: 4300.5, 300 sec: 4304.2). Total num frames: 6578176. Throughput: 0: 1067.9. Samples: 1643738. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-16 01:11:36,862][01307] Avg episode reward: [(0, '27.466')] [2025-02-16 01:11:39,681][03429] Updated weights for policy 0, policy_version 1610 (0.0031) [2025-02-16 01:11:41,854][01307] Fps is (10 sec: 4096.0, 60 sec: 4369.1, 300 sec: 4318.2). Total num frames: 6602752. Throughput: 0: 1083.1. Samples: 1651118. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-16 01:11:41,860][01307] Avg episode reward: [(0, '27.892')] [2025-02-16 01:11:46,856][01307] Fps is (10 sec: 4507.0, 60 sec: 4300.7, 300 sec: 4332.0). Total num frames: 6623232. Throughput: 0: 1083.0. Samples: 1654778. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-16 01:11:46,858][01307] Avg episode reward: [(0, '26.747')] [2025-02-16 01:11:49,695][03429] Updated weights for policy 0, policy_version 1620 (0.0016) [2025-02-16 01:11:51,854][01307] Fps is (10 sec: 4096.0, 60 sec: 4300.8, 300 sec: 4304.3). Total num frames: 6643712. Throughput: 0: 1076.5. Samples: 1660170. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-16 01:11:51,862][01307] Avg episode reward: [(0, '27.937')] [2025-02-16 01:11:56,854][01307] Fps is (10 sec: 4506.5, 60 sec: 4300.9, 300 sec: 4332.0). Total num frames: 6668288. Throughput: 0: 1087.6. Samples: 1667554. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-16 01:11:56,856][01307] Avg episode reward: [(0, '28.659')] [2025-02-16 01:11:57,921][03429] Updated weights for policy 0, policy_version 1630 (0.0013) [2025-02-16 01:12:01,859][01307] Fps is (10 sec: 4503.4, 60 sec: 4300.4, 300 sec: 4332.0). Total num frames: 6688768. Throughput: 0: 1090.2. Samples: 1671288. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-16 01:12:01,861][01307] Avg episode reward: [(0, '28.244')] [2025-02-16 01:12:06,854][01307] Fps is (10 sec: 4095.9, 60 sec: 4369.0, 300 sec: 4318.2). Total num frames: 6709248. Throughput: 0: 1087.1. Samples: 1676520. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-16 01:12:06,860][01307] Avg episode reward: [(0, '29.654')] [2025-02-16 01:12:07,999][03429] Updated weights for policy 0, policy_version 1640 (0.0034) [2025-02-16 01:12:11,854][01307] Fps is (10 sec: 4507.8, 60 sec: 4369.1, 300 sec: 4332.1). Total num frames: 6733824. Throughput: 0: 1091.9. Samples: 1683970. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-16 01:12:11,861][01307] Avg episode reward: [(0, '30.604')] [2025-02-16 01:12:16,857][01307] Fps is (10 sec: 4504.4, 60 sec: 4300.6, 300 sec: 4332.0). Total num frames: 6754304. Throughput: 0: 1082.1. Samples: 1687248. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-16 01:12:16,859][01307] Avg episode reward: [(0, '31.683')] [2025-02-16 01:12:17,992][03429] Updated weights for policy 0, policy_version 1650 (0.0018) [2025-02-16 01:12:21,854][01307] Fps is (10 sec: 4096.0, 60 sec: 4369.1, 300 sec: 4318.2). Total num frames: 6774784. Throughput: 0: 1094.7. Samples: 1692992. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-16 01:12:21,862][01307] Avg episode reward: [(0, '31.309')] [2025-02-16 01:12:26,233][03429] Updated weights for policy 0, policy_version 1660 (0.0021) [2025-02-16 01:12:26,854][01307] Fps is (10 sec: 4506.9, 60 sec: 4369.1, 300 sec: 4345.9). Total num frames: 6799360. Throughput: 0: 1095.5. Samples: 1700414. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-16 01:12:26,857][01307] Avg episode reward: [(0, '30.899')] [2025-02-16 01:12:31,854][01307] Fps is (10 sec: 4096.0, 60 sec: 4232.5, 300 sec: 4318.2). Total num frames: 6815744. Throughput: 0: 1078.7. Samples: 1703318. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-16 01:12:31,857][01307] Avg episode reward: [(0, '31.172')] [2025-02-16 01:12:31,865][03416] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001664_6815744.pth... [2025-02-16 01:12:32,019][03416] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001411_5779456.pth [2025-02-16 01:12:36,384][03429] Updated weights for policy 0, policy_version 1670 (0.0028) [2025-02-16 01:12:36,854][01307] Fps is (10 sec: 4096.0, 60 sec: 4369.4, 300 sec: 4318.2). Total num frames: 6840320. Throughput: 0: 1088.8. Samples: 1709166. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-16 01:12:36,857][01307] Avg episode reward: [(0, '30.423')] [2025-02-16 01:12:41,854][01307] Fps is (10 sec: 4915.2, 60 sec: 4369.1, 300 sec: 4345.9). Total num frames: 6864896. Throughput: 0: 1087.7. Samples: 1716502. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-16 01:12:41,856][01307] Avg episode reward: [(0, '29.792')] [2025-02-16 01:12:46,463][03429] Updated weights for policy 0, policy_version 1680 (0.0030) [2025-02-16 01:12:46,854][01307] Fps is (10 sec: 4095.9, 60 sec: 4300.9, 300 sec: 4318.2). Total num frames: 6881280. Throughput: 0: 1062.1. Samples: 1719076. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-16 01:12:46,856][01307] Avg episode reward: [(0, '30.055')] [2025-02-16 01:12:51,854][01307] Fps is (10 sec: 4096.0, 60 sec: 4369.1, 300 sec: 4318.2). Total num frames: 6905856. Throughput: 0: 1089.6. Samples: 1725552. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-16 01:12:51,857][01307] Avg episode reward: [(0, '30.722')] [2025-02-16 01:12:54,774][03429] Updated weights for policy 0, policy_version 1690 (0.0025) [2025-02-16 01:12:56,856][01307] Fps is (10 sec: 4914.2, 60 sec: 4368.9, 300 sec: 4345.9). Total num frames: 6930432. Throughput: 0: 1088.3. Samples: 1732946. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-16 01:12:56,861][01307] Avg episode reward: [(0, '31.509')] [2025-02-16 01:13:01,854][01307] Fps is (10 sec: 4096.0, 60 sec: 4301.2, 300 sec: 4318.2). Total num frames: 6946816. Throughput: 0: 1065.6. Samples: 1735196. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-02-16 01:13:01,863][01307] Avg episode reward: [(0, '30.169')] [2025-02-16 01:13:04,803][03429] Updated weights for policy 0, policy_version 1700 (0.0013) [2025-02-16 01:13:06,854][01307] Fps is (10 sec: 4096.9, 60 sec: 4369.1, 300 sec: 4318.2). Total num frames: 6971392. Throughput: 0: 1085.9. Samples: 1741858. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-02-16 01:13:06,856][01307] Avg episode reward: [(0, '30.029')] [2025-02-16 01:13:11,858][01307] Fps is (10 sec: 4503.8, 60 sec: 4300.5, 300 sec: 4332.0). Total num frames: 6991872. Throughput: 0: 1078.6. Samples: 1748956. Policy #0 lag: (min: 0.0, avg: 0.8, max: 1.0) [2025-02-16 01:13:11,864][01307] Avg episode reward: [(0, '29.846')] [2025-02-16 01:13:14,851][03429] Updated weights for policy 0, policy_version 1710 (0.0020) [2025-02-16 01:13:16,854][01307] Fps is (10 sec: 4096.0, 60 sec: 4301.0, 300 sec: 4318.2). Total num frames: 7012352. Throughput: 0: 1063.0. Samples: 1751152. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-02-16 01:13:16,859][01307] Avg episode reward: [(0, '29.117')] [2025-02-16 01:13:21,854][01307] Fps is (10 sec: 4507.4, 60 sec: 4369.1, 300 sec: 4332.0). Total num frames: 7036928. Throughput: 0: 1093.1. Samples: 1758356. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-02-16 01:13:21,856][01307] Avg episode reward: [(0, '27.797')] [2025-02-16 01:13:23,268][03429] Updated weights for policy 0, policy_version 1720 (0.0019) [2025-02-16 01:13:26,858][01307] Fps is (10 sec: 4503.8, 60 sec: 4300.5, 300 sec: 4332.0). Total num frames: 7057408. Throughput: 0: 1078.7. Samples: 1765046. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-02-16 01:13:26,860][01307] Avg episode reward: [(0, '26.943')] [2025-02-16 01:13:31,854][01307] Fps is (10 sec: 4096.0, 60 sec: 4369.1, 300 sec: 4318.2). Total num frames: 7077888. Throughput: 0: 1074.1. Samples: 1767412. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-02-16 01:13:31,857][01307] Avg episode reward: [(0, '26.884')] [2025-02-16 01:13:33,040][03429] Updated weights for policy 0, policy_version 1730 (0.0026) [2025-02-16 01:13:36,854][01307] Fps is (10 sec: 4507.4, 60 sec: 4369.1, 300 sec: 4345.9). Total num frames: 7102464. Throughput: 0: 1095.5. Samples: 1774850. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-02-16 01:13:36,856][01307] Avg episode reward: [(0, '27.204')] [2025-02-16 01:13:41,854][01307] Fps is (10 sec: 4505.6, 60 sec: 4300.8, 300 sec: 4332.0). Total num frames: 7122944. Throughput: 0: 1068.8. Samples: 1781038. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-16 01:13:41,856][01307] Avg episode reward: [(0, '27.099')] [2025-02-16 01:13:42,971][03429] Updated weights for policy 0, policy_version 1740 (0.0017) [2025-02-16 01:13:46,854][01307] Fps is (10 sec: 4096.0, 60 sec: 4369.1, 300 sec: 4318.2). Total num frames: 7143424. Throughput: 0: 1078.5. Samples: 1783728. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-16 01:13:46,856][01307] Avg episode reward: [(0, '26.584')] [2025-02-16 01:13:51,332][03429] Updated weights for policy 0, policy_version 1750 (0.0013) [2025-02-16 01:13:51,854][01307] Fps is (10 sec: 4505.6, 60 sec: 4369.1, 300 sec: 4345.9). Total num frames: 7168000. Throughput: 0: 1096.3. Samples: 1791192. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-16 01:13:51,862][01307] Avg episode reward: [(0, '27.624')] [2025-02-16 01:13:56,854][01307] Fps is (10 sec: 4096.0, 60 sec: 4232.7, 300 sec: 4318.2). Total num frames: 7184384. Throughput: 0: 1070.6. Samples: 1797130. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-02-16 01:13:56,861][01307] Avg episode reward: [(0, '26.635')] [2025-02-16 01:14:01,241][03429] Updated weights for policy 0, policy_version 1760 (0.0019) [2025-02-16 01:14:01,854][01307] Fps is (10 sec: 4096.0, 60 sec: 4369.1, 300 sec: 4318.2). Total num frames: 7208960. Throughput: 0: 1090.0. Samples: 1800204. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-02-16 01:14:01,860][01307] Avg episode reward: [(0, '25.008')] [2025-02-16 01:14:06,854][01307] Fps is (10 sec: 4915.2, 60 sec: 4369.1, 300 sec: 4345.9). Total num frames: 7233536. Throughput: 0: 1096.3. Samples: 1807688. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2025-02-16 01:14:06,862][01307] Avg episode reward: [(0, '25.109')] [2025-02-16 01:14:11,121][03429] Updated weights for policy 0, policy_version 1770 (0.0017) [2025-02-16 01:14:11,854][01307] Fps is (10 sec: 4096.0, 60 sec: 4301.1, 300 sec: 4318.2). Total num frames: 7249920. Throughput: 0: 1067.8. Samples: 1813094. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-16 01:14:11,861][01307] Avg episode reward: [(0, '25.850')] [2025-02-16 01:14:16,854][01307] Fps is (10 sec: 4096.0, 60 sec: 4369.1, 300 sec: 4318.2). Total num frames: 7274496. Throughput: 0: 1090.8. Samples: 1816500. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-16 01:14:16,858][01307] Avg episode reward: [(0, '25.987')] [2025-02-16 01:14:19,651][03429] Updated weights for policy 0, policy_version 1780 (0.0021) [2025-02-16 01:14:21,855][01307] Fps is (10 sec: 4914.7, 60 sec: 4369.0, 300 sec: 4345.9). Total num frames: 7299072. Throughput: 0: 1089.5. Samples: 1823880. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-02-16 01:14:21,862][01307] Avg episode reward: [(0, '25.389')] [2025-02-16 01:14:26,854][01307] Fps is (10 sec: 4096.0, 60 sec: 4301.1, 300 sec: 4318.2). Total num frames: 7315456. Throughput: 0: 1070.6. Samples: 1829216. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-16 01:14:26,856][01307] Avg episode reward: [(0, '25.163')] [2025-02-16 01:14:29,817][03429] Updated weights for policy 0, policy_version 1790 (0.0021) [2025-02-16 01:14:31,854][01307] Fps is (10 sec: 4096.4, 60 sec: 4369.1, 300 sec: 4318.2). Total num frames: 7340032. Throughput: 0: 1090.0. Samples: 1832780. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-16 01:14:31,862][01307] Avg episode reward: [(0, '25.841')] [2025-02-16 01:14:31,872][03416] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001792_7340032.pth... [2025-02-16 01:14:31,993][03416] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001539_6303744.pth [2025-02-16 01:14:36,854][01307] Fps is (10 sec: 4505.6, 60 sec: 4300.8, 300 sec: 4332.0). Total num frames: 7360512. Throughput: 0: 1088.0. Samples: 1840150. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-16 01:14:36,856][01307] Avg episode reward: [(0, '25.745')] [2025-02-16 01:14:39,904][03429] Updated weights for policy 0, policy_version 1800 (0.0017) [2025-02-16 01:14:41,854][01307] Fps is (10 sec: 4096.0, 60 sec: 4300.8, 300 sec: 4318.2). Total num frames: 7380992. Throughput: 0: 1071.8. Samples: 1845362. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-16 01:14:41,861][01307] Avg episode reward: [(0, '24.925')] [2025-02-16 01:14:46,854][01307] Fps is (10 sec: 4505.6, 60 sec: 4369.1, 300 sec: 4332.0). Total num frames: 7405568. Throughput: 0: 1084.4. Samples: 1849004. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2025-02-16 01:14:46,856][01307] Avg episode reward: [(0, '26.697')] [2025-02-16 01:14:48,150][03429] Updated weights for policy 0, policy_version 1810 (0.0022) [2025-02-16 01:14:51,854][01307] Fps is (10 sec: 4505.6, 60 sec: 4300.8, 300 sec: 4332.0). Total num frames: 7426048. Throughput: 0: 1084.7. Samples: 1856500. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-16 01:14:51,856][01307] Avg episode reward: [(0, '25.899')] [2025-02-16 01:14:56,854][01307] Fps is (10 sec: 4096.0, 60 sec: 4369.1, 300 sec: 4318.2). Total num frames: 7446528. Throughput: 0: 1083.6. Samples: 1861858. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-16 01:14:56,858][01307] Avg episode reward: [(0, '26.471')] [2025-02-16 01:14:58,194][03429] Updated weights for policy 0, policy_version 1820 (0.0032) [2025-02-16 01:15:01,854][01307] Fps is (10 sec: 4505.6, 60 sec: 4369.1, 300 sec: 4332.0). Total num frames: 7471104. Throughput: 0: 1091.5. Samples: 1865616. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-16 01:15:01,856][01307] Avg episode reward: [(0, '25.779')] [2025-02-16 01:15:06,856][01307] Fps is (10 sec: 4504.7, 60 sec: 4300.7, 300 sec: 4332.0). Total num frames: 7491584. Throughput: 0: 1083.4. Samples: 1872636. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-16 01:15:06,862][01307] Avg episode reward: [(0, '25.964')] [2025-02-16 01:15:07,668][03429] Updated weights for policy 0, policy_version 1830 (0.0016) [2025-02-16 01:15:11,854][01307] Fps is (10 sec: 4096.0, 60 sec: 4369.1, 300 sec: 4318.2). Total num frames: 7512064. Throughput: 0: 1092.0. Samples: 1878358. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-16 01:15:11,856][01307] Avg episode reward: [(0, '25.878')] [2025-02-16 01:15:16,360][03429] Updated weights for policy 0, policy_version 1840 (0.0016) [2025-02-16 01:15:16,854][01307] Fps is (10 sec: 4506.5, 60 sec: 4369.1, 300 sec: 4345.9). Total num frames: 7536640. Throughput: 0: 1095.3. Samples: 1882070. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-16 01:15:16,858][01307] Avg episode reward: [(0, '26.617')] [2025-02-16 01:15:21,854][01307] Fps is (10 sec: 4505.7, 60 sec: 4300.9, 300 sec: 4332.0). Total num frames: 7557120. Throughput: 0: 1081.6. Samples: 1888822. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-16 01:15:21,857][01307] Avg episode reward: [(0, '27.322')] [2025-02-16 01:15:26,423][03429] Updated weights for policy 0, policy_version 1850 (0.0016) [2025-02-16 01:15:26,854][01307] Fps is (10 sec: 4096.0, 60 sec: 4369.1, 300 sec: 4332.0). Total num frames: 7577600. Throughput: 0: 1100.9. Samples: 1894902. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-02-16 01:15:26,862][01307] Avg episode reward: [(0, '28.527')] [2025-02-16 01:15:31,854][01307] Fps is (10 sec: 4505.6, 60 sec: 4369.1, 300 sec: 4345.9). Total num frames: 7602176. Throughput: 0: 1102.5. Samples: 1898618. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-16 01:15:31,864][01307] Avg episode reward: [(0, '28.878')] [2025-02-16 01:15:36,080][03429] Updated weights for policy 0, policy_version 1860 (0.0027) [2025-02-16 01:15:36,854][01307] Fps is (10 sec: 4096.0, 60 sec: 4300.8, 300 sec: 4332.0). Total num frames: 7618560. Throughput: 0: 1068.2. Samples: 1904570. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-16 01:15:36,860][01307] Avg episode reward: [(0, '28.668')] [2025-02-16 01:15:41,854][01307] Fps is (10 sec: 3686.3, 60 sec: 4300.8, 300 sec: 4318.2). Total num frames: 7639040. Throughput: 0: 1080.9. Samples: 1910498. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-16 01:15:41,857][01307] Avg episode reward: [(0, '28.590')] [2025-02-16 01:15:45,286][03429] Updated weights for policy 0, policy_version 1870 (0.0025) [2025-02-16 01:15:46,854][01307] Fps is (10 sec: 4505.6, 60 sec: 4300.8, 300 sec: 4332.0). Total num frames: 7663616. Throughput: 0: 1077.6. Samples: 1914110. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2025-02-16 01:15:46,862][01307] Avg episode reward: [(0, '26.811')] [2025-02-16 01:15:51,854][01307] Fps is (10 sec: 4096.0, 60 sec: 4232.5, 300 sec: 4304.3). Total num frames: 7680000. Throughput: 0: 1056.4. Samples: 1920172. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-02-16 01:15:51,858][01307] Avg episode reward: [(0, '25.986')] [2025-02-16 01:15:55,426][03429] Updated weights for policy 0, policy_version 1880 (0.0019) [2025-02-16 01:15:56,854][01307] Fps is (10 sec: 4096.0, 60 sec: 4300.8, 300 sec: 4318.2). Total num frames: 7704576. Throughput: 0: 1072.8. Samples: 1926632. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-16 01:15:56,856][01307] Avg episode reward: [(0, '25.820')] [2025-02-16 01:16:01,854][01307] Fps is (10 sec: 4915.3, 60 sec: 4300.8, 300 sec: 4345.9). Total num frames: 7729152. Throughput: 0: 1072.0. Samples: 1930312. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-16 01:16:01,857][01307] Avg episode reward: [(0, '25.646')] [2025-02-16 01:16:05,202][03429] Updated weights for policy 0, policy_version 1890 (0.0015) [2025-02-16 01:16:06,854][01307] Fps is (10 sec: 4096.0, 60 sec: 4232.7, 300 sec: 4318.2). Total num frames: 7745536. Throughput: 0: 1047.5. Samples: 1935960. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-16 01:16:06,858][01307] Avg episode reward: [(0, '25.981')] [2025-02-16 01:16:11,854][01307] Fps is (10 sec: 3686.4, 60 sec: 4232.5, 300 sec: 4304.3). Total num frames: 7766016. Throughput: 0: 1054.9. Samples: 1942372. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-16 01:16:11,861][01307] Avg episode reward: [(0, '26.749')] [2025-02-16 01:16:14,662][03429] Updated weights for policy 0, policy_version 1900 (0.0014) [2025-02-16 01:16:16,854][01307] Fps is (10 sec: 4505.6, 60 sec: 4232.5, 300 sec: 4332.0). Total num frames: 7790592. Throughput: 0: 1048.6. Samples: 1945804. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-16 01:16:16,858][01307] Avg episode reward: [(0, '27.040')] [2025-02-16 01:16:21,854][01307] Fps is (10 sec: 4096.0, 60 sec: 4164.3, 300 sec: 4304.3). Total num frames: 7806976. Throughput: 0: 1031.4. Samples: 1950982. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-16 01:16:21,856][01307] Avg episode reward: [(0, '27.198')] [2025-02-16 01:16:25,045][03429] Updated weights for policy 0, policy_version 1910 (0.0015) [2025-02-16 01:16:26,854][01307] Fps is (10 sec: 4096.0, 60 sec: 4232.5, 300 sec: 4304.3). Total num frames: 7831552. Throughput: 0: 1049.7. Samples: 1957736. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-16 01:16:26,856][01307] Avg episode reward: [(0, '25.645')] [2025-02-16 01:16:31,854][01307] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 4318.2). Total num frames: 7852032. Throughput: 0: 1047.5. Samples: 1961248. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2025-02-16 01:16:31,858][01307] Avg episode reward: [(0, '26.266')] [2025-02-16 01:16:31,867][03416] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001917_7852032.pth... [2025-02-16 01:16:32,021][03416] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001664_6815744.pth [2025-02-16 01:16:35,531][03429] Updated weights for policy 0, policy_version 1920 (0.0015) [2025-02-16 01:16:36,854][01307] Fps is (10 sec: 3686.4, 60 sec: 4164.3, 300 sec: 4290.4). Total num frames: 7868416. Throughput: 0: 1023.6. Samples: 1966234. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-16 01:16:36,862][01307] Avg episode reward: [(0, '26.614')] [2025-02-16 01:16:41,854][01307] Fps is (10 sec: 4096.0, 60 sec: 4232.6, 300 sec: 4304.3). Total num frames: 7892992. Throughput: 0: 1039.1. Samples: 1973390. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-16 01:16:41,856][01307] Avg episode reward: [(0, '26.592')] [2025-02-16 01:16:44,187][03429] Updated weights for policy 0, policy_version 1930 (0.0012) [2025-02-16 01:16:46,856][01307] Fps is (10 sec: 4504.6, 60 sec: 4164.1, 300 sec: 4304.2). Total num frames: 7913472. Throughput: 0: 1037.2. Samples: 1976990. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-16 01:16:46,859][01307] Avg episode reward: [(0, '27.235')] [2025-02-16 01:16:51,854][01307] Fps is (10 sec: 4095.9, 60 sec: 4232.5, 300 sec: 4290.4). Total num frames: 7933952. Throughput: 0: 1031.2. Samples: 1982364. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-16 01:16:51,863][01307] Avg episode reward: [(0, '28.007')] [2025-02-16 01:16:54,218][03429] Updated weights for policy 0, policy_version 1940 (0.0019) [2025-02-16 01:16:56,854][01307] Fps is (10 sec: 4506.5, 60 sec: 4232.5, 300 sec: 4304.3). Total num frames: 7958528. Throughput: 0: 1052.8. Samples: 1989750. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-16 01:16:56,856][01307] Avg episode reward: [(0, '28.874')] [2025-02-16 01:17:01,854][01307] Fps is (10 sec: 4505.7, 60 sec: 4164.3, 300 sec: 4304.3). Total num frames: 7979008. Throughput: 0: 1057.7. Samples: 1993400. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-16 01:17:01,856][01307] Avg episode reward: [(0, '28.476')] [2025-02-16 01:17:04,197][03429] Updated weights for policy 0, policy_version 1950 (0.0020) [2025-02-16 01:17:06,854][01307] Fps is (10 sec: 4096.1, 60 sec: 4232.5, 300 sec: 4290.4). Total num frames: 7999488. Throughput: 0: 1060.0. Samples: 1998684. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-16 01:17:06,856][01307] Avg episode reward: [(0, '27.797')] [2025-02-16 01:17:11,854][01307] Fps is (10 sec: 4505.6, 60 sec: 4300.8, 300 sec: 4304.3). Total num frames: 8024064. Throughput: 0: 1072.5. Samples: 2006000. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-16 01:17:11,856][01307] Avg episode reward: [(0, '27.858')] [2025-02-16 01:17:12,530][03429] Updated weights for policy 0, policy_version 1960 (0.0028) [2025-02-16 01:17:16,854][01307] Fps is (10 sec: 4095.9, 60 sec: 4164.3, 300 sec: 4290.4). Total num frames: 8040448. Throughput: 0: 1067.3. Samples: 2009276. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-16 01:17:16,858][01307] Avg episode reward: [(0, '27.887')] [2025-02-16 01:17:21,854][01307] Fps is (10 sec: 4096.0, 60 sec: 4300.8, 300 sec: 4290.4). Total num frames: 8065024. Throughput: 0: 1083.5. Samples: 2014990. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-16 01:17:21,860][01307] Avg episode reward: [(0, '28.551')] [2025-02-16 01:17:22,618][03429] Updated weights for policy 0, policy_version 1970 (0.0020) [2025-02-16 01:17:26,854][01307] Fps is (10 sec: 4915.1, 60 sec: 4300.8, 300 sec: 4318.2). Total num frames: 8089600. Throughput: 0: 1086.9. Samples: 2022302. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-16 01:17:26,857][01307] Avg episode reward: [(0, '29.425')] [2025-02-16 01:17:31,854][01307] Fps is (10 sec: 4096.0, 60 sec: 4232.5, 300 sec: 4290.4). Total num frames: 8105984. Throughput: 0: 1070.3. Samples: 2025152. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-16 01:17:31,862][01307] Avg episode reward: [(0, '27.953')] [2025-02-16 01:17:32,655][03429] Updated weights for policy 0, policy_version 1980 (0.0016) [2025-02-16 01:17:36,854][01307] Fps is (10 sec: 4096.1, 60 sec: 4369.1, 300 sec: 4290.4). Total num frames: 8130560. Throughput: 0: 1084.9. Samples: 2031186. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-16 01:17:36,861][01307] Avg episode reward: [(0, '28.284')] [2025-02-16 01:17:41,280][03429] Updated weights for policy 0, policy_version 1990 (0.0018) [2025-02-16 01:17:41,854][01307] Fps is (10 sec: 4505.6, 60 sec: 4300.8, 300 sec: 4304.3). Total num frames: 8151040. Throughput: 0: 1075.9. Samples: 2038164. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2025-02-16 01:17:41,860][01307] Avg episode reward: [(0, '27.345')] [2025-02-16 01:17:46,854][01307] Fps is (10 sec: 3686.4, 60 sec: 4232.7, 300 sec: 4276.5). Total num frames: 8167424. Throughput: 0: 1050.0. Samples: 2040648. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-16 01:17:46,856][01307] Avg episode reward: [(0, '27.376')] [2025-02-16 01:17:51,418][03429] Updated weights for policy 0, policy_version 2000 (0.0041) [2025-02-16 01:17:51,854][01307] Fps is (10 sec: 4096.0, 60 sec: 4300.8, 300 sec: 4276.5). Total num frames: 8192000. Throughput: 0: 1075.8. Samples: 2047096. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-16 01:17:51,856][01307] Avg episode reward: [(0, '26.667')] [2025-02-16 01:17:56,854][01307] Fps is (10 sec: 4915.2, 60 sec: 4300.8, 300 sec: 4304.3). Total num frames: 8216576. Throughput: 0: 1077.4. Samples: 2054482. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-16 01:17:56,861][01307] Avg episode reward: [(0, '25.721')] [2025-02-16 01:18:01,610][03429] Updated weights for policy 0, policy_version 2010 (0.0012) [2025-02-16 01:18:01,854][01307] Fps is (10 sec: 4095.9, 60 sec: 4232.5, 300 sec: 4276.5). Total num frames: 8232960. Throughput: 0: 1051.4. Samples: 2056590. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-16 01:18:01,863][01307] Avg episode reward: [(0, '26.668')] [2025-02-16 01:18:06,854][01307] Fps is (10 sec: 3686.4, 60 sec: 4232.5, 300 sec: 4276.6). Total num frames: 8253440. Throughput: 0: 1065.0. Samples: 2062914. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-16 01:18:06,856][01307] Avg episode reward: [(0, '24.929')] [2025-02-16 01:18:10,539][03429] Updated weights for policy 0, policy_version 2020 (0.0016) [2025-02-16 01:18:11,854][01307] Fps is (10 sec: 4096.1, 60 sec: 4164.3, 300 sec: 4276.5). Total num frames: 8273920. Throughput: 0: 1052.0. Samples: 2069642. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-02-16 01:18:11,856][01307] Avg episode reward: [(0, '25.271')] [2025-02-16 01:18:16,854][01307] Fps is (10 sec: 4095.9, 60 sec: 4232.5, 300 sec: 4262.6). Total num frames: 8294400. Throughput: 0: 1037.6. Samples: 2071842. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-16 01:18:16,857][01307] Avg episode reward: [(0, '25.873')] [2025-02-16 01:18:20,594][03429] Updated weights for policy 0, policy_version 2030 (0.0023) [2025-02-16 01:18:21,854][01307] Fps is (10 sec: 4505.6, 60 sec: 4232.5, 300 sec: 4276.6). Total num frames: 8318976. Throughput: 0: 1059.2. Samples: 2078848. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-16 01:18:21,856][01307] Avg episode reward: [(0, '25.646')] [2025-02-16 01:18:26,854][01307] Fps is (10 sec: 4505.7, 60 sec: 4164.3, 300 sec: 4276.5). Total num frames: 8339456. Throughput: 0: 1048.6. Samples: 2085352. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-16 01:18:26,858][01307] Avg episode reward: [(0, '26.429')] [2025-02-16 01:18:31,072][03429] Updated weights for policy 0, policy_version 2040 (0.0024) [2025-02-16 01:18:31,854][01307] Fps is (10 sec: 3686.4, 60 sec: 4164.3, 300 sec: 4248.7). Total num frames: 8355840. Throughput: 0: 1042.2. Samples: 2087548. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-02-16 01:18:31,856][01307] Avg episode reward: [(0, '26.117')] [2025-02-16 01:18:31,886][03416] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002041_8359936.pth... [2025-02-16 01:18:32,015][03416] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001792_7340032.pth [2025-02-16 01:18:36,854][01307] Fps is (10 sec: 4096.0, 60 sec: 4164.3, 300 sec: 4262.6). Total num frames: 8380416. Throughput: 0: 1054.6. Samples: 2094554. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-02-16 01:18:36,856][01307] Avg episode reward: [(0, '27.149')] [2025-02-16 01:18:40,328][03429] Updated weights for policy 0, policy_version 2050 (0.0013) [2025-02-16 01:18:41,860][01307] Fps is (10 sec: 4502.9, 60 sec: 4163.8, 300 sec: 4262.5). Total num frames: 8400896. Throughput: 0: 1020.4. Samples: 2100408. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-16 01:18:41,862][01307] Avg episode reward: [(0, '26.336')] [2025-02-16 01:18:46,854][01307] Fps is (10 sec: 4096.0, 60 sec: 4232.5, 300 sec: 4248.7). Total num frames: 8421376. Throughput: 0: 1029.2. Samples: 2102904. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-02-16 01:18:46,856][01307] Avg episode reward: [(0, '26.151')] [2025-02-16 01:18:50,106][03429] Updated weights for policy 0, policy_version 2060 (0.0022) [2025-02-16 01:18:51,854][01307] Fps is (10 sec: 4508.4, 60 sec: 4232.5, 300 sec: 4276.5). Total num frames: 8445952. Throughput: 0: 1049.9. Samples: 2110158. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-16 01:18:51,863][01307] Avg episode reward: [(0, '27.781')] [2025-02-16 01:18:56,855][01307] Fps is (10 sec: 4095.4, 60 sec: 4095.9, 300 sec: 4248.7). Total num frames: 8462336. Throughput: 0: 1029.6. Samples: 2115974. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-02-16 01:18:56,863][01307] Avg episode reward: [(0, '28.108')] [2025-02-16 01:19:00,281][03429] Updated weights for policy 0, policy_version 2070 (0.0017) [2025-02-16 01:19:01,854][01307] Fps is (10 sec: 3686.3, 60 sec: 4164.3, 300 sec: 4234.8). Total num frames: 8482816. Throughput: 0: 1046.7. Samples: 2118944. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-16 01:19:01,860][01307] Avg episode reward: [(0, '28.764')] [2025-02-16 01:19:06,854][01307] Fps is (10 sec: 4506.3, 60 sec: 4232.5, 300 sec: 4262.6). Total num frames: 8507392. Throughput: 0: 1055.3. Samples: 2126338. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-16 01:19:06,859][01307] Avg episode reward: [(0, '30.049')] [2025-02-16 01:19:09,213][03429] Updated weights for policy 0, policy_version 2080 (0.0020) [2025-02-16 01:19:11,855][01307] Fps is (10 sec: 4095.7, 60 sec: 4164.2, 300 sec: 4234.8). Total num frames: 8523776. Throughput: 0: 1037.9. Samples: 2132058. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-02-16 01:19:11,857][01307] Avg episode reward: [(0, '30.582')] [2025-02-16 01:19:16,855][01307] Fps is (10 sec: 4095.7, 60 sec: 4232.5, 300 sec: 4234.9). Total num frames: 8548352. Throughput: 0: 1060.1. Samples: 2135252. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-16 01:19:16,861][01307] Avg episode reward: [(0, '31.589')] [2025-02-16 01:19:18,724][03429] Updated weights for policy 0, policy_version 2090 (0.0020) [2025-02-16 01:19:21,854][01307] Fps is (10 sec: 4915.7, 60 sec: 4232.5, 300 sec: 4262.6). Total num frames: 8572928. Throughput: 0: 1070.1. Samples: 2142708. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-16 01:19:21,859][01307] Avg episode reward: [(0, '31.365')] [2025-02-16 01:19:26,854][01307] Fps is (10 sec: 4096.3, 60 sec: 4164.3, 300 sec: 4234.8). Total num frames: 8589312. Throughput: 0: 1061.7. Samples: 2148176. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-16 01:19:26,857][01307] Avg episode reward: [(0, '29.540')] [2025-02-16 01:19:28,588][03429] Updated weights for policy 0, policy_version 2100 (0.0017) [2025-02-16 01:19:31,854][01307] Fps is (10 sec: 4096.0, 60 sec: 4300.8, 300 sec: 4248.7). Total num frames: 8613888. Throughput: 0: 1086.3. Samples: 2151786. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-16 01:19:31,860][01307] Avg episode reward: [(0, '26.517')] [2025-02-16 01:19:36,855][01307] Fps is (10 sec: 4914.8, 60 sec: 4300.7, 300 sec: 4262.6). Total num frames: 8638464. Throughput: 0: 1091.4. Samples: 2159274. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-16 01:19:36,857][01307] Avg episode reward: [(0, '25.671')] [2025-02-16 01:19:36,996][03429] Updated weights for policy 0, policy_version 2110 (0.0016) [2025-02-16 01:19:41,854][01307] Fps is (10 sec: 4096.0, 60 sec: 4233.0, 300 sec: 4234.8). Total num frames: 8654848. Throughput: 0: 1078.0. Samples: 2164482. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-16 01:19:41,857][01307] Avg episode reward: [(0, '24.173')] [2025-02-16 01:19:46,854][01307] Fps is (10 sec: 4096.3, 60 sec: 4300.8, 300 sec: 4248.7). Total num frames: 8679424. Throughput: 0: 1095.0. Samples: 2168220. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-16 01:19:46,861][01307] Avg episode reward: [(0, '24.156')] [2025-02-16 01:19:46,874][03429] Updated weights for policy 0, policy_version 2120 (0.0019) [2025-02-16 01:19:51,857][01307] Fps is (10 sec: 4913.7, 60 sec: 4300.6, 300 sec: 4262.6). Total num frames: 8704000. Throughput: 0: 1093.6. Samples: 2175552. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-16 01:19:51,861][01307] Avg episode reward: [(0, '26.817')] [2025-02-16 01:19:56,854][01307] Fps is (10 sec: 4096.0, 60 sec: 4300.9, 300 sec: 4234.9). Total num frames: 8720384. Throughput: 0: 1085.4. Samples: 2180902. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-02-16 01:19:56,856][01307] Avg episode reward: [(0, '27.098')] [2025-02-16 01:19:56,874][03429] Updated weights for policy 0, policy_version 2130 (0.0025) [2025-02-16 01:20:01,854][01307] Fps is (10 sec: 4097.3, 60 sec: 4369.1, 300 sec: 4248.8). Total num frames: 8744960. Throughput: 0: 1093.0. Samples: 2184436. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-02-16 01:20:01,856][01307] Avg episode reward: [(0, '28.414')] [2025-02-16 01:20:05,455][03429] Updated weights for policy 0, policy_version 2140 (0.0017) [2025-02-16 01:20:06,854][01307] Fps is (10 sec: 4915.1, 60 sec: 4369.1, 300 sec: 4262.6). Total num frames: 8769536. Throughput: 0: 1090.4. Samples: 2191776. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-16 01:20:06,857][01307] Avg episode reward: [(0, '28.383')] [2025-02-16 01:20:11,854][01307] Fps is (10 sec: 4095.9, 60 sec: 4369.1, 300 sec: 4234.8). Total num frames: 8785920. Throughput: 0: 1086.0. Samples: 2197046. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-16 01:20:11,856][01307] Avg episode reward: [(0, '30.849')] [2025-02-16 01:20:15,517][03429] Updated weights for policy 0, policy_version 2150 (0.0012) [2025-02-16 01:20:16,854][01307] Fps is (10 sec: 4096.0, 60 sec: 4369.1, 300 sec: 4248.7). Total num frames: 8810496. Throughput: 0: 1087.4. Samples: 2200718. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-02-16 01:20:16,858][01307] Avg episode reward: [(0, '29.090')] [2025-02-16 01:20:21,854][01307] Fps is (10 sec: 4505.7, 60 sec: 4300.8, 300 sec: 4248.7). Total num frames: 8830976. Throughput: 0: 1079.0. Samples: 2207828. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-02-16 01:20:21,864][01307] Avg episode reward: [(0, '29.147')] [2025-02-16 01:20:25,214][03429] Updated weights for policy 0, policy_version 2160 (0.0015) [2025-02-16 01:20:26,854][01307] Fps is (10 sec: 4096.0, 60 sec: 4369.1, 300 sec: 4234.8). Total num frames: 8851456. Throughput: 0: 1090.8. Samples: 2213566. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-16 01:20:26,856][01307] Avg episode reward: [(0, '29.346')] [2025-02-16 01:20:31,854][01307] Fps is (10 sec: 4505.6, 60 sec: 4369.1, 300 sec: 4262.6). Total num frames: 8876032. Throughput: 0: 1088.0. Samples: 2217182. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-16 01:20:31,857][01307] Avg episode reward: [(0, '27.203')] [2025-02-16 01:20:31,870][03416] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002167_8876032.pth... [2025-02-16 01:20:32,002][03416] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001917_7852032.pth [2025-02-16 01:20:33,864][03429] Updated weights for policy 0, policy_version 2170 (0.0018) [2025-02-16 01:20:36,854][01307] Fps is (10 sec: 4505.6, 60 sec: 4300.9, 300 sec: 4262.6). Total num frames: 8896512. Throughput: 0: 1068.7. Samples: 2223640. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-16 01:20:36,860][01307] Avg episode reward: [(0, '25.844')] [2025-02-16 01:20:41,854][01307] Fps is (10 sec: 4096.0, 60 sec: 4369.1, 300 sec: 4248.7). Total num frames: 8916992. Throughput: 0: 1081.9. Samples: 2229588. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-02-16 01:20:41,860][01307] Avg episode reward: [(0, '24.710')] [2025-02-16 01:20:44,020][03429] Updated weights for policy 0, policy_version 2180 (0.0016) [2025-02-16 01:20:46,854][01307] Fps is (10 sec: 4505.6, 60 sec: 4369.1, 300 sec: 4276.5). Total num frames: 8941568. Throughput: 0: 1080.9. Samples: 2233078. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-16 01:20:46,860][01307] Avg episode reward: [(0, '26.352')] [2025-02-16 01:20:51,854][01307] Fps is (10 sec: 4096.0, 60 sec: 4232.7, 300 sec: 4248.7). Total num frames: 8957952. Throughput: 0: 1059.0. Samples: 2239430. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-16 01:20:51,857][01307] Avg episode reward: [(0, '25.334')] [2025-02-16 01:20:54,021][03429] Updated weights for policy 0, policy_version 2190 (0.0018) [2025-02-16 01:20:56,854][01307] Fps is (10 sec: 4096.0, 60 sec: 4369.1, 300 sec: 4248.7). Total num frames: 8982528. Throughput: 0: 1082.6. Samples: 2245762. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-16 01:20:56,855][01307] Avg episode reward: [(0, '24.870')] [2025-02-16 01:21:01,854][01307] Fps is (10 sec: 4915.2, 60 sec: 4369.1, 300 sec: 4276.5). Total num frames: 9007104. Throughput: 0: 1082.8. Samples: 2249444. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-16 01:21:01,864][01307] Avg episode reward: [(0, '25.776')] [2025-02-16 01:21:02,501][03429] Updated weights for policy 0, policy_version 2200 (0.0016) [2025-02-16 01:21:06,854][01307] Fps is (10 sec: 4096.0, 60 sec: 4232.5, 300 sec: 4262.6). Total num frames: 9023488. Throughput: 0: 1055.4. Samples: 2255320. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-16 01:21:06,856][01307] Avg episode reward: [(0, '26.997')] [2025-02-16 01:21:11,854][01307] Fps is (10 sec: 4095.9, 60 sec: 4369.1, 300 sec: 4262.6). Total num frames: 9048064. Throughput: 0: 1077.0. Samples: 2262030. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-16 01:21:11,860][01307] Avg episode reward: [(0, '25.765')] [2025-02-16 01:21:12,544][03429] Updated weights for policy 0, policy_version 2210 (0.0021) [2025-02-16 01:21:16,854][01307] Fps is (10 sec: 4505.6, 60 sec: 4300.8, 300 sec: 4276.5). Total num frames: 9068544. Throughput: 0: 1077.6. Samples: 2265674. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-16 01:21:16,856][01307] Avg episode reward: [(0, '26.619')] [2025-02-16 01:21:21,854][01307] Fps is (10 sec: 3686.5, 60 sec: 4232.5, 300 sec: 4248.7). Total num frames: 9084928. Throughput: 0: 1050.4. Samples: 2270906. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-16 01:21:21,856][01307] Avg episode reward: [(0, '28.693')] [2025-02-16 01:21:22,994][03429] Updated weights for policy 0, policy_version 2220 (0.0012) [2025-02-16 01:21:26,854][01307] Fps is (10 sec: 4096.0, 60 sec: 4300.8, 300 sec: 4262.6). Total num frames: 9109504. Throughput: 0: 1070.9. Samples: 2277780. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-16 01:21:26,856][01307] Avg episode reward: [(0, '29.841')] [2025-02-16 01:21:31,854][01307] Fps is (10 sec: 4505.5, 60 sec: 4232.5, 300 sec: 4276.5). Total num frames: 9129984. Throughput: 0: 1071.1. Samples: 2281280. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-02-16 01:21:31,857][01307] Avg episode reward: [(0, '28.697')] [2025-02-16 01:21:32,275][03429] Updated weights for policy 0, policy_version 2230 (0.0017) [2025-02-16 01:21:36,854][01307] Fps is (10 sec: 4096.0, 60 sec: 4232.5, 300 sec: 4262.6). Total num frames: 9150464. Throughput: 0: 1042.6. Samples: 2286346. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-16 01:21:36,862][01307] Avg episode reward: [(0, '28.932')] [2025-02-16 01:21:41,854][01307] Fps is (10 sec: 4096.1, 60 sec: 4232.5, 300 sec: 4262.6). Total num frames: 9170944. Throughput: 0: 1057.4. Samples: 2293346. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-02-16 01:21:41,861][01307] Avg episode reward: [(0, '28.301')] [2025-02-16 01:21:42,240][03429] Updated weights for policy 0, policy_version 2240 (0.0018) [2025-02-16 01:21:46,854][01307] Fps is (10 sec: 4095.9, 60 sec: 4164.2, 300 sec: 4262.6). Total num frames: 9191424. Throughput: 0: 1053.5. Samples: 2296852. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-16 01:21:46,862][01307] Avg episode reward: [(0, '27.995')] [2025-02-16 01:21:51,857][01307] Fps is (10 sec: 4094.6, 60 sec: 4232.3, 300 sec: 4248.7). Total num frames: 9211904. Throughput: 0: 1043.4. Samples: 2302276. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-16 01:21:51,860][01307] Avg episode reward: [(0, '25.623')] [2025-02-16 01:21:52,238][03429] Updated weights for policy 0, policy_version 2250 (0.0020) [2025-02-16 01:21:56,854][01307] Fps is (10 sec: 4505.7, 60 sec: 4232.5, 300 sec: 4262.6). Total num frames: 9236480. Throughput: 0: 1057.5. Samples: 2309616. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-02-16 01:21:56,858][01307] Avg episode reward: [(0, '26.332')] [2025-02-16 01:22:01,552][03429] Updated weights for policy 0, policy_version 2260 (0.0014) [2025-02-16 01:22:01,854][01307] Fps is (10 sec: 4507.1, 60 sec: 4164.3, 300 sec: 4262.6). Total num frames: 9256960. Throughput: 0: 1056.9. Samples: 2313236. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-02-16 01:22:01,856][01307] Avg episode reward: [(0, '26.060')] [2025-02-16 01:22:06,854][01307] Fps is (10 sec: 4095.9, 60 sec: 4232.5, 300 sec: 4248.7). Total num frames: 9277440. Throughput: 0: 1057.9. Samples: 2318510. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-16 01:22:06,857][01307] Avg episode reward: [(0, '26.632')] [2025-02-16 01:22:10,614][03429] Updated weights for policy 0, policy_version 2270 (0.0019) [2025-02-16 01:22:11,854][01307] Fps is (10 sec: 4505.5, 60 sec: 4232.5, 300 sec: 4276.5). Total num frames: 9302016. Throughput: 0: 1069.6. Samples: 2325912. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-02-16 01:22:11,862][01307] Avg episode reward: [(0, '25.674')] [2025-02-16 01:22:16,854][01307] Fps is (10 sec: 4505.5, 60 sec: 4232.5, 300 sec: 4262.6). Total num frames: 9322496. Throughput: 0: 1063.7. Samples: 2329148. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2025-02-16 01:22:16,860][01307] Avg episode reward: [(0, '25.490')] [2025-02-16 01:22:20,407][03429] Updated weights for policy 0, policy_version 2280 (0.0015) [2025-02-16 01:22:21,854][01307] Fps is (10 sec: 4096.1, 60 sec: 4300.8, 300 sec: 4248.7). Total num frames: 9342976. Throughput: 0: 1080.1. Samples: 2334950. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2025-02-16 01:22:21,856][01307] Avg episode reward: [(0, '26.595')] [2025-02-16 01:22:26,854][01307] Fps is (10 sec: 4505.8, 60 sec: 4300.8, 300 sec: 4276.5). Total num frames: 9367552. Throughput: 0: 1089.8. Samples: 2342386. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-16 01:22:26,860][01307] Avg episode reward: [(0, '27.028')] [2025-02-16 01:22:29,668][03429] Updated weights for policy 0, policy_version 2290 (0.0021) [2025-02-16 01:22:31,854][01307] Fps is (10 sec: 4096.0, 60 sec: 4232.6, 300 sec: 4248.7). Total num frames: 9383936. Throughput: 0: 1075.6. Samples: 2345254. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-16 01:22:31,856][01307] Avg episode reward: [(0, '26.695')] [2025-02-16 01:22:31,862][03416] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002291_9383936.pth... [2025-02-16 01:22:31,994][03416] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002041_8359936.pth [2025-02-16 01:22:36,854][01307] Fps is (10 sec: 4096.0, 60 sec: 4300.8, 300 sec: 4262.6). Total num frames: 9408512. Throughput: 0: 1091.6. Samples: 2351394. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-16 01:22:36,863][01307] Avg episode reward: [(0, '27.250')] [2025-02-16 01:22:38,891][03429] Updated weights for policy 0, policy_version 2300 (0.0017) [2025-02-16 01:22:41,854][01307] Fps is (10 sec: 4915.2, 60 sec: 4369.1, 300 sec: 4290.4). Total num frames: 9433088. Throughput: 0: 1091.5. Samples: 2358734. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-02-16 01:22:41,858][01307] Avg episode reward: [(0, '27.877')] [2025-02-16 01:22:46,854][01307] Fps is (10 sec: 4096.0, 60 sec: 4300.8, 300 sec: 4262.6). Total num frames: 9449472. Throughput: 0: 1066.4. Samples: 2361226. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-02-16 01:22:46,862][01307] Avg episode reward: [(0, '27.497')] [2025-02-16 01:22:48,860][03429] Updated weights for policy 0, policy_version 2310 (0.0023) [2025-02-16 01:22:51,854][01307] Fps is (10 sec: 4096.0, 60 sec: 4369.3, 300 sec: 4262.6). Total num frames: 9474048. Throughput: 0: 1094.7. Samples: 2367772. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-02-16 01:22:51,861][01307] Avg episode reward: [(0, '28.638')] [2025-02-16 01:22:56,854][01307] Fps is (10 sec: 4915.2, 60 sec: 4369.1, 300 sec: 4290.4). Total num frames: 9498624. Throughput: 0: 1092.6. Samples: 2375078. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-02-16 01:22:56,856][01307] Avg episode reward: [(0, '28.870')] [2025-02-16 01:22:57,898][03429] Updated weights for policy 0, policy_version 2320 (0.0021) [2025-02-16 01:23:01,854][01307] Fps is (10 sec: 4096.0, 60 sec: 4300.8, 300 sec: 4276.5). Total num frames: 9515008. Throughput: 0: 1071.5. Samples: 2377366. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-02-16 01:23:01,857][01307] Avg episode reward: [(0, '28.887')] [2025-02-16 01:23:06,854][01307] Fps is (10 sec: 4096.0, 60 sec: 4369.1, 300 sec: 4290.4). Total num frames: 9539584. Throughput: 0: 1096.5. Samples: 2384294. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-16 01:23:06,859][01307] Avg episode reward: [(0, '29.266')] [2025-02-16 01:23:07,008][03429] Updated weights for policy 0, policy_version 2330 (0.0028) [2025-02-16 01:23:11,854][01307] Fps is (10 sec: 4915.2, 60 sec: 4369.1, 300 sec: 4304.3). Total num frames: 9564160. Throughput: 0: 1085.6. Samples: 2391236. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-16 01:23:11,856][01307] Avg episode reward: [(0, '28.020')] [2025-02-16 01:23:16,854][01307] Fps is (10 sec: 4096.0, 60 sec: 4300.8, 300 sec: 4276.5). Total num frames: 9580544. Throughput: 0: 1071.8. Samples: 2393484. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-16 01:23:16,861][01307] Avg episode reward: [(0, '27.665')] [2025-02-16 01:23:17,014][03429] Updated weights for policy 0, policy_version 2340 (0.0016) [2025-02-16 01:23:21,854][01307] Fps is (10 sec: 4096.0, 60 sec: 4369.1, 300 sec: 4290.4). Total num frames: 9605120. Throughput: 0: 1094.9. Samples: 2400664. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-16 01:23:21,862][01307] Avg episode reward: [(0, '26.100')] [2025-02-16 01:23:25,670][03429] Updated weights for policy 0, policy_version 2350 (0.0024) [2025-02-16 01:23:26,854][01307] Fps is (10 sec: 4505.6, 60 sec: 4300.8, 300 sec: 4304.3). Total num frames: 9625600. Throughput: 0: 1080.3. Samples: 2407346. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-16 01:23:26,856][01307] Avg episode reward: [(0, '26.961')] [2025-02-16 01:23:31,854][01307] Fps is (10 sec: 4096.0, 60 sec: 4369.1, 300 sec: 4290.4). Total num frames: 9646080. Throughput: 0: 1076.7. Samples: 2409676. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2025-02-16 01:23:31,864][01307] Avg episode reward: [(0, '26.275')] [2025-02-16 01:23:35,363][03429] Updated weights for policy 0, policy_version 2360 (0.0012) [2025-02-16 01:23:36,854][01307] Fps is (10 sec: 4505.6, 60 sec: 4369.1, 300 sec: 4304.4). Total num frames: 9670656. Throughput: 0: 1098.9. Samples: 2417224. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-02-16 01:23:36,856][01307] Avg episode reward: [(0, '27.880')] [2025-02-16 01:23:41,854][01307] Fps is (10 sec: 4505.6, 60 sec: 4300.8, 300 sec: 4304.3). Total num frames: 9691136. Throughput: 0: 1074.8. Samples: 2423442. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-16 01:23:41,864][01307] Avg episode reward: [(0, '29.475')] [2025-02-16 01:23:45,343][03429] Updated weights for policy 0, policy_version 2370 (0.0020) [2025-02-16 01:23:46,854][01307] Fps is (10 sec: 4095.9, 60 sec: 4369.1, 300 sec: 4290.4). Total num frames: 9711616. Throughput: 0: 1083.9. Samples: 2426140. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-16 01:23:46,856][01307] Avg episode reward: [(0, '29.378')] [2025-02-16 01:23:51,854][01307] Fps is (10 sec: 4505.5, 60 sec: 4369.1, 300 sec: 4318.2). Total num frames: 9736192. Throughput: 0: 1095.4. Samples: 2433588. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-16 01:23:51,862][01307] Avg episode reward: [(0, '29.889')] [2025-02-16 01:23:53,901][03429] Updated weights for policy 0, policy_version 2380 (0.0014) [2025-02-16 01:23:56,854][01307] Fps is (10 sec: 4505.7, 60 sec: 4300.8, 300 sec: 4318.2). Total num frames: 9756672. Throughput: 0: 1072.5. Samples: 2439500. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-16 01:23:56,856][01307] Avg episode reward: [(0, '30.907')] [2025-02-16 01:24:01,854][01307] Fps is (10 sec: 4096.1, 60 sec: 4369.1, 300 sec: 4304.3). Total num frames: 9777152. Throughput: 0: 1086.6. Samples: 2442380. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-16 01:24:01,861][01307] Avg episode reward: [(0, '30.331')] [2025-02-16 01:24:03,799][03429] Updated weights for policy 0, policy_version 2390 (0.0026) [2025-02-16 01:24:06,854][01307] Fps is (10 sec: 4505.6, 60 sec: 4369.1, 300 sec: 4332.1). Total num frames: 9801728. Throughput: 0: 1090.4. Samples: 2449734. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-02-16 01:24:06,856][01307] Avg episode reward: [(0, '26.211')] [2025-02-16 01:24:11,854][01307] Fps is (10 sec: 4096.0, 60 sec: 4232.5, 300 sec: 4304.3). Total num frames: 9818112. Throughput: 0: 1068.3. Samples: 2455420. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-16 01:24:11,856][01307] Avg episode reward: [(0, '26.045')] [2025-02-16 01:24:13,840][03429] Updated weights for policy 0, policy_version 2400 (0.0031) [2025-02-16 01:24:16,854][01307] Fps is (10 sec: 4096.0, 60 sec: 4369.1, 300 sec: 4304.3). Total num frames: 9842688. Throughput: 0: 1088.6. Samples: 2458664. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-16 01:24:16,856][01307] Avg episode reward: [(0, '26.159')] [2025-02-16 01:24:21,854][01307] Fps is (10 sec: 4915.2, 60 sec: 4369.1, 300 sec: 4332.0). Total num frames: 9867264. Throughput: 0: 1084.6. Samples: 2466030. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-16 01:24:21,856][01307] Avg episode reward: [(0, '25.502')] [2025-02-16 01:24:22,357][03429] Updated weights for policy 0, policy_version 2410 (0.0027) [2025-02-16 01:24:26,854][01307] Fps is (10 sec: 4096.0, 60 sec: 4300.8, 300 sec: 4304.3). Total num frames: 9883648. Throughput: 0: 1059.7. Samples: 2471130. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-16 01:24:26,856][01307] Avg episode reward: [(0, '24.530')] [2025-02-16 01:24:31,854][01307] Fps is (10 sec: 3686.4, 60 sec: 4300.8, 300 sec: 4290.4). Total num frames: 9904128. Throughput: 0: 1074.4. Samples: 2474488. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-16 01:24:31,856][01307] Avg episode reward: [(0, '26.203')] [2025-02-16 01:24:31,867][03416] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002419_9908224.pth... [2025-02-16 01:24:31,991][03416] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002167_8876032.pth [2025-02-16 01:24:32,852][03429] Updated weights for policy 0, policy_version 2420 (0.0034) [2025-02-16 01:24:36,857][01307] Fps is (10 sec: 4504.0, 60 sec: 4300.5, 300 sec: 4318.1). Total num frames: 9928704. Throughput: 0: 1064.7. Samples: 2481504. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-16 01:24:36,860][01307] Avg episode reward: [(0, '27.082')] [2025-02-16 01:24:41,854][01307] Fps is (10 sec: 4096.0, 60 sec: 4232.5, 300 sec: 4290.4). Total num frames: 9945088. Throughput: 0: 1046.8. Samples: 2486608. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-16 01:24:41,856][01307] Avg episode reward: [(0, '26.672')] [2025-02-16 01:24:43,330][03429] Updated weights for policy 0, policy_version 2430 (0.0021) [2025-02-16 01:24:46,854][01307] Fps is (10 sec: 4097.4, 60 sec: 4300.8, 300 sec: 4290.4). Total num frames: 9969664. Throughput: 0: 1057.2. Samples: 2489954. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-16 01:24:46,857][01307] Avg episode reward: [(0, '28.125')] [2025-02-16 01:24:51,858][01307] Fps is (10 sec: 4503.9, 60 sec: 4232.3, 300 sec: 4304.2). Total num frames: 9990144. Throughput: 0: 1052.3. Samples: 2497090. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-16 01:24:51,860][01307] Avg episode reward: [(0, '29.560')] [2025-02-16 01:24:52,758][03429] Updated weights for policy 0, policy_version 2440 (0.0025) [2025-02-16 01:24:56,146][03416] Stopping Batcher_0... [2025-02-16 01:24:56,147][03416] Loop batcher_evt_loop terminating... [2025-02-16 01:24:56,148][01307] Component Batcher_0 stopped! [2025-02-16 01:24:56,149][03416] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002443_10006528.pth... [2025-02-16 01:24:56,215][03429] Weights refcount: 2 0 [2025-02-16 01:24:56,217][03429] Stopping InferenceWorker_p0-w0... [2025-02-16 01:24:56,218][03429] Loop inference_proc0-0_evt_loop terminating... [2025-02-16 01:24:56,219][01307] Component InferenceWorker_p0-w0 stopped! [2025-02-16 01:24:56,270][03416] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002291_9383936.pth [2025-02-16 01:24:56,286][03416] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002443_10006528.pth... [2025-02-16 01:24:56,475][01307] Component LearnerWorker_p0 stopped! [2025-02-16 01:24:56,485][03416] Stopping LearnerWorker_p0... [2025-02-16 01:24:56,485][03416] Loop learner_proc0_evt_loop terminating... [2025-02-16 01:24:56,606][01307] Component RolloutWorker_w6 stopped! [2025-02-16 01:24:56,614][03436] Stopping RolloutWorker_w6... [2025-02-16 01:24:56,618][01307] Component RolloutWorker_w2 stopped! [2025-02-16 01:24:56,625][03432] Stopping RolloutWorker_w2... [2025-02-16 01:24:56,625][03432] Loop rollout_proc2_evt_loop terminating... [2025-02-16 01:24:56,615][03436] Loop rollout_proc6_evt_loop terminating... [2025-02-16 01:24:56,647][01307] Component RolloutWorker_w4 stopped! [2025-02-16 01:24:56,653][03434] Stopping RolloutWorker_w4... [2025-02-16 01:24:56,654][03434] Loop rollout_proc4_evt_loop terminating... [2025-02-16 01:24:56,659][03437] Stopping RolloutWorker_w7... [2025-02-16 01:24:56,664][03437] Loop rollout_proc7_evt_loop terminating... [2025-02-16 01:24:56,659][01307] Component RolloutWorker_w7 stopped! [2025-02-16 01:24:56,668][01307] Component RolloutWorker_w0 stopped! [2025-02-16 01:24:56,675][03430] Stopping RolloutWorker_w0... [2025-02-16 01:24:56,676][03430] Loop rollout_proc0_evt_loop terminating... [2025-02-16 01:24:56,772][03431] Stopping RolloutWorker_w1... [2025-02-16 01:24:56,773][03431] Loop rollout_proc1_evt_loop terminating... [2025-02-16 01:24:56,777][03433] Stopping RolloutWorker_w3... [2025-02-16 01:24:56,777][03433] Loop rollout_proc3_evt_loop terminating... [2025-02-16 01:24:56,772][01307] Component RolloutWorker_w1 stopped! [2025-02-16 01:24:56,782][01307] Component RolloutWorker_w3 stopped! [2025-02-16 01:24:56,805][03435] Stopping RolloutWorker_w5... [2025-02-16 01:24:56,805][03435] Loop rollout_proc5_evt_loop terminating... [2025-02-16 01:24:56,805][01307] Component RolloutWorker_w5 stopped! [2025-02-16 01:24:56,813][01307] Waiting for process learner_proc0 to stop... [2025-02-16 01:24:58,292][01307] Waiting for process inference_proc0-0 to join... [2025-02-16 01:24:58,300][01307] Waiting for process rollout_proc0 to join... [2025-02-16 01:25:00,650][01307] Waiting for process rollout_proc1 to join... [2025-02-16 01:25:00,652][01307] Waiting for process rollout_proc2 to join... [2025-02-16 01:25:00,660][01307] Waiting for process rollout_proc3 to join... [2025-02-16 01:25:00,661][01307] Waiting for process rollout_proc4 to join... [2025-02-16 01:25:00,662][01307] Waiting for process rollout_proc5 to join... [2025-02-16 01:25:00,664][01307] Waiting for process rollout_proc6 to join... [2025-02-16 01:25:00,665][01307] Waiting for process rollout_proc7 to join... [2025-02-16 01:25:00,668][01307] Batcher 0 profile tree view: batching: 61.5927, releasing_batches: 0.0654 [2025-02-16 01:25:00,672][01307] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0000 wait_policy_total: 957.6630 update_model: 19.6852 weight_update: 0.0037 one_step: 0.0026 handle_policy_step: 1332.3044 deserialize: 32.8836, stack: 6.7227, obs_to_device_normalize: 285.2566, forward: 681.0066, send_messages: 64.9330 prepare_outputs: 204.2906 to_cpu: 126.9792 [2025-02-16 01:25:00,675][01307] Learner 0 profile tree view: misc: 0.0094, prepare_batch: 28.5323 train: 173.4330 epoch_init: 0.0195, minibatch_init: 0.0228, losses_postprocess: 1.5836, kl_divergence: 1.6913, after_optimizer: 82.5151 calculate_losses: 60.2850 losses_init: 0.0180, forward_head: 2.7129, bptt_initial: 40.4449, tail: 2.4629, advantages_returns: 0.7000, losses: 8.5636 bptt: 4.7594 bptt_forward_core: 4.5568 update: 26.0917 clip: 1.9920 [2025-02-16 01:25:00,676][01307] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.6008, enqueue_policy_requests: 222.1597, env_step: 1923.4436, overhead: 28.0570, complete_rollouts: 16.4786 save_policy_outputs: 43.7273 split_output_tensors: 17.3239 [2025-02-16 01:25:00,677][01307] RolloutWorker_w7 profile tree view: wait_for_trajectories: 0.5845, enqueue_policy_requests: 224.0027, env_step: 1918.8797, overhead: 27.7542, complete_rollouts: 16.2327 save_policy_outputs: 42.5930 split_output_tensors: 16.5904 [2025-02-16 01:25:00,678][01307] Loop Runner_EvtLoop terminating... [2025-02-16 01:25:00,679][01307] Runner profile tree view: main_loop: 2423.3856 [2025-02-16 01:25:00,680][01307] Collected {0: 10006528}, FPS: 4129.2 [2025-02-16 01:53:01,285][01307] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2025-02-16 01:53:01,286][01307] Overriding arg 'num_workers' with value 1 passed from command line [2025-02-16 01:53:01,289][01307] Adding new argument 'no_render'=True that is not in the saved config file! [2025-02-16 01:53:01,291][01307] Adding new argument 'save_video'=True that is not in the saved config file! [2025-02-16 01:53:01,293][01307] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2025-02-16 01:53:01,295][01307] Adding new argument 'video_name'=None that is not in the saved config file! [2025-02-16 01:53:01,297][01307] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! [2025-02-16 01:53:01,298][01307] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2025-02-16 01:53:01,299][01307] Adding new argument 'push_to_hub'=False that is not in the saved config file! [2025-02-16 01:53:01,301][01307] Adding new argument 'hf_repository'=None that is not in the saved config file! [2025-02-16 01:53:01,302][01307] Adding new argument 'policy_index'=0 that is not in the saved config file! [2025-02-16 01:53:01,303][01307] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2025-02-16 01:53:01,304][01307] Adding new argument 'train_script'=None that is not in the saved config file! [2025-02-16 01:53:01,304][01307] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2025-02-16 01:53:01,305][01307] Using frameskip 1 and render_action_repeat=4 for evaluation [2025-02-16 01:53:01,339][01307] Doom resolution: 160x120, resize resolution: (128, 72) [2025-02-16 01:53:01,343][01307] RunningMeanStd input shape: (3, 72, 128) [2025-02-16 01:53:01,345][01307] RunningMeanStd input shape: (1,) [2025-02-16 01:53:01,362][01307] ConvEncoder: input_channels=3 [2025-02-16 01:53:01,461][01307] Conv encoder output size: 512 [2025-02-16 01:53:01,463][01307] Policy head output size: 512 [2025-02-16 01:53:01,745][01307] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002443_10006528.pth... [2025-02-16 01:53:02,520][01307] Num frames 100... [2025-02-16 01:53:02,648][01307] Num frames 200... [2025-02-16 01:53:02,784][01307] Num frames 300... [2025-02-16 01:53:02,919][01307] Num frames 400... [2025-02-16 01:53:03,049][01307] Num frames 500... [2025-02-16 01:53:03,185][01307] Num frames 600... [2025-02-16 01:53:03,315][01307] Num frames 700... [2025-02-16 01:53:03,448][01307] Num frames 800... [2025-02-16 01:53:03,582][01307] Num frames 900... [2025-02-16 01:53:03,723][01307] Num frames 1000... [2025-02-16 01:53:03,853][01307] Num frames 1100... [2025-02-16 01:53:03,984][01307] Num frames 1200... [2025-02-16 01:53:04,138][01307] Num frames 1300... [2025-02-16 01:53:04,262][01307] Num frames 1400... [2025-02-16 01:53:04,386][01307] Num frames 1500... [2025-02-16 01:53:04,510][01307] Num frames 1600... [2025-02-16 01:53:04,636][01307] Num frames 1700... [2025-02-16 01:53:04,758][01307] Avg episode rewards: #0: 47.459, true rewards: #0: 17.460 [2025-02-16 01:53:04,760][01307] Avg episode reward: 47.459, avg true_objective: 17.460 [2025-02-16 01:53:04,829][01307] Num frames 1800... [2025-02-16 01:53:04,955][01307] Num frames 1900... [2025-02-16 01:53:05,083][01307] Num frames 2000... [2025-02-16 01:53:05,219][01307] Num frames 2100... [2025-02-16 01:53:05,352][01307] Num frames 2200... [2025-02-16 01:53:05,481][01307] Num frames 2300... [2025-02-16 01:53:05,608][01307] Num frames 2400... [2025-02-16 01:53:05,744][01307] Num frames 2500... [2025-02-16 01:53:05,871][01307] Num frames 2600... [2025-02-16 01:53:06,004][01307] Num frames 2700... [2025-02-16 01:53:06,129][01307] Num frames 2800... [2025-02-16 01:53:06,263][01307] Num frames 2900... [2025-02-16 01:53:06,388][01307] Num frames 3000... [2025-02-16 01:53:06,513][01307] Num frames 3100... [2025-02-16 01:53:06,640][01307] Num frames 3200... [2025-02-16 01:53:06,771][01307] Num frames 3300... [2025-02-16 01:53:06,906][01307] Num frames 3400... [2025-02-16 01:53:07,031][01307] Num frames 3500... [2025-02-16 01:53:07,159][01307] Num frames 3600... [2025-02-16 01:53:07,295][01307] Num frames 3700... [2025-02-16 01:53:07,429][01307] Avg episode rewards: #0: 51.309, true rewards: #0: 18.810 [2025-02-16 01:53:07,431][01307] Avg episode reward: 51.309, avg true_objective: 18.810 [2025-02-16 01:53:07,479][01307] Num frames 3800... [2025-02-16 01:53:07,602][01307] Num frames 3900... [2025-02-16 01:53:07,736][01307] Num frames 4000... [2025-02-16 01:53:07,862][01307] Num frames 4100... [2025-02-16 01:53:07,988][01307] Num frames 4200... [2025-02-16 01:53:08,111][01307] Num frames 4300... [2025-02-16 01:53:08,240][01307] Num frames 4400... [2025-02-16 01:53:08,421][01307] Avg episode rewards: #0: 39.993, true rewards: #0: 14.993 [2025-02-16 01:53:08,422][01307] Avg episode reward: 39.993, avg true_objective: 14.993 [2025-02-16 01:53:08,427][01307] Num frames 4500... [2025-02-16 01:53:08,550][01307] Num frames 4600... [2025-02-16 01:53:08,680][01307] Num frames 4700... [2025-02-16 01:53:08,809][01307] Num frames 4800... [2025-02-16 01:53:08,934][01307] Num frames 4900... [2025-02-16 01:53:09,059][01307] Num frames 5000... [2025-02-16 01:53:09,182][01307] Num frames 5100... [2025-02-16 01:53:09,311][01307] Num frames 5200... [2025-02-16 01:53:09,434][01307] Num frames 5300... [2025-02-16 01:53:09,559][01307] Num frames 5400... [2025-02-16 01:53:09,690][01307] Num frames 5500... [2025-02-16 01:53:09,826][01307] Num frames 5600... [2025-02-16 01:53:09,957][01307] Num frames 5700... [2025-02-16 01:53:10,084][01307] Num frames 5800... [2025-02-16 01:53:10,209][01307] Num frames 5900... [2025-02-16 01:53:10,343][01307] Num frames 6000... [2025-02-16 01:53:10,516][01307] Num frames 6100... [2025-02-16 01:53:10,693][01307] Num frames 6200... [2025-02-16 01:53:10,862][01307] Num frames 6300... [2025-02-16 01:53:11,035][01307] Num frames 6400... [2025-02-16 01:53:11,237][01307] Avg episode rewards: #0: 42.454, true rewards: #0: 16.205 [2025-02-16 01:53:11,239][01307] Avg episode reward: 42.454, avg true_objective: 16.205 [2025-02-16 01:53:11,270][01307] Num frames 6500... [2025-02-16 01:53:11,442][01307] Num frames 6600... [2025-02-16 01:53:11,608][01307] Num frames 6700... [2025-02-16 01:53:11,801][01307] Num frames 6800... [2025-02-16 01:53:11,976][01307] Num frames 6900... [2025-02-16 01:53:12,152][01307] Num frames 7000... [2025-02-16 01:53:12,318][01307] Num frames 7100... [2025-02-16 01:53:12,450][01307] Avg episode rewards: #0: 37.107, true rewards: #0: 14.308 [2025-02-16 01:53:12,451][01307] Avg episode reward: 37.107, avg true_objective: 14.308 [2025-02-16 01:53:12,511][01307] Num frames 7200... [2025-02-16 01:53:12,635][01307] Num frames 7300... [2025-02-16 01:53:12,766][01307] Num frames 7400... [2025-02-16 01:53:12,900][01307] Num frames 7500... [2025-02-16 01:53:13,005][01307] Avg episode rewards: #0: 31.896, true rewards: #0: 12.563 [2025-02-16 01:53:13,006][01307] Avg episode reward: 31.896, avg true_objective: 12.563 [2025-02-16 01:53:13,084][01307] Num frames 7600... [2025-02-16 01:53:13,208][01307] Num frames 7700... [2025-02-16 01:53:13,330][01307] Num frames 7800... [2025-02-16 01:53:13,466][01307] Num frames 7900... [2025-02-16 01:53:13,595][01307] Num frames 8000... [2025-02-16 01:53:13,734][01307] Num frames 8100... [2025-02-16 01:53:13,862][01307] Num frames 8200... [2025-02-16 01:53:13,991][01307] Num frames 8300... [2025-02-16 01:53:14,121][01307] Num frames 8400... [2025-02-16 01:53:14,249][01307] Num frames 8500... [2025-02-16 01:53:14,379][01307] Num frames 8600... [2025-02-16 01:53:14,513][01307] Num frames 8700... [2025-02-16 01:53:14,644][01307] Num frames 8800... [2025-02-16 01:53:14,782][01307] Num frames 8900... [2025-02-16 01:53:14,911][01307] Num frames 9000... [2025-02-16 01:53:15,038][01307] Num frames 9100... [2025-02-16 01:53:15,170][01307] Num frames 9200... [2025-02-16 01:53:15,300][01307] Num frames 9300... [2025-02-16 01:53:15,484][01307] Avg episode rewards: #0: 35.134, true rewards: #0: 13.420 [2025-02-16 01:53:15,486][01307] Avg episode reward: 35.134, avg true_objective: 13.420 [2025-02-16 01:53:15,497][01307] Num frames 9400... [2025-02-16 01:53:15,623][01307] Num frames 9500... [2025-02-16 01:53:15,758][01307] Num frames 9600... [2025-02-16 01:53:15,885][01307] Num frames 9700... [2025-02-16 01:53:16,013][01307] Num frames 9800... [2025-02-16 01:53:16,141][01307] Num frames 9900... [2025-02-16 01:53:16,266][01307] Num frames 10000... [2025-02-16 01:53:16,393][01307] Num frames 10100... [2025-02-16 01:53:16,528][01307] Num frames 10200... [2025-02-16 01:53:16,685][01307] Num frames 10300... [2025-02-16 01:53:16,819][01307] Num frames 10400... [2025-02-16 01:53:16,939][01307] Avg episode rewards: #0: 34.187, true rewards: #0: 13.063 [2025-02-16 01:53:16,941][01307] Avg episode reward: 34.187, avg true_objective: 13.063 [2025-02-16 01:53:17,007][01307] Num frames 10500... [2025-02-16 01:53:17,134][01307] Num frames 10600... [2025-02-16 01:53:17,262][01307] Num frames 10700... [2025-02-16 01:53:17,396][01307] Num frames 10800... [2025-02-16 01:53:17,537][01307] Num frames 10900... [2025-02-16 01:53:17,662][01307] Num frames 11000... [2025-02-16 01:53:17,793][01307] Num frames 11100... [2025-02-16 01:53:17,918][01307] Num frames 11200... [2025-02-16 01:53:18,050][01307] Num frames 11300... [2025-02-16 01:53:18,177][01307] Num frames 11400... [2025-02-16 01:53:18,305][01307] Num frames 11500... [2025-02-16 01:53:18,431][01307] Num frames 11600... [2025-02-16 01:53:18,491][01307] Avg episode rewards: #0: 33.113, true rewards: #0: 12.891 [2025-02-16 01:53:18,493][01307] Avg episode reward: 33.113, avg true_objective: 12.891 [2025-02-16 01:53:18,617][01307] Num frames 11700... [2025-02-16 01:53:18,749][01307] Num frames 11800... [2025-02-16 01:53:18,877][01307] Num frames 11900... [2025-02-16 01:53:19,008][01307] Num frames 12000... [2025-02-16 01:53:19,132][01307] Num frames 12100... [2025-02-16 01:53:19,257][01307] Num frames 12200... [2025-02-16 01:53:19,383][01307] Num frames 12300... [2025-02-16 01:53:19,516][01307] Num frames 12400... [2025-02-16 01:53:19,644][01307] Num frames 12500... [2025-02-16 01:53:19,824][01307] Avg episode rewards: #0: 31.892, true rewards: #0: 12.592 [2025-02-16 01:53:19,825][01307] Avg episode reward: 31.892, avg true_objective: 12.592 [2025-02-16 01:54:29,554][01307] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2025-02-16 01:57:20,350][01307] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2025-02-16 01:57:20,352][01307] Overriding arg 'num_workers' with value 1 passed from command line [2025-02-16 01:57:20,354][01307] Adding new argument 'no_render'=True that is not in the saved config file! [2025-02-16 01:57:20,356][01307] Adding new argument 'save_video'=True that is not in the saved config file! [2025-02-16 01:57:20,358][01307] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2025-02-16 01:57:20,359][01307] Adding new argument 'video_name'=None that is not in the saved config file! [2025-02-16 01:57:20,361][01307] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2025-02-16 01:57:20,362][01307] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2025-02-16 01:57:20,363][01307] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2025-02-16 01:57:20,364][01307] Adding new argument 'hf_repository'='51nd0re1/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2025-02-16 01:57:20,365][01307] Adding new argument 'policy_index'=0 that is not in the saved config file! [2025-02-16 01:57:20,366][01307] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2025-02-16 01:57:20,367][01307] Adding new argument 'train_script'=None that is not in the saved config file! [2025-02-16 01:57:20,368][01307] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2025-02-16 01:57:20,369][01307] Using frameskip 1 and render_action_repeat=4 for evaluation [2025-02-16 01:57:20,399][01307] RunningMeanStd input shape: (3, 72, 128) [2025-02-16 01:57:20,400][01307] RunningMeanStd input shape: (1,) [2025-02-16 01:57:20,413][01307] ConvEncoder: input_channels=3 [2025-02-16 01:57:20,446][01307] Conv encoder output size: 512 [2025-02-16 01:57:20,449][01307] Policy head output size: 512 [2025-02-16 01:57:20,468][01307] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002443_10006528.pth... [2025-02-16 01:57:20,926][01307] Num frames 100... [2025-02-16 01:57:21,053][01307] Num frames 200... [2025-02-16 01:57:21,180][01307] Num frames 300... [2025-02-16 01:57:21,324][01307] Num frames 400... [2025-02-16 01:57:21,447][01307] Num frames 500... [2025-02-16 01:57:21,575][01307] Num frames 600... [2025-02-16 01:57:21,703][01307] Num frames 700... [2025-02-16 01:57:21,834][01307] Num frames 800... [2025-02-16 01:57:21,961][01307] Num frames 900... [2025-02-16 01:57:22,088][01307] Num frames 1000... [2025-02-16 01:57:22,220][01307] Num frames 1100... [2025-02-16 01:57:22,350][01307] Num frames 1200... [2025-02-16 01:57:22,476][01307] Num frames 1300... [2025-02-16 01:57:22,602][01307] Num frames 1400... [2025-02-16 01:57:22,737][01307] Num frames 1500... [2025-02-16 01:57:22,872][01307] Num frames 1600... [2025-02-16 01:57:23,000][01307] Num frames 1700... [2025-02-16 01:57:23,129][01307] Num frames 1800... [2025-02-16 01:57:23,255][01307] Num frames 1900... [2025-02-16 01:57:23,380][01307] Num frames 2000... [2025-02-16 01:57:23,510][01307] Num frames 2100... [2025-02-16 01:57:23,562][01307] Avg episode rewards: #0: 59.999, true rewards: #0: 21.000 [2025-02-16 01:57:23,563][01307] Avg episode reward: 59.999, avg true_objective: 21.000 [2025-02-16 01:57:23,692][01307] Num frames 2200... [2025-02-16 01:57:23,828][01307] Num frames 2300... [2025-02-16 01:57:23,957][01307] Num frames 2400... [2025-02-16 01:57:24,085][01307] Num frames 2500... [2025-02-16 01:57:24,213][01307] Num frames 2600... [2025-02-16 01:57:24,338][01307] Num frames 2700... [2025-02-16 01:57:24,463][01307] Num frames 2800... [2025-02-16 01:57:24,600][01307] Num frames 2900... [2025-02-16 01:57:24,769][01307] Avg episode rewards: #0: 38.819, true rewards: #0: 14.820 [2025-02-16 01:57:24,771][01307] Avg episode reward: 38.819, avg true_objective: 14.820 [2025-02-16 01:57:24,848][01307] Num frames 3000... [2025-02-16 01:57:25,026][01307] Num frames 3100... [2025-02-16 01:57:25,188][01307] Num frames 3200... [2025-02-16 01:57:25,350][01307] Num frames 3300... [2025-02-16 01:57:25,517][01307] Num frames 3400... [2025-02-16 01:57:25,678][01307] Num frames 3500... [2025-02-16 01:57:25,847][01307] Num frames 3600... [2025-02-16 01:57:25,933][01307] Avg episode rewards: #0: 29.713, true rewards: #0: 12.047 [2025-02-16 01:57:25,935][01307] Avg episode reward: 29.713, avg true_objective: 12.047 [2025-02-16 01:57:26,087][01307] Num frames 3700... [2025-02-16 01:57:26,269][01307] Num frames 3800... [2025-02-16 01:57:26,450][01307] Num frames 3900... [2025-02-16 01:57:26,607][01307] Num frames 4000... [2025-02-16 01:57:26,741][01307] Num frames 4100... [2025-02-16 01:57:26,870][01307] Num frames 4200... [2025-02-16 01:57:27,006][01307] Num frames 4300... [2025-02-16 01:57:27,134][01307] Num frames 4400... [2025-02-16 01:57:27,263][01307] Num frames 4500... [2025-02-16 01:57:27,390][01307] Num frames 4600... [2025-02-16 01:57:27,520][01307] Num frames 4700... [2025-02-16 01:57:27,644][01307] Num frames 4800... [2025-02-16 01:57:27,780][01307] Num frames 4900... [2025-02-16 01:57:27,908][01307] Num frames 5000... [2025-02-16 01:57:28,044][01307] Num frames 5100... [2025-02-16 01:57:28,171][01307] Num frames 5200... [2025-02-16 01:57:28,298][01307] Num frames 5300... [2025-02-16 01:57:28,426][01307] Num frames 5400... [2025-02-16 01:57:28,554][01307] Num frames 5500... [2025-02-16 01:57:28,683][01307] Num frames 5600... [2025-02-16 01:57:28,822][01307] Avg episode rewards: #0: 35.904, true rewards: #0: 14.155 [2025-02-16 01:57:28,823][01307] Avg episode reward: 35.904, avg true_objective: 14.155 [2025-02-16 01:57:28,874][01307] Num frames 5700... [2025-02-16 01:57:29,006][01307] Num frames 5800... [2025-02-16 01:57:29,132][01307] Num frames 5900... [2025-02-16 01:57:29,259][01307] Num frames 6000... [2025-02-16 01:57:29,385][01307] Num frames 6100... [2025-02-16 01:57:29,513][01307] Num frames 6200... [2025-02-16 01:57:29,639][01307] Num frames 6300... [2025-02-16 01:57:29,772][01307] Num frames 6400... [2025-02-16 01:57:29,898][01307] Num frames 6500... [2025-02-16 01:57:30,032][01307] Num frames 6600... [2025-02-16 01:57:30,161][01307] Num frames 6700... [2025-02-16 01:57:30,291][01307] Num frames 6800... [2025-02-16 01:57:30,418][01307] Num frames 6900... [2025-02-16 01:57:30,545][01307] Num frames 7000... [2025-02-16 01:57:30,670][01307] Num frames 7100... [2025-02-16 01:57:30,819][01307] Avg episode rewards: #0: 35.531, true rewards: #0: 14.332 [2025-02-16 01:57:30,820][01307] Avg episode reward: 35.531, avg true_objective: 14.332 [2025-02-16 01:57:30,864][01307] Num frames 7200... [2025-02-16 01:57:30,989][01307] Num frames 7300... [2025-02-16 01:57:31,121][01307] Num frames 7400... [2025-02-16 01:57:31,247][01307] Num frames 7500... [2025-02-16 01:57:31,374][01307] Num frames 7600... [2025-02-16 01:57:31,539][01307] Avg episode rewards: #0: 30.976, true rewards: #0: 12.810 [2025-02-16 01:57:31,541][01307] Avg episode reward: 30.976, avg true_objective: 12.810 [2025-02-16 01:57:31,563][01307] Num frames 7700... [2025-02-16 01:57:31,694][01307] Num frames 7800... [2025-02-16 01:57:31,824][01307] Num frames 7900... [2025-02-16 01:57:31,950][01307] Num frames 8000... [2025-02-16 01:57:32,084][01307] Num frames 8100... [2025-02-16 01:57:32,215][01307] Num frames 8200... [2025-02-16 01:57:32,343][01307] Num frames 8300... [2025-02-16 01:57:32,470][01307] Num frames 8400... [2025-02-16 01:57:32,597][01307] Num frames 8500... [2025-02-16 01:57:32,730][01307] Num frames 8600... [2025-02-16 01:57:32,854][01307] Num frames 8700... [2025-02-16 01:57:32,985][01307] Num frames 8800... [2025-02-16 01:57:33,118][01307] Num frames 8900... [2025-02-16 01:57:33,246][01307] Num frames 9000... [2025-02-16 01:57:33,375][01307] Num frames 9100... [2025-02-16 01:57:33,504][01307] Num frames 9200... [2025-02-16 01:57:33,594][01307] Avg episode rewards: #0: 31.464, true rewards: #0: 13.179 [2025-02-16 01:57:33,596][01307] Avg episode reward: 31.464, avg true_objective: 13.179 [2025-02-16 01:57:33,700][01307] Num frames 9300... [2025-02-16 01:57:33,825][01307] Num frames 9400... [2025-02-16 01:57:33,950][01307] Num frames 9500... [2025-02-16 01:57:34,025][01307] Avg episode rewards: #0: 28.143, true rewards: #0: 11.894 [2025-02-16 01:57:34,026][01307] Avg episode reward: 28.143, avg true_objective: 11.894 [2025-02-16 01:57:34,141][01307] Num frames 9600... [2025-02-16 01:57:34,267][01307] Num frames 9700... [2025-02-16 01:57:34,401][01307] Num frames 9800... [2025-02-16 01:57:34,529][01307] Num frames 9900... [2025-02-16 01:57:34,656][01307] Num frames 10000... [2025-02-16 01:57:34,787][01307] Num frames 10100... [2025-02-16 01:57:34,915][01307] Num frames 10200... [2025-02-16 01:57:35,042][01307] Num frames 10300... [2025-02-16 01:57:35,175][01307] Num frames 10400... [2025-02-16 01:57:35,304][01307] Num frames 10500... [2025-02-16 01:57:35,428][01307] Num frames 10600... [2025-02-16 01:57:35,553][01307] Num frames 10700... [2025-02-16 01:57:35,681][01307] Num frames 10800... [2025-02-16 01:57:35,811][01307] Num frames 10900... [2025-02-16 01:57:35,936][01307] Num frames 11000... [2025-02-16 01:57:36,087][01307] Avg episode rewards: #0: 29.305, true rewards: #0: 12.306 [2025-02-16 01:57:36,088][01307] Avg episode reward: 29.305, avg true_objective: 12.306 [2025-02-16 01:57:36,124][01307] Num frames 11100... [2025-02-16 01:57:36,263][01307] Num frames 11200... [2025-02-16 01:57:36,391][01307] Num frames 11300... [2025-02-16 01:57:36,520][01307] Num frames 11400... [2025-02-16 01:57:36,706][01307] Num frames 11500... [2025-02-16 01:57:36,876][01307] Num frames 11600... [2025-02-16 01:57:37,049][01307] Num frames 11700... [2025-02-16 01:57:37,220][01307] Num frames 11800... [2025-02-16 01:57:37,392][01307] Num frames 11900... [2025-02-16 01:57:37,572][01307] Avg episode rewards: #0: 28.573, true rewards: #0: 11.973 [2025-02-16 01:57:37,576][01307] Avg episode reward: 28.573, avg true_objective: 11.973 [2025-02-16 01:58:43,031][01307] Replay video saved to /content/train_dir/default_experiment/replay.mp4!