[2025-01-12 21:09:41,755][01010] Saving configuration to /content/train_dir/default_experiment/config.json... [2025-01-12 21:09:41,757][01010] Rollout worker 0 uses device cpu [2025-01-12 21:09:41,759][01010] Rollout worker 1 uses device cpu [2025-01-12 21:09:41,760][01010] Rollout worker 2 uses device cpu [2025-01-12 21:09:41,762][01010] Rollout worker 3 uses device cpu [2025-01-12 21:09:41,766][01010] Rollout worker 4 uses device cpu [2025-01-12 21:09:41,767][01010] Rollout worker 5 uses device cpu [2025-01-12 21:09:41,768][01010] Rollout worker 6 uses device cpu [2025-01-12 21:09:41,770][01010] Rollout worker 7 uses device cpu [2025-01-12 21:09:41,922][01010] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2025-01-12 21:09:41,924][01010] InferenceWorker_p0-w0: min num requests: 2 [2025-01-12 21:09:41,956][01010] Starting all processes... [2025-01-12 21:09:41,958][01010] Starting process learner_proc0 [2025-01-12 21:09:42,011][01010] Starting all processes... [2025-01-12 21:09:42,019][01010] Starting process inference_proc0-0 [2025-01-12 21:09:42,019][01010] Starting process rollout_proc0 [2025-01-12 21:09:42,019][01010] Starting process rollout_proc1 [2025-01-12 21:09:42,019][01010] Starting process rollout_proc2 [2025-01-12 21:09:42,019][01010] Starting process rollout_proc3 [2025-01-12 21:09:42,019][01010] Starting process rollout_proc4 [2025-01-12 21:09:42,019][01010] Starting process rollout_proc5 [2025-01-12 21:09:42,019][01010] Starting process rollout_proc6 [2025-01-12 21:09:42,019][01010] Starting process rollout_proc7 [2025-01-12 21:09:57,761][02866] Worker 6 uses CPU cores [0] [2025-01-12 21:09:58,350][02864] Worker 2 uses CPU cores [0] [2025-01-12 21:09:58,352][02863] Worker 1 uses CPU cores [1] [2025-01-12 21:09:58,846][02848] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2025-01-12 21:09:58,847][02848] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2025-01-12 21:09:58,902][02868] Worker 5 uses CPU cores [1] [2025-01-12 21:09:58,932][02848] Num visible devices: 1 [2025-01-12 21:09:58,959][02848] Starting seed is not provided [2025-01-12 21:09:58,960][02848] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2025-01-12 21:09:58,960][02848] Initializing actor-critic model on device cuda:0 [2025-01-12 21:09:58,961][02848] RunningMeanStd input shape: (3, 72, 128) [2025-01-12 21:09:58,974][02848] RunningMeanStd input shape: (1,) [2025-01-12 21:09:59,056][02848] ConvEncoder: input_channels=3 [2025-01-12 21:09:59,188][02865] Worker 3 uses CPU cores [1] [2025-01-12 21:09:59,189][02862] Worker 0 uses CPU cores [0] [2025-01-12 21:09:59,200][02869] Worker 7 uses CPU cores [1] [2025-01-12 21:09:59,206][02861] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2025-01-12 21:09:59,207][02861] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2025-01-12 21:09:59,249][02861] Num visible devices: 1 [2025-01-12 21:09:59,306][02867] Worker 4 uses CPU cores [0] [2025-01-12 21:09:59,441][02848] Conv encoder output size: 512 [2025-01-12 21:09:59,442][02848] Policy head output size: 512 [2025-01-12 21:09:59,498][02848] Created Actor Critic model with architecture: [2025-01-12 21:09:59,499][02848] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2025-01-12 21:09:59,841][02848] Using optimizer [2025-01-12 21:10:01,915][01010] Heartbeat connected on Batcher_0 [2025-01-12 21:10:01,923][01010] Heartbeat connected on InferenceWorker_p0-w0 [2025-01-12 21:10:01,931][01010] Heartbeat connected on RolloutWorker_w0 [2025-01-12 21:10:01,934][01010] Heartbeat connected on RolloutWorker_w1 [2025-01-12 21:10:01,940][01010] Heartbeat connected on RolloutWorker_w2 [2025-01-12 21:10:01,943][01010] Heartbeat connected on RolloutWorker_w3 [2025-01-12 21:10:01,946][01010] Heartbeat connected on RolloutWorker_w4 [2025-01-12 21:10:01,949][01010] Heartbeat connected on RolloutWorker_w5 [2025-01-12 21:10:01,952][01010] Heartbeat connected on RolloutWorker_w6 [2025-01-12 21:10:01,957][01010] Heartbeat connected on RolloutWorker_w7 [2025-01-12 21:10:03,411][02848] No checkpoints found [2025-01-12 21:10:03,411][02848] Did not load from checkpoint, starting from scratch! [2025-01-12 21:10:03,411][02848] Initialized policy 0 weights for model version 0 [2025-01-12 21:10:03,415][02848] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2025-01-12 21:10:03,422][02848] LearnerWorker_p0 finished initialization! [2025-01-12 21:10:03,423][01010] Heartbeat connected on LearnerWorker_p0 [2025-01-12 21:10:03,509][02861] RunningMeanStd input shape: (3, 72, 128) [2025-01-12 21:10:03,510][02861] RunningMeanStd input shape: (1,) [2025-01-12 21:10:03,523][02861] ConvEncoder: input_channels=3 [2025-01-12 21:10:03,626][02861] Conv encoder output size: 512 [2025-01-12 21:10:03,626][02861] Policy head output size: 512 [2025-01-12 21:10:03,677][01010] Inference worker 0-0 is ready! [2025-01-12 21:10:03,678][01010] All inference workers are ready! Signal rollout workers to start! [2025-01-12 21:10:03,875][02864] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-12 21:10:03,876][02862] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-12 21:10:03,878][02866] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-12 21:10:03,871][02867] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-12 21:10:03,883][02868] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-12 21:10:03,873][02865] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-12 21:10:03,877][02863] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-12 21:10:03,880][02869] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-12 21:10:05,210][02864] Decorrelating experience for 0 frames... [2025-01-12 21:10:05,212][02862] Decorrelating experience for 0 frames... [2025-01-12 21:10:05,203][02867] Decorrelating experience for 0 frames... [2025-01-12 21:10:05,218][02868] Decorrelating experience for 0 frames... [2025-01-12 21:10:05,223][02869] Decorrelating experience for 0 frames... [2025-01-12 21:10:05,243][02865] Decorrelating experience for 0 frames... [2025-01-12 21:10:06,336][02863] Decorrelating experience for 0 frames... [2025-01-12 21:10:06,344][02864] Decorrelating experience for 32 frames... [2025-01-12 21:10:06,347][02862] Decorrelating experience for 32 frames... [2025-01-12 21:10:06,342][02867] Decorrelating experience for 32 frames... [2025-01-12 21:10:06,349][02868] Decorrelating experience for 32 frames... [2025-01-12 21:10:06,358][02865] Decorrelating experience for 32 frames... [2025-01-12 21:10:07,208][01010] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2025-01-12 21:10:07,235][02869] Decorrelating experience for 32 frames... [2025-01-12 21:10:07,562][02865] Decorrelating experience for 64 frames... [2025-01-12 21:10:07,641][02866] Decorrelating experience for 0 frames... [2025-01-12 21:10:07,901][02867] Decorrelating experience for 64 frames... [2025-01-12 21:10:07,905][02864] Decorrelating experience for 64 frames... [2025-01-12 21:10:08,133][02868] Decorrelating experience for 64 frames... [2025-01-12 21:10:08,436][02865] Decorrelating experience for 96 frames... [2025-01-12 21:10:08,586][02866] Decorrelating experience for 32 frames... [2025-01-12 21:10:08,859][02862] Decorrelating experience for 64 frames... [2025-01-12 21:10:08,984][02869] Decorrelating experience for 64 frames... [2025-01-12 21:10:09,226][02864] Decorrelating experience for 96 frames... [2025-01-12 21:10:10,293][02866] Decorrelating experience for 64 frames... [2025-01-12 21:10:10,509][02868] Decorrelating experience for 96 frames... [2025-01-12 21:10:10,669][02869] Decorrelating experience for 96 frames... [2025-01-12 21:10:12,208][01010] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 2.4. Samples: 12. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2025-01-12 21:10:12,212][01010] Avg episode reward: [(0, '0.960')] [2025-01-12 21:10:12,906][02867] Decorrelating experience for 96 frames... [2025-01-12 21:10:13,145][02866] Decorrelating experience for 96 frames... [2025-01-12 21:10:13,410][02862] Decorrelating experience for 96 frames... [2025-01-12 21:10:16,501][02848] Signal inference workers to stop experience collection... [2025-01-12 21:10:16,536][02861] InferenceWorker_p0-w0: stopping experience collection [2025-01-12 21:10:16,651][02863] Decorrelating experience for 32 frames... [2025-01-12 21:10:17,208][01010] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 202.0. Samples: 2020. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2025-01-12 21:10:17,210][01010] Avg episode reward: [(0, '2.566')] [2025-01-12 21:10:17,724][02863] Decorrelating experience for 64 frames... [2025-01-12 21:10:18,092][02863] Decorrelating experience for 96 frames... [2025-01-12 21:10:19,628][02848] Signal inference workers to resume experience collection... [2025-01-12 21:10:19,630][02861] InferenceWorker_p0-w0: resuming experience collection [2025-01-12 21:10:22,208][01010] Fps is (10 sec: 1638.4, 60 sec: 1092.3, 300 sec: 1092.3). Total num frames: 16384. Throughput: 0: 294.1. Samples: 4412. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-12 21:10:22,213][01010] Avg episode reward: [(0, '3.185')] [2025-01-12 21:10:27,025][02861] Updated weights for policy 0, policy_version 10 (0.0150) [2025-01-12 21:10:27,208][01010] Fps is (10 sec: 4096.0, 60 sec: 2048.0, 300 sec: 2048.0). Total num frames: 40960. Throughput: 0: 398.0. Samples: 7960. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-12 21:10:27,214][01010] Avg episode reward: [(0, '3.891')] [2025-01-12 21:10:32,208][01010] Fps is (10 sec: 3686.4, 60 sec: 2129.9, 300 sec: 2129.9). Total num frames: 53248. Throughput: 0: 547.9. Samples: 13698. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-12 21:10:32,211][01010] Avg episode reward: [(0, '4.298')] [2025-01-12 21:10:37,208][01010] Fps is (10 sec: 3276.6, 60 sec: 2457.5, 300 sec: 2457.5). Total num frames: 73728. Throughput: 0: 644.9. Samples: 19346. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-12 21:10:37,213][01010] Avg episode reward: [(0, '4.460')] [2025-01-12 21:10:38,157][02861] Updated weights for policy 0, policy_version 20 (0.0047) [2025-01-12 21:10:42,208][01010] Fps is (10 sec: 4505.3, 60 sec: 2808.6, 300 sec: 2808.6). Total num frames: 98304. Throughput: 0: 657.1. Samples: 23000. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-01-12 21:10:42,211][01010] Avg episode reward: [(0, '4.345')] [2025-01-12 21:10:47,208][01010] Fps is (10 sec: 4505.9, 60 sec: 2969.6, 300 sec: 2969.6). Total num frames: 118784. Throughput: 0: 735.2. Samples: 29408. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2025-01-12 21:10:47,210][01010] Avg episode reward: [(0, '4.441')] [2025-01-12 21:10:47,213][02848] Saving new best policy, reward=4.441! [2025-01-12 21:10:48,953][02861] Updated weights for policy 0, policy_version 30 (0.0052) [2025-01-12 21:10:52,208][01010] Fps is (10 sec: 3277.0, 60 sec: 2912.7, 300 sec: 2912.7). Total num frames: 131072. Throughput: 0: 742.9. Samples: 33432. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-01-12 21:10:52,210][01010] Avg episode reward: [(0, '4.417')] [2025-01-12 21:10:57,208][01010] Fps is (10 sec: 3686.4, 60 sec: 3113.0, 300 sec: 3113.0). Total num frames: 155648. Throughput: 0: 824.3. Samples: 37104. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-12 21:10:57,210][01010] Avg episode reward: [(0, '4.260')] [2025-01-12 21:10:58,372][02861] Updated weights for policy 0, policy_version 40 (0.0032) [2025-01-12 21:11:02,208][01010] Fps is (10 sec: 4915.2, 60 sec: 3276.8, 300 sec: 3276.8). Total num frames: 180224. Throughput: 0: 943.5. Samples: 44476. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-12 21:11:02,213][01010] Avg episode reward: [(0, '4.423')] [2025-01-12 21:11:07,211][01010] Fps is (10 sec: 3685.1, 60 sec: 3208.3, 300 sec: 3208.3). Total num frames: 192512. Throughput: 0: 994.6. Samples: 49174. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-12 21:11:07,219][01010] Avg episode reward: [(0, '4.411')] [2025-01-12 21:11:09,314][02861] Updated weights for policy 0, policy_version 50 (0.0032) [2025-01-12 21:11:12,208][01010] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3339.8). Total num frames: 217088. Throughput: 0: 983.5. Samples: 52216. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-12 21:11:12,214][01010] Avg episode reward: [(0, '4.393')] [2025-01-12 21:11:17,208][01010] Fps is (10 sec: 4917.0, 60 sec: 4027.7, 300 sec: 3452.3). Total num frames: 241664. Throughput: 0: 1015.2. Samples: 59380. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-12 21:11:17,210][01010] Avg episode reward: [(0, '4.400')] [2025-01-12 21:11:17,943][02861] Updated weights for policy 0, policy_version 60 (0.0018) [2025-01-12 21:11:22,208][01010] Fps is (10 sec: 4095.7, 60 sec: 4027.7, 300 sec: 3440.6). Total num frames: 258048. Throughput: 0: 1013.1. Samples: 64936. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-12 21:11:22,215][01010] Avg episode reward: [(0, '4.369')] [2025-01-12 21:11:27,208][01010] Fps is (10 sec: 3276.7, 60 sec: 3891.2, 300 sec: 3430.4). Total num frames: 274432. Throughput: 0: 982.3. Samples: 67204. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-12 21:11:27,213][01010] Avg episode reward: [(0, '4.429')] [2025-01-12 21:11:29,001][02861] Updated weights for policy 0, policy_version 70 (0.0022) [2025-01-12 21:11:32,208][01010] Fps is (10 sec: 4096.2, 60 sec: 4096.0, 300 sec: 3517.7). Total num frames: 299008. Throughput: 0: 999.9. Samples: 74404. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-12 21:11:32,213][01010] Avg episode reward: [(0, '4.598')] [2025-01-12 21:11:32,222][02848] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000073_299008.pth... [2025-01-12 21:11:32,371][02848] Saving new best policy, reward=4.598! [2025-01-12 21:11:37,208][01010] Fps is (10 sec: 4505.7, 60 sec: 4096.0, 300 sec: 3549.9). Total num frames: 319488. Throughput: 0: 1052.2. Samples: 80782. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2025-01-12 21:11:37,214][01010] Avg episode reward: [(0, '4.530')] [2025-01-12 21:11:39,101][02861] Updated weights for policy 0, policy_version 80 (0.0038) [2025-01-12 21:11:42,208][01010] Fps is (10 sec: 3686.2, 60 sec: 3959.5, 300 sec: 3535.5). Total num frames: 335872. Throughput: 0: 1019.0. Samples: 82958. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-01-12 21:11:42,212][01010] Avg episode reward: [(0, '4.357')] [2025-01-12 21:11:47,208][01010] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3604.5). Total num frames: 360448. Throughput: 0: 987.9. Samples: 88930. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-12 21:11:47,213][01010] Avg episode reward: [(0, '4.283')] [2025-01-12 21:11:48,842][02861] Updated weights for policy 0, policy_version 90 (0.0027) [2025-01-12 21:11:52,208][01010] Fps is (10 sec: 4505.6, 60 sec: 4164.2, 300 sec: 3627.9). Total num frames: 380928. Throughput: 0: 1046.8. Samples: 96276. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-12 21:11:52,212][01010] Avg episode reward: [(0, '4.521')] [2025-01-12 21:11:57,208][01010] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3611.9). Total num frames: 397312. Throughput: 0: 1029.7. Samples: 98554. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-12 21:11:57,210][01010] Avg episode reward: [(0, '4.619')] [2025-01-12 21:11:57,216][02848] Saving new best policy, reward=4.619! [2025-01-12 21:12:00,049][02861] Updated weights for policy 0, policy_version 100 (0.0017) [2025-01-12 21:12:02,208][01010] Fps is (10 sec: 3686.6, 60 sec: 3959.5, 300 sec: 3633.0). Total num frames: 417792. Throughput: 0: 988.2. Samples: 103850. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-01-12 21:12:02,215][01010] Avg episode reward: [(0, '4.644')] [2025-01-12 21:12:02,222][02848] Saving new best policy, reward=4.644! [2025-01-12 21:12:07,208][01010] Fps is (10 sec: 4505.6, 60 sec: 4164.5, 300 sec: 3686.4). Total num frames: 442368. Throughput: 0: 1026.2. Samples: 111116. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-12 21:12:07,212][01010] Avg episode reward: [(0, '4.488')] [2025-01-12 21:12:08,288][02861] Updated weights for policy 0, policy_version 110 (0.0025) [2025-01-12 21:12:12,208][01010] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 3702.8). Total num frames: 462848. Throughput: 0: 1048.0. Samples: 114366. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-01-12 21:12:12,213][01010] Avg episode reward: [(0, '4.660')] [2025-01-12 21:12:12,221][02848] Saving new best policy, reward=4.660! [2025-01-12 21:12:17,208][01010] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3686.4). Total num frames: 479232. Throughput: 0: 986.4. Samples: 118794. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-01-12 21:12:17,209][01010] Avg episode reward: [(0, '4.713')] [2025-01-12 21:12:17,220][02848] Saving new best policy, reward=4.713! [2025-01-12 21:12:19,675][02861] Updated weights for policy 0, policy_version 120 (0.0020) [2025-01-12 21:12:22,208][01010] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 3731.9). Total num frames: 503808. Throughput: 0: 1001.1. Samples: 125830. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-01-12 21:12:22,210][01010] Avg episode reward: [(0, '4.703')] [2025-01-12 21:12:27,208][01010] Fps is (10 sec: 4505.5, 60 sec: 4164.3, 300 sec: 3744.9). Total num frames: 524288. Throughput: 0: 1034.3. Samples: 129502. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-01-12 21:12:27,212][01010] Avg episode reward: [(0, '4.759')] [2025-01-12 21:12:27,216][02848] Saving new best policy, reward=4.759! [2025-01-12 21:12:29,359][02861] Updated weights for policy 0, policy_version 130 (0.0027) [2025-01-12 21:12:32,208][01010] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3728.8). Total num frames: 540672. Throughput: 0: 1012.3. Samples: 134484. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-01-12 21:12:32,212][01010] Avg episode reward: [(0, '4.897')] [2025-01-12 21:12:32,223][02848] Saving new best policy, reward=4.897! [2025-01-12 21:12:37,215][01010] Fps is (10 sec: 3683.9, 60 sec: 4027.3, 300 sec: 3740.8). Total num frames: 561152. Throughput: 0: 989.4. Samples: 140806. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-01-12 21:12:37,217][01010] Avg episode reward: [(0, '5.039')] [2025-01-12 21:12:37,223][02848] Saving new best policy, reward=5.039! [2025-01-12 21:12:39,247][02861] Updated weights for policy 0, policy_version 140 (0.0020) [2025-01-12 21:12:42,208][01010] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 3778.9). Total num frames: 585728. Throughput: 0: 1018.6. Samples: 144392. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-01-12 21:12:42,210][01010] Avg episode reward: [(0, '4.614')] [2025-01-12 21:12:47,210][01010] Fps is (10 sec: 4097.9, 60 sec: 4027.6, 300 sec: 3763.1). Total num frames: 602112. Throughput: 0: 1028.4. Samples: 150132. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-01-12 21:12:47,213][01010] Avg episode reward: [(0, '4.541')] [2025-01-12 21:12:50,677][02861] Updated weights for policy 0, policy_version 150 (0.0017) [2025-01-12 21:12:52,208][01010] Fps is (10 sec: 3276.8, 60 sec: 3959.5, 300 sec: 3748.5). Total num frames: 618496. Throughput: 0: 986.2. Samples: 155494. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-12 21:12:52,215][01010] Avg episode reward: [(0, '4.614')] [2025-01-12 21:12:57,208][01010] Fps is (10 sec: 4097.0, 60 sec: 4096.0, 300 sec: 3782.8). Total num frames: 643072. Throughput: 0: 995.7. Samples: 159172. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-12 21:12:57,210][01010] Avg episode reward: [(0, '4.781')] [2025-01-12 21:12:58,997][02861] Updated weights for policy 0, policy_version 160 (0.0019) [2025-01-12 21:13:02,208][01010] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 3791.7). Total num frames: 663552. Throughput: 0: 1053.9. Samples: 166218. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-12 21:13:02,214][01010] Avg episode reward: [(0, '4.848')] [2025-01-12 21:13:07,208][01010] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3777.4). Total num frames: 679936. Throughput: 0: 996.3. Samples: 170664. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-01-12 21:13:07,215][01010] Avg episode reward: [(0, '5.065')] [2025-01-12 21:13:07,218][02848] Saving new best policy, reward=5.065! [2025-01-12 21:13:10,152][02861] Updated weights for policy 0, policy_version 170 (0.0028) [2025-01-12 21:13:12,208][01010] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3808.2). Total num frames: 704512. Throughput: 0: 991.9. Samples: 174138. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-12 21:13:12,215][01010] Avg episode reward: [(0, '5.050')] [2025-01-12 21:13:17,208][01010] Fps is (10 sec: 4915.2, 60 sec: 4164.3, 300 sec: 3837.3). Total num frames: 729088. Throughput: 0: 1041.7. Samples: 181360. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-01-12 21:13:17,214][01010] Avg episode reward: [(0, '4.848')] [2025-01-12 21:13:19,559][02861] Updated weights for policy 0, policy_version 180 (0.0016) [2025-01-12 21:13:22,208][01010] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3801.9). Total num frames: 741376. Throughput: 0: 1011.0. Samples: 186296. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-12 21:13:22,211][01010] Avg episode reward: [(0, '4.965')] [2025-01-12 21:13:27,208][01010] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3829.8). Total num frames: 765952. Throughput: 0: 993.1. Samples: 189080. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-12 21:13:27,210][01010] Avg episode reward: [(0, '5.075')] [2025-01-12 21:13:27,215][02848] Saving new best policy, reward=5.075! [2025-01-12 21:13:29,706][02861] Updated weights for policy 0, policy_version 190 (0.0035) [2025-01-12 21:13:32,208][01010] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 3836.3). Total num frames: 786432. Throughput: 0: 1025.4. Samples: 196274. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-12 21:13:32,209][01010] Avg episode reward: [(0, '5.298')] [2025-01-12 21:13:32,222][02848] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000192_786432.pth... [2025-01-12 21:13:32,345][02848] Saving new best policy, reward=5.298! [2025-01-12 21:13:37,208][01010] Fps is (10 sec: 4096.0, 60 sec: 4096.5, 300 sec: 3842.4). Total num frames: 806912. Throughput: 0: 1035.1. Samples: 202072. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-12 21:13:37,213][01010] Avg episode reward: [(0, '5.384')] [2025-01-12 21:13:37,216][02848] Saving new best policy, reward=5.384! [2025-01-12 21:13:41,069][02861] Updated weights for policy 0, policy_version 200 (0.0025) [2025-01-12 21:13:42,208][01010] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3829.3). Total num frames: 823296. Throughput: 0: 1003.3. Samples: 204320. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-12 21:13:42,215][01010] Avg episode reward: [(0, '5.542')] [2025-01-12 21:13:42,223][02848] Saving new best policy, reward=5.542! [2025-01-12 21:13:47,208][01010] Fps is (10 sec: 4095.9, 60 sec: 4096.1, 300 sec: 3854.0). Total num frames: 847872. Throughput: 0: 994.5. Samples: 210972. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-12 21:13:47,215][01010] Avg episode reward: [(0, '5.280')] [2025-01-12 21:13:49,513][02861] Updated weights for policy 0, policy_version 210 (0.0027) [2025-01-12 21:13:52,208][01010] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 3859.3). Total num frames: 868352. Throughput: 0: 1048.7. Samples: 217854. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-01-12 21:13:52,221][01010] Avg episode reward: [(0, '4.900')] [2025-01-12 21:13:57,208][01010] Fps is (10 sec: 3686.5, 60 sec: 4027.7, 300 sec: 3846.7). Total num frames: 884736. Throughput: 0: 1020.8. Samples: 220074. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-12 21:13:57,212][01010] Avg episode reward: [(0, '5.267')] [2025-01-12 21:14:00,550][02861] Updated weights for policy 0, policy_version 220 (0.0015) [2025-01-12 21:14:02,208][01010] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 3869.4). Total num frames: 909312. Throughput: 0: 995.1. Samples: 226138. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-12 21:14:02,209][01010] Avg episode reward: [(0, '5.528')] [2025-01-12 21:14:07,208][01010] Fps is (10 sec: 4915.1, 60 sec: 4232.5, 300 sec: 3891.2). Total num frames: 933888. Throughput: 0: 1049.0. Samples: 233502. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-12 21:14:07,212][01010] Avg episode reward: [(0, '5.458')] [2025-01-12 21:14:09,574][02861] Updated weights for policy 0, policy_version 230 (0.0039) [2025-01-12 21:14:12,208][01010] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3861.9). Total num frames: 946176. Throughput: 0: 1046.6. Samples: 236178. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-12 21:14:12,217][01010] Avg episode reward: [(0, '5.723')] [2025-01-12 21:14:12,229][02848] Saving new best policy, reward=5.723! [2025-01-12 21:14:17,208][01010] Fps is (10 sec: 3276.8, 60 sec: 3959.5, 300 sec: 3866.6). Total num frames: 966656. Throughput: 0: 993.3. Samples: 240974. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-12 21:14:17,214][01010] Avg episode reward: [(0, '5.846')] [2025-01-12 21:14:17,223][02848] Saving new best policy, reward=5.846! [2025-01-12 21:14:20,093][02861] Updated weights for policy 0, policy_version 240 (0.0024) [2025-01-12 21:14:22,208][01010] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 3887.2). Total num frames: 991232. Throughput: 0: 1027.2. Samples: 248298. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-12 21:14:22,209][01010] Avg episode reward: [(0, '5.904')] [2025-01-12 21:14:22,216][02848] Saving new best policy, reward=5.904! [2025-01-12 21:14:27,208][01010] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 3891.2). Total num frames: 1011712. Throughput: 0: 1058.3. Samples: 251944. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-12 21:14:27,215][01010] Avg episode reward: [(0, '5.914')] [2025-01-12 21:14:27,217][02848] Saving new best policy, reward=5.914! [2025-01-12 21:14:30,896][02861] Updated weights for policy 0, policy_version 250 (0.0050) [2025-01-12 21:14:32,208][01010] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3879.6). Total num frames: 1028096. Throughput: 0: 1010.0. Samples: 256420. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-12 21:14:32,210][01010] Avg episode reward: [(0, '5.999')] [2025-01-12 21:14:32,220][02848] Saving new best policy, reward=5.999! [2025-01-12 21:14:37,208][01010] Fps is (10 sec: 4095.9, 60 sec: 4096.0, 300 sec: 3898.8). Total num frames: 1052672. Throughput: 0: 1010.7. Samples: 263334. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-12 21:14:37,210][01010] Avg episode reward: [(0, '5.979')] [2025-01-12 21:14:39,602][02861] Updated weights for policy 0, policy_version 260 (0.0030) [2025-01-12 21:14:42,208][01010] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 3902.4). Total num frames: 1073152. Throughput: 0: 1041.5. Samples: 266942. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-01-12 21:14:42,210][01010] Avg episode reward: [(0, '5.920')] [2025-01-12 21:14:47,208][01010] Fps is (10 sec: 3686.5, 60 sec: 4027.7, 300 sec: 3891.2). Total num frames: 1089536. Throughput: 0: 1025.7. Samples: 272294. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-12 21:14:47,212][01010] Avg episode reward: [(0, '6.298')] [2025-01-12 21:14:47,217][02848] Saving new best policy, reward=6.298! [2025-01-12 21:14:50,758][02861] Updated weights for policy 0, policy_version 270 (0.0024) [2025-01-12 21:14:52,208][01010] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3894.8). Total num frames: 1110016. Throughput: 0: 991.8. Samples: 278132. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-01-12 21:14:52,214][01010] Avg episode reward: [(0, '6.485')] [2025-01-12 21:14:52,222][02848] Saving new best policy, reward=6.485! [2025-01-12 21:14:57,208][01010] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 3912.4). Total num frames: 1134592. Throughput: 0: 1013.1. Samples: 281766. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-01-12 21:14:57,210][01010] Avg episode reward: [(0, '6.956')] [2025-01-12 21:14:57,216][02848] Saving new best policy, reward=6.956! [2025-01-12 21:14:59,368][02861] Updated weights for policy 0, policy_version 280 (0.0028) [2025-01-12 21:15:02,208][01010] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 3915.5). Total num frames: 1155072. Throughput: 0: 1050.5. Samples: 288246. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-01-12 21:15:02,210][01010] Avg episode reward: [(0, '7.251')] [2025-01-12 21:15:02,222][02848] Saving new best policy, reward=7.251! [2025-01-12 21:15:07,208][01010] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3971.0). Total num frames: 1171456. Throughput: 0: 994.0. Samples: 293026. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-01-12 21:15:07,210][01010] Avg episode reward: [(0, '7.432')] [2025-01-12 21:15:07,215][02848] Saving new best policy, reward=7.432! [2025-01-12 21:15:10,440][02861] Updated weights for policy 0, policy_version 290 (0.0023) [2025-01-12 21:15:12,208][01010] Fps is (10 sec: 4096.0, 60 sec: 4164.3, 300 sec: 4054.3). Total num frames: 1196032. Throughput: 0: 990.9. Samples: 296534. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-12 21:15:12,209][01010] Avg episode reward: [(0, '7.325')] [2025-01-12 21:15:17,212][01010] Fps is (10 sec: 4503.8, 60 sec: 4164.0, 300 sec: 4068.2). Total num frames: 1216512. Throughput: 0: 1051.1. Samples: 303722. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-12 21:15:17,214][01010] Avg episode reward: [(0, '6.927')] [2025-01-12 21:15:21,092][02861] Updated weights for policy 0, policy_version 300 (0.0032) [2025-01-12 21:15:22,209][01010] Fps is (10 sec: 3276.4, 60 sec: 3959.4, 300 sec: 4026.6). Total num frames: 1228800. Throughput: 0: 996.5. Samples: 308178. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-12 21:15:22,211][01010] Avg episode reward: [(0, '7.081')] [2025-01-12 21:15:27,208][01010] Fps is (10 sec: 3687.8, 60 sec: 4027.7, 300 sec: 4068.2). Total num frames: 1253376. Throughput: 0: 987.9. Samples: 311396. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-12 21:15:27,210][01010] Avg episode reward: [(0, '7.428')] [2025-01-12 21:15:30,064][02861] Updated weights for policy 0, policy_version 310 (0.0016) [2025-01-12 21:15:32,208][01010] Fps is (10 sec: 4915.7, 60 sec: 4164.3, 300 sec: 4082.1). Total num frames: 1277952. Throughput: 0: 1033.9. Samples: 318820. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-12 21:15:32,213][01010] Avg episode reward: [(0, '7.721')] [2025-01-12 21:15:32,222][02848] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000312_1277952.pth... [2025-01-12 21:15:32,339][02848] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000073_299008.pth [2025-01-12 21:15:32,350][02848] Saving new best policy, reward=7.721! [2025-01-12 21:15:37,210][01010] Fps is (10 sec: 4094.9, 60 sec: 4027.6, 300 sec: 4054.3). Total num frames: 1294336. Throughput: 0: 1019.5. Samples: 324012. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-12 21:15:37,212][01010] Avg episode reward: [(0, '7.803')] [2025-01-12 21:15:37,220][02848] Saving new best policy, reward=7.803! [2025-01-12 21:15:41,320][02861] Updated weights for policy 0, policy_version 320 (0.0018) [2025-01-12 21:15:42,208][01010] Fps is (10 sec: 3686.3, 60 sec: 4027.7, 300 sec: 4054.3). Total num frames: 1314816. Throughput: 0: 987.6. Samples: 326208. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-01-12 21:15:42,211][01010] Avg episode reward: [(0, '7.962')] [2025-01-12 21:15:42,231][02848] Saving new best policy, reward=7.962! [2025-01-12 21:15:47,208][01010] Fps is (10 sec: 4097.1, 60 sec: 4096.0, 300 sec: 4082.1). Total num frames: 1335296. Throughput: 0: 1001.1. Samples: 333294. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-12 21:15:47,215][01010] Avg episode reward: [(0, '9.201')] [2025-01-12 21:15:47,237][02848] Saving new best policy, reward=9.201! [2025-01-12 21:15:50,215][02861] Updated weights for policy 0, policy_version 330 (0.0020) [2025-01-12 21:15:52,208][01010] Fps is (10 sec: 4096.1, 60 sec: 4096.0, 300 sec: 4068.2). Total num frames: 1355776. Throughput: 0: 1034.4. Samples: 339572. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-12 21:15:52,214][01010] Avg episode reward: [(0, '9.550')] [2025-01-12 21:15:52,224][02848] Saving new best policy, reward=9.550! [2025-01-12 21:15:57,208][01010] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 4040.5). Total num frames: 1372160. Throughput: 0: 1004.0. Samples: 341714. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-12 21:15:57,215][01010] Avg episode reward: [(0, '9.677')] [2025-01-12 21:15:57,219][02848] Saving new best policy, reward=9.677! [2025-01-12 21:16:00,967][02861] Updated weights for policy 0, policy_version 340 (0.0017) [2025-01-12 21:16:02,208][01010] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 4082.2). Total num frames: 1396736. Throughput: 0: 989.3. Samples: 348236. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-12 21:16:02,209][01010] Avg episode reward: [(0, '9.488')] [2025-01-12 21:16:07,208][01010] Fps is (10 sec: 4915.2, 60 sec: 4164.3, 300 sec: 4082.1). Total num frames: 1421312. Throughput: 0: 1051.5. Samples: 355492. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2025-01-12 21:16:07,213][01010] Avg episode reward: [(0, '9.271')] [2025-01-12 21:16:11,348][02861] Updated weights for policy 0, policy_version 350 (0.0034) [2025-01-12 21:16:12,213][01010] Fps is (10 sec: 3684.4, 60 sec: 3959.1, 300 sec: 4040.4). Total num frames: 1433600. Throughput: 0: 1029.4. Samples: 357724. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-12 21:16:12,221][01010] Avg episode reward: [(0, '9.707')] [2025-01-12 21:16:12,238][02848] Saving new best policy, reward=9.707! [2025-01-12 21:16:17,208][01010] Fps is (10 sec: 3276.8, 60 sec: 3959.7, 300 sec: 4054.4). Total num frames: 1454080. Throughput: 0: 983.8. Samples: 363092. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-01-12 21:16:17,210][01010] Avg episode reward: [(0, '9.218')] [2025-01-12 21:16:20,665][02861] Updated weights for policy 0, policy_version 360 (0.0026) [2025-01-12 21:16:22,208][01010] Fps is (10 sec: 4508.0, 60 sec: 4164.3, 300 sec: 4082.1). Total num frames: 1478656. Throughput: 0: 1032.8. Samples: 370486. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-12 21:16:22,215][01010] Avg episode reward: [(0, '9.774')] [2025-01-12 21:16:22,222][02848] Saving new best policy, reward=9.774! [2025-01-12 21:16:27,208][01010] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 4054.3). Total num frames: 1495040. Throughput: 0: 1050.1. Samples: 373462. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-12 21:16:27,216][01010] Avg episode reward: [(0, '9.647')] [2025-01-12 21:16:31,878][02861] Updated weights for policy 0, policy_version 370 (0.0031) [2025-01-12 21:16:32,209][01010] Fps is (10 sec: 3685.9, 60 sec: 3959.4, 300 sec: 4054.3). Total num frames: 1515520. Throughput: 0: 996.4. Samples: 378134. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-12 21:16:32,214][01010] Avg episode reward: [(0, '9.965')] [2025-01-12 21:16:32,222][02848] Saving new best policy, reward=9.965! [2025-01-12 21:16:37,208][01010] Fps is (10 sec: 4505.5, 60 sec: 4096.2, 300 sec: 4082.1). Total num frames: 1540096. Throughput: 0: 1020.4. Samples: 385490. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-12 21:16:37,215][01010] Avg episode reward: [(0, '10.092')] [2025-01-12 21:16:37,220][02848] Saving new best policy, reward=10.092! [2025-01-12 21:16:40,256][02861] Updated weights for policy 0, policy_version 380 (0.0020) [2025-01-12 21:16:42,208][01010] Fps is (10 sec: 4506.3, 60 sec: 4096.0, 300 sec: 4068.2). Total num frames: 1560576. Throughput: 0: 1054.2. Samples: 389154. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-12 21:16:42,215][01010] Avg episode reward: [(0, '10.950')] [2025-01-12 21:16:42,223][02848] Saving new best policy, reward=10.950! [2025-01-12 21:16:47,208][01010] Fps is (10 sec: 3686.5, 60 sec: 4027.7, 300 sec: 4054.4). Total num frames: 1576960. Throughput: 0: 1012.0. Samples: 393778. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-12 21:16:47,210][01010] Avg episode reward: [(0, '11.260')] [2025-01-12 21:16:47,212][02848] Saving new best policy, reward=11.260! [2025-01-12 21:16:51,470][02861] Updated weights for policy 0, policy_version 390 (0.0018) [2025-01-12 21:16:52,208][01010] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4068.2). Total num frames: 1597440. Throughput: 0: 993.1. Samples: 400180. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-01-12 21:16:52,210][01010] Avg episode reward: [(0, '12.961')] [2025-01-12 21:16:52,303][02848] Saving new best policy, reward=12.961! [2025-01-12 21:16:57,209][01010] Fps is (10 sec: 4504.9, 60 sec: 4164.2, 300 sec: 4082.1). Total num frames: 1622016. Throughput: 0: 1025.3. Samples: 403858. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-12 21:16:57,212][01010] Avg episode reward: [(0, '12.616')] [2025-01-12 21:17:01,275][02861] Updated weights for policy 0, policy_version 400 (0.0019) [2025-01-12 21:17:02,209][01010] Fps is (10 sec: 4095.3, 60 sec: 4027.6, 300 sec: 4054.3). Total num frames: 1638400. Throughput: 0: 1035.4. Samples: 409686. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-12 21:17:02,214][01010] Avg episode reward: [(0, '12.446')] [2025-01-12 21:17:07,208][01010] Fps is (10 sec: 3687.0, 60 sec: 3959.5, 300 sec: 4054.3). Total num frames: 1658880. Throughput: 0: 995.1. Samples: 415266. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-01-12 21:17:07,212][01010] Avg episode reward: [(0, '11.441')] [2025-01-12 21:17:11,063][02861] Updated weights for policy 0, policy_version 410 (0.0036) [2025-01-12 21:17:12,208][01010] Fps is (10 sec: 4506.3, 60 sec: 4164.6, 300 sec: 4082.1). Total num frames: 1683456. Throughput: 0: 1009.6. Samples: 418892. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-12 21:17:12,215][01010] Avg episode reward: [(0, '11.427')] [2025-01-12 21:17:17,210][01010] Fps is (10 sec: 4504.4, 60 sec: 4164.1, 300 sec: 4068.2). Total num frames: 1703936. Throughput: 0: 1056.4. Samples: 425672. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-01-12 21:17:17,213][01010] Avg episode reward: [(0, '10.653')] [2025-01-12 21:17:22,100][02861] Updated weights for policy 0, policy_version 420 (0.0020) [2025-01-12 21:17:22,208][01010] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4054.3). Total num frames: 1720320. Throughput: 0: 993.5. Samples: 430198. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-12 21:17:22,213][01010] Avg episode reward: [(0, '11.322')] [2025-01-12 21:17:27,208][01010] Fps is (10 sec: 4097.1, 60 sec: 4164.3, 300 sec: 4082.1). Total num frames: 1744896. Throughput: 0: 993.1. Samples: 433844. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-12 21:17:27,209][01010] Avg episode reward: [(0, '10.834')] [2025-01-12 21:17:30,434][02861] Updated weights for policy 0, policy_version 430 (0.0025) [2025-01-12 21:17:32,209][01010] Fps is (10 sec: 4505.1, 60 sec: 4164.3, 300 sec: 4082.2). Total num frames: 1765376. Throughput: 0: 1056.0. Samples: 441300. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-12 21:17:32,212][01010] Avg episode reward: [(0, '11.798')] [2025-01-12 21:17:32,229][02848] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000431_1765376.pth... [2025-01-12 21:17:32,403][02848] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000192_786432.pth [2025-01-12 21:17:37,213][01010] Fps is (10 sec: 3684.3, 60 sec: 4027.4, 300 sec: 4054.3). Total num frames: 1781760. Throughput: 0: 1018.3. Samples: 446008. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-12 21:17:37,216][01010] Avg episode reward: [(0, '11.876')] [2025-01-12 21:17:41,462][02861] Updated weights for policy 0, policy_version 440 (0.0029) [2025-01-12 21:17:42,208][01010] Fps is (10 sec: 3686.8, 60 sec: 4027.7, 300 sec: 4068.3). Total num frames: 1802240. Throughput: 0: 1002.3. Samples: 448962. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-12 21:17:42,214][01010] Avg episode reward: [(0, '12.019')] [2025-01-12 21:17:47,208][01010] Fps is (10 sec: 4508.1, 60 sec: 4164.3, 300 sec: 4096.0). Total num frames: 1826816. Throughput: 0: 1035.3. Samples: 456272. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-01-12 21:17:47,212][01010] Avg episode reward: [(0, '12.054')] [2025-01-12 21:17:51,053][02861] Updated weights for policy 0, policy_version 450 (0.0038) [2025-01-12 21:17:52,208][01010] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4068.2). Total num frames: 1843200. Throughput: 0: 1035.0. Samples: 461842. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2025-01-12 21:17:52,213][01010] Avg episode reward: [(0, '12.578')] [2025-01-12 21:17:57,208][01010] Fps is (10 sec: 3686.3, 60 sec: 4027.8, 300 sec: 4068.2). Total num frames: 1863680. Throughput: 0: 1005.7. Samples: 464148. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-12 21:17:57,210][01010] Avg episode reward: [(0, '13.161')] [2025-01-12 21:17:57,218][02848] Saving new best policy, reward=13.161! [2025-01-12 21:18:01,105][02861] Updated weights for policy 0, policy_version 460 (0.0035) [2025-01-12 21:18:02,208][01010] Fps is (10 sec: 4505.6, 60 sec: 4164.4, 300 sec: 4096.0). Total num frames: 1888256. Throughput: 0: 1013.1. Samples: 471258. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-12 21:18:02,219][01010] Avg episode reward: [(0, '12.817')] [2025-01-12 21:18:07,209][01010] Fps is (10 sec: 4505.3, 60 sec: 4164.2, 300 sec: 4082.1). Total num frames: 1908736. Throughput: 0: 1055.3. Samples: 477686. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-12 21:18:07,211][01010] Avg episode reward: [(0, '12.625')] [2025-01-12 21:18:12,123][02861] Updated weights for policy 0, policy_version 470 (0.0037) [2025-01-12 21:18:12,208][01010] Fps is (10 sec: 3686.5, 60 sec: 4027.7, 300 sec: 4054.3). Total num frames: 1925120. Throughput: 0: 1023.0. Samples: 479880. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2025-01-12 21:18:12,210][01010] Avg episode reward: [(0, '12.818')] [2025-01-12 21:18:17,208][01010] Fps is (10 sec: 3686.7, 60 sec: 4027.9, 300 sec: 4082.1). Total num frames: 1945600. Throughput: 0: 996.1. Samples: 486124. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-12 21:18:17,214][01010] Avg episode reward: [(0, '13.435')] [2025-01-12 21:18:17,218][02848] Saving new best policy, reward=13.435! [2025-01-12 21:18:20,664][02861] Updated weights for policy 0, policy_version 480 (0.0022) [2025-01-12 21:18:22,208][01010] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 4082.1). Total num frames: 1970176. Throughput: 0: 1049.5. Samples: 493230. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2025-01-12 21:18:22,210][01010] Avg episode reward: [(0, '14.869')] [2025-01-12 21:18:22,224][02848] Saving new best policy, reward=14.869! [2025-01-12 21:18:27,208][01010] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 4068.2). Total num frames: 1986560. Throughput: 0: 1036.6. Samples: 495608. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-01-12 21:18:27,215][01010] Avg episode reward: [(0, '14.986')] [2025-01-12 21:18:27,217][02848] Saving new best policy, reward=14.986! [2025-01-12 21:18:31,799][02861] Updated weights for policy 0, policy_version 490 (0.0033) [2025-01-12 21:18:32,208][01010] Fps is (10 sec: 3686.3, 60 sec: 4027.8, 300 sec: 4068.2). Total num frames: 2007040. Throughput: 0: 991.2. Samples: 500878. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-12 21:18:32,210][01010] Avg episode reward: [(0, '14.535')] [2025-01-12 21:18:37,208][01010] Fps is (10 sec: 4505.6, 60 sec: 4164.7, 300 sec: 4096.0). Total num frames: 2031616. Throughput: 0: 1030.4. Samples: 508210. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-12 21:18:37,214][01010] Avg episode reward: [(0, '13.689')] [2025-01-12 21:18:41,284][02861] Updated weights for policy 0, policy_version 500 (0.0020) [2025-01-12 21:18:42,208][01010] Fps is (10 sec: 4096.1, 60 sec: 4096.0, 300 sec: 4068.2). Total num frames: 2048000. Throughput: 0: 1052.5. Samples: 511512. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-12 21:18:42,212][01010] Avg episode reward: [(0, '12.879')] [2025-01-12 21:18:47,208][01010] Fps is (10 sec: 3276.8, 60 sec: 3959.5, 300 sec: 4054.3). Total num frames: 2064384. Throughput: 0: 992.6. Samples: 515926. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-12 21:18:47,213][01010] Avg episode reward: [(0, '12.143')] [2025-01-12 21:18:51,603][02861] Updated weights for policy 0, policy_version 510 (0.0037) [2025-01-12 21:18:52,208][01010] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4082.1). Total num frames: 2088960. Throughput: 0: 1009.0. Samples: 523092. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-12 21:18:52,209][01010] Avg episode reward: [(0, '14.343')] [2025-01-12 21:18:57,208][01010] Fps is (10 sec: 4915.2, 60 sec: 4164.3, 300 sec: 4082.1). Total num frames: 2113536. Throughput: 0: 1041.9. Samples: 526766. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-01-12 21:18:57,211][01010] Avg episode reward: [(0, '15.650')] [2025-01-12 21:18:57,214][02848] Saving new best policy, reward=15.650! [2025-01-12 21:19:02,208][01010] Fps is (10 sec: 3686.3, 60 sec: 3959.5, 300 sec: 4040.5). Total num frames: 2125824. Throughput: 0: 1013.3. Samples: 531724. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2025-01-12 21:19:02,210][01010] Avg episode reward: [(0, '16.154')] [2025-01-12 21:19:02,221][02848] Saving new best policy, reward=16.154! [2025-01-12 21:19:02,496][02861] Updated weights for policy 0, policy_version 520 (0.0023) [2025-01-12 21:19:07,208][01010] Fps is (10 sec: 3686.4, 60 sec: 4027.8, 300 sec: 4082.1). Total num frames: 2150400. Throughput: 0: 997.5. Samples: 538116. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-12 21:19:07,210][01010] Avg episode reward: [(0, '18.954')] [2025-01-12 21:19:07,213][02848] Saving new best policy, reward=18.954! [2025-01-12 21:19:10,997][02861] Updated weights for policy 0, policy_version 530 (0.0021) [2025-01-12 21:19:12,208][01010] Fps is (10 sec: 4915.3, 60 sec: 4164.3, 300 sec: 4096.0). Total num frames: 2174976. Throughput: 0: 1024.2. Samples: 541696. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-01-12 21:19:12,210][01010] Avg episode reward: [(0, '17.989')] [2025-01-12 21:19:17,215][01010] Fps is (10 sec: 4093.1, 60 sec: 4095.5, 300 sec: 4068.1). Total num frames: 2191360. Throughput: 0: 1036.7. Samples: 547538. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-12 21:19:17,217][01010] Avg episode reward: [(0, '18.463')] [2025-01-12 21:19:22,152][02861] Updated weights for policy 0, policy_version 540 (0.0029) [2025-01-12 21:19:22,208][01010] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4068.2). Total num frames: 2211840. Throughput: 0: 996.3. Samples: 553044. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-12 21:19:22,215][01010] Avg episode reward: [(0, '18.373')] [2025-01-12 21:19:27,208][01010] Fps is (10 sec: 4508.8, 60 sec: 4164.3, 300 sec: 4096.0). Total num frames: 2236416. Throughput: 0: 1003.5. Samples: 556670. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-12 21:19:27,213][01010] Avg episode reward: [(0, '16.761')] [2025-01-12 21:19:30,935][02861] Updated weights for policy 0, policy_version 550 (0.0020) [2025-01-12 21:19:32,208][01010] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4068.2). Total num frames: 2252800. Throughput: 0: 1059.5. Samples: 563604. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-12 21:19:32,215][01010] Avg episode reward: [(0, '17.343')] [2025-01-12 21:19:32,224][02848] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000550_2252800.pth... [2025-01-12 21:19:32,412][02848] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000312_1277952.pth [2025-01-12 21:19:37,208][01010] Fps is (10 sec: 3276.8, 60 sec: 3959.5, 300 sec: 4054.3). Total num frames: 2269184. Throughput: 0: 1000.6. Samples: 568118. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-12 21:19:37,210][01010] Avg episode reward: [(0, '16.976')] [2025-01-12 21:19:41,475][02861] Updated weights for policy 0, policy_version 560 (0.0018) [2025-01-12 21:19:42,208][01010] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4082.1). Total num frames: 2293760. Throughput: 0: 1000.7. Samples: 571796. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-12 21:19:42,214][01010] Avg episode reward: [(0, '17.230')] [2025-01-12 21:19:47,208][01010] Fps is (10 sec: 4915.2, 60 sec: 4232.5, 300 sec: 4096.0). Total num frames: 2318336. Throughput: 0: 1052.8. Samples: 579098. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2025-01-12 21:19:47,216][01010] Avg episode reward: [(0, '18.515')] [2025-01-12 21:19:52,026][02861] Updated weights for policy 0, policy_version 570 (0.0043) [2025-01-12 21:19:52,211][01010] Fps is (10 sec: 4094.5, 60 sec: 4095.8, 300 sec: 4068.2). Total num frames: 2334720. Throughput: 0: 1016.7. Samples: 583870. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2025-01-12 21:19:52,216][01010] Avg episode reward: [(0, '19.510')] [2025-01-12 21:19:52,227][02848] Saving new best policy, reward=19.510! [2025-01-12 21:19:57,208][01010] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4068.2). Total num frames: 2355200. Throughput: 0: 997.8. Samples: 586598. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-01-12 21:19:57,216][01010] Avg episode reward: [(0, '18.515')] [2025-01-12 21:20:01,104][02861] Updated weights for policy 0, policy_version 580 (0.0033) [2025-01-12 21:20:02,208][01010] Fps is (10 sec: 4507.2, 60 sec: 4232.5, 300 sec: 4096.0). Total num frames: 2379776. Throughput: 0: 1033.3. Samples: 594030. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-12 21:20:02,216][01010] Avg episode reward: [(0, '17.891')] [2025-01-12 21:20:07,212][01010] Fps is (10 sec: 4094.4, 60 sec: 4095.7, 300 sec: 4068.2). Total num frames: 2396160. Throughput: 0: 1039.9. Samples: 599842. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-01-12 21:20:07,218][01010] Avg episode reward: [(0, '17.494')] [2025-01-12 21:20:12,180][02861] Updated weights for policy 0, policy_version 590 (0.0018) [2025-01-12 21:20:12,208][01010] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4068.3). Total num frames: 2416640. Throughput: 0: 1008.1. Samples: 602034. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-12 21:20:12,212][01010] Avg episode reward: [(0, '17.345')] [2025-01-12 21:20:17,208][01010] Fps is (10 sec: 4097.6, 60 sec: 4096.5, 300 sec: 4096.0). Total num frames: 2437120. Throughput: 0: 1007.8. Samples: 608956. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-12 21:20:17,213][01010] Avg episode reward: [(0, '17.098')] [2025-01-12 21:20:20,961][02861] Updated weights for policy 0, policy_version 600 (0.0021) [2025-01-12 21:20:22,210][01010] Fps is (10 sec: 4094.9, 60 sec: 4095.8, 300 sec: 4082.1). Total num frames: 2457600. Throughput: 0: 1056.1. Samples: 615646. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-12 21:20:22,214][01010] Avg episode reward: [(0, '17.039')] [2025-01-12 21:20:27,208][01010] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 4054.3). Total num frames: 2473984. Throughput: 0: 1023.4. Samples: 617850. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-12 21:20:27,210][01010] Avg episode reward: [(0, '17.552')] [2025-01-12 21:20:31,624][02861] Updated weights for policy 0, policy_version 610 (0.0029) [2025-01-12 21:20:32,208][01010] Fps is (10 sec: 4097.1, 60 sec: 4096.0, 300 sec: 4082.2). Total num frames: 2498560. Throughput: 0: 998.2. Samples: 624018. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-12 21:20:32,211][01010] Avg episode reward: [(0, '17.304')] [2025-01-12 21:20:37,208][01010] Fps is (10 sec: 4915.2, 60 sec: 4232.5, 300 sec: 4096.0). Total num frames: 2523136. Throughput: 0: 1055.5. Samples: 631364. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-12 21:20:37,210][01010] Avg episode reward: [(0, '18.144')] [2025-01-12 21:20:41,738][02861] Updated weights for policy 0, policy_version 620 (0.0022) [2025-01-12 21:20:42,208][01010] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4082.1). Total num frames: 2539520. Throughput: 0: 1051.8. Samples: 633930. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2025-01-12 21:20:42,210][01010] Avg episode reward: [(0, '19.754')] [2025-01-12 21:20:42,220][02848] Saving new best policy, reward=19.754! [2025-01-12 21:20:47,208][01010] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4082.1). Total num frames: 2560000. Throughput: 0: 996.4. Samples: 638866. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-12 21:20:47,214][01010] Avg episode reward: [(0, '21.447')] [2025-01-12 21:20:47,219][02848] Saving new best policy, reward=21.447! [2025-01-12 21:20:51,536][02861] Updated weights for policy 0, policy_version 630 (0.0041) [2025-01-12 21:20:52,208][01010] Fps is (10 sec: 4096.0, 60 sec: 4096.2, 300 sec: 4096.0). Total num frames: 2580480. Throughput: 0: 1024.3. Samples: 645930. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-12 21:20:52,210][01010] Avg episode reward: [(0, '21.216')] [2025-01-12 21:20:57,208][01010] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4082.1). Total num frames: 2600960. Throughput: 0: 1055.7. Samples: 649540. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-12 21:20:57,212][01010] Avg episode reward: [(0, '20.925')] [2025-01-12 21:21:02,208][01010] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 4054.3). Total num frames: 2617344. Throughput: 0: 1000.4. Samples: 653976. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-12 21:21:02,213][01010] Avg episode reward: [(0, '20.725')] [2025-01-12 21:21:02,624][02861] Updated weights for policy 0, policy_version 640 (0.0028) [2025-01-12 21:21:07,208][01010] Fps is (10 sec: 4096.0, 60 sec: 4096.3, 300 sec: 4096.1). Total num frames: 2641920. Throughput: 0: 1007.9. Samples: 660998. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-01-12 21:21:07,209][01010] Avg episode reward: [(0, '20.735')] [2025-01-12 21:21:10,871][02861] Updated weights for policy 0, policy_version 650 (0.0025) [2025-01-12 21:21:12,208][01010] Fps is (10 sec: 4915.2, 60 sec: 4164.3, 300 sec: 4109.9). Total num frames: 2666496. Throughput: 0: 1041.3. Samples: 664708. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-12 21:21:12,213][01010] Avg episode reward: [(0, '20.581')] [2025-01-12 21:21:17,208][01010] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4068.2). Total num frames: 2678784. Throughput: 0: 1025.2. Samples: 670150. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-12 21:21:17,210][01010] Avg episode reward: [(0, '22.155')] [2025-01-12 21:21:17,212][02848] Saving new best policy, reward=22.155! [2025-01-12 21:21:22,189][02861] Updated weights for policy 0, policy_version 660 (0.0045) [2025-01-12 21:21:22,208][01010] Fps is (10 sec: 3686.4, 60 sec: 4096.2, 300 sec: 4096.0). Total num frames: 2703360. Throughput: 0: 989.7. Samples: 675902. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-12 21:21:22,209][01010] Avg episode reward: [(0, '21.861')] [2025-01-12 21:21:27,208][01010] Fps is (10 sec: 4915.2, 60 sec: 4232.5, 300 sec: 4109.9). Total num frames: 2727936. Throughput: 0: 1014.1. Samples: 679564. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-12 21:21:27,209][01010] Avg episode reward: [(0, '20.865')] [2025-01-12 21:21:31,377][02861] Updated weights for policy 0, policy_version 670 (0.0040) [2025-01-12 21:21:32,208][01010] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4082.1). Total num frames: 2744320. Throughput: 0: 1048.4. Samples: 686044. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-01-12 21:21:32,210][01010] Avg episode reward: [(0, '19.675')] [2025-01-12 21:21:32,223][02848] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000670_2744320.pth... [2025-01-12 21:21:32,385][02848] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000431_1765376.pth [2025-01-12 21:21:37,208][01010] Fps is (10 sec: 3276.8, 60 sec: 3959.5, 300 sec: 4068.2). Total num frames: 2760704. Throughput: 0: 1002.0. Samples: 691022. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-01-12 21:21:37,215][01010] Avg episode reward: [(0, '18.642')] [2025-01-12 21:21:41,491][02861] Updated weights for policy 0, policy_version 680 (0.0045) [2025-01-12 21:21:42,208][01010] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4096.0). Total num frames: 2785280. Throughput: 0: 1004.9. Samples: 694762. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-12 21:21:42,213][01010] Avg episode reward: [(0, '18.695')] [2025-01-12 21:21:47,208][01010] Fps is (10 sec: 4915.2, 60 sec: 4164.3, 300 sec: 4109.9). Total num frames: 2809856. Throughput: 0: 1067.6. Samples: 702020. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-12 21:21:47,214][01010] Avg episode reward: [(0, '17.979')] [2025-01-12 21:21:52,210][01010] Fps is (10 sec: 3685.4, 60 sec: 4027.6, 300 sec: 4068.2). Total num frames: 2822144. Throughput: 0: 1009.2. Samples: 706416. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-12 21:21:52,213][01010] Avg episode reward: [(0, '18.346')] [2025-01-12 21:21:52,614][02861] Updated weights for policy 0, policy_version 690 (0.0028) [2025-01-12 21:21:57,208][01010] Fps is (10 sec: 3686.4, 60 sec: 4096.0, 300 sec: 4096.0). Total num frames: 2846720. Throughput: 0: 1000.8. Samples: 709746. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-12 21:21:57,210][01010] Avg episode reward: [(0, '18.979')] [2025-01-12 21:22:00,896][02861] Updated weights for policy 0, policy_version 700 (0.0024) [2025-01-12 21:22:02,208][01010] Fps is (10 sec: 4916.5, 60 sec: 4232.5, 300 sec: 4109.9). Total num frames: 2871296. Throughput: 0: 1044.7. Samples: 717160. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-12 21:22:02,214][01010] Avg episode reward: [(0, '18.835')] [2025-01-12 21:22:07,208][01010] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4082.1). Total num frames: 2887680. Throughput: 0: 1036.7. Samples: 722552. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-12 21:22:07,215][01010] Avg episode reward: [(0, '18.670')] [2025-01-12 21:22:11,913][02861] Updated weights for policy 0, policy_version 710 (0.0029) [2025-01-12 21:22:12,208][01010] Fps is (10 sec: 3686.1, 60 sec: 4027.7, 300 sec: 4082.1). Total num frames: 2908160. Throughput: 0: 1010.3. Samples: 725030. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-01-12 21:22:12,211][01010] Avg episode reward: [(0, '20.386')] [2025-01-12 21:22:17,208][01010] Fps is (10 sec: 4505.6, 60 sec: 4232.5, 300 sec: 4109.9). Total num frames: 2932736. Throughput: 0: 1028.9. Samples: 732344. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-12 21:22:17,212][01010] Avg episode reward: [(0, '20.504')] [2025-01-12 21:22:20,797][02861] Updated weights for policy 0, policy_version 720 (0.0024) [2025-01-12 21:22:22,208][01010] Fps is (10 sec: 4096.3, 60 sec: 4096.0, 300 sec: 4082.1). Total num frames: 2949120. Throughput: 0: 1056.4. Samples: 738560. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-12 21:22:22,212][01010] Avg episode reward: [(0, '20.982')] [2025-01-12 21:22:27,208][01010] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4082.1). Total num frames: 2969600. Throughput: 0: 1023.3. Samples: 740810. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-12 21:22:27,213][01010] Avg episode reward: [(0, '20.942')] [2025-01-12 21:22:31,237][02861] Updated weights for policy 0, policy_version 730 (0.0037) [2025-01-12 21:22:32,208][01010] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 4110.0). Total num frames: 2994176. Throughput: 0: 1008.2. Samples: 747390. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-01-12 21:22:32,212][01010] Avg episode reward: [(0, '21.836')] [2025-01-12 21:22:37,208][01010] Fps is (10 sec: 4505.6, 60 sec: 4232.5, 300 sec: 4109.9). Total num frames: 3014656. Throughput: 0: 1072.0. Samples: 754654. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-12 21:22:37,212][01010] Avg episode reward: [(0, '21.519')] [2025-01-12 21:22:41,633][02861] Updated weights for policy 0, policy_version 740 (0.0026) [2025-01-12 21:22:42,208][01010] Fps is (10 sec: 3686.4, 60 sec: 4096.0, 300 sec: 4082.1). Total num frames: 3031040. Throughput: 0: 1047.1. Samples: 756864. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-12 21:22:42,221][01010] Avg episode reward: [(0, '20.929')] [2025-01-12 21:22:47,208][01010] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4096.0). Total num frames: 3051520. Throughput: 0: 1007.1. Samples: 762478. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-01-12 21:22:47,210][01010] Avg episode reward: [(0, '19.663')] [2025-01-12 21:22:50,705][02861] Updated weights for policy 0, policy_version 750 (0.0039) [2025-01-12 21:22:52,208][01010] Fps is (10 sec: 4505.6, 60 sec: 4232.7, 300 sec: 4109.9). Total num frames: 3076096. Throughput: 0: 1049.0. Samples: 769756. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-12 21:22:52,210][01010] Avg episode reward: [(0, '17.232')] [2025-01-12 21:22:57,208][01010] Fps is (10 sec: 4095.8, 60 sec: 4096.0, 300 sec: 4082.1). Total num frames: 3092480. Throughput: 0: 1060.3. Samples: 772744. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-01-12 21:22:57,211][01010] Avg episode reward: [(0, '18.355')] [2025-01-12 21:23:01,753][02861] Updated weights for policy 0, policy_version 760 (0.0031) [2025-01-12 21:23:02,208][01010] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4082.1). Total num frames: 3112960. Throughput: 0: 1004.9. Samples: 777564. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-12 21:23:02,210][01010] Avg episode reward: [(0, '19.018')] [2025-01-12 21:23:07,208][01010] Fps is (10 sec: 4505.8, 60 sec: 4164.3, 300 sec: 4109.9). Total num frames: 3137536. Throughput: 0: 1030.0. Samples: 784908. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-12 21:23:07,210][01010] Avg episode reward: [(0, '19.913')] [2025-01-12 21:23:10,060][02861] Updated weights for policy 0, policy_version 770 (0.0028) [2025-01-12 21:23:12,208][01010] Fps is (10 sec: 4505.5, 60 sec: 4164.3, 300 sec: 4109.9). Total num frames: 3158016. Throughput: 0: 1062.4. Samples: 788616. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-12 21:23:12,210][01010] Avg episode reward: [(0, '20.698')] [2025-01-12 21:23:17,208][01010] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4082.1). Total num frames: 3174400. Throughput: 0: 1019.8. Samples: 793280. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-12 21:23:17,213][01010] Avg episode reward: [(0, '21.940')] [2025-01-12 21:23:21,289][02861] Updated weights for policy 0, policy_version 780 (0.0025) [2025-01-12 21:23:22,208][01010] Fps is (10 sec: 4096.1, 60 sec: 4164.3, 300 sec: 4109.9). Total num frames: 3198976. Throughput: 0: 1006.4. Samples: 799944. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-01-12 21:23:22,214][01010] Avg episode reward: [(0, '20.962')] [2025-01-12 21:23:27,208][01010] Fps is (10 sec: 4915.2, 60 sec: 4232.5, 300 sec: 4123.8). Total num frames: 3223552. Throughput: 0: 1038.2. Samples: 803582. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-01-12 21:23:27,217][01010] Avg episode reward: [(0, '20.732')] [2025-01-12 21:23:30,929][02861] Updated weights for policy 0, policy_version 790 (0.0027) [2025-01-12 21:23:32,208][01010] Fps is (10 sec: 3686.2, 60 sec: 4027.7, 300 sec: 4082.1). Total num frames: 3235840. Throughput: 0: 1039.7. Samples: 809266. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-12 21:23:32,210][01010] Avg episode reward: [(0, '21.296')] [2025-01-12 21:23:32,294][02848] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000791_3239936.pth... [2025-01-12 21:23:32,459][02848] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000550_2252800.pth [2025-01-12 21:23:37,208][01010] Fps is (10 sec: 3276.7, 60 sec: 4027.7, 300 sec: 4096.0). Total num frames: 3256320. Throughput: 0: 1005.1. Samples: 814986. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-12 21:23:37,214][01010] Avg episode reward: [(0, '21.516')] [2025-01-12 21:23:40,663][02861] Updated weights for policy 0, policy_version 800 (0.0014) [2025-01-12 21:23:42,208][01010] Fps is (10 sec: 4505.9, 60 sec: 4164.3, 300 sec: 4123.8). Total num frames: 3280896. Throughput: 0: 1020.5. Samples: 818664. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-12 21:23:42,212][01010] Avg episode reward: [(0, '21.553')] [2025-01-12 21:23:47,208][01010] Fps is (10 sec: 4505.7, 60 sec: 4164.3, 300 sec: 4109.9). Total num frames: 3301376. Throughput: 0: 1061.2. Samples: 825316. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-12 21:23:47,213][01010] Avg episode reward: [(0, '22.403')] [2025-01-12 21:23:47,218][02848] Saving new best policy, reward=22.403! [2025-01-12 21:23:51,834][02861] Updated weights for policy 0, policy_version 810 (0.0027) [2025-01-12 21:23:52,208][01010] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4082.1). Total num frames: 3317760. Throughput: 0: 1000.6. Samples: 829936. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-12 21:23:52,213][01010] Avg episode reward: [(0, '23.030')] [2025-01-12 21:23:52,230][02848] Saving new best policy, reward=23.030! [2025-01-12 21:23:57,208][01010] Fps is (10 sec: 4096.0, 60 sec: 4164.3, 300 sec: 4123.8). Total num frames: 3342336. Throughput: 0: 997.7. Samples: 833512. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-12 21:23:57,210][01010] Avg episode reward: [(0, '22.206')] [2025-01-12 21:24:00,130][02861] Updated weights for policy 0, policy_version 820 (0.0029) [2025-01-12 21:24:02,208][01010] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 4109.9). Total num frames: 3362816. Throughput: 0: 1058.9. Samples: 840932. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-12 21:24:02,213][01010] Avg episode reward: [(0, '22.319')] [2025-01-12 21:24:07,210][01010] Fps is (10 sec: 3685.4, 60 sec: 4027.6, 300 sec: 4082.1). Total num frames: 3379200. Throughput: 0: 1012.3. Samples: 845500. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-12 21:24:07,213][01010] Avg episode reward: [(0, '23.418')] [2025-01-12 21:24:07,217][02848] Saving new best policy, reward=23.418! [2025-01-12 21:24:11,364][02861] Updated weights for policy 0, policy_version 830 (0.0027) [2025-01-12 21:24:12,208][01010] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4110.0). Total num frames: 3403776. Throughput: 0: 997.0. Samples: 848448. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-01-12 21:24:12,214][01010] Avg episode reward: [(0, '23.498')] [2025-01-12 21:24:12,232][02848] Saving new best policy, reward=23.498! [2025-01-12 21:24:17,208][01010] Fps is (10 sec: 4506.8, 60 sec: 4164.3, 300 sec: 4109.9). Total num frames: 3424256. Throughput: 0: 1031.2. Samples: 855668. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-12 21:24:17,212][01010] Avg episode reward: [(0, '23.307')] [2025-01-12 21:24:21,093][02861] Updated weights for policy 0, policy_version 840 (0.0028) [2025-01-12 21:24:22,208][01010] Fps is (10 sec: 3686.2, 60 sec: 4027.7, 300 sec: 4082.1). Total num frames: 3440640. Throughput: 0: 1027.1. Samples: 861204. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-12 21:24:22,210][01010] Avg episode reward: [(0, '23.774')] [2025-01-12 21:24:22,224][02848] Saving new best policy, reward=23.774! [2025-01-12 21:24:27,208][01010] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 4096.0). Total num frames: 3461120. Throughput: 0: 994.5. Samples: 863416. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-01-12 21:24:27,210][01010] Avg episode reward: [(0, '25.567')] [2025-01-12 21:24:27,213][02848] Saving new best policy, reward=25.567! [2025-01-12 21:24:31,103][02861] Updated weights for policy 0, policy_version 850 (0.0033) [2025-01-12 21:24:32,208][01010] Fps is (10 sec: 4505.8, 60 sec: 4164.3, 300 sec: 4123.8). Total num frames: 3485696. Throughput: 0: 1005.4. Samples: 870558. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-12 21:24:32,210][01010] Avg episode reward: [(0, '24.729')] [2025-01-12 21:24:37,208][01010] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 4109.9). Total num frames: 3506176. Throughput: 0: 1049.1. Samples: 877146. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-12 21:24:37,217][01010] Avg episode reward: [(0, '24.474')] [2025-01-12 21:24:42,178][02861] Updated weights for policy 0, policy_version 860 (0.0023) [2025-01-12 21:24:42,208][01010] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4082.1). Total num frames: 3522560. Throughput: 0: 1017.7. Samples: 879308. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-12 21:24:42,214][01010] Avg episode reward: [(0, '24.756')] [2025-01-12 21:24:47,208][01010] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4109.9). Total num frames: 3547136. Throughput: 0: 993.0. Samples: 885616. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-01-12 21:24:47,215][01010] Avg episode reward: [(0, '25.184')] [2025-01-12 21:24:50,488][02861] Updated weights for policy 0, policy_version 870 (0.0017) [2025-01-12 21:24:52,208][01010] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 4109.9). Total num frames: 3567616. Throughput: 0: 1052.2. Samples: 892846. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-12 21:24:52,210][01010] Avg episode reward: [(0, '24.868')] [2025-01-12 21:24:57,211][01010] Fps is (10 sec: 3685.0, 60 sec: 4027.5, 300 sec: 4082.1). Total num frames: 3584000. Throughput: 0: 1038.8. Samples: 895200. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-12 21:24:57,214][01010] Avg episode reward: [(0, '24.791')] [2025-01-12 21:25:01,772][02861] Updated weights for policy 0, policy_version 880 (0.0031) [2025-01-12 21:25:02,208][01010] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4096.1). Total num frames: 3604480. Throughput: 0: 996.2. Samples: 900496. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-12 21:25:02,211][01010] Avg episode reward: [(0, '25.368')] [2025-01-12 21:25:07,208][01010] Fps is (10 sec: 4507.3, 60 sec: 4164.4, 300 sec: 4109.9). Total num frames: 3629056. Throughput: 0: 1030.2. Samples: 907564. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-12 21:25:07,211][01010] Avg episode reward: [(0, '23.945')] [2025-01-12 21:25:11,295][02861] Updated weights for policy 0, policy_version 890 (0.0021) [2025-01-12 21:25:12,209][01010] Fps is (10 sec: 4095.4, 60 sec: 4027.6, 300 sec: 4096.0). Total num frames: 3645440. Throughput: 0: 1051.1. Samples: 910718. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-12 21:25:12,211][01010] Avg episode reward: [(0, '23.428')] [2025-01-12 21:25:17,208][01010] Fps is (10 sec: 3276.8, 60 sec: 3959.5, 300 sec: 4082.2). Total num frames: 3661824. Throughput: 0: 988.8. Samples: 915056. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-01-12 21:25:17,210][01010] Avg episode reward: [(0, '22.972')] [2025-01-12 21:25:21,609][02861] Updated weights for policy 0, policy_version 900 (0.0014) [2025-01-12 21:25:22,208][01010] Fps is (10 sec: 4096.7, 60 sec: 4096.0, 300 sec: 4109.9). Total num frames: 3686400. Throughput: 0: 1003.5. Samples: 922304. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-12 21:25:22,212][01010] Avg episode reward: [(0, '22.029')] [2025-01-12 21:25:27,208][01010] Fps is (10 sec: 4915.2, 60 sec: 4164.3, 300 sec: 4109.9). Total num frames: 3710976. Throughput: 0: 1037.5. Samples: 925996. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-12 21:25:27,211][01010] Avg episode reward: [(0, '22.152')] [2025-01-12 21:25:32,208][01010] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 4068.2). Total num frames: 3723264. Throughput: 0: 1008.6. Samples: 931004. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-12 21:25:32,212][01010] Avg episode reward: [(0, '22.680')] [2025-01-12 21:25:32,223][02848] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000909_3723264.pth... [2025-01-12 21:25:32,390][02848] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000670_2744320.pth [2025-01-12 21:25:32,673][02861] Updated weights for policy 0, policy_version 910 (0.0038) [2025-01-12 21:25:37,208][01010] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4096.0). Total num frames: 3747840. Throughput: 0: 991.0. Samples: 937440. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-12 21:25:37,215][01010] Avg episode reward: [(0, '23.393')] [2025-01-12 21:25:40,945][02861] Updated weights for policy 0, policy_version 920 (0.0022) [2025-01-12 21:25:42,208][01010] Fps is (10 sec: 4915.2, 60 sec: 4164.3, 300 sec: 4109.9). Total num frames: 3772416. Throughput: 0: 1021.5. Samples: 941162. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-12 21:25:42,210][01010] Avg episode reward: [(0, '23.666')] [2025-01-12 21:25:47,209][01010] Fps is (10 sec: 4095.5, 60 sec: 4027.7, 300 sec: 4096.0). Total num frames: 3788800. Throughput: 0: 1031.7. Samples: 946922. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-12 21:25:47,211][01010] Avg episode reward: [(0, '23.913')] [2025-01-12 21:25:52,208][01010] Fps is (10 sec: 3276.8, 60 sec: 3959.5, 300 sec: 4082.1). Total num frames: 3805184. Throughput: 0: 995.4. Samples: 952356. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-12 21:25:52,210][01010] Avg episode reward: [(0, '24.365')] [2025-01-12 21:25:52,267][02861] Updated weights for policy 0, policy_version 930 (0.0020) [2025-01-12 21:25:57,208][01010] Fps is (10 sec: 4096.4, 60 sec: 4096.2, 300 sec: 4109.9). Total num frames: 3829760. Throughput: 0: 1005.4. Samples: 955960. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-12 21:25:57,210][01010] Avg episode reward: [(0, '24.980')] [2025-01-12 21:26:01,027][02861] Updated weights for policy 0, policy_version 940 (0.0020) [2025-01-12 21:26:02,208][01010] Fps is (10 sec: 4505.3, 60 sec: 4096.0, 300 sec: 4096.0). Total num frames: 3850240. Throughput: 0: 1060.3. Samples: 962772. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2025-01-12 21:26:02,211][01010] Avg episode reward: [(0, '26.162')] [2025-01-12 21:26:02,217][02848] Saving new best policy, reward=26.162! [2025-01-12 21:26:07,208][01010] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 4068.2). Total num frames: 3866624. Throughput: 0: 999.4. Samples: 967276. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-12 21:26:07,212][01010] Avg episode reward: [(0, '25.998')] [2025-01-12 21:26:11,748][02861] Updated weights for policy 0, policy_version 950 (0.0014) [2025-01-12 21:26:12,208][01010] Fps is (10 sec: 4096.3, 60 sec: 4096.1, 300 sec: 4109.9). Total num frames: 3891200. Throughput: 0: 999.2. Samples: 970958. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-12 21:26:12,211][01010] Avg episode reward: [(0, '25.580')] [2025-01-12 21:26:17,208][01010] Fps is (10 sec: 4915.3, 60 sec: 4232.5, 300 sec: 4109.9). Total num frames: 3915776. Throughput: 0: 1051.8. Samples: 978336. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-12 21:26:17,213][01010] Avg episode reward: [(0, '24.142')] [2025-01-12 21:26:22,211][01010] Fps is (10 sec: 3685.0, 60 sec: 4027.5, 300 sec: 4068.2). Total num frames: 3928064. Throughput: 0: 1014.3. Samples: 983086. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-01-12 21:26:22,218][01010] Avg episode reward: [(0, '23.788')] [2025-01-12 21:26:22,600][02861] Updated weights for policy 0, policy_version 960 (0.0021) [2025-01-12 21:26:27,208][01010] Fps is (10 sec: 3686.3, 60 sec: 4027.7, 300 sec: 4096.0). Total num frames: 3952640. Throughput: 0: 992.2. Samples: 985812. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-12 21:26:27,212][01010] Avg episode reward: [(0, '24.460')] [2025-01-12 21:26:31,176][02861] Updated weights for policy 0, policy_version 970 (0.0015) [2025-01-12 21:26:32,208][01010] Fps is (10 sec: 4917.0, 60 sec: 4232.5, 300 sec: 4123.8). Total num frames: 3977216. Throughput: 0: 1028.9. Samples: 993222. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-01-12 21:26:32,216][01010] Avg episode reward: [(0, '24.574')] [2025-01-12 21:26:37,208][01010] Fps is (10 sec: 4096.1, 60 sec: 4096.0, 300 sec: 4096.0). Total num frames: 3993600. Throughput: 0: 1037.1. Samples: 999024. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-12 21:26:37,214][01010] Avg episode reward: [(0, '24.730')] [2025-01-12 21:26:40,564][02848] Stopping Batcher_0... [2025-01-12 21:26:40,565][02848] Loop batcher_evt_loop terminating... [2025-01-12 21:26:40,567][02848] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2025-01-12 21:26:40,571][01010] Component Batcher_0 stopped! [2025-01-12 21:26:40,620][02861] Weights refcount: 2 0 [2025-01-12 21:26:40,623][02861] Stopping InferenceWorker_p0-w0... [2025-01-12 21:26:40,625][02861] Loop inference_proc0-0_evt_loop terminating... [2025-01-12 21:26:40,623][01010] Component InferenceWorker_p0-w0 stopped! [2025-01-12 21:26:40,697][02848] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000791_3239936.pth [2025-01-12 21:26:40,720][02848] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2025-01-12 21:26:40,870][01010] Component RolloutWorker_w5 stopped! [2025-01-12 21:26:40,874][02868] Stopping RolloutWorker_w5... [2025-01-12 21:26:40,879][02868] Loop rollout_proc5_evt_loop terminating... [2025-01-12 21:26:40,909][01010] Component RolloutWorker_w1 stopped! [2025-01-12 21:26:40,911][02863] Stopping RolloutWorker_w1... [2025-01-12 21:26:40,912][02863] Loop rollout_proc1_evt_loop terminating... [2025-01-12 21:26:40,925][02848] Stopping LearnerWorker_p0... [2025-01-12 21:26:40,926][02848] Loop learner_proc0_evt_loop terminating... [2025-01-12 21:26:40,925][01010] Component LearnerWorker_p0 stopped! [2025-01-12 21:26:40,938][01010] Component RolloutWorker_w3 stopped! [2025-01-12 21:26:40,942][02865] Stopping RolloutWorker_w3... [2025-01-12 21:26:40,944][01010] Component RolloutWorker_w7 stopped! [2025-01-12 21:26:40,948][02869] Stopping RolloutWorker_w7... [2025-01-12 21:26:40,943][02865] Loop rollout_proc3_evt_loop terminating... [2025-01-12 21:26:40,950][02869] Loop rollout_proc7_evt_loop terminating... [2025-01-12 21:26:41,009][02864] Stopping RolloutWorker_w2... [2025-01-12 21:26:41,009][01010] Component RolloutWorker_w2 stopped! [2025-01-12 21:26:41,009][02864] Loop rollout_proc2_evt_loop terminating... [2025-01-12 21:26:41,038][02862] Stopping RolloutWorker_w0... [2025-01-12 21:26:41,038][01010] Component RolloutWorker_w0 stopped! [2025-01-12 21:26:41,050][02862] Loop rollout_proc0_evt_loop terminating... [2025-01-12 21:26:41,059][02866] Stopping RolloutWorker_w6... [2025-01-12 21:26:41,059][01010] Component RolloutWorker_w6 stopped! [2025-01-12 21:26:41,069][02867] Stopping RolloutWorker_w4... [2025-01-12 21:26:41,069][01010] Component RolloutWorker_w4 stopped! [2025-01-12 21:26:41,072][02866] Loop rollout_proc6_evt_loop terminating... [2025-01-12 21:26:41,071][01010] Waiting for process learner_proc0 to stop... [2025-01-12 21:26:41,075][02867] Loop rollout_proc4_evt_loop terminating... [2025-01-12 21:26:42,540][01010] Waiting for process inference_proc0-0 to join... [2025-01-12 21:26:42,547][01010] Waiting for process rollout_proc0 to join... [2025-01-12 21:26:44,379][01010] Waiting for process rollout_proc1 to join... [2025-01-12 21:26:44,384][01010] Waiting for process rollout_proc2 to join... [2025-01-12 21:26:44,389][01010] Waiting for process rollout_proc3 to join... [2025-01-12 21:26:44,392][01010] Waiting for process rollout_proc4 to join... [2025-01-12 21:26:44,395][01010] Waiting for process rollout_proc5 to join... [2025-01-12 21:26:44,400][01010] Waiting for process rollout_proc6 to join... [2025-01-12 21:26:44,403][01010] Waiting for process rollout_proc7 to join... [2025-01-12 21:26:44,408][01010] Batcher 0 profile tree view: batching: 26.1060, releasing_batches: 0.0250 [2025-01-12 21:26:44,409][01010] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0001 wait_policy_total: 362.1032 update_model: 8.9616 weight_update: 0.0023 one_step: 0.0055 handle_policy_step: 581.1722 deserialize: 14.4151, stack: 3.2648, obs_to_device_normalize: 125.1614, forward: 286.7229, send_messages: 29.7036 prepare_outputs: 91.6522 to_cpu: 55.6678 [2025-01-12 21:26:44,411][01010] Learner 0 profile tree view: misc: 0.0049, prepare_batch: 13.7389 train: 74.0967 epoch_init: 0.0096, minibatch_init: 0.0131, losses_postprocess: 0.6108, kl_divergence: 0.6175, after_optimizer: 34.4121 calculate_losses: 26.1215 losses_init: 0.0076, forward_head: 1.3366, bptt_initial: 17.5636, tail: 1.0975, advantages_returns: 0.2641, losses: 3.7769 bptt: 1.7480 bptt_forward_core: 1.6756 update: 11.6702 clip: 0.8795 [2025-01-12 21:26:44,413][01010] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.2506, enqueue_policy_requests: 82.5651, env_step: 779.6548, overhead: 12.0015, complete_rollouts: 7.3145 save_policy_outputs: 20.8328 split_output_tensors: 8.0273 [2025-01-12 21:26:44,414][01010] RolloutWorker_w7 profile tree view: wait_for_trajectories: 0.3758, enqueue_policy_requests: 87.9305, env_step: 777.6872, overhead: 12.1325, complete_rollouts: 6.5892 save_policy_outputs: 20.2589 split_output_tensors: 8.0206 [2025-01-12 21:26:44,415][01010] Loop Runner_EvtLoop terminating... [2025-01-12 21:26:44,416][01010] Runner profile tree view: main_loop: 1022.4605 [2025-01-12 21:26:44,417][01010] Collected {0: 4005888}, FPS: 3917.9 [2025-01-12 21:26:44,817][01010] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2025-01-12 21:26:44,818][01010] Overriding arg 'num_workers' with value 1 passed from command line [2025-01-12 21:26:44,823][01010] Adding new argument 'no_render'=True that is not in the saved config file! [2025-01-12 21:26:44,824][01010] Adding new argument 'save_video'=True that is not in the saved config file! [2025-01-12 21:26:44,826][01010] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2025-01-12 21:26:44,828][01010] Adding new argument 'video_name'=None that is not in the saved config file! [2025-01-12 21:26:44,829][01010] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! [2025-01-12 21:26:44,831][01010] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2025-01-12 21:26:44,832][01010] Adding new argument 'push_to_hub'=False that is not in the saved config file! [2025-01-12 21:26:44,833][01010] Adding new argument 'hf_repository'=None that is not in the saved config file! [2025-01-12 21:26:44,834][01010] Adding new argument 'policy_index'=0 that is not in the saved config file! [2025-01-12 21:26:44,835][01010] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2025-01-12 21:26:44,837][01010] Adding new argument 'train_script'=None that is not in the saved config file! [2025-01-12 21:26:44,838][01010] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2025-01-12 21:26:44,839][01010] Using frameskip 1 and render_action_repeat=4 for evaluation [2025-01-12 21:26:44,875][01010] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-12 21:26:44,879][01010] RunningMeanStd input shape: (3, 72, 128) [2025-01-12 21:26:44,881][01010] RunningMeanStd input shape: (1,) [2025-01-12 21:26:44,898][01010] ConvEncoder: input_channels=3 [2025-01-12 21:26:45,019][01010] Conv encoder output size: 512 [2025-01-12 21:26:45,021][01010] Policy head output size: 512 [2025-01-12 21:26:45,208][01010] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2025-01-12 21:26:45,985][01010] Num frames 100... [2025-01-12 21:26:46,105][01010] Num frames 200... [2025-01-12 21:26:46,227][01010] Num frames 300... [2025-01-12 21:26:46,349][01010] Num frames 400... [2025-01-12 21:26:46,469][01010] Num frames 500... [2025-01-12 21:26:46,591][01010] Num frames 600... [2025-01-12 21:26:46,719][01010] Num frames 700... [2025-01-12 21:26:46,848][01010] Num frames 800... [2025-01-12 21:26:46,974][01010] Num frames 900... [2025-01-12 21:26:47,098][01010] Num frames 1000... [2025-01-12 21:26:47,219][01010] Num frames 1100... [2025-01-12 21:26:47,339][01010] Num frames 1200... [2025-01-12 21:26:47,463][01010] Num frames 1300... [2025-01-12 21:26:47,587][01010] Num frames 1400... [2025-01-12 21:26:47,718][01010] Num frames 1500... [2025-01-12 21:26:47,840][01010] Num frames 1600... [2025-01-12 21:26:47,962][01010] Num frames 1700... [2025-01-12 21:26:48,089][01010] Num frames 1800... [2025-01-12 21:26:48,207][01010] Num frames 1900... [2025-01-12 21:26:48,364][01010] Avg episode rewards: #0: 59.839, true rewards: #0: 19.840 [2025-01-12 21:26:48,366][01010] Avg episode reward: 59.839, avg true_objective: 19.840 [2025-01-12 21:26:48,388][01010] Num frames 2000... [2025-01-12 21:26:48,507][01010] Num frames 2100... [2025-01-12 21:26:48,630][01010] Num frames 2200... [2025-01-12 21:26:48,755][01010] Num frames 2300... [2025-01-12 21:26:48,873][01010] Num frames 2400... [2025-01-12 21:26:49,002][01010] Num frames 2500... [2025-01-12 21:26:49,124][01010] Num frames 2600... [2025-01-12 21:26:49,252][01010] Num frames 2700... [2025-01-12 21:26:49,371][01010] Num frames 2800... [2025-01-12 21:26:49,492][01010] Num frames 2900... [2025-01-12 21:26:49,564][01010] Avg episode rewards: #0: 40.559, true rewards: #0: 14.560 [2025-01-12 21:26:49,566][01010] Avg episode reward: 40.559, avg true_objective: 14.560 [2025-01-12 21:26:49,674][01010] Num frames 3000... [2025-01-12 21:26:49,803][01010] Num frames 3100... [2025-01-12 21:26:49,924][01010] Num frames 3200... [2025-01-12 21:26:50,051][01010] Num frames 3300... [2025-01-12 21:26:50,174][01010] Num frames 3400... [2025-01-12 21:26:50,296][01010] Num frames 3500... [2025-01-12 21:26:50,465][01010] Num frames 3600... [2025-01-12 21:26:50,638][01010] Num frames 3700... [2025-01-12 21:26:50,810][01010] Num frames 3800... [2025-01-12 21:26:50,984][01010] Num frames 3900... [2025-01-12 21:26:51,151][01010] Num frames 4000... [2025-01-12 21:26:51,360][01010] Avg episode rewards: #0: 35.320, true rewards: #0: 13.653 [2025-01-12 21:26:51,362][01010] Avg episode reward: 35.320, avg true_objective: 13.653 [2025-01-12 21:26:51,371][01010] Num frames 4100... [2025-01-12 21:26:51,527][01010] Num frames 4200... [2025-01-12 21:26:51,700][01010] Num frames 4300... [2025-01-12 21:26:51,880][01010] Num frames 4400... [2025-01-12 21:26:52,046][01010] Num frames 4500... [2025-01-12 21:26:52,224][01010] Num frames 4600... [2025-01-12 21:26:52,418][01010] Num frames 4700... [2025-01-12 21:26:52,594][01010] Num frames 4800... [2025-01-12 21:26:52,768][01010] Num frames 4900... [2025-01-12 21:26:52,925][01010] Num frames 5000... [2025-01-12 21:26:53,050][01010] Num frames 5100... [2025-01-12 21:26:53,170][01010] Num frames 5200... [2025-01-12 21:26:53,293][01010] Num frames 5300... [2025-01-12 21:26:53,413][01010] Num frames 5400... [2025-01-12 21:26:53,534][01010] Num frames 5500... [2025-01-12 21:26:53,656][01010] Num frames 5600... [2025-01-12 21:26:53,777][01010] Num frames 5700... [2025-01-12 21:26:53,910][01010] Num frames 5800... [2025-01-12 21:26:54,038][01010] Num frames 5900... [2025-01-12 21:26:54,157][01010] Num frames 6000... [2025-01-12 21:26:54,281][01010] Num frames 6100... [2025-01-12 21:26:54,356][01010] Avg episode rewards: #0: 41.289, true rewards: #0: 15.290 [2025-01-12 21:26:54,357][01010] Avg episode reward: 41.289, avg true_objective: 15.290 [2025-01-12 21:26:54,459][01010] Num frames 6200... [2025-01-12 21:26:54,576][01010] Num frames 6300... [2025-01-12 21:26:54,698][01010] Num frames 6400... [2025-01-12 21:26:54,816][01010] Num frames 6500... [2025-01-12 21:26:54,940][01010] Num frames 6600... [2025-01-12 21:26:55,068][01010] Num frames 6700... [2025-01-12 21:26:55,190][01010] Num frames 6800... [2025-01-12 21:26:55,312][01010] Num frames 6900... [2025-01-12 21:26:55,432][01010] Num frames 7000... [2025-01-12 21:26:55,552][01010] Num frames 7100... [2025-01-12 21:26:55,673][01010] Num frames 7200... [2025-01-12 21:26:55,797][01010] Num frames 7300... [2025-01-12 21:26:55,857][01010] Avg episode rewards: #0: 39.604, true rewards: #0: 14.604 [2025-01-12 21:26:55,858][01010] Avg episode reward: 39.604, avg true_objective: 14.604 [2025-01-12 21:26:55,992][01010] Num frames 7400... [2025-01-12 21:26:56,115][01010] Num frames 7500... [2025-01-12 21:26:56,235][01010] Num frames 7600... [2025-01-12 21:26:56,362][01010] Num frames 7700... [2025-01-12 21:26:56,485][01010] Num frames 7800... [2025-01-12 21:26:56,614][01010] Num frames 7900... [2025-01-12 21:26:56,733][01010] Num frames 8000... [2025-01-12 21:26:56,856][01010] Num frames 8100... [2025-01-12 21:26:56,993][01010] Num frames 8200... [2025-01-12 21:26:57,084][01010] Avg episode rewards: #0: 36.216, true rewards: #0: 13.717 [2025-01-12 21:26:57,085][01010] Avg episode reward: 36.216, avg true_objective: 13.717 [2025-01-12 21:26:57,167][01010] Num frames 8300... [2025-01-12 21:26:57,288][01010] Num frames 8400... [2025-01-12 21:26:57,404][01010] Num frames 8500... [2025-01-12 21:26:57,525][01010] Num frames 8600... [2025-01-12 21:26:57,643][01010] Num frames 8700... [2025-01-12 21:26:57,759][01010] Num frames 8800... [2025-01-12 21:26:57,880][01010] Num frames 8900... [2025-01-12 21:26:58,014][01010] Num frames 9000... [2025-01-12 21:26:58,138][01010] Num frames 9100... [2025-01-12 21:26:58,257][01010] Num frames 9200... [2025-01-12 21:26:58,377][01010] Num frames 9300... [2025-01-12 21:26:58,493][01010] Num frames 9400... [2025-01-12 21:26:58,612][01010] Num frames 9500... [2025-01-12 21:26:58,732][01010] Num frames 9600... [2025-01-12 21:26:58,851][01010] Num frames 9700... [2025-01-12 21:26:58,984][01010] Num frames 9800... [2025-01-12 21:26:59,109][01010] Num frames 9900... [2025-01-12 21:26:59,228][01010] Num frames 10000... [2025-01-12 21:26:59,355][01010] Num frames 10100... [2025-01-12 21:26:59,478][01010] Num frames 10200... [2025-01-12 21:26:59,600][01010] Num frames 10300... [2025-01-12 21:26:59,694][01010] Avg episode rewards: #0: 39.471, true rewards: #0: 14.757 [2025-01-12 21:26:59,695][01010] Avg episode reward: 39.471, avg true_objective: 14.757 [2025-01-12 21:26:59,778][01010] Num frames 10400... [2025-01-12 21:26:59,900][01010] Num frames 10500... [2025-01-12 21:27:00,034][01010] Num frames 10600... [2025-01-12 21:27:00,156][01010] Num frames 10700... [2025-01-12 21:27:00,275][01010] Num frames 10800... [2025-01-12 21:27:00,402][01010] Num frames 10900... [2025-01-12 21:27:00,525][01010] Num frames 11000... [2025-01-12 21:27:00,647][01010] Num frames 11100... [2025-01-12 21:27:00,767][01010] Num frames 11200... [2025-01-12 21:27:00,885][01010] Num frames 11300... [2025-01-12 21:27:01,018][01010] Num frames 11400... [2025-01-12 21:27:01,139][01010] Num frames 11500... [2025-01-12 21:27:01,227][01010] Avg episode rewards: #0: 38.285, true rewards: #0: 14.410 [2025-01-12 21:27:01,229][01010] Avg episode reward: 38.285, avg true_objective: 14.410 [2025-01-12 21:27:01,317][01010] Num frames 11600... [2025-01-12 21:27:01,435][01010] Num frames 11700... [2025-01-12 21:27:01,556][01010] Num frames 11800... [2025-01-12 21:27:01,675][01010] Num frames 11900... [2025-01-12 21:27:01,818][01010] Avg episode rewards: #0: 34.640, true rewards: #0: 13.307 [2025-01-12 21:27:01,819][01010] Avg episode reward: 34.640, avg true_objective: 13.307 [2025-01-12 21:27:01,854][01010] Num frames 12000... [2025-01-12 21:27:01,975][01010] Num frames 12100... [2025-01-12 21:27:02,110][01010] Num frames 12200... [2025-01-12 21:27:02,229][01010] Num frames 12300... [2025-01-12 21:27:02,348][01010] Num frames 12400... [2025-01-12 21:27:02,470][01010] Num frames 12500... [2025-01-12 21:27:02,589][01010] Avg episode rewards: #0: 32.152, true rewards: #0: 12.552 [2025-01-12 21:27:02,591][01010] Avg episode reward: 32.152, avg true_objective: 12.552 [2025-01-12 21:28:17,255][01010] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2025-01-12 21:34:40,584][01010] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2025-01-12 21:34:40,585][01010] Overriding arg 'num_workers' with value 1 passed from command line [2025-01-12 21:34:40,587][01010] Adding new argument 'no_render'=True that is not in the saved config file! [2025-01-12 21:34:40,589][01010] Adding new argument 'save_video'=True that is not in the saved config file! [2025-01-12 21:34:40,591][01010] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2025-01-12 21:34:40,593][01010] Adding new argument 'video_name'=None that is not in the saved config file! [2025-01-12 21:34:40,595][01010] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2025-01-12 21:34:40,596][01010] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2025-01-12 21:34:40,597][01010] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2025-01-12 21:34:40,598][01010] Adding new argument 'hf_repository'='federicobecona/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2025-01-12 21:34:40,599][01010] Adding new argument 'policy_index'=0 that is not in the saved config file! [2025-01-12 21:34:40,600][01010] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2025-01-12 21:34:40,601][01010] Adding new argument 'train_script'=None that is not in the saved config file! [2025-01-12 21:34:40,602][01010] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2025-01-12 21:34:40,603][01010] Using frameskip 1 and render_action_repeat=4 for evaluation [2025-01-12 21:34:40,647][01010] RunningMeanStd input shape: (3, 72, 128) [2025-01-12 21:34:40,649][01010] RunningMeanStd input shape: (1,) [2025-01-12 21:34:40,667][01010] ConvEncoder: input_channels=3 [2025-01-12 21:34:40,726][01010] Conv encoder output size: 512 [2025-01-12 21:34:40,728][01010] Policy head output size: 512 [2025-01-12 21:34:40,754][01010] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2025-01-12 21:34:41,389][01010] Num frames 100... [2025-01-12 21:34:41,563][01010] Num frames 200... [2025-01-12 21:34:41,697][01010] Num frames 300... [2025-01-12 21:34:41,818][01010] Num frames 400... [2025-01-12 21:34:41,935][01010] Num frames 500... [2025-01-12 21:34:42,060][01010] Num frames 600... [2025-01-12 21:34:42,186][01010] Num frames 700... [2025-01-12 21:34:42,307][01010] Num frames 800... [2025-01-12 21:34:42,426][01010] Num frames 900... [2025-01-12 21:34:42,544][01010] Num frames 1000... [2025-01-12 21:34:42,664][01010] Avg episode rewards: #0: 25.560, true rewards: #0: 10.560 [2025-01-12 21:34:42,666][01010] Avg episode reward: 25.560, avg true_objective: 10.560 [2025-01-12 21:34:42,726][01010] Num frames 1100... [2025-01-12 21:34:42,847][01010] Num frames 1200... [2025-01-12 21:34:42,968][01010] Num frames 1300... [2025-01-12 21:34:43,092][01010] Num frames 1400... [2025-01-12 21:34:43,220][01010] Num frames 1500... [2025-01-12 21:34:43,340][01010] Num frames 1600... [2025-01-12 21:34:43,461][01010] Num frames 1700... [2025-01-12 21:34:43,580][01010] Num frames 1800... [2025-01-12 21:34:43,699][01010] Num frames 1900... [2025-01-12 21:34:43,822][01010] Num frames 2000... [2025-01-12 21:34:43,943][01010] Num frames 2100... [2025-01-12 21:34:44,057][01010] Avg episode rewards: #0: 24.220, true rewards: #0: 10.720 [2025-01-12 21:34:44,059][01010] Avg episode reward: 24.220, avg true_objective: 10.720 [2025-01-12 21:34:44,130][01010] Num frames 2200... [2025-01-12 21:34:44,257][01010] Num frames 2300... [2025-01-12 21:34:44,378][01010] Num frames 2400... [2025-01-12 21:34:44,496][01010] Num frames 2500... [2025-01-12 21:34:44,617][01010] Num frames 2600... [2025-01-12 21:34:44,735][01010] Num frames 2700... [2025-01-12 21:34:44,853][01010] Num frames 2800... [2025-01-12 21:34:44,979][01010] Num frames 2900... [2025-01-12 21:34:45,099][01010] Num frames 3000... [2025-01-12 21:34:45,227][01010] Num frames 3100... [2025-01-12 21:34:45,351][01010] Num frames 3200... [2025-01-12 21:34:45,470][01010] Num frames 3300... [2025-01-12 21:34:45,538][01010] Avg episode rewards: #0: 25.367, true rewards: #0: 11.033 [2025-01-12 21:34:45,539][01010] Avg episode reward: 25.367, avg true_objective: 11.033 [2025-01-12 21:34:45,645][01010] Num frames 3400... [2025-01-12 21:34:45,763][01010] Num frames 3500... [2025-01-12 21:34:45,882][01010] Num frames 3600... [2025-01-12 21:34:46,010][01010] Num frames 3700... [2025-01-12 21:34:46,129][01010] Num frames 3800... [2025-01-12 21:34:46,258][01010] Num frames 3900... [2025-01-12 21:34:46,382][01010] Num frames 4000... [2025-01-12 21:34:46,499][01010] Num frames 4100... [2025-01-12 21:34:46,617][01010] Num frames 4200... [2025-01-12 21:34:46,735][01010] Num frames 4300... [2025-01-12 21:34:46,794][01010] Avg episode rewards: #0: 24.005, true rewards: #0: 10.755 [2025-01-12 21:34:46,796][01010] Avg episode reward: 24.005, avg true_objective: 10.755 [2025-01-12 21:34:46,912][01010] Num frames 4400... [2025-01-12 21:34:47,041][01010] Num frames 4500... [2025-01-12 21:34:47,161][01010] Num frames 4600... [2025-01-12 21:34:47,288][01010] Num frames 4700... [2025-01-12 21:34:47,412][01010] Num frames 4800... [2025-01-12 21:34:47,535][01010] Num frames 4900... [2025-01-12 21:34:47,658][01010] Num frames 5000... [2025-01-12 21:34:47,761][01010] Avg episode rewards: #0: 22.076, true rewards: #0: 10.076 [2025-01-12 21:34:47,762][01010] Avg episode reward: 22.076, avg true_objective: 10.076 [2025-01-12 21:34:47,836][01010] Num frames 5100... [2025-01-12 21:34:47,954][01010] Num frames 5200... [2025-01-12 21:34:48,082][01010] Num frames 5300... [2025-01-12 21:34:48,203][01010] Num frames 5400... [2025-01-12 21:34:48,336][01010] Num frames 5500... [2025-01-12 21:34:48,459][01010] Num frames 5600... [2025-01-12 21:34:48,581][01010] Num frames 5700... [2025-01-12 21:34:48,650][01010] Avg episode rewards: #0: 21.017, true rewards: #0: 9.517 [2025-01-12 21:34:48,651][01010] Avg episode reward: 21.017, avg true_objective: 9.517 [2025-01-12 21:34:48,765][01010] Num frames 5800... [2025-01-12 21:34:48,886][01010] Num frames 5900... [2025-01-12 21:34:49,013][01010] Num frames 6000... [2025-01-12 21:34:49,134][01010] Num frames 6100... [2025-01-12 21:34:49,256][01010] Num frames 6200... [2025-01-12 21:34:49,382][01010] Num frames 6300... [2025-01-12 21:34:49,502][01010] Num frames 6400... [2025-01-12 21:34:49,622][01010] Num frames 6500... [2025-01-12 21:34:49,742][01010] Num frames 6600... [2025-01-12 21:34:49,863][01010] Num frames 6700... [2025-01-12 21:34:49,989][01010] Num frames 6800... [2025-01-12 21:34:50,114][01010] Num frames 6900... [2025-01-12 21:34:50,237][01010] Num frames 7000... [2025-01-12 21:34:50,364][01010] Num frames 7100... [2025-01-12 21:34:50,482][01010] Num frames 7200... [2025-01-12 21:34:50,594][01010] Avg episode rewards: #0: 23.637, true rewards: #0: 10.351 [2025-01-12 21:34:50,595][01010] Avg episode reward: 23.637, avg true_objective: 10.351 [2025-01-12 21:34:50,661][01010] Num frames 7300... [2025-01-12 21:34:50,779][01010] Num frames 7400... [2025-01-12 21:34:50,898][01010] Num frames 7500... [2025-01-12 21:34:51,027][01010] Num frames 7600... [2025-01-12 21:34:51,148][01010] Num frames 7700... [2025-01-12 21:34:51,273][01010] Num frames 7800... [2025-01-12 21:34:51,400][01010] Num frames 7900... [2025-01-12 21:34:51,523][01010] Num frames 8000... [2025-01-12 21:34:51,663][01010] Num frames 8100... [2025-01-12 21:34:51,829][01010] Num frames 8200... [2025-01-12 21:34:52,008][01010] Num frames 8300... [2025-01-12 21:34:52,177][01010] Num frames 8400... [2025-01-12 21:34:52,344][01010] Num frames 8500... [2025-01-12 21:34:52,522][01010] Num frames 8600... [2025-01-12 21:34:52,681][01010] Num frames 8700... [2025-01-12 21:34:52,840][01010] Num frames 8800... [2025-01-12 21:34:53,014][01010] Num frames 8900... [2025-01-12 21:34:53,181][01010] Num frames 9000... [2025-01-12 21:34:53,300][01010] Avg episode rewards: #0: 26.048, true rewards: #0: 11.297 [2025-01-12 21:34:53,302][01010] Avg episode reward: 26.048, avg true_objective: 11.297 [2025-01-12 21:34:53,411][01010] Num frames 9100... [2025-01-12 21:34:53,587][01010] Num frames 9200... [2025-01-12 21:34:53,754][01010] Num frames 9300... [2025-01-12 21:34:53,930][01010] Num frames 9400... [2025-01-12 21:34:54,118][01010] Num frames 9500... [2025-01-12 21:34:54,254][01010] Num frames 9600... [2025-01-12 21:34:54,375][01010] Num frames 9700... [2025-01-12 21:34:54,507][01010] Num frames 9800... [2025-01-12 21:34:54,664][01010] Avg episode rewards: #0: 25.426, true rewards: #0: 10.981 [2025-01-12 21:34:54,665][01010] Avg episode reward: 25.426, avg true_objective: 10.981 [2025-01-12 21:34:54,687][01010] Num frames 9900... [2025-01-12 21:34:54,807][01010] Num frames 10000... [2025-01-12 21:34:54,930][01010] Num frames 10100... [2025-01-12 21:34:55,054][01010] Num frames 10200... [2025-01-12 21:34:55,171][01010] Num frames 10300... [2025-01-12 21:34:55,288][01010] Num frames 10400... [2025-01-12 21:34:55,407][01010] Num frames 10500... [2025-01-12 21:34:55,532][01010] Num frames 10600... [2025-01-12 21:34:55,654][01010] Num frames 10700... [2025-01-12 21:34:55,775][01010] Num frames 10800... [2025-01-12 21:34:55,921][01010] Avg episode rewards: #0: 25.075, true rewards: #0: 10.875 [2025-01-12 21:34:55,923][01010] Avg episode reward: 25.075, avg true_objective: 10.875 [2025-01-12 21:35:58,760][01010] Replay video saved to /content/train_dir/default_experiment/replay.mp4!