[2025-01-07 16:53:40,003][00600] Saving configuration to /content/train_dir/default_experiment/config.json... [2025-01-07 16:53:40,006][00600] Rollout worker 0 uses device cpu [2025-01-07 16:53:40,007][00600] Rollout worker 1 uses device cpu [2025-01-07 16:53:40,009][00600] Rollout worker 2 uses device cpu [2025-01-07 16:53:40,010][00600] Rollout worker 3 uses device cpu [2025-01-07 16:53:40,012][00600] Rollout worker 4 uses device cpu [2025-01-07 16:53:40,013][00600] Rollout worker 5 uses device cpu [2025-01-07 16:53:40,014][00600] Rollout worker 6 uses device cpu [2025-01-07 16:53:40,015][00600] Rollout worker 7 uses device cpu [2025-01-07 16:53:40,188][00600] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2025-01-07 16:53:40,190][00600] InferenceWorker_p0-w0: min num requests: 2 [2025-01-07 16:53:40,224][00600] Starting all processes... [2025-01-07 16:53:40,226][00600] Starting process learner_proc0 [2025-01-07 16:53:40,274][00600] Starting all processes... [2025-01-07 16:53:40,293][00600] Starting process inference_proc0-0 [2025-01-07 16:53:40,298][00600] Starting process rollout_proc0 [2025-01-07 16:53:40,298][00600] Starting process rollout_proc1 [2025-01-07 16:53:40,298][00600] Starting process rollout_proc2 [2025-01-07 16:53:40,298][00600] Starting process rollout_proc3 [2025-01-07 16:53:40,298][00600] Starting process rollout_proc4 [2025-01-07 16:53:40,298][00600] Starting process rollout_proc5 [2025-01-07 16:53:40,298][00600] Starting process rollout_proc6 [2025-01-07 16:53:40,298][00600] Starting process rollout_proc7 [2025-01-07 16:53:57,658][02732] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2025-01-07 16:53:57,659][02732] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2025-01-07 16:53:57,681][02751] Worker 5 uses CPU cores [1] [2025-01-07 16:53:57,756][02732] Num visible devices: 1 [2025-01-07 16:53:57,790][02753] Worker 7 uses CPU cores [1] [2025-01-07 16:53:57,791][02732] Starting seed is not provided [2025-01-07 16:53:57,791][02732] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2025-01-07 16:53:57,791][02732] Initializing actor-critic model on device cuda:0 [2025-01-07 16:53:57,792][02732] RunningMeanStd input shape: (3, 72, 128) [2025-01-07 16:53:57,793][02748] Worker 2 uses CPU cores [0] [2025-01-07 16:53:57,795][02732] RunningMeanStd input shape: (1,) [2025-01-07 16:53:57,807][02752] Worker 6 uses CPU cores [0] [2025-01-07 16:53:57,858][02746] Worker 0 uses CPU cores [0] [2025-01-07 16:53:57,870][02732] ConvEncoder: input_channels=3 [2025-01-07 16:53:57,888][02745] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2025-01-07 16:53:57,888][02745] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2025-01-07 16:53:57,909][02750] Worker 4 uses CPU cores [0] [2025-01-07 16:53:57,914][02747] Worker 1 uses CPU cores [1] [2025-01-07 16:53:57,928][02745] Num visible devices: 1 [2025-01-07 16:53:57,951][02749] Worker 3 uses CPU cores [1] [2025-01-07 16:53:58,175][02732] Conv encoder output size: 512 [2025-01-07 16:53:58,175][02732] Policy head output size: 512 [2025-01-07 16:53:58,234][02732] Created Actor Critic model with architecture: [2025-01-07 16:53:58,234][02732] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2025-01-07 16:53:58,666][02732] Using optimizer [2025-01-07 16:54:00,186][00600] Heartbeat connected on Batcher_0 [2025-01-07 16:54:00,193][00600] Heartbeat connected on InferenceWorker_p0-w0 [2025-01-07 16:54:00,198][00600] Heartbeat connected on RolloutWorker_w0 [2025-01-07 16:54:00,201][00600] Heartbeat connected on RolloutWorker_w1 [2025-01-07 16:54:00,205][00600] Heartbeat connected on RolloutWorker_w2 [2025-01-07 16:54:00,210][00600] Heartbeat connected on RolloutWorker_w3 [2025-01-07 16:54:00,213][00600] Heartbeat connected on RolloutWorker_w4 [2025-01-07 16:54:00,217][00600] Heartbeat connected on RolloutWorker_w5 [2025-01-07 16:54:00,220][00600] Heartbeat connected on RolloutWorker_w6 [2025-01-07 16:54:00,224][00600] Heartbeat connected on RolloutWorker_w7 [2025-01-07 16:54:03,088][02732] No checkpoints found [2025-01-07 16:54:03,088][02732] Did not load from checkpoint, starting from scratch! [2025-01-07 16:54:03,088][02732] Initialized policy 0 weights for model version 0 [2025-01-07 16:54:03,093][02732] LearnerWorker_p0 finished initialization! [2025-01-07 16:54:03,096][02732] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2025-01-07 16:54:03,093][00600] Heartbeat connected on LearnerWorker_p0 [2025-01-07 16:54:03,352][02745] RunningMeanStd input shape: (3, 72, 128) [2025-01-07 16:54:03,353][02745] RunningMeanStd input shape: (1,) [2025-01-07 16:54:03,375][02745] ConvEncoder: input_channels=3 [2025-01-07 16:54:03,552][02745] Conv encoder output size: 512 [2025-01-07 16:54:03,554][02745] Policy head output size: 512 [2025-01-07 16:54:03,636][00600] Inference worker 0-0 is ready! [2025-01-07 16:54:03,642][00600] All inference workers are ready! Signal rollout workers to start! [2025-01-07 16:54:03,853][00600] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2025-01-07 16:54:03,863][02753] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-07 16:54:03,867][02747] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-07 16:54:03,862][02749] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-07 16:54:03,870][02751] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-07 16:54:03,949][02748] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-07 16:54:03,955][02746] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-07 16:54:03,951][02752] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-07 16:54:03,960][02750] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-07 16:54:04,942][02746] Decorrelating experience for 0 frames... [2025-01-07 16:54:04,949][02752] Decorrelating experience for 0 frames... [2025-01-07 16:54:05,246][02753] Decorrelating experience for 0 frames... [2025-01-07 16:54:05,248][02751] Decorrelating experience for 0 frames... [2025-01-07 16:54:05,255][02747] Decorrelating experience for 0 frames... [2025-01-07 16:54:05,975][02753] Decorrelating experience for 32 frames... [2025-01-07 16:54:06,124][02751] Decorrelating experience for 32 frames... [2025-01-07 16:54:06,385][02746] Decorrelating experience for 32 frames... [2025-01-07 16:54:06,399][02752] Decorrelating experience for 32 frames... [2025-01-07 16:54:06,448][02748] Decorrelating experience for 0 frames... [2025-01-07 16:54:06,868][02750] Decorrelating experience for 0 frames... [2025-01-07 16:54:07,122][02751] Decorrelating experience for 64 frames... [2025-01-07 16:54:07,624][02753] Decorrelating experience for 64 frames... [2025-01-07 16:54:07,724][02747] Decorrelating experience for 32 frames... [2025-01-07 16:54:07,732][02748] Decorrelating experience for 32 frames... [2025-01-07 16:54:08,124][02746] Decorrelating experience for 64 frames... [2025-01-07 16:54:08,146][02750] Decorrelating experience for 32 frames... [2025-01-07 16:54:08,390][02751] Decorrelating experience for 96 frames... [2025-01-07 16:54:08,853][00600] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2025-01-07 16:54:08,985][02753] Decorrelating experience for 96 frames... [2025-01-07 16:54:09,462][02749] Decorrelating experience for 0 frames... [2025-01-07 16:54:09,578][02747] Decorrelating experience for 64 frames... [2025-01-07 16:54:09,651][02752] Decorrelating experience for 64 frames... [2025-01-07 16:54:09,989][02748] Decorrelating experience for 64 frames... [2025-01-07 16:54:10,056][02746] Decorrelating experience for 96 frames... [2025-01-07 16:54:10,268][02750] Decorrelating experience for 64 frames... [2025-01-07 16:54:10,725][02752] Decorrelating experience for 96 frames... [2025-01-07 16:54:11,397][02749] Decorrelating experience for 32 frames... [2025-01-07 16:54:11,980][02750] Decorrelating experience for 96 frames... [2025-01-07 16:54:13,854][02747] Decorrelating experience for 96 frames... [2025-01-07 16:54:13,858][00600] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 140.7. Samples: 1408. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2025-01-07 16:54:13,861][00600] Avg episode reward: [(0, '2.377')] [2025-01-07 16:54:14,907][02749] Decorrelating experience for 64 frames... [2025-01-07 16:54:15,038][02732] Signal inference workers to stop experience collection... [2025-01-07 16:54:15,081][02745] InferenceWorker_p0-w0: stopping experience collection [2025-01-07 16:54:16,139][02748] Decorrelating experience for 96 frames... [2025-01-07 16:54:16,686][02749] Decorrelating experience for 96 frames... [2025-01-07 16:54:18,853][00600] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 170.0. Samples: 2550. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2025-01-07 16:54:18,856][00600] Avg episode reward: [(0, '3.402')] [2025-01-07 16:54:18,891][02732] Signal inference workers to resume experience collection... [2025-01-07 16:54:18,892][02745] InferenceWorker_p0-w0: resuming experience collection [2025-01-07 16:54:23,853][00600] Fps is (10 sec: 2458.8, 60 sec: 1228.8, 300 sec: 1228.8). Total num frames: 24576. Throughput: 0: 248.7. Samples: 4974. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-07 16:54:23,861][00600] Avg episode reward: [(0, '3.639')] [2025-01-07 16:54:27,634][02745] Updated weights for policy 0, policy_version 10 (0.0026) [2025-01-07 16:54:28,853][00600] Fps is (10 sec: 4505.6, 60 sec: 1802.2, 300 sec: 1802.2). Total num frames: 45056. Throughput: 0: 435.0. Samples: 10876. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-07 16:54:28,856][00600] Avg episode reward: [(0, '4.091')] [2025-01-07 16:54:33,853][00600] Fps is (10 sec: 3276.8, 60 sec: 1911.5, 300 sec: 1911.5). Total num frames: 57344. Throughput: 0: 526.3. Samples: 15790. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-07 16:54:33,857][00600] Avg episode reward: [(0, '4.448')] [2025-01-07 16:54:38,853][00600] Fps is (10 sec: 3276.8, 60 sec: 2223.5, 300 sec: 2223.5). Total num frames: 77824. Throughput: 0: 511.9. Samples: 17916. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-01-07 16:54:38,858][00600] Avg episode reward: [(0, '4.459')] [2025-01-07 16:54:39,385][02745] Updated weights for policy 0, policy_version 20 (0.0035) [2025-01-07 16:54:43,858][00600] Fps is (10 sec: 4094.0, 60 sec: 2457.3, 300 sec: 2457.3). Total num frames: 98304. Throughput: 0: 617.4. Samples: 24698. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-01-07 16:54:43,860][00600] Avg episode reward: [(0, '4.501')] [2025-01-07 16:54:48,853][00600] Fps is (10 sec: 3686.4, 60 sec: 2548.6, 300 sec: 2548.6). Total num frames: 114688. Throughput: 0: 662.0. Samples: 29790. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-07 16:54:48,859][00600] Avg episode reward: [(0, '4.419')] [2025-01-07 16:54:48,866][02732] Saving new best policy, reward=4.419! [2025-01-07 16:54:51,006][02745] Updated weights for policy 0, policy_version 30 (0.0015) [2025-01-07 16:54:53,853][00600] Fps is (10 sec: 3278.3, 60 sec: 2621.4, 300 sec: 2621.4). Total num frames: 131072. Throughput: 0: 705.6. Samples: 31752. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-07 16:54:53,859][00600] Avg episode reward: [(0, '4.396')] [2025-01-07 16:54:58,853][00600] Fps is (10 sec: 3686.4, 60 sec: 2755.5, 300 sec: 2755.5). Total num frames: 151552. Throughput: 0: 804.9. Samples: 37626. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-01-07 16:54:58,855][00600] Avg episode reward: [(0, '4.563')] [2025-01-07 16:54:58,863][02732] Saving new best policy, reward=4.563! [2025-01-07 16:55:01,194][02745] Updated weights for policy 0, policy_version 40 (0.0031) [2025-01-07 16:55:03,853][00600] Fps is (10 sec: 4096.1, 60 sec: 2867.2, 300 sec: 2867.2). Total num frames: 172032. Throughput: 0: 926.7. Samples: 44252. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-07 16:55:03,855][00600] Avg episode reward: [(0, '4.559')] [2025-01-07 16:55:08,853][00600] Fps is (10 sec: 3686.4, 60 sec: 3140.3, 300 sec: 2898.7). Total num frames: 188416. Throughput: 0: 918.4. Samples: 46304. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-07 16:55:08,855][00600] Avg episode reward: [(0, '4.430')] [2025-01-07 16:55:13,090][02745] Updated weights for policy 0, policy_version 50 (0.0020) [2025-01-07 16:55:13,853][00600] Fps is (10 sec: 3276.8, 60 sec: 3413.6, 300 sec: 2925.7). Total num frames: 204800. Throughput: 0: 899.2. Samples: 51340. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-01-07 16:55:13,856][00600] Avg episode reward: [(0, '4.374')] [2025-01-07 16:55:18,853][00600] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3058.3). Total num frames: 229376. Throughput: 0: 940.3. Samples: 58104. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-07 16:55:18,856][00600] Avg episode reward: [(0, '4.393')] [2025-01-07 16:55:22,818][02745] Updated weights for policy 0, policy_version 60 (0.0019) [2025-01-07 16:55:23,853][00600] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3072.0). Total num frames: 245760. Throughput: 0: 960.9. Samples: 61158. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-07 16:55:23,856][00600] Avg episode reward: [(0, '4.310')] [2025-01-07 16:55:28,853][00600] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3084.0). Total num frames: 262144. Throughput: 0: 897.4. Samples: 65076. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-07 16:55:28,856][00600] Avg episode reward: [(0, '4.411')] [2025-01-07 16:55:33,855][00600] Fps is (10 sec: 3685.8, 60 sec: 3754.6, 300 sec: 3140.2). Total num frames: 282624. Throughput: 0: 930.3. Samples: 71656. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-07 16:55:33,859][00600] Avg episode reward: [(0, '4.366')] [2025-01-07 16:55:33,866][02732] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000069_282624.pth... [2025-01-07 16:55:34,245][02745] Updated weights for policy 0, policy_version 70 (0.0015) [2025-01-07 16:55:38,857][00600] Fps is (10 sec: 4094.3, 60 sec: 3754.4, 300 sec: 3190.4). Total num frames: 303104. Throughput: 0: 961.9. Samples: 75042. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2025-01-07 16:55:38,860][00600] Avg episode reward: [(0, '4.365')] [2025-01-07 16:55:43,859][00600] Fps is (10 sec: 3684.9, 60 sec: 3686.3, 300 sec: 3194.7). Total num frames: 319488. Throughput: 0: 937.3. Samples: 79808. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-07 16:55:43,861][00600] Avg episode reward: [(0, '4.360')] [2025-01-07 16:55:46,054][02745] Updated weights for policy 0, policy_version 80 (0.0015) [2025-01-07 16:55:48,853][00600] Fps is (10 sec: 3687.9, 60 sec: 3754.7, 300 sec: 3237.8). Total num frames: 339968. Throughput: 0: 916.0. Samples: 85472. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2025-01-07 16:55:48,858][00600] Avg episode reward: [(0, '4.201')] [2025-01-07 16:55:53,853][00600] Fps is (10 sec: 4098.4, 60 sec: 3822.9, 300 sec: 3276.8). Total num frames: 360448. Throughput: 0: 944.0. Samples: 88784. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-07 16:55:53,861][00600] Avg episode reward: [(0, '4.214')] [2025-01-07 16:55:55,154][02745] Updated weights for policy 0, policy_version 90 (0.0017) [2025-01-07 16:55:58,853][00600] Fps is (10 sec: 3686.3, 60 sec: 3754.6, 300 sec: 3276.8). Total num frames: 376832. Throughput: 0: 958.9. Samples: 94492. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-07 16:55:58,860][00600] Avg episode reward: [(0, '4.355')] [2025-01-07 16:56:03,853][00600] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3276.8). Total num frames: 393216. Throughput: 0: 910.1. Samples: 99058. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-07 16:56:03,860][00600] Avg episode reward: [(0, '4.562')] [2025-01-07 16:56:07,017][02745] Updated weights for policy 0, policy_version 100 (0.0031) [2025-01-07 16:56:08,853][00600] Fps is (10 sec: 4096.1, 60 sec: 3822.9, 300 sec: 3342.3). Total num frames: 417792. Throughput: 0: 918.5. Samples: 102492. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-07 16:56:08,855][00600] Avg episode reward: [(0, '4.488')] [2025-01-07 16:56:13,853][00600] Fps is (10 sec: 4505.5, 60 sec: 3891.2, 300 sec: 3371.3). Total num frames: 438272. Throughput: 0: 983.9. Samples: 109354. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-01-07 16:56:13,856][00600] Avg episode reward: [(0, '4.245')] [2025-01-07 16:56:18,277][02745] Updated weights for policy 0, policy_version 110 (0.0043) [2025-01-07 16:56:18,853][00600] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3337.5). Total num frames: 450560. Throughput: 0: 929.2. Samples: 113470. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-07 16:56:18,858][00600] Avg episode reward: [(0, '4.312')] [2025-01-07 16:56:23,853][00600] Fps is (10 sec: 3276.9, 60 sec: 3754.7, 300 sec: 3364.6). Total num frames: 471040. Throughput: 0: 918.7. Samples: 116380. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-07 16:56:23,857][00600] Avg episode reward: [(0, '4.429')] [2025-01-07 16:56:27,798][02745] Updated weights for policy 0, policy_version 120 (0.0024) [2025-01-07 16:56:28,853][00600] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3389.8). Total num frames: 491520. Throughput: 0: 966.8. Samples: 123310. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-07 16:56:28,858][00600] Avg episode reward: [(0, '4.592')] [2025-01-07 16:56:28,870][02732] Saving new best policy, reward=4.592! [2025-01-07 16:56:33,853][00600] Fps is (10 sec: 3686.4, 60 sec: 3754.8, 300 sec: 3386.0). Total num frames: 507904. Throughput: 0: 948.0. Samples: 128132. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-01-07 16:56:33,858][00600] Avg episode reward: [(0, '4.568')] [2025-01-07 16:56:38,853][00600] Fps is (10 sec: 3276.8, 60 sec: 3686.6, 300 sec: 3382.5). Total num frames: 524288. Throughput: 0: 920.1. Samples: 130188. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-01-07 16:56:38,856][00600] Avg episode reward: [(0, '4.604')] [2025-01-07 16:56:38,923][02732] Saving new best policy, reward=4.604! [2025-01-07 16:56:40,046][02745] Updated weights for policy 0, policy_version 130 (0.0026) [2025-01-07 16:56:43,853][00600] Fps is (10 sec: 4096.0, 60 sec: 3823.3, 300 sec: 3430.4). Total num frames: 548864. Throughput: 0: 938.7. Samples: 136734. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-07 16:56:43,856][00600] Avg episode reward: [(0, '4.607')] [2025-01-07 16:56:43,864][02732] Saving new best policy, reward=4.607! [2025-01-07 16:56:48,853][00600] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3425.7). Total num frames: 565248. Throughput: 0: 968.0. Samples: 142620. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-07 16:56:48,857][00600] Avg episode reward: [(0, '4.405')] [2025-01-07 16:56:50,639][02745] Updated weights for policy 0, policy_version 140 (0.0029) [2025-01-07 16:56:53,853][00600] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3421.4). Total num frames: 581632. Throughput: 0: 936.0. Samples: 144612. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-07 16:56:53,858][00600] Avg episode reward: [(0, '4.372')] [2025-01-07 16:56:58,853][00600] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3440.6). Total num frames: 602112. Throughput: 0: 910.2. Samples: 150314. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-01-07 16:56:58,859][00600] Avg episode reward: [(0, '4.315')] [2025-01-07 16:57:01,161][02745] Updated weights for policy 0, policy_version 150 (0.0017) [2025-01-07 16:57:03,853][00600] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3458.8). Total num frames: 622592. Throughput: 0: 966.4. Samples: 156960. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-01-07 16:57:03,858][00600] Avg episode reward: [(0, '4.346')] [2025-01-07 16:57:08,856][00600] Fps is (10 sec: 3685.3, 60 sec: 3686.2, 300 sec: 3453.9). Total num frames: 638976. Throughput: 0: 950.1. Samples: 159138. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-07 16:57:08,863][00600] Avg episode reward: [(0, '4.383')] [2025-01-07 16:57:13,130][02745] Updated weights for policy 0, policy_version 160 (0.0014) [2025-01-07 16:57:13,853][00600] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3449.3). Total num frames: 655360. Throughput: 0: 901.4. Samples: 163872. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-01-07 16:57:13,860][00600] Avg episode reward: [(0, '4.551')] [2025-01-07 16:57:18,853][00600] Fps is (10 sec: 4097.3, 60 sec: 3822.9, 300 sec: 3486.9). Total num frames: 679936. Throughput: 0: 946.0. Samples: 170704. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-01-07 16:57:18,860][00600] Avg episode reward: [(0, '4.723')] [2025-01-07 16:57:18,863][02732] Saving new best policy, reward=4.723! [2025-01-07 16:57:22,717][02745] Updated weights for policy 0, policy_version 170 (0.0016) [2025-01-07 16:57:23,855][00600] Fps is (10 sec: 4095.3, 60 sec: 3754.5, 300 sec: 3481.6). Total num frames: 696320. Throughput: 0: 969.5. Samples: 173818. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-01-07 16:57:23,864][00600] Avg episode reward: [(0, '4.582')] [2025-01-07 16:57:28,853][00600] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3476.6). Total num frames: 712704. Throughput: 0: 915.6. Samples: 177936. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-01-07 16:57:28,860][00600] Avg episode reward: [(0, '4.634')] [2025-01-07 16:57:33,853][00600] Fps is (10 sec: 3687.1, 60 sec: 3754.7, 300 sec: 3491.4). Total num frames: 733184. Throughput: 0: 926.2. Samples: 184300. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-07 16:57:33,856][00600] Avg episode reward: [(0, '4.488')] [2025-01-07 16:57:33,865][02732] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000179_733184.pth... [2025-01-07 16:57:34,314][02745] Updated weights for policy 0, policy_version 180 (0.0022) [2025-01-07 16:57:38,853][00600] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3505.4). Total num frames: 753664. Throughput: 0: 957.0. Samples: 187676. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-07 16:57:38,856][00600] Avg episode reward: [(0, '4.524')] [2025-01-07 16:57:43,854][00600] Fps is (10 sec: 3686.0, 60 sec: 3686.3, 300 sec: 3500.2). Total num frames: 770048. Throughput: 0: 935.6. Samples: 192416. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-07 16:57:43,859][00600] Avg episode reward: [(0, '4.460')] [2025-01-07 16:57:46,120][02745] Updated weights for policy 0, policy_version 190 (0.0024) [2025-01-07 16:57:48,853][00600] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3513.5). Total num frames: 790528. Throughput: 0: 914.1. Samples: 198096. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-07 16:57:48,856][00600] Avg episode reward: [(0, '4.666')] [2025-01-07 16:57:53,858][00600] Fps is (10 sec: 4094.5, 60 sec: 3822.6, 300 sec: 3526.0). Total num frames: 811008. Throughput: 0: 940.9. Samples: 201478. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-07 16:57:53,860][00600] Avg episode reward: [(0, '4.756')] [2025-01-07 16:57:53,875][02732] Saving new best policy, reward=4.756! [2025-01-07 16:57:55,154][02745] Updated weights for policy 0, policy_version 200 (0.0022) [2025-01-07 16:57:58,853][00600] Fps is (10 sec: 3686.3, 60 sec: 3754.6, 300 sec: 3520.8). Total num frames: 827392. Throughput: 0: 961.2. Samples: 207128. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-07 16:57:58,857][00600] Avg episode reward: [(0, '4.583')] [2025-01-07 16:58:03,853][00600] Fps is (10 sec: 3278.4, 60 sec: 3686.4, 300 sec: 3515.7). Total num frames: 843776. Throughput: 0: 907.5. Samples: 211542. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-01-07 16:58:03,856][00600] Avg episode reward: [(0, '4.551')] [2025-01-07 16:58:07,281][02745] Updated weights for policy 0, policy_version 210 (0.0027) [2025-01-07 16:58:08,853][00600] Fps is (10 sec: 3686.5, 60 sec: 3754.9, 300 sec: 3527.6). Total num frames: 864256. Throughput: 0: 912.5. Samples: 214880. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-07 16:58:08,856][00600] Avg episode reward: [(0, '4.696')] [2025-01-07 16:58:13,861][00600] Fps is (10 sec: 4092.7, 60 sec: 3822.4, 300 sec: 3538.8). Total num frames: 884736. Throughput: 0: 970.2. Samples: 221602. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-07 16:58:13,864][00600] Avg episode reward: [(0, '4.859')] [2025-01-07 16:58:13,874][02732] Saving new best policy, reward=4.859! [2025-01-07 16:58:18,855][00600] Fps is (10 sec: 3276.3, 60 sec: 3618.0, 300 sec: 3517.7). Total num frames: 897024. Throughput: 0: 916.8. Samples: 225556. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2025-01-07 16:58:18,857][00600] Avg episode reward: [(0, '4.929')] [2025-01-07 16:58:18,862][02732] Saving new best policy, reward=4.929! [2025-01-07 16:58:18,871][02745] Updated weights for policy 0, policy_version 220 (0.0022) [2025-01-07 16:58:23,853][00600] Fps is (10 sec: 3279.4, 60 sec: 3686.5, 300 sec: 3528.9). Total num frames: 917504. Throughput: 0: 901.7. Samples: 228254. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-07 16:58:23,856][00600] Avg episode reward: [(0, '4.598')] [2025-01-07 16:58:28,724][02745] Updated weights for policy 0, policy_version 230 (0.0018) [2025-01-07 16:58:28,853][00600] Fps is (10 sec: 4506.3, 60 sec: 3822.9, 300 sec: 3555.0). Total num frames: 942080. Throughput: 0: 947.4. Samples: 235046. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-07 16:58:28,860][00600] Avg episode reward: [(0, '4.421')] [2025-01-07 16:58:33,853][00600] Fps is (10 sec: 4095.9, 60 sec: 3754.7, 300 sec: 3549.9). Total num frames: 958464. Throughput: 0: 932.8. Samples: 240070. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-07 16:58:33,856][00600] Avg episode reward: [(0, '4.763')] [2025-01-07 16:58:38,853][00600] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3544.9). Total num frames: 974848. Throughput: 0: 903.7. Samples: 242142. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-07 16:58:38,858][00600] Avg episode reward: [(0, '4.927')] [2025-01-07 16:58:40,633][02745] Updated weights for policy 0, policy_version 240 (0.0014) [2025-01-07 16:58:43,853][00600] Fps is (10 sec: 3686.5, 60 sec: 3754.7, 300 sec: 3554.7). Total num frames: 995328. Throughput: 0: 919.7. Samples: 248514. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2025-01-07 16:58:43,856][00600] Avg episode reward: [(0, '4.832')] [2025-01-07 16:58:48,853][00600] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3564.2). Total num frames: 1015808. Throughput: 0: 959.6. Samples: 254724. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-07 16:58:48,855][00600] Avg episode reward: [(0, '4.894')] [2025-01-07 16:58:51,416][02745] Updated weights for policy 0, policy_version 250 (0.0017) [2025-01-07 16:58:53,853][00600] Fps is (10 sec: 3276.8, 60 sec: 3618.4, 300 sec: 3545.2). Total num frames: 1028096. Throughput: 0: 930.1. Samples: 256734. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-07 16:58:53,857][00600] Avg episode reward: [(0, '4.837')] [2025-01-07 16:58:58,853][00600] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3554.5). Total num frames: 1048576. Throughput: 0: 901.8. Samples: 262178. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-07 16:58:58,861][00600] Avg episode reward: [(0, '4.685')] [2025-01-07 16:59:01,854][02745] Updated weights for policy 0, policy_version 260 (0.0020) [2025-01-07 16:59:03,853][00600] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3637.8). Total num frames: 1073152. Throughput: 0: 963.1. Samples: 268892. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-07 16:59:03,855][00600] Avg episode reward: [(0, '4.566')] [2025-01-07 16:59:08,853][00600] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3679.5). Total num frames: 1085440. Throughput: 0: 956.7. Samples: 271306. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-07 16:59:08,856][00600] Avg episode reward: [(0, '4.641')] [2025-01-07 16:59:13,786][02745] Updated weights for policy 0, policy_version 270 (0.0033) [2025-01-07 16:59:13,853][00600] Fps is (10 sec: 3276.8, 60 sec: 3686.9, 300 sec: 3748.9). Total num frames: 1105920. Throughput: 0: 905.5. Samples: 275792. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-07 16:59:13,859][00600] Avg episode reward: [(0, '4.700')] [2025-01-07 16:59:18,853][00600] Fps is (10 sec: 4096.0, 60 sec: 3823.0, 300 sec: 3735.0). Total num frames: 1126400. Throughput: 0: 943.3. Samples: 282518. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-07 16:59:18,855][00600] Avg episode reward: [(0, '4.743')] [2025-01-07 16:59:23,355][02745] Updated weights for policy 0, policy_version 280 (0.0021) [2025-01-07 16:59:23,853][00600] Fps is (10 sec: 4095.9, 60 sec: 3822.9, 300 sec: 3735.0). Total num frames: 1146880. Throughput: 0: 971.9. Samples: 285876. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2025-01-07 16:59:23,858][00600] Avg episode reward: [(0, '4.604')] [2025-01-07 16:59:28,853][00600] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3735.0). Total num frames: 1159168. Throughput: 0: 924.0. Samples: 290094. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-07 16:59:28,856][00600] Avg episode reward: [(0, '4.732')] [2025-01-07 16:59:33,856][00600] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3735.0). Total num frames: 1179648. Throughput: 0: 919.2. Samples: 296088. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-01-07 16:59:33,859][00600] Avg episode reward: [(0, '4.707')] [2025-01-07 16:59:33,867][02732] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000288_1179648.pth... [2025-01-07 16:59:33,993][02732] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000069_282624.pth [2025-01-07 16:59:34,983][02745] Updated weights for policy 0, policy_version 290 (0.0027) [2025-01-07 16:59:38,853][00600] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3748.9). Total num frames: 1204224. Throughput: 0: 946.7. Samples: 299336. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-01-07 16:59:38,855][00600] Avg episode reward: [(0, '4.428')] [2025-01-07 16:59:43,853][00600] Fps is (10 sec: 3686.5, 60 sec: 3686.4, 300 sec: 3735.0). Total num frames: 1216512. Throughput: 0: 938.1. Samples: 304394. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-01-07 16:59:43,856][00600] Avg episode reward: [(0, '4.488')] [2025-01-07 16:59:46,972][02745] Updated weights for policy 0, policy_version 300 (0.0017) [2025-01-07 16:59:48,853][00600] Fps is (10 sec: 2867.2, 60 sec: 3618.1, 300 sec: 3735.0). Total num frames: 1232896. Throughput: 0: 903.6. Samples: 309552. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-07 16:59:48,860][00600] Avg episode reward: [(0, '4.705')] [2025-01-07 16:59:53,853][00600] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3748.9). Total num frames: 1257472. Throughput: 0: 924.2. Samples: 312894. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2025-01-07 16:59:53,861][00600] Avg episode reward: [(0, '4.908')] [2025-01-07 16:59:56,228][02745] Updated weights for policy 0, policy_version 310 (0.0025) [2025-01-07 16:59:58,853][00600] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3735.0). Total num frames: 1273856. Throughput: 0: 962.1. Samples: 319086. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-07 16:59:58,856][00600] Avg episode reward: [(0, '4.894')] [2025-01-07 17:00:03,853][00600] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3735.0). Total num frames: 1290240. Throughput: 0: 903.1. Samples: 323156. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-01-07 17:00:03,859][00600] Avg episode reward: [(0, '4.820')] [2025-01-07 17:00:08,303][02745] Updated weights for policy 0, policy_version 320 (0.0017) [2025-01-07 17:00:08,853][00600] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 1310720. Throughput: 0: 900.4. Samples: 326392. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-01-07 17:00:08,856][00600] Avg episode reward: [(0, '4.631')] [2025-01-07 17:00:13,853][00600] Fps is (10 sec: 4505.5, 60 sec: 3822.9, 300 sec: 3748.9). Total num frames: 1335296. Throughput: 0: 955.9. Samples: 333108. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2025-01-07 17:00:13,860][00600] Avg episode reward: [(0, '4.568')] [2025-01-07 17:00:18,853][00600] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3735.0). Total num frames: 1347584. Throughput: 0: 924.4. Samples: 337686. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-07 17:00:18,863][00600] Avg episode reward: [(0, '4.754')] [2025-01-07 17:00:19,678][02745] Updated weights for policy 0, policy_version 330 (0.0034) [2025-01-07 17:00:23,853][00600] Fps is (10 sec: 3276.9, 60 sec: 3686.4, 300 sec: 3748.9). Total num frames: 1368064. Throughput: 0: 906.4. Samples: 340122. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-07 17:00:23,855][00600] Avg episode reward: [(0, '4.924')] [2025-01-07 17:00:28,853][00600] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3748.9). Total num frames: 1388544. Throughput: 0: 944.3. Samples: 346886. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-07 17:00:28,864][00600] Avg episode reward: [(0, '4.586')] [2025-01-07 17:00:29,285][02745] Updated weights for policy 0, policy_version 340 (0.0021) [2025-01-07 17:00:33,853][00600] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3735.0). Total num frames: 1404928. Throughput: 0: 950.8. Samples: 352340. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-01-07 17:00:33,856][00600] Avg episode reward: [(0, '4.447')] [2025-01-07 17:00:38,853][00600] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3735.1). Total num frames: 1421312. Throughput: 0: 921.4. Samples: 354358. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-01-07 17:00:38,858][00600] Avg episode reward: [(0, '4.573')] [2025-01-07 17:00:41,356][02745] Updated weights for policy 0, policy_version 350 (0.0020) [2025-01-07 17:00:43,853][00600] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3735.0). Total num frames: 1441792. Throughput: 0: 918.4. Samples: 360416. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-01-07 17:00:43,860][00600] Avg episode reward: [(0, '4.516')] [2025-01-07 17:00:48,855][00600] Fps is (10 sec: 4504.9, 60 sec: 3891.1, 300 sec: 3748.9). Total num frames: 1466368. Throughput: 0: 974.2. Samples: 366996. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-07 17:00:48,858][00600] Avg episode reward: [(0, '4.407')] [2025-01-07 17:00:51,889][02745] Updated weights for policy 0, policy_version 360 (0.0015) [2025-01-07 17:00:53,856][00600] Fps is (10 sec: 3685.3, 60 sec: 3686.2, 300 sec: 3735.0). Total num frames: 1478656. Throughput: 0: 948.2. Samples: 369064. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-07 17:00:53,861][00600] Avg episode reward: [(0, '4.608')] [2025-01-07 17:00:58,853][00600] Fps is (10 sec: 3277.3, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 1499136. Throughput: 0: 911.5. Samples: 374126. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-07 17:00:58,855][00600] Avg episode reward: [(0, '4.828')] [2025-01-07 17:01:02,476][02745] Updated weights for policy 0, policy_version 370 (0.0038) [2025-01-07 17:01:03,853][00600] Fps is (10 sec: 4097.2, 60 sec: 3822.9, 300 sec: 3735.0). Total num frames: 1519616. Throughput: 0: 956.2. Samples: 380714. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-01-07 17:01:03,857][00600] Avg episode reward: [(0, '4.891')] [2025-01-07 17:01:08,853][00600] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3721.1). Total num frames: 1536000. Throughput: 0: 966.8. Samples: 383630. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-07 17:01:08,864][00600] Avg episode reward: [(0, '4.652')] [2025-01-07 17:01:13,853][00600] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3735.0). Total num frames: 1552384. Throughput: 0: 907.1. Samples: 387704. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-07 17:01:13,856][00600] Avg episode reward: [(0, '4.754')] [2025-01-07 17:01:14,422][02745] Updated weights for policy 0, policy_version 380 (0.0015) [2025-01-07 17:01:18,853][00600] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3748.9). Total num frames: 1576960. Throughput: 0: 938.2. Samples: 394558. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-01-07 17:01:18,856][00600] Avg episode reward: [(0, '4.602')] [2025-01-07 17:01:23,853][00600] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3735.0). Total num frames: 1593344. Throughput: 0: 968.0. Samples: 397918. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-01-07 17:01:23,856][00600] Avg episode reward: [(0, '4.419')] [2025-01-07 17:01:23,929][02745] Updated weights for policy 0, policy_version 390 (0.0025) [2025-01-07 17:01:28,853][00600] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3735.0). Total num frames: 1609728. Throughput: 0: 933.0. Samples: 402400. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-07 17:01:28,856][00600] Avg episode reward: [(0, '4.426')] [2025-01-07 17:01:33,853][00600] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 1630208. Throughput: 0: 913.9. Samples: 408122. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-07 17:01:33,856][00600] Avg episode reward: [(0, '4.498')] [2025-01-07 17:01:33,867][02732] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000398_1630208.pth... [2025-01-07 17:01:33,998][02732] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000179_733184.pth [2025-01-07 17:01:35,539][02745] Updated weights for policy 0, policy_version 400 (0.0029) [2025-01-07 17:01:38,853][00600] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3735.0). Total num frames: 1650688. Throughput: 0: 940.7. Samples: 411392. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-07 17:01:38,856][00600] Avg episode reward: [(0, '4.434')] [2025-01-07 17:01:43,853][00600] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3735.0). Total num frames: 1667072. Throughput: 0: 949.2. Samples: 416840. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-07 17:01:43,856][00600] Avg episode reward: [(0, '4.467')] [2025-01-07 17:01:47,482][02745] Updated weights for policy 0, policy_version 410 (0.0041) [2025-01-07 17:01:48,853][00600] Fps is (10 sec: 3276.7, 60 sec: 3618.2, 300 sec: 3735.0). Total num frames: 1683456. Throughput: 0: 909.7. Samples: 421650. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-01-07 17:01:48,861][00600] Avg episode reward: [(0, '4.595')] [2025-01-07 17:01:53,853][00600] Fps is (10 sec: 3686.4, 60 sec: 3754.8, 300 sec: 3735.0). Total num frames: 1703936. Throughput: 0: 919.8. Samples: 425022. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-07 17:01:53,860][00600] Avg episode reward: [(0, '4.650')] [2025-01-07 17:01:56,681][02745] Updated weights for policy 0, policy_version 420 (0.0020) [2025-01-07 17:01:58,854][00600] Fps is (10 sec: 4095.9, 60 sec: 3754.6, 300 sec: 3735.0). Total num frames: 1724416. Throughput: 0: 969.5. Samples: 431330. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-07 17:01:58,860][00600] Avg episode reward: [(0, '4.885')] [2025-01-07 17:02:03,853][00600] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3721.1). Total num frames: 1736704. Throughput: 0: 908.5. Samples: 435442. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-07 17:02:03,858][00600] Avg episode reward: [(0, '4.798')] [2025-01-07 17:02:08,671][02745] Updated weights for policy 0, policy_version 430 (0.0048) [2025-01-07 17:02:08,853][00600] Fps is (10 sec: 3686.6, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 1761280. Throughput: 0: 906.2. Samples: 438696. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-07 17:02:08,858][00600] Avg episode reward: [(0, '4.860')] [2025-01-07 17:02:13,853][00600] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3735.0). Total num frames: 1781760. Throughput: 0: 955.3. Samples: 445388. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-01-07 17:02:13,860][00600] Avg episode reward: [(0, '5.046')] [2025-01-07 17:02:13,868][02732] Saving new best policy, reward=5.046! [2025-01-07 17:02:18,854][00600] Fps is (10 sec: 3276.6, 60 sec: 3618.1, 300 sec: 3721.1). Total num frames: 1794048. Throughput: 0: 928.2. Samples: 449890. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-07 17:02:18,856][00600] Avg episode reward: [(0, '4.993')] [2025-01-07 17:02:20,537][02745] Updated weights for policy 0, policy_version 440 (0.0039) [2025-01-07 17:02:23,853][00600] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3735.0). Total num frames: 1814528. Throughput: 0: 908.3. Samples: 452266. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-07 17:02:23,856][00600] Avg episode reward: [(0, '4.864')] [2025-01-07 17:02:28,853][00600] Fps is (10 sec: 4096.2, 60 sec: 3754.7, 300 sec: 3735.0). Total num frames: 1835008. Throughput: 0: 934.1. Samples: 458874. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-07 17:02:28,856][00600] Avg episode reward: [(0, '4.652')] [2025-01-07 17:02:29,947][02745] Updated weights for policy 0, policy_version 450 (0.0023) [2025-01-07 17:02:33,853][00600] Fps is (10 sec: 3686.3, 60 sec: 3686.4, 300 sec: 3721.1). Total num frames: 1851392. Throughput: 0: 947.5. Samples: 464288. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-07 17:02:33,857][00600] Avg episode reward: [(0, '4.568')] [2025-01-07 17:02:38,853][00600] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3721.1). Total num frames: 1867776. Throughput: 0: 915.6. Samples: 466224. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-01-07 17:02:38,856][00600] Avg episode reward: [(0, '4.564')] [2025-01-07 17:02:41,963][02745] Updated weights for policy 0, policy_version 460 (0.0025) [2025-01-07 17:02:43,853][00600] Fps is (10 sec: 4096.1, 60 sec: 3754.7, 300 sec: 3735.0). Total num frames: 1892352. Throughput: 0: 912.4. Samples: 472388. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-01-07 17:02:43,858][00600] Avg episode reward: [(0, '4.580')] [2025-01-07 17:02:48,855][00600] Fps is (10 sec: 4504.6, 60 sec: 3822.8, 300 sec: 3735.0). Total num frames: 1912832. Throughput: 0: 960.6. Samples: 478670. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-01-07 17:02:48,858][00600] Avg episode reward: [(0, '4.852')] [2025-01-07 17:02:53,212][02745] Updated weights for policy 0, policy_version 470 (0.0022) [2025-01-07 17:02:53,854][00600] Fps is (10 sec: 3276.5, 60 sec: 3686.3, 300 sec: 3721.1). Total num frames: 1925120. Throughput: 0: 932.9. Samples: 480678. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-01-07 17:02:53,856][00600] Avg episode reward: [(0, '4.824')] [2025-01-07 17:02:58,853][00600] Fps is (10 sec: 3277.5, 60 sec: 3686.4, 300 sec: 3735.0). Total num frames: 1945600. Throughput: 0: 904.0. Samples: 486066. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-07 17:02:58,855][00600] Avg episode reward: [(0, '4.666')] [2025-01-07 17:03:03,072][02745] Updated weights for policy 0, policy_version 480 (0.0020) [2025-01-07 17:03:03,853][00600] Fps is (10 sec: 4096.3, 60 sec: 3822.9, 300 sec: 3735.0). Total num frames: 1966080. Throughput: 0: 949.6. Samples: 492622. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-07 17:03:03,855][00600] Avg episode reward: [(0, '4.830')] [2025-01-07 17:03:08,855][00600] Fps is (10 sec: 3685.7, 60 sec: 3686.3, 300 sec: 3721.2). Total num frames: 1982464. Throughput: 0: 948.1. Samples: 494932. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2025-01-07 17:03:08,865][00600] Avg episode reward: [(0, '4.744')] [2025-01-07 17:03:13,853][00600] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3735.0). Total num frames: 1998848. Throughput: 0: 900.3. Samples: 499388. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2025-01-07 17:03:13,860][00600] Avg episode reward: [(0, '4.785')] [2025-01-07 17:03:15,437][02745] Updated weights for policy 0, policy_version 490 (0.0042) [2025-01-07 17:03:18,853][00600] Fps is (10 sec: 3687.0, 60 sec: 3754.7, 300 sec: 3735.0). Total num frames: 2019328. Throughput: 0: 927.7. Samples: 506034. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-07 17:03:18,863][00600] Avg episode reward: [(0, '4.705')] [2025-01-07 17:03:23,853][00600] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3721.1). Total num frames: 2039808. Throughput: 0: 959.3. Samples: 509392. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-07 17:03:23,861][00600] Avg episode reward: [(0, '5.010')] [2025-01-07 17:03:26,374][02745] Updated weights for policy 0, policy_version 500 (0.0021) [2025-01-07 17:03:28,853][00600] Fps is (10 sec: 3276.7, 60 sec: 3618.1, 300 sec: 3707.2). Total num frames: 2052096. Throughput: 0: 912.5. Samples: 513452. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-07 17:03:28,856][00600] Avg episode reward: [(0, '4.961')] [2025-01-07 17:03:33,853][00600] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3735.0). Total num frames: 2076672. Throughput: 0: 911.6. Samples: 519688. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-07 17:03:33,865][00600] Avg episode reward: [(0, '4.736')] [2025-01-07 17:03:33,882][02732] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000507_2076672.pth... [2025-01-07 17:03:34,008][02732] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000288_1179648.pth [2025-01-07 17:03:36,671][02745] Updated weights for policy 0, policy_version 510 (0.0024) [2025-01-07 17:03:38,853][00600] Fps is (10 sec: 4505.7, 60 sec: 3822.9, 300 sec: 3735.0). Total num frames: 2097152. Throughput: 0: 935.8. Samples: 522786. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-07 17:03:38,855][00600] Avg episode reward: [(0, '4.667')] [2025-01-07 17:03:43,853][00600] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3707.2). Total num frames: 2109440. Throughput: 0: 929.2. Samples: 527878. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-07 17:03:43,858][00600] Avg episode reward: [(0, '4.561')] [2025-01-07 17:03:48,467][02745] Updated weights for policy 0, policy_version 520 (0.0023) [2025-01-07 17:03:48,853][00600] Fps is (10 sec: 3276.8, 60 sec: 3618.3, 300 sec: 3735.0). Total num frames: 2129920. Throughput: 0: 903.5. Samples: 533280. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-07 17:03:48,856][00600] Avg episode reward: [(0, '4.496')] [2025-01-07 17:03:53,853][00600] Fps is (10 sec: 4095.9, 60 sec: 3754.7, 300 sec: 3735.0). Total num frames: 2150400. Throughput: 0: 926.1. Samples: 536604. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-07 17:03:53,858][00600] Avg episode reward: [(0, '4.568')] [2025-01-07 17:03:58,853][00600] Fps is (10 sec: 3686.3, 60 sec: 3686.4, 300 sec: 3707.2). Total num frames: 2166784. Throughput: 0: 958.0. Samples: 542500. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-07 17:03:58,858][00600] Avg episode reward: [(0, '4.594')] [2025-01-07 17:03:58,886][02745] Updated weights for policy 0, policy_version 530 (0.0018) [2025-01-07 17:04:03,853][00600] Fps is (10 sec: 3276.9, 60 sec: 3618.1, 300 sec: 3721.1). Total num frames: 2183168. Throughput: 0: 910.4. Samples: 547002. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-07 17:04:03,858][00600] Avg episode reward: [(0, '4.632')] [2025-01-07 17:04:08,855][00600] Fps is (10 sec: 4095.2, 60 sec: 3754.6, 300 sec: 3735.0). Total num frames: 2207744. Throughput: 0: 906.8. Samples: 550200. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-07 17:04:08,861][00600] Avg episode reward: [(0, '4.677')] [2025-01-07 17:04:09,523][02745] Updated weights for policy 0, policy_version 540 (0.0013) [2025-01-07 17:04:13,853][00600] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3735.0). Total num frames: 2228224. Throughput: 0: 970.0. Samples: 557100. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-07 17:04:13,858][00600] Avg episode reward: [(0, '4.810')] [2025-01-07 17:04:18,861][00600] Fps is (10 sec: 3274.9, 60 sec: 3685.9, 300 sec: 3707.1). Total num frames: 2240512. Throughput: 0: 924.1. Samples: 561282. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-07 17:04:18,863][00600] Avg episode reward: [(0, '4.961')] [2025-01-07 17:04:21,241][02745] Updated weights for policy 0, policy_version 550 (0.0017) [2025-01-07 17:04:23,853][00600] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3735.0). Total num frames: 2260992. Throughput: 0: 922.5. Samples: 564298. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-01-07 17:04:23,855][00600] Avg episode reward: [(0, '4.957')] [2025-01-07 17:04:28,853][00600] Fps is (10 sec: 4509.1, 60 sec: 3891.2, 300 sec: 3748.9). Total num frames: 2285568. Throughput: 0: 963.5. Samples: 571234. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-07 17:04:28,860][00600] Avg episode reward: [(0, '4.795')] [2025-01-07 17:04:30,299][02745] Updated weights for policy 0, policy_version 560 (0.0029) [2025-01-07 17:04:33,853][00600] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3721.1). Total num frames: 2301952. Throughput: 0: 954.8. Samples: 576246. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-07 17:04:33,856][00600] Avg episode reward: [(0, '4.710')] [2025-01-07 17:04:38,853][00600] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3735.0). Total num frames: 2318336. Throughput: 0: 929.6. Samples: 578434. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-07 17:04:38,860][00600] Avg episode reward: [(0, '4.814')] [2025-01-07 17:04:41,852][02745] Updated weights for policy 0, policy_version 570 (0.0043) [2025-01-07 17:04:43,853][00600] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3762.8). Total num frames: 2342912. Throughput: 0: 950.4. Samples: 585266. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-01-07 17:04:43,855][00600] Avg episode reward: [(0, '4.842')] [2025-01-07 17:04:48,853][00600] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3748.9). Total num frames: 2363392. Throughput: 0: 986.5. Samples: 591396. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-07 17:04:48,856][00600] Avg episode reward: [(0, '4.921')] [2025-01-07 17:04:52,952][02745] Updated weights for policy 0, policy_version 580 (0.0023) [2025-01-07 17:04:53,853][00600] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3735.0). Total num frames: 2375680. Throughput: 0: 962.5. Samples: 593512. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-07 17:04:53,855][00600] Avg episode reward: [(0, '4.827')] [2025-01-07 17:04:58,853][00600] Fps is (10 sec: 3686.3, 60 sec: 3891.2, 300 sec: 3762.8). Total num frames: 2400256. Throughput: 0: 945.3. Samples: 599640. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-07 17:04:58,856][00600] Avg episode reward: [(0, '4.972')] [2025-01-07 17:05:01,983][02745] Updated weights for policy 0, policy_version 590 (0.0017) [2025-01-07 17:05:03,856][00600] Fps is (10 sec: 4913.8, 60 sec: 4027.5, 300 sec: 3776.6). Total num frames: 2424832. Throughput: 0: 1008.5. Samples: 606658. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-07 17:05:03,859][00600] Avg episode reward: [(0, '5.079')] [2025-01-07 17:05:03,871][02732] Saving new best policy, reward=5.079! [2025-01-07 17:05:08,856][00600] Fps is (10 sec: 3685.4, 60 sec: 3822.9, 300 sec: 3735.0). Total num frames: 2437120. Throughput: 0: 984.6. Samples: 608610. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-01-07 17:05:08,862][00600] Avg episode reward: [(0, '4.923')] [2025-01-07 17:05:13,664][02745] Updated weights for policy 0, policy_version 600 (0.0014) [2025-01-07 17:05:13,853][00600] Fps is (10 sec: 3277.7, 60 sec: 3822.9, 300 sec: 3762.8). Total num frames: 2457600. Throughput: 0: 943.0. Samples: 613670. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2025-01-07 17:05:13,855][00600] Avg episode reward: [(0, '4.965')] [2025-01-07 17:05:18,853][00600] Fps is (10 sec: 4097.2, 60 sec: 3960.0, 300 sec: 3762.8). Total num frames: 2478080. Throughput: 0: 987.7. Samples: 620692. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2025-01-07 17:05:18,860][00600] Avg episode reward: [(0, '5.337')] [2025-01-07 17:05:18,867][02732] Saving new best policy, reward=5.337! [2025-01-07 17:05:23,664][02745] Updated weights for policy 0, policy_version 610 (0.0030) [2025-01-07 17:05:23,860][00600] Fps is (10 sec: 4093.2, 60 sec: 3959.0, 300 sec: 3762.7). Total num frames: 2498560. Throughput: 0: 1006.5. Samples: 623734. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-07 17:05:23,862][00600] Avg episode reward: [(0, '5.204')] [2025-01-07 17:05:28,853][00600] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 2510848. Throughput: 0: 950.3. Samples: 628028. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-07 17:05:28,856][00600] Avg episode reward: [(0, '5.206')] [2025-01-07 17:05:33,853][00600] Fps is (10 sec: 3688.9, 60 sec: 3891.2, 300 sec: 3776.7). Total num frames: 2535424. Throughput: 0: 965.6. Samples: 634850. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-07 17:05:33,856][00600] Avg episode reward: [(0, '5.169')] [2025-01-07 17:05:33,864][02732] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000619_2535424.pth... [2025-01-07 17:05:34,022][02732] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000398_1630208.pth [2025-01-07 17:05:34,120][02745] Updated weights for policy 0, policy_version 620 (0.0019) [2025-01-07 17:05:38,853][00600] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3776.7). Total num frames: 2555904. Throughput: 0: 994.9. Samples: 638282. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-01-07 17:05:38,859][00600] Avg episode reward: [(0, '5.064')] [2025-01-07 17:05:43,853][00600] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3735.0). Total num frames: 2568192. Throughput: 0: 959.2. Samples: 642804. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-07 17:05:43,858][00600] Avg episode reward: [(0, '5.201')] [2025-01-07 17:05:46,057][02745] Updated weights for policy 0, policy_version 630 (0.0024) [2025-01-07 17:05:48,853][00600] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3776.7). Total num frames: 2592768. Throughput: 0: 935.3. Samples: 648744. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-07 17:05:48,855][00600] Avg episode reward: [(0, '5.321')] [2025-01-07 17:05:53,853][00600] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3776.7). Total num frames: 2613248. Throughput: 0: 971.7. Samples: 652334. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-07 17:05:53,855][00600] Avg episode reward: [(0, '5.299')] [2025-01-07 17:05:54,738][02745] Updated weights for policy 0, policy_version 640 (0.0033) [2025-01-07 17:05:58,853][00600] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3762.8). Total num frames: 2629632. Throughput: 0: 981.4. Samples: 657834. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-01-07 17:05:58,856][00600] Avg episode reward: [(0, '5.553')] [2025-01-07 17:05:58,858][02732] Saving new best policy, reward=5.553! [2025-01-07 17:06:03,853][00600] Fps is (10 sec: 3686.4, 60 sec: 3754.8, 300 sec: 3776.6). Total num frames: 2650112. Throughput: 0: 939.6. Samples: 662976. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-01-07 17:06:03,860][00600] Avg episode reward: [(0, '6.053')] [2025-01-07 17:06:03,867][02732] Saving new best policy, reward=6.053! [2025-01-07 17:06:06,490][02745] Updated weights for policy 0, policy_version 650 (0.0014) [2025-01-07 17:06:08,853][00600] Fps is (10 sec: 4096.0, 60 sec: 3891.4, 300 sec: 3790.5). Total num frames: 2670592. Throughput: 0: 946.1. Samples: 666304. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-07 17:06:08,856][00600] Avg episode reward: [(0, '5.681')] [2025-01-07 17:06:13,853][00600] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3776.7). Total num frames: 2691072. Throughput: 0: 994.1. Samples: 672764. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-07 17:06:13,857][00600] Avg episode reward: [(0, '5.147')] [2025-01-07 17:06:17,751][02745] Updated weights for policy 0, policy_version 660 (0.0024) [2025-01-07 17:06:18,853][00600] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3762.8). Total num frames: 2703360. Throughput: 0: 938.3. Samples: 677074. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-07 17:06:18,857][00600] Avg episode reward: [(0, '5.005')] [2025-01-07 17:06:23,853][00600] Fps is (10 sec: 3686.4, 60 sec: 3823.4, 300 sec: 3790.5). Total num frames: 2727936. Throughput: 0: 938.0. Samples: 680492. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-07 17:06:23,856][00600] Avg episode reward: [(0, '5.038')] [2025-01-07 17:06:26,909][02745] Updated weights for policy 0, policy_version 670 (0.0022) [2025-01-07 17:06:28,853][00600] Fps is (10 sec: 4915.2, 60 sec: 4027.7, 300 sec: 3804.4). Total num frames: 2752512. Throughput: 0: 996.0. Samples: 687626. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-07 17:06:28,858][00600] Avg episode reward: [(0, '4.964')] [2025-01-07 17:06:33,853][00600] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3776.7). Total num frames: 2764800. Throughput: 0: 967.3. Samples: 692274. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-07 17:06:33,860][00600] Avg episode reward: [(0, '5.229')] [2025-01-07 17:06:38,411][02745] Updated weights for policy 0, policy_version 680 (0.0027) [2025-01-07 17:06:38,853][00600] Fps is (10 sec: 3276.7, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 2785280. Throughput: 0: 948.2. Samples: 695002. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-07 17:06:38,856][00600] Avg episode reward: [(0, '5.372')] [2025-01-07 17:06:43,853][00600] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3818.3). Total num frames: 2809856. Throughput: 0: 979.9. Samples: 701930. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-07 17:06:43,861][00600] Avg episode reward: [(0, '4.966')] [2025-01-07 17:06:47,982][02745] Updated weights for policy 0, policy_version 690 (0.0024) [2025-01-07 17:06:48,854][00600] Fps is (10 sec: 4095.7, 60 sec: 3891.1, 300 sec: 3804.4). Total num frames: 2826240. Throughput: 0: 988.3. Samples: 707452. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-01-07 17:06:48,856][00600] Avg episode reward: [(0, '5.029')] [2025-01-07 17:06:53,853][00600] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 2842624. Throughput: 0: 963.1. Samples: 709644. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-01-07 17:06:53,857][00600] Avg episode reward: [(0, '4.825')] [2025-01-07 17:06:58,564][02745] Updated weights for policy 0, policy_version 700 (0.0025) [2025-01-07 17:06:58,853][00600] Fps is (10 sec: 4096.4, 60 sec: 3959.5, 300 sec: 3832.2). Total num frames: 2867200. Throughput: 0: 969.6. Samples: 716394. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2025-01-07 17:06:58,859][00600] Avg episode reward: [(0, '4.861')] [2025-01-07 17:07:03,859][00600] Fps is (10 sec: 4503.1, 60 sec: 3959.1, 300 sec: 3818.2). Total num frames: 2887680. Throughput: 0: 1016.9. Samples: 722842. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-07 17:07:03,864][00600] Avg episode reward: [(0, '4.975')] [2025-01-07 17:07:08,853][00600] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 2899968. Throughput: 0: 987.9. Samples: 724948. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-07 17:07:08,859][00600] Avg episode reward: [(0, '4.796')] [2025-01-07 17:07:09,863][02745] Updated weights for policy 0, policy_version 710 (0.0036) [2025-01-07 17:07:13,853][00600] Fps is (10 sec: 3688.5, 60 sec: 3891.2, 300 sec: 3832.2). Total num frames: 2924544. Throughput: 0: 958.8. Samples: 730772. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-01-07 17:07:13,858][00600] Avg episode reward: [(0, '4.797')] [2025-01-07 17:07:18,665][02745] Updated weights for policy 0, policy_version 720 (0.0022) [2025-01-07 17:07:18,853][00600] Fps is (10 sec: 4915.2, 60 sec: 4096.0, 300 sec: 3846.1). Total num frames: 2949120. Throughput: 0: 1014.1. Samples: 737908. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-01-07 17:07:18,858][00600] Avg episode reward: [(0, '4.799')] [2025-01-07 17:07:23,853][00600] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3818.3). Total num frames: 2961408. Throughput: 0: 1006.5. Samples: 740294. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-07 17:07:23,859][00600] Avg episode reward: [(0, '5.001')] [2025-01-07 17:07:28,853][00600] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3832.2). Total num frames: 2981888. Throughput: 0: 966.3. Samples: 745414. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-07 17:07:28,857][00600] Avg episode reward: [(0, '5.153')] [2025-01-07 17:07:29,848][02745] Updated weights for policy 0, policy_version 730 (0.0013) [2025-01-07 17:07:33,853][00600] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3860.0). Total num frames: 3006464. Throughput: 0: 1001.0. Samples: 752496. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-01-07 17:07:33,860][00600] Avg episode reward: [(0, '4.955')] [2025-01-07 17:07:33,872][02732] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000734_3006464.pth... [2025-01-07 17:07:33,993][02732] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000507_2076672.pth [2025-01-07 17:07:38,853][00600] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3832.2). Total num frames: 3022848. Throughput: 0: 1024.1. Samples: 755728. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-07 17:07:38,862][00600] Avg episode reward: [(0, '5.457')] [2025-01-07 17:07:40,470][02745] Updated weights for policy 0, policy_version 740 (0.0043) [2025-01-07 17:07:43,853][00600] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3818.3). Total num frames: 3039232. Throughput: 0: 967.5. Samples: 759932. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-07 17:07:43,857][00600] Avg episode reward: [(0, '5.708')] [2025-01-07 17:07:48,853][00600] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3860.0). Total num frames: 3063808. Throughput: 0: 975.2. Samples: 766720. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-07 17:07:48,857][00600] Avg episode reward: [(0, '6.145')] [2025-01-07 17:07:48,860][02732] Saving new best policy, reward=6.145! [2025-01-07 17:07:50,265][02745] Updated weights for policy 0, policy_version 750 (0.0013) [2025-01-07 17:07:53,853][00600] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3860.0). Total num frames: 3084288. Throughput: 0: 1006.8. Samples: 770256. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-07 17:07:53,856][00600] Avg episode reward: [(0, '6.166')] [2025-01-07 17:07:53,864][02732] Saving new best policy, reward=6.166! [2025-01-07 17:07:58,855][00600] Fps is (10 sec: 3685.8, 60 sec: 3891.1, 300 sec: 3846.1). Total num frames: 3100672. Throughput: 0: 982.8. Samples: 775000. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-01-07 17:07:58,858][00600] Avg episode reward: [(0, '6.290')] [2025-01-07 17:07:58,861][02732] Saving new best policy, reward=6.290! [2025-01-07 17:08:01,694][02745] Updated weights for policy 0, policy_version 760 (0.0016) [2025-01-07 17:08:03,853][00600] Fps is (10 sec: 3686.4, 60 sec: 3891.6, 300 sec: 3860.0). Total num frames: 3121152. Throughput: 0: 958.2. Samples: 781028. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2025-01-07 17:08:03,856][00600] Avg episode reward: [(0, '6.787')] [2025-01-07 17:08:03,864][02732] Saving new best policy, reward=6.787! [2025-01-07 17:08:08,853][00600] Fps is (10 sec: 4096.7, 60 sec: 4027.7, 300 sec: 3873.8). Total num frames: 3141632. Throughput: 0: 980.7. Samples: 784424. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-07 17:08:08,856][00600] Avg episode reward: [(0, '7.090')] [2025-01-07 17:08:08,939][02732] Saving new best policy, reward=7.090! [2025-01-07 17:08:11,516][02745] Updated weights for policy 0, policy_version 770 (0.0021) [2025-01-07 17:08:13,855][00600] Fps is (10 sec: 3685.7, 60 sec: 3891.1, 300 sec: 3859.9). Total num frames: 3158016. Throughput: 0: 987.4. Samples: 789850. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-01-07 17:08:13,859][00600] Avg episode reward: [(0, '6.783')] [2025-01-07 17:08:18,853][00600] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3846.1). Total num frames: 3174400. Throughput: 0: 944.5. Samples: 794998. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-01-07 17:08:18,860][00600] Avg episode reward: [(0, '6.412')] [2025-01-07 17:08:22,473][02745] Updated weights for policy 0, policy_version 780 (0.0014) [2025-01-07 17:08:23,853][00600] Fps is (10 sec: 4096.8, 60 sec: 3959.5, 300 sec: 3887.7). Total num frames: 3198976. Throughput: 0: 948.9. Samples: 798430. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-07 17:08:23,857][00600] Avg episode reward: [(0, '7.014')] [2025-01-07 17:08:28,855][00600] Fps is (10 sec: 4504.8, 60 sec: 3959.3, 300 sec: 3873.8). Total num frames: 3219456. Throughput: 0: 1001.6. Samples: 805004. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-07 17:08:28,857][00600] Avg episode reward: [(0, '7.632')] [2025-01-07 17:08:28,863][02732] Saving new best policy, reward=7.632! [2025-01-07 17:08:33,853][00600] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3846.1). Total num frames: 3231744. Throughput: 0: 943.6. Samples: 809180. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-07 17:08:33,862][00600] Avg episode reward: [(0, '7.699')] [2025-01-07 17:08:33,875][02732] Saving new best policy, reward=7.699! [2025-01-07 17:08:34,127][02745] Updated weights for policy 0, policy_version 790 (0.0042) [2025-01-07 17:08:38,854][00600] Fps is (10 sec: 3686.9, 60 sec: 3891.2, 300 sec: 3887.7). Total num frames: 3256320. Throughput: 0: 938.7. Samples: 812496. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2025-01-07 17:08:38,859][00600] Avg episode reward: [(0, '7.495')] [2025-01-07 17:08:43,009][02745] Updated weights for policy 0, policy_version 800 (0.0017) [2025-01-07 17:08:43,853][00600] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3887.7). Total num frames: 3276800. Throughput: 0: 987.7. Samples: 819446. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-07 17:08:43,855][00600] Avg episode reward: [(0, '7.542')] [2025-01-07 17:08:48,857][00600] Fps is (10 sec: 3685.0, 60 sec: 3822.7, 300 sec: 3873.8). Total num frames: 3293184. Throughput: 0: 955.7. Samples: 824040. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-01-07 17:08:48,862][00600] Avg episode reward: [(0, '7.291')] [2025-01-07 17:08:53,853][00600] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3887.7). Total num frames: 3313664. Throughput: 0: 940.8. Samples: 826762. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-07 17:08:53,855][00600] Avg episode reward: [(0, '8.121')] [2025-01-07 17:08:53,871][02732] Saving new best policy, reward=8.121! [2025-01-07 17:08:54,663][02745] Updated weights for policy 0, policy_version 810 (0.0014) [2025-01-07 17:08:58,853][00600] Fps is (10 sec: 4097.7, 60 sec: 3891.3, 300 sec: 3901.6). Total num frames: 3334144. Throughput: 0: 973.3. Samples: 833648. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-07 17:08:58,860][00600] Avg episode reward: [(0, '8.054')] [2025-01-07 17:09:03,853][00600] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3887.8). Total num frames: 3354624. Throughput: 0: 979.6. Samples: 839082. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2025-01-07 17:09:03,859][00600] Avg episode reward: [(0, '8.422')] [2025-01-07 17:09:03,870][02732] Saving new best policy, reward=8.422! [2025-01-07 17:09:05,290][02745] Updated weights for policy 0, policy_version 820 (0.0015) [2025-01-07 17:09:08,853][00600] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3873.8). Total num frames: 3371008. Throughput: 0: 948.7. Samples: 841120. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2025-01-07 17:09:08,860][00600] Avg episode reward: [(0, '8.392')] [2025-01-07 17:09:13,853][00600] Fps is (10 sec: 3686.4, 60 sec: 3891.3, 300 sec: 3901.7). Total num frames: 3391488. Throughput: 0: 949.4. Samples: 847726. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-07 17:09:13,861][00600] Avg episode reward: [(0, '8.637')] [2025-01-07 17:09:13,869][02732] Saving new best policy, reward=8.637! [2025-01-07 17:09:15,105][02745] Updated weights for policy 0, policy_version 830 (0.0025) [2025-01-07 17:09:18,853][00600] Fps is (10 sec: 4095.9, 60 sec: 3959.4, 300 sec: 3901.6). Total num frames: 3411968. Throughput: 0: 994.1. Samples: 853914. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-07 17:09:18,858][00600] Avg episode reward: [(0, '9.007')] [2025-01-07 17:09:18,860][02732] Saving new best policy, reward=9.007! [2025-01-07 17:09:23,853][00600] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3860.0). Total num frames: 3424256. Throughput: 0: 966.0. Samples: 855964. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-07 17:09:23,859][00600] Avg episode reward: [(0, '8.675')] [2025-01-07 17:09:26,713][02745] Updated weights for policy 0, policy_version 840 (0.0025) [2025-01-07 17:09:28,853][00600] Fps is (10 sec: 3686.5, 60 sec: 3823.1, 300 sec: 3887.7). Total num frames: 3448832. Throughput: 0: 939.5. Samples: 861724. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-07 17:09:28,859][00600] Avg episode reward: [(0, '7.789')] [2025-01-07 17:09:33,853][00600] Fps is (10 sec: 4915.2, 60 sec: 4027.7, 300 sec: 3915.5). Total num frames: 3473408. Throughput: 0: 994.9. Samples: 868804. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-07 17:09:33,859][00600] Avg episode reward: [(0, '8.687')] [2025-01-07 17:09:33,868][02732] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000848_3473408.pth... [2025-01-07 17:09:33,998][02732] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000619_2535424.pth [2025-01-07 17:09:36,802][02745] Updated weights for policy 0, policy_version 850 (0.0029) [2025-01-07 17:09:38,857][00600] Fps is (10 sec: 3684.8, 60 sec: 3822.7, 300 sec: 3873.8). Total num frames: 3485696. Throughput: 0: 980.2. Samples: 870876. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-07 17:09:38,859][00600] Avg episode reward: [(0, '8.709')] [2025-01-07 17:09:43,853][00600] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3873.8). Total num frames: 3506176. Throughput: 0: 935.5. Samples: 875744. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-07 17:09:43,858][00600] Avg episode reward: [(0, '10.101')] [2025-01-07 17:09:43,867][02732] Saving new best policy, reward=10.101! [2025-01-07 17:09:47,647][02745] Updated weights for policy 0, policy_version 860 (0.0020) [2025-01-07 17:09:48,853][00600] Fps is (10 sec: 4097.7, 60 sec: 3891.5, 300 sec: 3901.6). Total num frames: 3526656. Throughput: 0: 963.2. Samples: 882426. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-07 17:09:48,856][00600] Avg episode reward: [(0, '10.356')] [2025-01-07 17:09:48,859][02732] Saving new best policy, reward=10.356! [2025-01-07 17:09:53,853][00600] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3873.8). Total num frames: 3543040. Throughput: 0: 986.9. Samples: 885530. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-07 17:09:53,860][00600] Avg episode reward: [(0, '9.979')] [2025-01-07 17:09:58,853][00600] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3846.1). Total num frames: 3559424. Throughput: 0: 931.3. Samples: 889634. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-07 17:09:58,856][00600] Avg episode reward: [(0, '9.441')] [2025-01-07 17:09:59,315][02745] Updated weights for policy 0, policy_version 870 (0.0023) [2025-01-07 17:10:03,853][00600] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3887.8). Total num frames: 3584000. Throughput: 0: 946.2. Samples: 896494. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-07 17:10:03,859][00600] Avg episode reward: [(0, '9.802')] [2025-01-07 17:10:08,603][02745] Updated weights for policy 0, policy_version 880 (0.0015) [2025-01-07 17:10:08,853][00600] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3887.7). Total num frames: 3604480. Throughput: 0: 977.7. Samples: 899962. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-07 17:10:08,857][00600] Avg episode reward: [(0, '10.047')] [2025-01-07 17:10:13,853][00600] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3860.0). Total num frames: 3616768. Throughput: 0: 951.9. Samples: 904560. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-07 17:10:13,857][00600] Avg episode reward: [(0, '10.957')] [2025-01-07 17:10:13,865][02732] Saving new best policy, reward=10.957! [2025-01-07 17:10:18,853][00600] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3860.0). Total num frames: 3637248. Throughput: 0: 927.7. Samples: 910552. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-07 17:10:18,861][00600] Avg episode reward: [(0, '11.781')] [2025-01-07 17:10:18,931][02732] Saving new best policy, reward=11.781! [2025-01-07 17:10:19,881][02745] Updated weights for policy 0, policy_version 890 (0.0040) [2025-01-07 17:10:23,853][00600] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3901.6). Total num frames: 3661824. Throughput: 0: 956.3. Samples: 913906. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-07 17:10:23,859][00600] Avg episode reward: [(0, '12.662')] [2025-01-07 17:10:23,870][02732] Saving new best policy, reward=12.662! [2025-01-07 17:10:28,853][00600] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3873.8). Total num frames: 3678208. Throughput: 0: 968.1. Samples: 919310. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-07 17:10:28,857][00600] Avg episode reward: [(0, '12.406')] [2025-01-07 17:10:31,579][02745] Updated weights for policy 0, policy_version 900 (0.0026) [2025-01-07 17:10:33,853][00600] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3860.0). Total num frames: 3694592. Throughput: 0: 936.2. Samples: 924554. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-07 17:10:33,859][00600] Avg episode reward: [(0, '11.665')] [2025-01-07 17:10:38,853][00600] Fps is (10 sec: 4096.0, 60 sec: 3891.5, 300 sec: 3901.6). Total num frames: 3719168. Throughput: 0: 942.7. Samples: 927950. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-07 17:10:38,857][00600] Avg episode reward: [(0, '11.702')] [2025-01-07 17:10:40,584][02745] Updated weights for policy 0, policy_version 910 (0.0028) [2025-01-07 17:10:43,853][00600] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3873.8). Total num frames: 3735552. Throughput: 0: 991.2. Samples: 934240. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-07 17:10:43,857][00600] Avg episode reward: [(0, '11.652')] [2025-01-07 17:10:48,853][00600] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3860.0). Total num frames: 3751936. Throughput: 0: 933.1. Samples: 938484. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-07 17:10:48,855][00600] Avg episode reward: [(0, '11.611')] [2025-01-07 17:10:52,265][02745] Updated weights for policy 0, policy_version 920 (0.0030) [2025-01-07 17:10:53,853][00600] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3873.8). Total num frames: 3772416. Throughput: 0: 932.5. Samples: 941926. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-07 17:10:53,856][00600] Avg episode reward: [(0, '12.934')] [2025-01-07 17:10:53,864][02732] Saving new best policy, reward=12.934! [2025-01-07 17:10:58,853][00600] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3887.7). Total num frames: 3796992. Throughput: 0: 981.7. Samples: 948738. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-01-07 17:10:58,858][00600] Avg episode reward: [(0, '13.471')] [2025-01-07 17:10:58,866][02732] Saving new best policy, reward=13.471! [2025-01-07 17:11:03,179][02745] Updated weights for policy 0, policy_version 930 (0.0019) [2025-01-07 17:11:03,853][00600] Fps is (10 sec: 3686.3, 60 sec: 3754.7, 300 sec: 3860.0). Total num frames: 3809280. Throughput: 0: 944.4. Samples: 953048. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-01-07 17:11:03,859][00600] Avg episode reward: [(0, '14.473')] [2025-01-07 17:11:03,868][02732] Saving new best policy, reward=14.473! [2025-01-07 17:11:08,853][00600] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3860.0). Total num frames: 3829760. Throughput: 0: 930.7. Samples: 955788. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-01-07 17:11:08,856][00600] Avg episode reward: [(0, '14.696')] [2025-01-07 17:11:08,858][02732] Saving new best policy, reward=14.696! [2025-01-07 17:11:13,107][02745] Updated weights for policy 0, policy_version 940 (0.0018) [2025-01-07 17:11:13,853][00600] Fps is (10 sec: 4096.1, 60 sec: 3891.2, 300 sec: 3887.7). Total num frames: 3850240. Throughput: 0: 961.6. Samples: 962580. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-07 17:11:13,861][00600] Avg episode reward: [(0, '15.275')] [2025-01-07 17:11:13,871][02732] Saving new best policy, reward=15.275! [2025-01-07 17:11:18,855][00600] Fps is (10 sec: 3685.7, 60 sec: 3822.8, 300 sec: 3859.9). Total num frames: 3866624. Throughput: 0: 958.4. Samples: 967682. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-07 17:11:18,857][00600] Avg episode reward: [(0, '15.526')] [2025-01-07 17:11:18,859][02732] Saving new best policy, reward=15.526! [2025-01-07 17:11:23,853][00600] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3832.2). Total num frames: 3883008. Throughput: 0: 927.5. Samples: 969688. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-07 17:11:23,856][00600] Avg episode reward: [(0, '14.976')] [2025-01-07 17:11:25,108][02745] Updated weights for policy 0, policy_version 950 (0.0034) [2025-01-07 17:11:28,857][00600] Fps is (10 sec: 4095.2, 60 sec: 3822.7, 300 sec: 3873.8). Total num frames: 3907584. Throughput: 0: 934.2. Samples: 976282. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-07 17:11:28,859][00600] Avg episode reward: [(0, '15.502')] [2025-01-07 17:11:33,853][00600] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 3928064. Throughput: 0: 979.4. Samples: 982556. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-07 17:11:33,857][00600] Avg episode reward: [(0, '15.364')] [2025-01-07 17:11:33,868][02732] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000959_3928064.pth... [2025-01-07 17:11:34,027][02732] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000734_3006464.pth [2025-01-07 17:11:35,321][02745] Updated weights for policy 0, policy_version 960 (0.0021) [2025-01-07 17:11:38,853][00600] Fps is (10 sec: 3278.1, 60 sec: 3686.4, 300 sec: 3832.2). Total num frames: 3940352. Throughput: 0: 948.2. Samples: 984594. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-07 17:11:38,857][00600] Avg episode reward: [(0, '16.419')] [2025-01-07 17:11:38,871][02732] Saving new best policy, reward=16.419! [2025-01-07 17:11:43,853][00600] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3846.1). Total num frames: 3960832. Throughput: 0: 922.4. Samples: 990248. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-07 17:11:43,862][00600] Avg episode reward: [(0, '16.046')] [2025-01-07 17:11:45,895][02745] Updated weights for policy 0, policy_version 970 (0.0022) [2025-01-07 17:11:48,853][00600] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 3985408. Throughput: 0: 975.6. Samples: 996948. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-07 17:11:48,855][00600] Avg episode reward: [(0, '15.518')] [2025-01-07 17:11:53,857][00600] Fps is (10 sec: 3684.8, 60 sec: 3754.4, 300 sec: 3832.1). Total num frames: 3997696. Throughput: 0: 964.4. Samples: 999192. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-01-07 17:11:53,860][00600] Avg episode reward: [(0, '15.074')] [2025-01-07 17:11:55,671][02732] Stopping Batcher_0... [2025-01-07 17:11:55,672][02732] Loop batcher_evt_loop terminating... [2025-01-07 17:11:55,673][02732] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2025-01-07 17:11:55,687][00600] Component Batcher_0 stopped! [2025-01-07 17:11:55,786][02745] Weights refcount: 2 0 [2025-01-07 17:11:55,799][02745] Stopping InferenceWorker_p0-w0... [2025-01-07 17:11:55,800][02745] Loop inference_proc0-0_evt_loop terminating... [2025-01-07 17:11:55,805][00600] Component InferenceWorker_p0-w0 stopped! [2025-01-07 17:11:55,839][02732] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000848_3473408.pth [2025-01-07 17:11:55,855][02732] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2025-01-07 17:11:56,069][02732] Stopping LearnerWorker_p0... [2025-01-07 17:11:56,075][02732] Loop learner_proc0_evt_loop terminating... [2025-01-07 17:11:56,069][00600] Component LearnerWorker_p0 stopped! [2025-01-07 17:11:56,205][02749] Stopping RolloutWorker_w3... [2025-01-07 17:11:56,205][00600] Component RolloutWorker_w3 stopped! [2025-01-07 17:11:56,208][02749] Loop rollout_proc3_evt_loop terminating... [2025-01-07 17:11:56,224][00600] Component RolloutWorker_w5 stopped! [2025-01-07 17:11:56,226][02751] Stopping RolloutWorker_w5... [2025-01-07 17:11:56,230][00600] Component RolloutWorker_w7 stopped! [2025-01-07 17:11:56,228][02751] Loop rollout_proc5_evt_loop terminating... [2025-01-07 17:11:56,238][00600] Component RolloutWorker_w1 stopped! [2025-01-07 17:11:56,241][02747] Stopping RolloutWorker_w1... [2025-01-07 17:11:56,235][02753] Stopping RolloutWorker_w7... [2025-01-07 17:11:56,244][02747] Loop rollout_proc1_evt_loop terminating... [2025-01-07 17:11:56,243][02753] Loop rollout_proc7_evt_loop terminating... [2025-01-07 17:11:56,272][02750] Stopping RolloutWorker_w4... [2025-01-07 17:11:56,272][02750] Loop rollout_proc4_evt_loop terminating... [2025-01-07 17:11:56,274][00600] Component RolloutWorker_w4 stopped! [2025-01-07 17:11:56,309][02746] Stopping RolloutWorker_w0... [2025-01-07 17:11:56,309][02746] Loop rollout_proc0_evt_loop terminating... [2025-01-07 17:11:56,308][00600] Component RolloutWorker_w0 stopped! [2025-01-07 17:11:56,336][02748] Stopping RolloutWorker_w2... [2025-01-07 17:11:56,336][02748] Loop rollout_proc2_evt_loop terminating... [2025-01-07 17:11:56,335][00600] Component RolloutWorker_w2 stopped! [2025-01-07 17:11:56,424][02752] Stopping RolloutWorker_w6... [2025-01-07 17:11:56,425][02752] Loop rollout_proc6_evt_loop terminating... [2025-01-07 17:11:56,424][00600] Component RolloutWorker_w6 stopped! [2025-01-07 17:11:56,428][00600] Waiting for process learner_proc0 to stop... [2025-01-07 17:11:57,863][00600] Waiting for process inference_proc0-0 to join... [2025-01-07 17:11:57,873][00600] Waiting for process rollout_proc0 to join... [2025-01-07 17:11:59,880][00600] Waiting for process rollout_proc1 to join... [2025-01-07 17:11:59,882][00600] Waiting for process rollout_proc2 to join... [2025-01-07 17:11:59,888][00600] Waiting for process rollout_proc3 to join... [2025-01-07 17:11:59,891][00600] Waiting for process rollout_proc4 to join... [2025-01-07 17:11:59,895][00600] Waiting for process rollout_proc5 to join... [2025-01-07 17:11:59,898][00600] Waiting for process rollout_proc6 to join... [2025-01-07 17:11:59,902][00600] Waiting for process rollout_proc7 to join... [2025-01-07 17:11:59,906][00600] Batcher 0 profile tree view: batching: 25.9903, releasing_batches: 0.0322 [2025-01-07 17:11:59,908][00600] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0208 wait_policy_total: 428.7128 update_model: 8.6145 weight_update: 0.0023 one_step: 0.0030 handle_policy_step: 588.2331 deserialize: 15.0185, stack: 3.3229, obs_to_device_normalize: 124.4648, forward: 295.8447, send_messages: 29.2273 prepare_outputs: 90.4324 to_cpu: 54.3817 [2025-01-07 17:11:59,910][00600] Learner 0 profile tree view: misc: 0.0067, prepare_batch: 14.1132 train: 74.8774 epoch_init: 0.0056, minibatch_init: 0.0138, losses_postprocess: 0.6348, kl_divergence: 0.6363, after_optimizer: 33.6250 calculate_losses: 27.0190 losses_init: 0.0064, forward_head: 1.3171, bptt_initial: 18.2204, tail: 1.0377, advantages_returns: 0.2660, losses: 3.8130 bptt: 1.9977 bptt_forward_core: 1.9183 update: 12.2975 clip: 0.9360 [2025-01-07 17:11:59,912][00600] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.3291, enqueue_policy_requests: 106.2644, env_step: 832.9791, overhead: 14.2014, complete_rollouts: 7.9145 save_policy_outputs: 21.9259 split_output_tensors: 8.4777 [2025-01-07 17:11:59,916][00600] RolloutWorker_w7 profile tree view: wait_for_trajectories: 0.3148, enqueue_policy_requests: 109.0362, env_step: 832.7990, overhead: 14.0383, complete_rollouts: 6.5195 save_policy_outputs: 21.9675 split_output_tensors: 8.5268 [2025-01-07 17:11:59,917][00600] Loop Runner_EvtLoop terminating... [2025-01-07 17:11:59,918][00600] Runner profile tree view: main_loop: 1099.6946 [2025-01-07 17:11:59,920][00600] Collected {0: 4005888}, FPS: 3642.7 [2025-01-07 17:12:16,438][00600] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2025-01-07 17:12:16,441][00600] Overriding arg 'num_workers' with value 1 passed from command line [2025-01-07 17:12:16,444][00600] Adding new argument 'no_render'=True that is not in the saved config file! [2025-01-07 17:12:16,445][00600] Adding new argument 'save_video'=True that is not in the saved config file! [2025-01-07 17:12:16,447][00600] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2025-01-07 17:12:16,449][00600] Adding new argument 'video_name'=None that is not in the saved config file! [2025-01-07 17:12:16,450][00600] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! [2025-01-07 17:12:16,452][00600] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2025-01-07 17:12:16,453][00600] Adding new argument 'push_to_hub'=False that is not in the saved config file! [2025-01-07 17:12:16,454][00600] Adding new argument 'hf_repository'=None that is not in the saved config file! [2025-01-07 17:12:16,455][00600] Adding new argument 'policy_index'=0 that is not in the saved config file! [2025-01-07 17:12:16,456][00600] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2025-01-07 17:12:16,457][00600] Adding new argument 'train_script'=None that is not in the saved config file! [2025-01-07 17:12:16,458][00600] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2025-01-07 17:12:16,459][00600] Using frameskip 1 and render_action_repeat=4 for evaluation [2025-01-07 17:12:16,489][00600] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-07 17:12:16,494][00600] RunningMeanStd input shape: (3, 72, 128) [2025-01-07 17:12:16,496][00600] RunningMeanStd input shape: (1,) [2025-01-07 17:12:16,511][00600] ConvEncoder: input_channels=3 [2025-01-07 17:12:16,615][00600] Conv encoder output size: 512 [2025-01-07 17:12:16,616][00600] Policy head output size: 512 [2025-01-07 17:12:16,879][00600] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2025-01-07 17:12:17,813][00600] Num frames 100... [2025-01-07 17:12:17,983][00600] Num frames 200... [2025-01-07 17:12:18,156][00600] Num frames 300... [2025-01-07 17:12:18,326][00600] Num frames 400... [2025-01-07 17:12:18,496][00600] Num frames 500... [2025-01-07 17:12:18,663][00600] Num frames 600... [2025-01-07 17:12:18,835][00600] Num frames 700... [2025-01-07 17:12:19,007][00600] Num frames 800... [2025-01-07 17:12:19,130][00600] Avg episode rewards: #0: 15.320, true rewards: #0: 8.320 [2025-01-07 17:12:19,132][00600] Avg episode reward: 15.320, avg true_objective: 8.320 [2025-01-07 17:12:19,257][00600] Num frames 900... [2025-01-07 17:12:19,440][00600] Num frames 1000... [2025-01-07 17:12:19,632][00600] Num frames 1100... [2025-01-07 17:12:19,795][00600] Num frames 1200... [2025-01-07 17:12:19,916][00600] Num frames 1300... [2025-01-07 17:12:20,036][00600] Num frames 1400... [2025-01-07 17:12:20,170][00600] Num frames 1500... [2025-01-07 17:12:20,247][00600] Avg episode rewards: #0: 13.580, true rewards: #0: 7.580 [2025-01-07 17:12:20,248][00600] Avg episode reward: 13.580, avg true_objective: 7.580 [2025-01-07 17:12:20,361][00600] Num frames 1600... [2025-01-07 17:12:20,483][00600] Num frames 1700... [2025-01-07 17:12:20,605][00600] Num frames 1800... [2025-01-07 17:12:20,727][00600] Num frames 1900... [2025-01-07 17:12:20,780][00600] Avg episode rewards: #0: 10.333, true rewards: #0: 6.333 [2025-01-07 17:12:20,782][00600] Avg episode reward: 10.333, avg true_objective: 6.333 [2025-01-07 17:12:20,904][00600] Num frames 2000... [2025-01-07 17:12:21,026][00600] Num frames 2100... [2025-01-07 17:12:21,154][00600] Num frames 2200... [2025-01-07 17:12:21,283][00600] Num frames 2300... [2025-01-07 17:12:21,410][00600] Num frames 2400... [2025-01-07 17:12:21,532][00600] Num frames 2500... [2025-01-07 17:12:21,598][00600] Avg episode rewards: #0: 10.020, true rewards: #0: 6.270 [2025-01-07 17:12:21,600][00600] Avg episode reward: 10.020, avg true_objective: 6.270 [2025-01-07 17:12:21,714][00600] Num frames 2600... [2025-01-07 17:12:21,837][00600] Num frames 2700... [2025-01-07 17:12:21,955][00600] Num frames 2800... [2025-01-07 17:12:22,080][00600] Num frames 2900... [2025-01-07 17:12:22,212][00600] Num frames 3000... [2025-01-07 17:12:22,343][00600] Num frames 3100... [2025-01-07 17:12:22,465][00600] Num frames 3200... [2025-01-07 17:12:22,587][00600] Num frames 3300... [2025-01-07 17:12:22,713][00600] Num frames 3400... [2025-01-07 17:12:22,835][00600] Num frames 3500... [2025-01-07 17:12:22,957][00600] Num frames 3600... [2025-01-07 17:12:23,084][00600] Avg episode rewards: #0: 12.520, true rewards: #0: 7.320 [2025-01-07 17:12:23,085][00600] Avg episode reward: 12.520, avg true_objective: 7.320 [2025-01-07 17:12:23,141][00600] Num frames 3700... [2025-01-07 17:12:23,268][00600] Num frames 3800... [2025-01-07 17:12:23,406][00600] Num frames 3900... [2025-01-07 17:12:23,530][00600] Num frames 4000... [2025-01-07 17:12:23,703][00600] Avg episode rewards: #0: 11.832, true rewards: #0: 6.832 [2025-01-07 17:12:23,704][00600] Avg episode reward: 11.832, avg true_objective: 6.832 [2025-01-07 17:12:23,709][00600] Num frames 4100... [2025-01-07 17:12:23,834][00600] Num frames 4200... [2025-01-07 17:12:23,962][00600] Num frames 4300... [2025-01-07 17:12:24,085][00600] Num frames 4400... [2025-01-07 17:12:24,214][00600] Num frames 4500... [2025-01-07 17:12:24,373][00600] Avg episode rewards: #0: 11.256, true rewards: #0: 6.541 [2025-01-07 17:12:24,375][00600] Avg episode reward: 11.256, avg true_objective: 6.541 [2025-01-07 17:12:24,405][00600] Num frames 4600... [2025-01-07 17:12:24,530][00600] Num frames 4700... [2025-01-07 17:12:24,651][00600] Num frames 4800... [2025-01-07 17:12:24,773][00600] Num frames 4900... [2025-01-07 17:12:24,943][00600] Avg episode rewards: #0: 10.619, true rewards: #0: 6.244 [2025-01-07 17:12:24,944][00600] Avg episode reward: 10.619, avg true_objective: 6.244 [2025-01-07 17:12:24,955][00600] Num frames 5000... [2025-01-07 17:12:25,077][00600] Num frames 5100... [2025-01-07 17:12:25,204][00600] Num frames 5200... [2025-01-07 17:12:25,338][00600] Num frames 5300... [2025-01-07 17:12:25,465][00600] Num frames 5400... [2025-01-07 17:12:25,588][00600] Num frames 5500... [2025-01-07 17:12:25,713][00600] Num frames 5600... [2025-01-07 17:12:25,774][00600] Avg episode rewards: #0: 10.448, true rewards: #0: 6.226 [2025-01-07 17:12:25,776][00600] Avg episode reward: 10.448, avg true_objective: 6.226 [2025-01-07 17:12:25,893][00600] Num frames 5700... [2025-01-07 17:12:26,016][00600] Num frames 5800... [2025-01-07 17:12:26,138][00600] Num frames 5900... [2025-01-07 17:12:26,270][00600] Num frames 6000... [2025-01-07 17:12:26,402][00600] Num frames 6100... [2025-01-07 17:12:26,523][00600] Num frames 6200... [2025-01-07 17:12:26,642][00600] Num frames 6300... [2025-01-07 17:12:26,708][00600] Avg episode rewards: #0: 10.507, true rewards: #0: 6.307 [2025-01-07 17:12:26,710][00600] Avg episode reward: 10.507, avg true_objective: 6.307 [2025-01-07 17:13:03,291][00600] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2025-01-07 17:20:19,635][11933] Saving configuration to /content/train_dir/default_experiment/config.json... [2025-01-07 17:20:19,639][11933] Rollout worker 0 uses device cpu [2025-01-07 17:20:19,640][11933] Rollout worker 1 uses device cpu [2025-01-07 17:20:19,642][11933] Rollout worker 2 uses device cpu [2025-01-07 17:20:19,643][11933] Rollout worker 3 uses device cpu [2025-01-07 17:20:19,644][11933] Rollout worker 4 uses device cpu [2025-01-07 17:20:19,645][11933] Rollout worker 5 uses device cpu [2025-01-07 17:20:19,647][11933] Rollout worker 6 uses device cpu [2025-01-07 17:20:19,648][11933] Rollout worker 7 uses device cpu [2025-01-07 17:20:19,778][11933] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2025-01-07 17:20:19,781][11933] InferenceWorker_p0-w0: min num requests: 2 [2025-01-07 17:20:19,828][11933] Starting all processes... [2025-01-07 17:20:19,832][11933] Starting process learner_proc0 [2025-01-07 17:20:19,889][11933] Starting all processes... [2025-01-07 17:20:19,968][11933] Starting process inference_proc0-0 [2025-01-07 17:20:19,969][11933] Starting process rollout_proc0 [2025-01-07 17:20:19,970][11933] Starting process rollout_proc1 [2025-01-07 17:20:19,971][11933] Starting process rollout_proc2 [2025-01-07 17:20:19,971][11933] Starting process rollout_proc3 [2025-01-07 17:20:19,972][11933] Starting process rollout_proc4 [2025-01-07 17:20:19,972][11933] Starting process rollout_proc5 [2025-01-07 17:20:19,972][11933] Starting process rollout_proc6 [2025-01-07 17:20:19,973][11933] Starting process rollout_proc7 [2025-01-07 17:20:37,147][13105] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2025-01-07 17:20:37,148][13105] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2025-01-07 17:20:37,201][13105] Num visible devices: 1 [2025-01-07 17:20:37,236][13105] Starting seed is not provided [2025-01-07 17:20:37,237][13105] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2025-01-07 17:20:37,238][13105] Initializing actor-critic model on device cuda:0 [2025-01-07 17:20:37,239][13105] RunningMeanStd input shape: (3, 72, 128) [2025-01-07 17:20:37,241][13105] RunningMeanStd input shape: (1,) [2025-01-07 17:20:37,312][13105] ConvEncoder: input_channels=3 [2025-01-07 17:20:37,441][13123] Worker 4 uses CPU cores [0] [2025-01-07 17:20:37,567][13124] Worker 5 uses CPU cores [1] [2025-01-07 17:20:37,592][13121] Worker 2 uses CPU cores [0] [2025-01-07 17:20:37,610][13118] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2025-01-07 17:20:37,615][13122] Worker 3 uses CPU cores [1] [2025-01-07 17:20:37,615][13118] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2025-01-07 17:20:37,682][13118] Num visible devices: 1 [2025-01-07 17:20:37,688][13119] Worker 0 uses CPU cores [0] [2025-01-07 17:20:37,701][13125] Worker 6 uses CPU cores [0] [2025-01-07 17:20:37,757][13120] Worker 1 uses CPU cores [1] [2025-01-07 17:20:37,755][13126] Worker 7 uses CPU cores [1] [2025-01-07 17:20:37,766][13105] Conv encoder output size: 512 [2025-01-07 17:20:37,767][13105] Policy head output size: 512 [2025-01-07 17:20:37,782][13105] Created Actor Critic model with architecture: [2025-01-07 17:20:37,783][13105] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2025-01-07 17:20:38,024][13105] Using optimizer [2025-01-07 17:20:38,863][13105] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2025-01-07 17:20:38,904][13105] Loading model from checkpoint [2025-01-07 17:20:38,906][13105] Loaded experiment state at self.train_step=978, self.env_steps=4005888 [2025-01-07 17:20:38,907][13105] Initialized policy 0 weights for model version 978 [2025-01-07 17:20:38,910][13105] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2025-01-07 17:20:38,917][13105] LearnerWorker_p0 finished initialization! [2025-01-07 17:20:39,099][13118] RunningMeanStd input shape: (3, 72, 128) [2025-01-07 17:20:39,100][13118] RunningMeanStd input shape: (1,) [2025-01-07 17:20:39,113][13118] ConvEncoder: input_channels=3 [2025-01-07 17:20:39,232][13118] Conv encoder output size: 512 [2025-01-07 17:20:39,232][13118] Policy head output size: 512 [2025-01-07 17:20:39,290][11933] Inference worker 0-0 is ready! [2025-01-07 17:20:39,292][11933] All inference workers are ready! Signal rollout workers to start! [2025-01-07 17:20:39,499][13120] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-07 17:20:39,500][13122] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-07 17:20:39,502][13124] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-07 17:20:39,504][13126] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-07 17:20:39,498][13123] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-07 17:20:39,500][13119] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-07 17:20:39,506][13125] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-07 17:20:39,504][13121] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-07 17:20:39,531][11933] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 4005888. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2025-01-07 17:20:39,766][11933] Heartbeat connected on Batcher_0 [2025-01-07 17:20:39,771][11933] Heartbeat connected on LearnerWorker_p0 [2025-01-07 17:20:39,817][11933] Heartbeat connected on InferenceWorker_p0-w0 [2025-01-07 17:20:40,113][13125] Decorrelating experience for 0 frames... [2025-01-07 17:20:40,811][13124] Decorrelating experience for 0 frames... [2025-01-07 17:20:40,814][13126] Decorrelating experience for 0 frames... [2025-01-07 17:20:40,821][13122] Decorrelating experience for 0 frames... [2025-01-07 17:20:40,817][13120] Decorrelating experience for 0 frames... [2025-01-07 17:20:41,598][13125] Decorrelating experience for 32 frames... [2025-01-07 17:20:41,664][13123] Decorrelating experience for 0 frames... [2025-01-07 17:20:41,930][13122] Decorrelating experience for 32 frames... [2025-01-07 17:20:41,933][13120] Decorrelating experience for 32 frames... [2025-01-07 17:20:41,935][13126] Decorrelating experience for 32 frames... [2025-01-07 17:20:43,510][13124] Decorrelating experience for 32 frames... [2025-01-07 17:20:43,538][13119] Decorrelating experience for 0 frames... [2025-01-07 17:20:43,592][13123] Decorrelating experience for 32 frames... [2025-01-07 17:20:43,657][13121] Decorrelating experience for 0 frames... [2025-01-07 17:20:44,017][13125] Decorrelating experience for 64 frames... [2025-01-07 17:20:44,197][13126] Decorrelating experience for 64 frames... [2025-01-07 17:20:44,531][11933] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 4005888. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2025-01-07 17:20:45,065][13119] Decorrelating experience for 32 frames... [2025-01-07 17:20:45,156][13122] Decorrelating experience for 64 frames... [2025-01-07 17:20:45,241][13121] Decorrelating experience for 32 frames... [2025-01-07 17:20:45,738][13123] Decorrelating experience for 64 frames... [2025-01-07 17:20:46,668][13126] Decorrelating experience for 96 frames... [2025-01-07 17:20:46,889][13120] Decorrelating experience for 64 frames... [2025-01-07 17:20:46,901][13121] Decorrelating experience for 64 frames... [2025-01-07 17:20:47,037][11933] Heartbeat connected on RolloutWorker_w7 [2025-01-07 17:20:47,231][13122] Decorrelating experience for 96 frames... [2025-01-07 17:20:47,498][11933] Heartbeat connected on RolloutWorker_w3 [2025-01-07 17:20:47,691][13124] Decorrelating experience for 64 frames... [2025-01-07 17:20:48,913][13125] Decorrelating experience for 96 frames... [2025-01-07 17:20:49,135][11933] Heartbeat connected on RolloutWorker_w6 [2025-01-07 17:20:49,280][13123] Decorrelating experience for 96 frames... [2025-01-07 17:20:49,468][13121] Decorrelating experience for 96 frames... [2025-01-07 17:20:49,531][11933] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 4005888. Throughput: 0: 1.2. Samples: 12. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2025-01-07 17:20:49,535][11933] Avg episode reward: [(0, '1.280')] [2025-01-07 17:20:49,706][11933] Heartbeat connected on RolloutWorker_w4 [2025-01-07 17:20:49,828][11933] Heartbeat connected on RolloutWorker_w2 [2025-01-07 17:20:50,061][13120] Decorrelating experience for 96 frames... [2025-01-07 17:20:50,300][13124] Decorrelating experience for 96 frames... [2025-01-07 17:20:50,385][11933] Heartbeat connected on RolloutWorker_w1 [2025-01-07 17:20:50,654][11933] Heartbeat connected on RolloutWorker_w5 [2025-01-07 17:20:51,750][13119] Decorrelating experience for 64 frames... [2025-01-07 17:20:54,531][11933] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 4005888. Throughput: 0: 197.1. Samples: 2956. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2025-01-07 17:20:54,535][11933] Avg episode reward: [(0, '4.452')] [2025-01-07 17:20:54,828][13105] Signal inference workers to stop experience collection... [2025-01-07 17:20:54,840][13118] InferenceWorker_p0-w0: stopping experience collection [2025-01-07 17:20:54,901][13119] Decorrelating experience for 96 frames... [2025-01-07 17:20:54,991][11933] Heartbeat connected on RolloutWorker_w0 [2025-01-07 17:20:56,752][13105] Signal inference workers to resume experience collection... [2025-01-07 17:20:56,753][13118] InferenceWorker_p0-w0: resuming experience collection [2025-01-07 17:20:59,531][11933] Fps is (10 sec: 1228.8, 60 sec: 614.4, 300 sec: 614.4). Total num frames: 4018176. Throughput: 0: 161.8. Samples: 3236. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2025-01-07 17:20:59,535][11933] Avg episode reward: [(0, '4.774')] [2025-01-07 17:21:04,531][11933] Fps is (10 sec: 2048.0, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 4026368. Throughput: 0: 216.9. Samples: 5422. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2025-01-07 17:21:04,536][11933] Avg episode reward: [(0, '6.089')] [2025-01-07 17:21:09,038][13118] Updated weights for policy 0, policy_version 988 (0.0024) [2025-01-07 17:21:09,531][11933] Fps is (10 sec: 2867.2, 60 sec: 1365.3, 300 sec: 1365.3). Total num frames: 4046848. Throughput: 0: 360.6. Samples: 10818. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-01-07 17:21:09,536][11933] Avg episode reward: [(0, '9.071')] [2025-01-07 17:21:14,531][11933] Fps is (10 sec: 4096.0, 60 sec: 1755.4, 300 sec: 1755.4). Total num frames: 4067328. Throughput: 0: 387.2. Samples: 13552. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-01-07 17:21:14,535][11933] Avg episode reward: [(0, '11.582')] [2025-01-07 17:21:19,531][11933] Fps is (10 sec: 3686.3, 60 sec: 1945.6, 300 sec: 1945.6). Total num frames: 4083712. Throughput: 0: 480.6. Samples: 19224. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-01-07 17:21:19,537][11933] Avg episode reward: [(0, '12.741')] [2025-01-07 17:21:20,823][13118] Updated weights for policy 0, policy_version 998 (0.0021) [2025-01-07 17:21:24,531][11933] Fps is (10 sec: 2867.2, 60 sec: 2002.5, 300 sec: 2002.5). Total num frames: 4096000. Throughput: 0: 512.1. Samples: 23044. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-01-07 17:21:24,535][11933] Avg episode reward: [(0, '13.761')] [2025-01-07 17:21:29,531][11933] Fps is (10 sec: 3276.8, 60 sec: 2211.8, 300 sec: 2211.8). Total num frames: 4116480. Throughput: 0: 582.6. Samples: 26216. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-01-07 17:21:29,538][11933] Avg episode reward: [(0, '16.584')] [2025-01-07 17:21:29,541][13105] Saving new best policy, reward=16.584! [2025-01-07 17:21:31,740][13118] Updated weights for policy 0, policy_version 1008 (0.0015) [2025-01-07 17:21:34,531][11933] Fps is (10 sec: 4096.0, 60 sec: 2383.1, 300 sec: 2383.1). Total num frames: 4136960. Throughput: 0: 728.8. Samples: 32806. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-01-07 17:21:34,533][11933] Avg episode reward: [(0, '16.909')] [2025-01-07 17:21:34,538][13105] Saving new best policy, reward=16.909! [2025-01-07 17:21:39,539][11933] Fps is (10 sec: 3274.2, 60 sec: 2389.0, 300 sec: 2389.0). Total num frames: 4149248. Throughput: 0: 756.8. Samples: 37018. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-07 17:21:39,542][11933] Avg episode reward: [(0, '15.947')] [2025-01-07 17:21:44,488][13118] Updated weights for policy 0, policy_version 1018 (0.0032) [2025-01-07 17:21:44,531][11933] Fps is (10 sec: 3276.8, 60 sec: 2730.7, 300 sec: 2520.6). Total num frames: 4169728. Throughput: 0: 800.7. Samples: 39266. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-07 17:21:44,535][11933] Avg episode reward: [(0, '15.686')] [2025-01-07 17:21:49,531][11933] Fps is (10 sec: 4099.3, 60 sec: 3072.0, 300 sec: 2633.1). Total num frames: 4190208. Throughput: 0: 892.2. Samples: 45570. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-07 17:21:49,538][11933] Avg episode reward: [(0, '14.068')] [2025-01-07 17:21:54,531][11933] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 2676.1). Total num frames: 4206592. Throughput: 0: 883.5. Samples: 50574. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-07 17:21:54,535][11933] Avg episode reward: [(0, '13.959')] [2025-01-07 17:21:56,050][13118] Updated weights for policy 0, policy_version 1028 (0.0014) [2025-01-07 17:21:59,531][11933] Fps is (10 sec: 2867.2, 60 sec: 3345.1, 300 sec: 2662.4). Total num frames: 4218880. Throughput: 0: 865.0. Samples: 52476. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-01-07 17:21:59,534][11933] Avg episode reward: [(0, '15.541')] [2025-01-07 17:22:04,531][11933] Fps is (10 sec: 3276.7, 60 sec: 3549.9, 300 sec: 2746.7). Total num frames: 4239360. Throughput: 0: 862.8. Samples: 58052. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-01-07 17:22:04,533][11933] Avg episode reward: [(0, '16.563')] [2025-01-07 17:22:06,833][13118] Updated weights for policy 0, policy_version 1038 (0.0032) [2025-01-07 17:22:09,531][11933] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 2821.7). Total num frames: 4259840. Throughput: 0: 929.4. Samples: 64868. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-01-07 17:22:09,534][11933] Avg episode reward: [(0, '17.029')] [2025-01-07 17:22:09,538][13105] Saving new best policy, reward=17.029! [2025-01-07 17:22:14,531][11933] Fps is (10 sec: 3276.9, 60 sec: 3413.3, 300 sec: 2802.5). Total num frames: 4272128. Throughput: 0: 902.4. Samples: 66826. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2025-01-07 17:22:14,539][11933] Avg episode reward: [(0, '17.850')] [2025-01-07 17:22:14,603][13105] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001044_4276224.pth... [2025-01-07 17:22:14,791][13105] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000959_3928064.pth [2025-01-07 17:22:14,814][13105] Saving new best policy, reward=17.850! [2025-01-07 17:22:19,072][13118] Updated weights for policy 0, policy_version 1048 (0.0021) [2025-01-07 17:22:19,531][11933] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 2867.2). Total num frames: 4292608. Throughput: 0: 860.7. Samples: 71536. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-07 17:22:19,536][11933] Avg episode reward: [(0, '19.260')] [2025-01-07 17:22:19,539][13105] Saving new best policy, reward=19.260! [2025-01-07 17:22:24,531][11933] Fps is (10 sec: 4505.6, 60 sec: 3686.4, 300 sec: 2964.7). Total num frames: 4317184. Throughput: 0: 915.6. Samples: 78212. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-01-07 17:22:24,536][11933] Avg episode reward: [(0, '18.827')] [2025-01-07 17:22:29,531][11933] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 2941.7). Total num frames: 4329472. Throughput: 0: 930.2. Samples: 81126. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-07 17:22:29,533][11933] Avg episode reward: [(0, '19.111')] [2025-01-07 17:22:29,568][13118] Updated weights for policy 0, policy_version 1058 (0.0026) [2025-01-07 17:22:34,531][11933] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 2956.2). Total num frames: 4345856. Throughput: 0: 879.5. Samples: 85148. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-07 17:22:34,533][11933] Avg episode reward: [(0, '18.724')] [2025-01-07 17:22:39,531][11933] Fps is (10 sec: 4096.1, 60 sec: 3686.9, 300 sec: 3037.9). Total num frames: 4370432. Throughput: 0: 911.0. Samples: 91568. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-07 17:22:39,535][11933] Avg episode reward: [(0, '19.383')] [2025-01-07 17:22:39,538][13105] Saving new best policy, reward=19.383! [2025-01-07 17:22:40,649][13118] Updated weights for policy 0, policy_version 1068 (0.0015) [2025-01-07 17:22:44,533][11933] Fps is (10 sec: 4095.2, 60 sec: 3618.0, 300 sec: 3047.4). Total num frames: 4386816. Throughput: 0: 932.7. Samples: 94450. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-01-07 17:22:44,535][11933] Avg episode reward: [(0, '18.802')] [2025-01-07 17:22:49,536][11933] Fps is (10 sec: 3274.9, 60 sec: 3549.5, 300 sec: 3056.1). Total num frames: 4403200. Throughput: 0: 913.8. Samples: 99176. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-07 17:22:49,547][11933] Avg episode reward: [(0, '18.608')] [2025-01-07 17:22:52,836][13118] Updated weights for policy 0, policy_version 1078 (0.0018) [2025-01-07 17:22:54,531][11933] Fps is (10 sec: 3277.5, 60 sec: 3549.9, 300 sec: 3064.4). Total num frames: 4419584. Throughput: 0: 882.5. Samples: 104580. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-07 17:22:54,537][11933] Avg episode reward: [(0, '19.333')] [2025-01-07 17:22:59,531][11933] Fps is (10 sec: 4098.4, 60 sec: 3754.7, 300 sec: 3130.5). Total num frames: 4444160. Throughput: 0: 909.8. Samples: 107768. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-07 17:22:59,533][11933] Avg episode reward: [(0, '20.457')] [2025-01-07 17:22:59,540][13105] Saving new best policy, reward=20.457! [2025-01-07 17:23:02,854][13118] Updated weights for policy 0, policy_version 1088 (0.0019) [2025-01-07 17:23:04,531][11933] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3135.6). Total num frames: 4460544. Throughput: 0: 935.5. Samples: 113632. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-07 17:23:04,533][11933] Avg episode reward: [(0, '20.591')] [2025-01-07 17:23:04,543][13105] Saving new best policy, reward=20.591! [2025-01-07 17:23:09,531][11933] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3113.0). Total num frames: 4472832. Throughput: 0: 879.8. Samples: 117802. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-07 17:23:09,533][11933] Avg episode reward: [(0, '20.360')] [2025-01-07 17:23:14,106][13118] Updated weights for policy 0, policy_version 1098 (0.0036) [2025-01-07 17:23:14,531][11933] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3171.1). Total num frames: 4497408. Throughput: 0: 891.3. Samples: 121234. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-01-07 17:23:14,535][11933] Avg episode reward: [(0, '19.105')] [2025-01-07 17:23:19,537][11933] Fps is (10 sec: 4502.6, 60 sec: 3754.3, 300 sec: 3199.9). Total num frames: 4517888. Throughput: 0: 954.6. Samples: 128112. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-07 17:23:19,542][11933] Avg episode reward: [(0, '19.510')] [2025-01-07 17:23:24,531][11933] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3177.5). Total num frames: 4530176. Throughput: 0: 905.5. Samples: 132316. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-07 17:23:24,534][11933] Avg episode reward: [(0, '19.021')] [2025-01-07 17:23:26,274][13118] Updated weights for policy 0, policy_version 1108 (0.0014) [2025-01-07 17:23:29,531][11933] Fps is (10 sec: 3279.0, 60 sec: 3686.4, 300 sec: 3204.5). Total num frames: 4550656. Throughput: 0: 898.4. Samples: 134876. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-07 17:23:29,534][11933] Avg episode reward: [(0, '18.703')] [2025-01-07 17:23:34,531][11933] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3253.4). Total num frames: 4575232. Throughput: 0: 944.8. Samples: 141686. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-07 17:23:34,533][11933] Avg episode reward: [(0, '18.469')] [2025-01-07 17:23:35,344][13118] Updated weights for policy 0, policy_version 1118 (0.0023) [2025-01-07 17:23:39,531][11933] Fps is (10 sec: 4095.9, 60 sec: 3686.4, 300 sec: 3254.0). Total num frames: 4591616. Throughput: 0: 944.0. Samples: 147058. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-07 17:23:39,536][11933] Avg episode reward: [(0, '18.235')] [2025-01-07 17:23:44,531][11933] Fps is (10 sec: 2867.2, 60 sec: 3618.3, 300 sec: 3232.5). Total num frames: 4603904. Throughput: 0: 917.2. Samples: 149042. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-07 17:23:44,533][11933] Avg episode reward: [(0, '17.814')] [2025-01-07 17:23:47,198][13118] Updated weights for policy 0, policy_version 1128 (0.0033) [2025-01-07 17:23:49,531][11933] Fps is (10 sec: 3686.5, 60 sec: 3755.0, 300 sec: 3276.8). Total num frames: 4628480. Throughput: 0: 929.1. Samples: 155442. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-01-07 17:23:49,534][11933] Avg episode reward: [(0, '18.016')] [2025-01-07 17:23:54,531][11933] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3297.8). Total num frames: 4648960. Throughput: 0: 979.7. Samples: 161888. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-07 17:23:54,537][11933] Avg episode reward: [(0, '18.316')] [2025-01-07 17:23:58,038][13118] Updated weights for policy 0, policy_version 1138 (0.0024) [2025-01-07 17:23:59,531][11933] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3276.8). Total num frames: 4661248. Throughput: 0: 947.3. Samples: 163864. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-07 17:23:59,537][11933] Avg episode reward: [(0, '17.929')] [2025-01-07 17:24:04,531][11933] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3296.8). Total num frames: 4681728. Throughput: 0: 908.7. Samples: 168998. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-07 17:24:04,536][11933] Avg episode reward: [(0, '18.245')] [2025-01-07 17:24:08,401][13118] Updated weights for policy 0, policy_version 1148 (0.0022) [2025-01-07 17:24:09,531][11933] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3335.3). Total num frames: 4706304. Throughput: 0: 963.6. Samples: 175680. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-01-07 17:24:09,538][11933] Avg episode reward: [(0, '17.760')] [2025-01-07 17:24:14,531][11933] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3334.0). Total num frames: 4722688. Throughput: 0: 970.5. Samples: 178550. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-07 17:24:14,533][11933] Avg episode reward: [(0, '17.871')] [2025-01-07 17:24:14,542][13105] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001153_4722688.pth... [2025-01-07 17:24:14,717][13105] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth [2025-01-07 17:24:19,531][11933] Fps is (10 sec: 3276.8, 60 sec: 3686.8, 300 sec: 3332.7). Total num frames: 4739072. Throughput: 0: 909.8. Samples: 182628. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-01-07 17:24:19,538][11933] Avg episode reward: [(0, '17.773')] [2025-01-07 17:24:20,385][13118] Updated weights for policy 0, policy_version 1158 (0.0024) [2025-01-07 17:24:24,531][11933] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3349.6). Total num frames: 4759552. Throughput: 0: 942.3. Samples: 189462. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-01-07 17:24:24,532][11933] Avg episode reward: [(0, '18.520')] [2025-01-07 17:24:29,531][11933] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3365.8). Total num frames: 4780032. Throughput: 0: 971.4. Samples: 192754. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-01-07 17:24:29,537][11933] Avg episode reward: [(0, '19.612')] [2025-01-07 17:24:30,121][13118] Updated weights for policy 0, policy_version 1168 (0.0029) [2025-01-07 17:24:34,531][11933] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3346.5). Total num frames: 4792320. Throughput: 0: 928.3. Samples: 197216. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-07 17:24:34,533][11933] Avg episode reward: [(0, '19.516')] [2025-01-07 17:24:39,531][11933] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3362.1). Total num frames: 4812800. Throughput: 0: 912.3. Samples: 202940. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-07 17:24:39,533][11933] Avg episode reward: [(0, '18.475')] [2025-01-07 17:24:41,507][13118] Updated weights for policy 0, policy_version 1178 (0.0029) [2025-01-07 17:24:44,531][11933] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3393.8). Total num frames: 4837376. Throughput: 0: 944.6. Samples: 206372. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-07 17:24:44,533][11933] Avg episode reward: [(0, '18.724')] [2025-01-07 17:24:49,536][11933] Fps is (10 sec: 4093.8, 60 sec: 3754.3, 300 sec: 3391.4). Total num frames: 4853760. Throughput: 0: 954.0. Samples: 211932. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-01-07 17:24:49,546][11933] Avg episode reward: [(0, '19.442')] [2025-01-07 17:24:53,296][13118] Updated weights for policy 0, policy_version 1188 (0.0016) [2025-01-07 17:24:54,531][11933] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3389.2). Total num frames: 4870144. Throughput: 0: 910.0. Samples: 216630. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-07 17:24:54,538][11933] Avg episode reward: [(0, '18.169')] [2025-01-07 17:24:59,531][11933] Fps is (10 sec: 3688.3, 60 sec: 3822.9, 300 sec: 3402.8). Total num frames: 4890624. Throughput: 0: 921.1. Samples: 219998. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2025-01-07 17:24:59,533][11933] Avg episode reward: [(0, '19.368')] [2025-01-07 17:25:02,677][13118] Updated weights for policy 0, policy_version 1198 (0.0014) [2025-01-07 17:25:04,531][11933] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3415.9). Total num frames: 4911104. Throughput: 0: 973.0. Samples: 226412. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-07 17:25:04,539][11933] Avg episode reward: [(0, '19.952')] [2025-01-07 17:25:09,531][11933] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3398.2). Total num frames: 4923392. Throughput: 0: 911.6. Samples: 230484. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-07 17:25:09,534][11933] Avg episode reward: [(0, '20.486')] [2025-01-07 17:25:14,531][11933] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3410.9). Total num frames: 4943872. Throughput: 0: 904.1. Samples: 233438. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-07 17:25:14,535][11933] Avg episode reward: [(0, '19.795')] [2025-01-07 17:25:14,681][13118] Updated weights for policy 0, policy_version 1208 (0.0014) [2025-01-07 17:25:19,531][11933] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3437.7). Total num frames: 4968448. Throughput: 0: 957.1. Samples: 240284. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-01-07 17:25:19,534][11933] Avg episode reward: [(0, '21.462')] [2025-01-07 17:25:19,538][13105] Saving new best policy, reward=21.462! [2025-01-07 17:25:24,532][11933] Fps is (10 sec: 3685.9, 60 sec: 3686.3, 300 sec: 3420.5). Total num frames: 4980736. Throughput: 0: 936.4. Samples: 245078. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-07 17:25:24,535][11933] Avg episode reward: [(0, '22.601')] [2025-01-07 17:25:24,547][13105] Saving new best policy, reward=22.601! [2025-01-07 17:25:26,377][13118] Updated weights for policy 0, policy_version 1218 (0.0021) [2025-01-07 17:25:29,531][11933] Fps is (10 sec: 2867.2, 60 sec: 3618.1, 300 sec: 3418.0). Total num frames: 4997120. Throughput: 0: 905.0. Samples: 247096. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-07 17:25:29,537][11933] Avg episode reward: [(0, '22.720')] [2025-01-07 17:25:29,542][13105] Saving new best policy, reward=22.720! [2025-01-07 17:25:34,531][11933] Fps is (10 sec: 4096.5, 60 sec: 3822.9, 300 sec: 3443.4). Total num frames: 5021696. Throughput: 0: 925.2. Samples: 253562. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-07 17:25:34,533][11933] Avg episode reward: [(0, '23.305')] [2025-01-07 17:25:34,539][13105] Saving new best policy, reward=23.305! [2025-01-07 17:25:36,149][13118] Updated weights for policy 0, policy_version 1228 (0.0023) [2025-01-07 17:25:39,531][11933] Fps is (10 sec: 4095.9, 60 sec: 3754.6, 300 sec: 3499.0). Total num frames: 5038080. Throughput: 0: 952.0. Samples: 259472. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-07 17:25:39,535][11933] Avg episode reward: [(0, '22.973')] [2025-01-07 17:25:44,535][11933] Fps is (10 sec: 3275.4, 60 sec: 3617.9, 300 sec: 3554.4). Total num frames: 5054464. Throughput: 0: 921.6. Samples: 261472. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-07 17:25:44,537][11933] Avg episode reward: [(0, '22.012')] [2025-01-07 17:25:48,121][13118] Updated weights for policy 0, policy_version 1238 (0.0024) [2025-01-07 17:25:49,531][11933] Fps is (10 sec: 3686.4, 60 sec: 3686.7, 300 sec: 3623.9). Total num frames: 5074944. Throughput: 0: 901.0. Samples: 266956. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-07 17:25:49,534][11933] Avg episode reward: [(0, '21.080')] [2025-01-07 17:25:54,531][11933] Fps is (10 sec: 4507.5, 60 sec: 3822.9, 300 sec: 3665.6). Total num frames: 5099520. Throughput: 0: 962.9. Samples: 273814. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-07 17:25:54,535][11933] Avg episode reward: [(0, '19.719')] [2025-01-07 17:25:58,584][13118] Updated weights for policy 0, policy_version 1248 (0.0016) [2025-01-07 17:25:59,532][11933] Fps is (10 sec: 3686.0, 60 sec: 3686.3, 300 sec: 3679.4). Total num frames: 5111808. Throughput: 0: 948.3. Samples: 276114. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2025-01-07 17:25:59,535][11933] Avg episode reward: [(0, '19.846')] [2025-01-07 17:26:04,531][11933] Fps is (10 sec: 2867.2, 60 sec: 3618.1, 300 sec: 3665.6). Total num frames: 5128192. Throughput: 0: 894.2. Samples: 280522. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2025-01-07 17:26:04,533][11933] Avg episode reward: [(0, '19.637')] [2025-01-07 17:26:09,477][13118] Updated weights for policy 0, policy_version 1258 (0.0019) [2025-01-07 17:26:09,531][11933] Fps is (10 sec: 4096.6, 60 sec: 3822.9, 300 sec: 3679.5). Total num frames: 5152768. Throughput: 0: 936.3. Samples: 287210. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-07 17:26:09,533][11933] Avg episode reward: [(0, '19.684')] [2025-01-07 17:26:14,533][11933] Fps is (10 sec: 4095.1, 60 sec: 3754.5, 300 sec: 3679.4). Total num frames: 5169152. Throughput: 0: 967.2. Samples: 290622. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-07 17:26:14,538][11933] Avg episode reward: [(0, '19.188')] [2025-01-07 17:26:14,549][13105] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001262_5169152.pth... [2025-01-07 17:26:14,737][13105] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001044_4276224.pth [2025-01-07 17:26:19,533][11933] Fps is (10 sec: 3276.0, 60 sec: 3618.0, 300 sec: 3693.3). Total num frames: 5185536. Throughput: 0: 914.2. Samples: 294704. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2025-01-07 17:26:19,543][11933] Avg episode reward: [(0, '20.503')] [2025-01-07 17:26:21,586][13118] Updated weights for policy 0, policy_version 1268 (0.0030) [2025-01-07 17:26:24,531][11933] Fps is (10 sec: 3687.1, 60 sec: 3754.7, 300 sec: 3693.3). Total num frames: 5206016. Throughput: 0: 915.3. Samples: 300660. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-07 17:26:24,534][11933] Avg episode reward: [(0, '20.206')] [2025-01-07 17:26:29,531][11933] Fps is (10 sec: 4096.9, 60 sec: 3822.9, 300 sec: 3693.3). Total num frames: 5226496. Throughput: 0: 945.9. Samples: 304034. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-07 17:26:29,537][11933] Avg episode reward: [(0, '19.759')] [2025-01-07 17:26:31,303][13118] Updated weights for policy 0, policy_version 1278 (0.0022) [2025-01-07 17:26:34,531][11933] Fps is (10 sec: 3686.5, 60 sec: 3686.4, 300 sec: 3707.3). Total num frames: 5242880. Throughput: 0: 937.1. Samples: 309126. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-01-07 17:26:34,539][11933] Avg episode reward: [(0, '21.234')] [2025-01-07 17:26:39,531][11933] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3693.3). Total num frames: 5259264. Throughput: 0: 900.3. Samples: 314328. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-07 17:26:39,535][11933] Avg episode reward: [(0, '21.271')] [2025-01-07 17:26:42,686][13118] Updated weights for policy 0, policy_version 1288 (0.0020) [2025-01-07 17:26:44,531][11933] Fps is (10 sec: 4096.1, 60 sec: 3823.2, 300 sec: 3707.2). Total num frames: 5283840. Throughput: 0: 922.9. Samples: 317642. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-07 17:26:44,532][11933] Avg episode reward: [(0, '21.850')] [2025-01-07 17:26:49,532][11933] Fps is (10 sec: 4095.5, 60 sec: 3754.6, 300 sec: 3707.2). Total num frames: 5300224. Throughput: 0: 962.0. Samples: 323812. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-01-07 17:26:49,537][11933] Avg episode reward: [(0, '22.465')] [2025-01-07 17:26:54,384][13118] Updated weights for policy 0, policy_version 1298 (0.0022) [2025-01-07 17:26:54,531][11933] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3721.1). Total num frames: 5316608. Throughput: 0: 908.2. Samples: 328078. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2025-01-07 17:26:54,535][11933] Avg episode reward: [(0, '23.624')] [2025-01-07 17:26:54,543][13105] Saving new best policy, reward=23.624! [2025-01-07 17:26:59,531][11933] Fps is (10 sec: 3686.8, 60 sec: 3754.7, 300 sec: 3721.1). Total num frames: 5337088. Throughput: 0: 904.3. Samples: 331312. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-07 17:26:59,536][11933] Avg episode reward: [(0, '24.662')] [2025-01-07 17:26:59,538][13105] Saving new best policy, reward=24.662! [2025-01-07 17:27:03,765][13118] Updated weights for policy 0, policy_version 1308 (0.0029) [2025-01-07 17:27:04,531][11933] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3721.1). Total num frames: 5357568. Throughput: 0: 959.2. Samples: 337864. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-01-07 17:27:04,537][11933] Avg episode reward: [(0, '23.703')] [2025-01-07 17:27:09,532][11933] Fps is (10 sec: 3276.5, 60 sec: 3618.1, 300 sec: 3721.1). Total num frames: 5369856. Throughput: 0: 920.7. Samples: 342092. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-01-07 17:27:09,539][11933] Avg episode reward: [(0, '24.356')] [2025-01-07 17:27:14,531][11933] Fps is (10 sec: 3276.8, 60 sec: 3686.5, 300 sec: 3721.1). Total num frames: 5390336. Throughput: 0: 905.1. Samples: 344764. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-01-07 17:27:14,538][11933] Avg episode reward: [(0, '24.252')] [2025-01-07 17:27:15,765][13118] Updated weights for policy 0, policy_version 1318 (0.0031) [2025-01-07 17:27:19,531][11933] Fps is (10 sec: 4506.1, 60 sec: 3823.1, 300 sec: 3721.1). Total num frames: 5414912. Throughput: 0: 943.4. Samples: 351580. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-01-07 17:27:19,534][11933] Avg episode reward: [(0, '22.655')] [2025-01-07 17:27:24,531][11933] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3735.0). Total num frames: 5431296. Throughput: 0: 943.2. Samples: 356772. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-07 17:27:24,533][11933] Avg episode reward: [(0, '22.782')] [2025-01-07 17:27:27,364][13118] Updated weights for policy 0, policy_version 1328 (0.0014) [2025-01-07 17:27:29,531][11933] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3735.0). Total num frames: 5447680. Throughput: 0: 914.5. Samples: 358796. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-07 17:27:29,538][11933] Avg episode reward: [(0, '21.859')] [2025-01-07 17:27:34,531][11933] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3721.1). Total num frames: 5468160. Throughput: 0: 920.3. Samples: 365226. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-07 17:27:34,533][11933] Avg episode reward: [(0, '22.345')] [2025-01-07 17:27:36,835][13118] Updated weights for policy 0, policy_version 1338 (0.0018) [2025-01-07 17:27:39,531][11933] Fps is (10 sec: 4095.9, 60 sec: 3822.9, 300 sec: 3735.0). Total num frames: 5488640. Throughput: 0: 961.9. Samples: 371362. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-07 17:27:39,538][11933] Avg episode reward: [(0, '20.962')] [2025-01-07 17:27:44,532][11933] Fps is (10 sec: 3276.4, 60 sec: 3618.1, 300 sec: 3721.2). Total num frames: 5500928. Throughput: 0: 934.6. Samples: 373372. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-07 17:27:44,537][11933] Avg episode reward: [(0, '20.816')] [2025-01-07 17:27:48,627][13118] Updated weights for policy 0, policy_version 1348 (0.0022) [2025-01-07 17:27:49,531][11933] Fps is (10 sec: 3276.9, 60 sec: 3686.5, 300 sec: 3735.0). Total num frames: 5521408. Throughput: 0: 919.0. Samples: 379220. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-07 17:27:49,540][11933] Avg episode reward: [(0, '20.932')] [2025-01-07 17:27:54,531][11933] Fps is (10 sec: 4506.2, 60 sec: 3822.9, 300 sec: 3735.0). Total num frames: 5545984. Throughput: 0: 975.2. Samples: 385974. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-07 17:27:54,535][11933] Avg episode reward: [(0, '20.167')] [2025-01-07 17:27:59,536][11933] Fps is (10 sec: 3684.6, 60 sec: 3686.1, 300 sec: 3721.1). Total num frames: 5558272. Throughput: 0: 963.6. Samples: 388130. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-07 17:27:59,538][11933] Avg episode reward: [(0, '19.873')] [2025-01-07 17:27:59,767][13118] Updated weights for policy 0, policy_version 1358 (0.0031) [2025-01-07 17:28:04,531][11933] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3748.9). Total num frames: 5578752. Throughput: 0: 917.1. Samples: 392848. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-01-07 17:28:04,533][11933] Avg episode reward: [(0, '21.760')] [2025-01-07 17:28:09,531][11933] Fps is (10 sec: 4098.0, 60 sec: 3823.0, 300 sec: 3735.0). Total num frames: 5599232. Throughput: 0: 952.3. Samples: 399626. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-07 17:28:09,537][11933] Avg episode reward: [(0, '20.914')] [2025-01-07 17:28:09,622][13118] Updated weights for policy 0, policy_version 1368 (0.0019) [2025-01-07 17:28:14,533][11933] Fps is (10 sec: 4095.1, 60 sec: 3822.8, 300 sec: 3735.1). Total num frames: 5619712. Throughput: 0: 977.1. Samples: 402768. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-07 17:28:14,538][11933] Avg episode reward: [(0, '20.582')] [2025-01-07 17:28:14,550][13105] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001372_5619712.pth... [2025-01-07 17:28:14,715][13105] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001153_4722688.pth [2025-01-07 17:28:19,531][11933] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3735.0). Total num frames: 5632000. Throughput: 0: 923.2. Samples: 406772. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-01-07 17:28:19,538][11933] Avg episode reward: [(0, '21.474')] [2025-01-07 17:28:21,443][13118] Updated weights for policy 0, policy_version 1378 (0.0020) [2025-01-07 17:28:24,531][11933] Fps is (10 sec: 3687.2, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 5656576. Throughput: 0: 931.1. Samples: 413262. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2025-01-07 17:28:24,533][11933] Avg episode reward: [(0, '22.118')] [2025-01-07 17:28:29,532][11933] Fps is (10 sec: 4505.2, 60 sec: 3822.9, 300 sec: 3735.0). Total num frames: 5677056. Throughput: 0: 962.7. Samples: 416694. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-01-07 17:28:29,534][11933] Avg episode reward: [(0, '21.477')] [2025-01-07 17:28:31,938][13118] Updated weights for policy 0, policy_version 1388 (0.0013) [2025-01-07 17:28:34,532][11933] Fps is (10 sec: 3276.6, 60 sec: 3686.4, 300 sec: 3721.1). Total num frames: 5689344. Throughput: 0: 938.5. Samples: 421452. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-07 17:28:34,540][11933] Avg episode reward: [(0, '21.816')] [2025-01-07 17:28:39,531][11933] Fps is (10 sec: 3277.1, 60 sec: 3686.4, 300 sec: 3748.9). Total num frames: 5709824. Throughput: 0: 913.2. Samples: 427068. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-07 17:28:39,533][11933] Avg episode reward: [(0, '24.687')] [2025-01-07 17:28:39,535][13105] Saving new best policy, reward=24.687! [2025-01-07 17:28:42,490][13118] Updated weights for policy 0, policy_version 1398 (0.0014) [2025-01-07 17:28:44,531][11933] Fps is (10 sec: 4505.7, 60 sec: 3891.3, 300 sec: 3748.9). Total num frames: 5734400. Throughput: 0: 939.4. Samples: 430398. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-01-07 17:28:44,538][11933] Avg episode reward: [(0, '22.850')] [2025-01-07 17:28:49,534][11933] Fps is (10 sec: 4094.6, 60 sec: 3822.7, 300 sec: 3735.0). Total num frames: 5750784. Throughput: 0: 961.1. Samples: 436102. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-07 17:28:49,536][11933] Avg episode reward: [(0, '23.770')] [2025-01-07 17:28:54,424][13118] Updated weights for policy 0, policy_version 1408 (0.0014) [2025-01-07 17:28:54,531][11933] Fps is (10 sec: 3276.9, 60 sec: 3686.4, 300 sec: 3748.9). Total num frames: 5767168. Throughput: 0: 914.7. Samples: 440788. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-07 17:28:54,533][11933] Avg episode reward: [(0, '23.936')] [2025-01-07 17:28:59,531][11933] Fps is (10 sec: 3687.6, 60 sec: 3823.2, 300 sec: 3748.9). Total num frames: 5787648. Throughput: 0: 917.6. Samples: 444060. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-07 17:28:59,533][11933] Avg episode reward: [(0, '24.429')] [2025-01-07 17:29:03,881][13118] Updated weights for policy 0, policy_version 1418 (0.0015) [2025-01-07 17:29:04,531][11933] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3735.0). Total num frames: 5808128. Throughput: 0: 975.9. Samples: 450688. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-01-07 17:29:04,532][11933] Avg episode reward: [(0, '23.300')] [2025-01-07 17:29:09,531][11933] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3721.1). Total num frames: 5820416. Throughput: 0: 920.8. Samples: 454700. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-01-07 17:29:09,535][11933] Avg episode reward: [(0, '23.010')] [2025-01-07 17:29:14,531][11933] Fps is (10 sec: 3276.8, 60 sec: 3686.5, 300 sec: 3735.0). Total num frames: 5840896. Throughput: 0: 912.2. Samples: 457742. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-07 17:29:14,534][11933] Avg episode reward: [(0, '24.123')] [2025-01-07 17:29:15,566][13118] Updated weights for policy 0, policy_version 1428 (0.0036) [2025-01-07 17:29:19,531][11933] Fps is (10 sec: 4505.5, 60 sec: 3891.2, 300 sec: 3748.9). Total num frames: 5865472. Throughput: 0: 954.6. Samples: 464408. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-07 17:29:19,533][11933] Avg episode reward: [(0, '23.731')] [2025-01-07 17:29:24,534][11933] Fps is (10 sec: 3685.2, 60 sec: 3686.2, 300 sec: 3721.1). Total num frames: 5877760. Throughput: 0: 938.6. Samples: 469310. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-07 17:29:24,536][11933] Avg episode reward: [(0, '23.407')] [2025-01-07 17:29:27,481][13118] Updated weights for policy 0, policy_version 1438 (0.0019) [2025-01-07 17:29:29,533][11933] Fps is (10 sec: 3276.1, 60 sec: 3686.3, 300 sec: 3748.9). Total num frames: 5898240. Throughput: 0: 911.8. Samples: 471430. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-07 17:29:29,535][11933] Avg episode reward: [(0, '23.825')] [2025-01-07 17:29:34,531][11933] Fps is (10 sec: 4097.3, 60 sec: 3823.0, 300 sec: 3748.9). Total num frames: 5918720. Throughput: 0: 933.0. Samples: 478084. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-01-07 17:29:34,533][11933] Avg episode reward: [(0, '22.954')] [2025-01-07 17:29:36,725][13118] Updated weights for policy 0, policy_version 1448 (0.0025) [2025-01-07 17:29:39,531][11933] Fps is (10 sec: 4097.0, 60 sec: 3822.9, 300 sec: 3735.0). Total num frames: 5939200. Throughput: 0: 957.3. Samples: 483866. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-07 17:29:39,535][11933] Avg episode reward: [(0, '22.473')] [2025-01-07 17:29:44,531][11933] Fps is (10 sec: 3276.8, 60 sec: 3618.2, 300 sec: 3721.2). Total num frames: 5951488. Throughput: 0: 929.9. Samples: 485904. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-07 17:29:44,535][11933] Avg episode reward: [(0, '22.347')] [2025-01-07 17:29:48,533][13118] Updated weights for policy 0, policy_version 1458 (0.0023) [2025-01-07 17:29:49,531][11933] Fps is (10 sec: 3686.4, 60 sec: 3754.9, 300 sec: 3748.9). Total num frames: 5976064. Throughput: 0: 918.4. Samples: 492016. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-07 17:29:49,533][11933] Avg episode reward: [(0, '22.179')] [2025-01-07 17:29:54,531][11933] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3748.9). Total num frames: 5996544. Throughput: 0: 976.8. Samples: 498654. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-07 17:29:54,536][11933] Avg episode reward: [(0, '22.441')] [2025-01-07 17:29:59,531][11933] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3721.1). Total num frames: 6008832. Throughput: 0: 953.0. Samples: 500628. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-07 17:29:59,537][11933] Avg episode reward: [(0, '23.028')] [2025-01-07 17:29:59,877][13118] Updated weights for policy 0, policy_version 1468 (0.0021) [2025-01-07 17:30:04,531][11933] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3748.9). Total num frames: 6029312. Throughput: 0: 916.8. Samples: 505666. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-07 17:30:04,537][11933] Avg episode reward: [(0, '24.063')] [2025-01-07 17:30:09,417][13118] Updated weights for policy 0, policy_version 1478 (0.0021) [2025-01-07 17:30:09,531][11933] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3762.8). Total num frames: 6053888. Throughput: 0: 960.3. Samples: 512522. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-07 17:30:09,535][11933] Avg episode reward: [(0, '23.165')] [2025-01-07 17:30:14,531][11933] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3735.0). Total num frames: 6070272. Throughput: 0: 976.0. Samples: 515346. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-07 17:30:14,536][11933] Avg episode reward: [(0, '22.090')] [2025-01-07 17:30:14,547][13105] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001482_6070272.pth... [2025-01-07 17:30:14,716][13105] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001262_5169152.pth [2025-01-07 17:30:19,531][11933] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3748.9). Total num frames: 6086656. Throughput: 0: 919.6. Samples: 519464. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-07 17:30:19,534][11933] Avg episode reward: [(0, '23.103')] [2025-01-07 17:30:21,355][13118] Updated weights for policy 0, policy_version 1488 (0.0015) [2025-01-07 17:30:24,531][11933] Fps is (10 sec: 3686.4, 60 sec: 3823.1, 300 sec: 3762.8). Total num frames: 6107136. Throughput: 0: 942.0. Samples: 526258. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-07 17:30:24,538][11933] Avg episode reward: [(0, '22.358')] [2025-01-07 17:30:29,531][11933] Fps is (10 sec: 4095.9, 60 sec: 3823.1, 300 sec: 3748.9). Total num frames: 6127616. Throughput: 0: 971.1. Samples: 529602. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2025-01-07 17:30:29,533][11933] Avg episode reward: [(0, '21.348')] [2025-01-07 17:30:31,774][13118] Updated weights for policy 0, policy_version 1498 (0.0033) [2025-01-07 17:30:34,533][11933] Fps is (10 sec: 3275.9, 60 sec: 3686.2, 300 sec: 3735.0). Total num frames: 6139904. Throughput: 0: 935.8. Samples: 534128. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-07 17:30:34,543][11933] Avg episode reward: [(0, '20.833')] [2025-01-07 17:30:39,534][11933] Fps is (10 sec: 3275.8, 60 sec: 3686.2, 300 sec: 3748.9). Total num frames: 6160384. Throughput: 0: 917.1. Samples: 539928. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-07 17:30:39,541][11933] Avg episode reward: [(0, '19.306')] [2025-01-07 17:30:42,299][13118] Updated weights for policy 0, policy_version 1508 (0.0040) [2025-01-07 17:30:44,531][11933] Fps is (10 sec: 4506.8, 60 sec: 3891.2, 300 sec: 3762.8). Total num frames: 6184960. Throughput: 0: 949.1. Samples: 543338. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-07 17:30:44,537][11933] Avg episode reward: [(0, '18.978')] [2025-01-07 17:30:49,531][11933] Fps is (10 sec: 4097.3, 60 sec: 3754.7, 300 sec: 3735.0). Total num frames: 6201344. Throughput: 0: 961.0. Samples: 548912. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-07 17:30:49,533][11933] Avg episode reward: [(0, '19.029')] [2025-01-07 17:30:54,025][13118] Updated weights for policy 0, policy_version 1518 (0.0022) [2025-01-07 17:30:54,531][11933] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3748.9). Total num frames: 6217728. Throughput: 0: 917.2. Samples: 553796. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2025-01-07 17:30:54,538][11933] Avg episode reward: [(0, '19.966')] [2025-01-07 17:30:59,531][11933] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3762.8). Total num frames: 6238208. Throughput: 0: 930.8. Samples: 557232. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-07 17:30:59,537][11933] Avg episode reward: [(0, '21.098')] [2025-01-07 17:31:03,662][13118] Updated weights for policy 0, policy_version 1528 (0.0029) [2025-01-07 17:31:04,531][11933] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3748.9). Total num frames: 6258688. Throughput: 0: 981.6. Samples: 563638. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-07 17:31:04,535][11933] Avg episode reward: [(0, '22.161')] [2025-01-07 17:31:09,531][11933] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3735.0). Total num frames: 6270976. Throughput: 0: 922.0. Samples: 567746. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-07 17:31:09,533][11933] Avg episode reward: [(0, '21.793')] [2025-01-07 17:31:14,531][11933] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3762.8). Total num frames: 6295552. Throughput: 0: 918.7. Samples: 570944. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-07 17:31:14,533][11933] Avg episode reward: [(0, '23.130')] [2025-01-07 17:31:15,053][13118] Updated weights for policy 0, policy_version 1538 (0.0023) [2025-01-07 17:31:19,531][11933] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3762.8). Total num frames: 6316032. Throughput: 0: 969.7. Samples: 577764. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-01-07 17:31:19,534][11933] Avg episode reward: [(0, '22.252')] [2025-01-07 17:31:24,533][11933] Fps is (10 sec: 3685.7, 60 sec: 3754.5, 300 sec: 3748.9). Total num frames: 6332416. Throughput: 0: 944.3. Samples: 582422. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2025-01-07 17:31:24,548][11933] Avg episode reward: [(0, '21.608')] [2025-01-07 17:31:27,025][13118] Updated weights for policy 0, policy_version 1548 (0.0023) [2025-01-07 17:31:29,531][11933] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3748.9). Total num frames: 6348800. Throughput: 0: 918.3. Samples: 584662. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-07 17:31:29,533][11933] Avg episode reward: [(0, '22.390')] [2025-01-07 17:31:34,531][11933] Fps is (10 sec: 4096.7, 60 sec: 3891.4, 300 sec: 3776.7). Total num frames: 6373376. Throughput: 0: 947.4. Samples: 591546. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-07 17:31:34,535][11933] Avg episode reward: [(0, '23.147')] [2025-01-07 17:31:35,948][13118] Updated weights for policy 0, policy_version 1558 (0.0025) [2025-01-07 17:31:39,531][11933] Fps is (10 sec: 4096.0, 60 sec: 3823.1, 300 sec: 3748.9). Total num frames: 6389760. Throughput: 0: 961.2. Samples: 597048. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2025-01-07 17:31:39,534][11933] Avg episode reward: [(0, '22.211')] [2025-01-07 17:31:44,531][11933] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3748.9). Total num frames: 6406144. Throughput: 0: 930.5. Samples: 599106. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-07 17:31:44,538][11933] Avg episode reward: [(0, '22.317')] [2025-01-07 17:31:47,801][13118] Updated weights for policy 0, policy_version 1568 (0.0016) [2025-01-07 17:31:49,531][11933] Fps is (10 sec: 3686.3, 60 sec: 3754.7, 300 sec: 3762.8). Total num frames: 6426624. Throughput: 0: 924.2. Samples: 605228. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2025-01-07 17:31:49,533][11933] Avg episode reward: [(0, '23.650')] [2025-01-07 17:31:54,531][11933] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3776.7). Total num frames: 6451200. Throughput: 0: 977.9. Samples: 611750. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2025-01-07 17:31:54,533][11933] Avg episode reward: [(0, '23.669')] [2025-01-07 17:31:58,959][13118] Updated weights for policy 0, policy_version 1578 (0.0039) [2025-01-07 17:31:59,531][11933] Fps is (10 sec: 3686.5, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 6463488. Throughput: 0: 951.5. Samples: 613760. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2025-01-07 17:31:59,537][11933] Avg episode reward: [(0, '23.392')] [2025-01-07 17:32:04,531][11933] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3776.7). Total num frames: 6483968. Throughput: 0: 918.2. Samples: 619084. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2025-01-07 17:32:04,537][11933] Avg episode reward: [(0, '23.386')] [2025-01-07 17:32:08,892][13118] Updated weights for policy 0, policy_version 1588 (0.0020) [2025-01-07 17:32:09,531][11933] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3776.7). Total num frames: 6504448. Throughput: 0: 961.6. Samples: 625690. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-07 17:32:09,535][11933] Avg episode reward: [(0, '23.478')] [2025-01-07 17:32:14,531][11933] Fps is (10 sec: 3686.3, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 6520832. Throughput: 0: 970.5. Samples: 628334. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-01-07 17:32:14,534][11933] Avg episode reward: [(0, '21.516')] [2025-01-07 17:32:14,553][13105] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001592_6520832.pth... [2025-01-07 17:32:14,737][13105] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001372_5619712.pth [2025-01-07 17:32:19,531][11933] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3748.9). Total num frames: 6537216. Throughput: 0: 915.6. Samples: 632746. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-07 17:32:19,533][11933] Avg episode reward: [(0, '22.519')] [2025-01-07 17:32:20,759][13118] Updated weights for policy 0, policy_version 1598 (0.0016) [2025-01-07 17:32:24,531][11933] Fps is (10 sec: 4096.1, 60 sec: 3823.1, 300 sec: 3776.7). Total num frames: 6561792. Throughput: 0: 944.2. Samples: 639538. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-01-07 17:32:24,537][11933] Avg episode reward: [(0, '22.494')] [2025-01-07 17:32:29,531][11933] Fps is (10 sec: 4095.8, 60 sec: 3822.9, 300 sec: 3762.8). Total num frames: 6578176. Throughput: 0: 973.5. Samples: 642916. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-01-07 17:32:29,536][11933] Avg episode reward: [(0, '22.192')] [2025-01-07 17:32:31,080][13118] Updated weights for policy 0, policy_version 1608 (0.0037) [2025-01-07 17:32:34,531][11933] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3748.9). Total num frames: 6594560. Throughput: 0: 930.9. Samples: 647116. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-07 17:32:34,535][11933] Avg episode reward: [(0, '22.043')] [2025-01-07 17:32:39,531][11933] Fps is (10 sec: 3686.6, 60 sec: 3754.7, 300 sec: 3776.7). Total num frames: 6615040. Throughput: 0: 922.2. Samples: 653250. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-01-07 17:32:39,537][11933] Avg episode reward: [(0, '21.532')] [2025-01-07 17:32:41,890][13118] Updated weights for policy 0, policy_version 1618 (0.0019) [2025-01-07 17:32:44,531][11933] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3790.5). Total num frames: 6639616. Throughput: 0: 952.1. Samples: 656606. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-07 17:32:44,533][11933] Avg episode reward: [(0, '20.805')] [2025-01-07 17:32:49,531][11933] Fps is (10 sec: 3686.3, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 6651904. Throughput: 0: 948.4. Samples: 661762. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-07 17:32:49,535][11933] Avg episode reward: [(0, '21.453')] [2025-01-07 17:32:53,602][13118] Updated weights for policy 0, policy_version 1628 (0.0021) [2025-01-07 17:32:54,531][11933] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3776.7). Total num frames: 6672384. Throughput: 0: 918.3. Samples: 667014. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-07 17:32:54,533][11933] Avg episode reward: [(0, '22.324')] [2025-01-07 17:32:59,531][11933] Fps is (10 sec: 4096.1, 60 sec: 3822.9, 300 sec: 3776.7). Total num frames: 6692864. Throughput: 0: 932.9. Samples: 670314. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-01-07 17:32:59,533][11933] Avg episode reward: [(0, '22.805')] [2025-01-07 17:33:03,416][13118] Updated weights for policy 0, policy_version 1638 (0.0016) [2025-01-07 17:33:04,532][11933] Fps is (10 sec: 3685.9, 60 sec: 3754.6, 300 sec: 3762.7). Total num frames: 6709248. Throughput: 0: 968.5. Samples: 676328. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-07 17:33:04,534][11933] Avg episode reward: [(0, '24.472')] [2025-01-07 17:33:09,531][11933] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3748.9). Total num frames: 6725632. Throughput: 0: 909.3. Samples: 680458. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-07 17:33:09,533][11933] Avg episode reward: [(0, '25.067')] [2025-01-07 17:33:09,541][13105] Saving new best policy, reward=25.067! [2025-01-07 17:33:14,531][11933] Fps is (10 sec: 3686.9, 60 sec: 3754.7, 300 sec: 3776.7). Total num frames: 6746112. Throughput: 0: 908.5. Samples: 683800. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-07 17:33:14,533][11933] Avg episode reward: [(0, '24.638')] [2025-01-07 17:33:14,783][13118] Updated weights for policy 0, policy_version 1648 (0.0027) [2025-01-07 17:33:19,531][11933] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3762.8). Total num frames: 6766592. Throughput: 0: 968.0. Samples: 690674. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-07 17:33:19,536][11933] Avg episode reward: [(0, '25.923')] [2025-01-07 17:33:19,623][13105] Saving new best policy, reward=25.923! [2025-01-07 17:33:24,531][11933] Fps is (10 sec: 3686.2, 60 sec: 3686.4, 300 sec: 3748.9). Total num frames: 6782976. Throughput: 0: 926.8. Samples: 694956. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-01-07 17:33:24,535][11933] Avg episode reward: [(0, '24.854')] [2025-01-07 17:33:26,697][13118] Updated weights for policy 0, policy_version 1658 (0.0033) [2025-01-07 17:33:29,531][11933] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3776.7). Total num frames: 6803456. Throughput: 0: 911.2. Samples: 697608. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-07 17:33:29,535][11933] Avg episode reward: [(0, '23.925')] [2025-01-07 17:33:34,531][11933] Fps is (10 sec: 4096.2, 60 sec: 3822.9, 300 sec: 3776.6). Total num frames: 6823936. Throughput: 0: 947.7. Samples: 704410. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-01-07 17:33:34,538][11933] Avg episode reward: [(0, '23.292')] [2025-01-07 17:33:35,601][13118] Updated weights for policy 0, policy_version 1668 (0.0019) [2025-01-07 17:33:39,535][11933] Fps is (10 sec: 3684.8, 60 sec: 3754.4, 300 sec: 3748.8). Total num frames: 6840320. Throughput: 0: 947.2. Samples: 709644. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-01-07 17:33:39,540][11933] Avg episode reward: [(0, '23.715')] [2025-01-07 17:33:44,531][11933] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3748.9). Total num frames: 6856704. Throughput: 0: 919.6. Samples: 711696. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-01-07 17:33:44,535][11933] Avg episode reward: [(0, '24.576')] [2025-01-07 17:33:47,605][13118] Updated weights for policy 0, policy_version 1678 (0.0041) [2025-01-07 17:33:49,531][11933] Fps is (10 sec: 4097.8, 60 sec: 3822.9, 300 sec: 3776.7). Total num frames: 6881280. Throughput: 0: 929.7. Samples: 718162. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-01-07 17:33:49,534][11933] Avg episode reward: [(0, '23.511')] [2025-01-07 17:33:54,533][11933] Fps is (10 sec: 4504.8, 60 sec: 3822.8, 300 sec: 3776.6). Total num frames: 6901760. Throughput: 0: 979.2. Samples: 724522. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-01-07 17:33:54,539][11933] Avg episode reward: [(0, '25.949')] [2025-01-07 17:33:54,551][13105] Saving new best policy, reward=25.949! [2025-01-07 17:33:58,828][13118] Updated weights for policy 0, policy_version 1688 (0.0028) [2025-01-07 17:33:59,533][11933] Fps is (10 sec: 3276.0, 60 sec: 3686.3, 300 sec: 3748.9). Total num frames: 6914048. Throughput: 0: 948.8. Samples: 726500. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-07 17:33:59,535][11933] Avg episode reward: [(0, '24.621')] [2025-01-07 17:34:04,531][11933] Fps is (10 sec: 3277.4, 60 sec: 3754.8, 300 sec: 3776.7). Total num frames: 6934528. Throughput: 0: 918.6. Samples: 732010. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-07 17:34:04,533][11933] Avg episode reward: [(0, '24.458')] [2025-01-07 17:34:08,666][13118] Updated weights for policy 0, policy_version 1698 (0.0014) [2025-01-07 17:34:09,531][11933] Fps is (10 sec: 4506.6, 60 sec: 3891.2, 300 sec: 3790.5). Total num frames: 6959104. Throughput: 0: 969.3. Samples: 738576. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-07 17:34:09,533][11933] Avg episode reward: [(0, '25.887')] [2025-01-07 17:34:14,531][11933] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 6971392. Throughput: 0: 963.5. Samples: 740966. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-07 17:34:14,538][11933] Avg episode reward: [(0, '24.969')] [2025-01-07 17:34:14,549][13105] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001702_6971392.pth... [2025-01-07 17:34:14,728][13105] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001482_6070272.pth [2025-01-07 17:34:19,531][11933] Fps is (10 sec: 2867.3, 60 sec: 3686.4, 300 sec: 3762.8). Total num frames: 6987776. Throughput: 0: 912.1. Samples: 745456. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-01-07 17:34:19,539][11933] Avg episode reward: [(0, '24.856')] [2025-01-07 17:34:20,692][13118] Updated weights for policy 0, policy_version 1708 (0.0017) [2025-01-07 17:34:24,531][11933] Fps is (10 sec: 4096.0, 60 sec: 3823.0, 300 sec: 3776.7). Total num frames: 7012352. Throughput: 0: 945.3. Samples: 752178. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-07 17:34:24,536][11933] Avg episode reward: [(0, '24.356')] [2025-01-07 17:34:29,531][11933] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3762.8). Total num frames: 7028736. Throughput: 0: 974.7. Samples: 755556. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-07 17:34:29,535][11933] Avg episode reward: [(0, '25.402')] [2025-01-07 17:34:31,355][13118] Updated weights for policy 0, policy_version 1718 (0.0020) [2025-01-07 17:34:34,531][11933] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3748.9). Total num frames: 7045120. Throughput: 0: 922.8. Samples: 759690. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-07 17:34:34,538][11933] Avg episode reward: [(0, '26.194')] [2025-01-07 17:34:34,547][13105] Saving new best policy, reward=26.194! [2025-01-07 17:34:39,531][11933] Fps is (10 sec: 3686.4, 60 sec: 3754.9, 300 sec: 3776.7). Total num frames: 7065600. Throughput: 0: 915.8. Samples: 765730. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-07 17:34:39,538][11933] Avg episode reward: [(0, '24.724')] [2025-01-07 17:34:41,877][13118] Updated weights for policy 0, policy_version 1728 (0.0033) [2025-01-07 17:34:44,531][11933] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3762.8). Total num frames: 7086080. Throughput: 0: 946.9. Samples: 769108. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-07 17:34:44,536][11933] Avg episode reward: [(0, '24.629')] [2025-01-07 17:34:49,531][11933] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3748.9). Total num frames: 7102464. Throughput: 0: 934.4. Samples: 774060. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-07 17:34:49,536][11933] Avg episode reward: [(0, '25.745')] [2025-01-07 17:34:53,739][13118] Updated weights for policy 0, policy_version 1738 (0.0020) [2025-01-07 17:34:54,531][11933] Fps is (10 sec: 3276.8, 60 sec: 3618.2, 300 sec: 3762.8). Total num frames: 7118848. Throughput: 0: 907.6. Samples: 779418. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-01-07 17:34:54,538][11933] Avg episode reward: [(0, '24.143')] [2025-01-07 17:34:59,531][11933] Fps is (10 sec: 4095.9, 60 sec: 3823.1, 300 sec: 3776.6). Total num frames: 7143424. Throughput: 0: 927.6. Samples: 782706. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2025-01-07 17:34:59,539][11933] Avg episode reward: [(0, '23.182')] [2025-01-07 17:35:03,707][13118] Updated weights for policy 0, policy_version 1748 (0.0035) [2025-01-07 17:35:04,531][11933] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 7159808. Throughput: 0: 964.1. Samples: 788842. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-07 17:35:04,537][11933] Avg episode reward: [(0, '22.822')] [2025-01-07 17:35:09,531][11933] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3748.9). Total num frames: 7176192. Throughput: 0: 907.3. Samples: 793008. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-07 17:35:09,537][11933] Avg episode reward: [(0, '23.778')] [2025-01-07 17:35:14,531][11933] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3762.8). Total num frames: 7196672. Throughput: 0: 906.6. Samples: 796352. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-07 17:35:14,534][11933] Avg episode reward: [(0, '25.430')] [2025-01-07 17:35:14,961][13118] Updated weights for policy 0, policy_version 1758 (0.0020) [2025-01-07 17:35:19,531][11933] Fps is (10 sec: 4096.1, 60 sec: 3822.9, 300 sec: 3762.8). Total num frames: 7217152. Throughput: 0: 966.1. Samples: 803166. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-07 17:35:19,535][11933] Avg episode reward: [(0, '24.425')] [2025-01-07 17:35:24,531][11933] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3748.9). Total num frames: 7233536. Throughput: 0: 928.3. Samples: 807502. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-07 17:35:24,533][11933] Avg episode reward: [(0, '24.661')] [2025-01-07 17:35:26,635][13118] Updated weights for policy 0, policy_version 1768 (0.0027) [2025-01-07 17:35:29,531][11933] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3776.7). Total num frames: 7254016. Throughput: 0: 914.1. Samples: 810244. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-07 17:35:29,535][11933] Avg episode reward: [(0, '25.593')] [2025-01-07 17:35:34,531][11933] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3776.7). Total num frames: 7274496. Throughput: 0: 954.8. Samples: 817028. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-07 17:35:34,535][11933] Avg episode reward: [(0, '26.483')] [2025-01-07 17:35:34,546][13105] Saving new best policy, reward=26.483! [2025-01-07 17:35:35,831][13118] Updated weights for policy 0, policy_version 1778 (0.0014) [2025-01-07 17:35:39,531][11933] Fps is (10 sec: 3686.3, 60 sec: 3754.6, 300 sec: 3748.9). Total num frames: 7290880. Throughput: 0: 949.3. Samples: 822138. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-07 17:35:39,541][11933] Avg episode reward: [(0, '26.584')] [2025-01-07 17:35:39,553][13105] Saving new best policy, reward=26.584! [2025-01-07 17:35:44,531][11933] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3748.9). Total num frames: 7307264. Throughput: 0: 920.0. Samples: 824108. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-07 17:35:44,537][11933] Avg episode reward: [(0, '25.755')] [2025-01-07 17:35:47,839][13118] Updated weights for policy 0, policy_version 1788 (0.0026) [2025-01-07 17:35:49,531][11933] Fps is (10 sec: 3686.6, 60 sec: 3754.7, 300 sec: 3762.8). Total num frames: 7327744. Throughput: 0: 928.4. Samples: 830622. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-07 17:35:49,533][11933] Avg episode reward: [(0, '26.779')] [2025-01-07 17:35:49,543][13105] Saving new best policy, reward=26.779! [2025-01-07 17:35:54,532][11933] Fps is (10 sec: 4095.4, 60 sec: 3822.8, 300 sec: 3762.7). Total num frames: 7348224. Throughput: 0: 972.4. Samples: 836768. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-07 17:35:54,535][11933] Avg episode reward: [(0, '27.948')] [2025-01-07 17:35:54,548][13105] Saving new best policy, reward=27.948! [2025-01-07 17:35:59,362][13118] Updated weights for policy 0, policy_version 1798 (0.0024) [2025-01-07 17:35:59,531][11933] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3748.9). Total num frames: 7364608. Throughput: 0: 941.3. Samples: 838712. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-07 17:35:59,533][11933] Avg episode reward: [(0, '26.956')] [2025-01-07 17:36:04,531][11933] Fps is (10 sec: 3686.8, 60 sec: 3754.7, 300 sec: 3776.6). Total num frames: 7385088. Throughput: 0: 915.8. Samples: 844376. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2025-01-07 17:36:04,535][11933] Avg episode reward: [(0, '26.171')] [2025-01-07 17:36:08,807][13118] Updated weights for policy 0, policy_version 1808 (0.0016) [2025-01-07 17:36:09,531][11933] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3762.8). Total num frames: 7405568. Throughput: 0: 968.0. Samples: 851060. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-07 17:36:09,534][11933] Avg episode reward: [(0, '25.436')] [2025-01-07 17:36:14,531][11933] Fps is (10 sec: 3686.5, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 7421952. Throughput: 0: 958.5. Samples: 853376. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-07 17:36:14,532][11933] Avg episode reward: [(0, '25.526')] [2025-01-07 17:36:14,550][13105] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001812_7421952.pth... [2025-01-07 17:36:14,708][13105] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001592_6520832.pth [2025-01-07 17:36:19,531][11933] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3748.9). Total num frames: 7438336. Throughput: 0: 907.8. Samples: 857878. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-01-07 17:36:19,533][11933] Avg episode reward: [(0, '24.991')] [2025-01-07 17:36:20,834][13118] Updated weights for policy 0, policy_version 1818 (0.0016) [2025-01-07 17:36:24,531][11933] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3776.7). Total num frames: 7462912. Throughput: 0: 946.9. Samples: 864748. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-01-07 17:36:24,534][11933] Avg episode reward: [(0, '25.020')] [2025-01-07 17:36:29,535][11933] Fps is (10 sec: 4094.3, 60 sec: 3754.4, 300 sec: 3748.8). Total num frames: 7479296. Throughput: 0: 975.2. Samples: 867998. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-07 17:36:29,541][11933] Avg episode reward: [(0, '23.704')] [2025-01-07 17:36:31,691][13118] Updated weights for policy 0, policy_version 1828 (0.0017) [2025-01-07 17:36:34,532][11933] Fps is (10 sec: 3276.4, 60 sec: 3686.3, 300 sec: 3748.9). Total num frames: 7495680. Throughput: 0: 923.1. Samples: 872162. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-01-07 17:36:34,536][11933] Avg episode reward: [(0, '23.468')] [2025-01-07 17:36:39,531][11933] Fps is (10 sec: 3688.0, 60 sec: 3754.7, 300 sec: 3762.8). Total num frames: 7516160. Throughput: 0: 931.6. Samples: 878688. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-07 17:36:39,534][11933] Avg episode reward: [(0, '22.015')] [2025-01-07 17:36:41,668][13118] Updated weights for policy 0, policy_version 1838 (0.0021) [2025-01-07 17:36:44,531][11933] Fps is (10 sec: 4096.4, 60 sec: 3822.9, 300 sec: 3762.8). Total num frames: 7536640. Throughput: 0: 961.3. Samples: 881970. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-07 17:36:44,536][11933] Avg episode reward: [(0, '22.227')] [2025-01-07 17:36:49,531][11933] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3735.0). Total num frames: 7553024. Throughput: 0: 943.9. Samples: 886852. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-07 17:36:49,537][11933] Avg episode reward: [(0, '22.557')] [2025-01-07 17:36:53,400][13118] Updated weights for policy 0, policy_version 1848 (0.0035) [2025-01-07 17:36:54,531][11933] Fps is (10 sec: 3686.4, 60 sec: 3754.8, 300 sec: 3762.8). Total num frames: 7573504. Throughput: 0: 920.3. Samples: 892472. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-07 17:36:54,533][11933] Avg episode reward: [(0, '22.328')] [2025-01-07 17:36:59,531][11933] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3762.8). Total num frames: 7593984. Throughput: 0: 944.1. Samples: 895862. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-07 17:36:59,533][11933] Avg episode reward: [(0, '22.945')] [2025-01-07 17:37:03,321][13118] Updated weights for policy 0, policy_version 1858 (0.0028) [2025-01-07 17:37:04,531][11933] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 7610368. Throughput: 0: 972.2. Samples: 901626. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-07 17:37:04,538][11933] Avg episode reward: [(0, '21.870')] [2025-01-07 17:37:09,531][11933] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3748.9). Total num frames: 7626752. Throughput: 0: 924.8. Samples: 906362. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-07 17:37:09,535][11933] Avg episode reward: [(0, '22.320')] [2025-01-07 17:37:14,362][13118] Updated weights for policy 0, policy_version 1868 (0.0029) [2025-01-07 17:37:14,531][11933] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3776.7). Total num frames: 7651328. Throughput: 0: 924.2. Samples: 909584. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-07 17:37:14,535][11933] Avg episode reward: [(0, '23.358')] [2025-01-07 17:37:19,531][11933] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3762.8). Total num frames: 7671808. Throughput: 0: 979.4. Samples: 916232. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-07 17:37:19,533][11933] Avg episode reward: [(0, '23.753')] [2025-01-07 17:37:24,531][11933] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3748.9). Total num frames: 7684096. Throughput: 0: 924.0. Samples: 920270. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-07 17:37:24,533][11933] Avg episode reward: [(0, '25.126')] [2025-01-07 17:37:26,304][13118] Updated weights for policy 0, policy_version 1878 (0.0015) [2025-01-07 17:37:29,531][11933] Fps is (10 sec: 3276.8, 60 sec: 3754.9, 300 sec: 3762.8). Total num frames: 7704576. Throughput: 0: 920.4. Samples: 923388. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-07 17:37:29,533][11933] Avg episode reward: [(0, '25.565')] [2025-01-07 17:37:34,531][11933] Fps is (10 sec: 4505.6, 60 sec: 3891.3, 300 sec: 3776.6). Total num frames: 7729152. Throughput: 0: 965.6. Samples: 930304. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-07 17:37:34,537][11933] Avg episode reward: [(0, '26.812')] [2025-01-07 17:37:35,329][13118] Updated weights for policy 0, policy_version 1888 (0.0025) [2025-01-07 17:37:39,532][11933] Fps is (10 sec: 3685.9, 60 sec: 3754.6, 300 sec: 3735.0). Total num frames: 7741440. Throughput: 0: 946.3. Samples: 935058. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-07 17:37:39,537][11933] Avg episode reward: [(0, '26.718')] [2025-01-07 17:37:44,533][11933] Fps is (10 sec: 3276.1, 60 sec: 3754.5, 300 sec: 3762.7). Total num frames: 7761920. Throughput: 0: 919.9. Samples: 937260. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-01-07 17:37:44,540][11933] Avg episode reward: [(0, '26.429')] [2025-01-07 17:37:47,068][13118] Updated weights for policy 0, policy_version 1898 (0.0026) [2025-01-07 17:37:49,531][11933] Fps is (10 sec: 4096.5, 60 sec: 3822.9, 300 sec: 3762.8). Total num frames: 7782400. Throughput: 0: 943.2. Samples: 944070. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-07 17:37:49,537][11933] Avg episode reward: [(0, '24.863')] [2025-01-07 17:37:54,534][11933] Fps is (10 sec: 4095.6, 60 sec: 3822.7, 300 sec: 3762.7). Total num frames: 7802880. Throughput: 0: 965.7. Samples: 949820. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-07 17:37:54,541][11933] Avg episode reward: [(0, '25.320')] [2025-01-07 17:37:58,768][13118] Updated weights for policy 0, policy_version 1908 (0.0022) [2025-01-07 17:37:59,531][11933] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3748.9). Total num frames: 7815168. Throughput: 0: 939.6. Samples: 951864. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-07 17:37:59,533][11933] Avg episode reward: [(0, '24.499')] [2025-01-07 17:38:04,531][11933] Fps is (10 sec: 3687.6, 60 sec: 3822.9, 300 sec: 3776.7). Total num frames: 7839744. Throughput: 0: 928.3. Samples: 958006. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-01-07 17:38:04,532][11933] Avg episode reward: [(0, '25.129')] [2025-01-07 17:38:07,883][13118] Updated weights for policy 0, policy_version 1918 (0.0019) [2025-01-07 17:38:09,530][11933] Fps is (10 sec: 4505.7, 60 sec: 3891.2, 300 sec: 3776.7). Total num frames: 7860224. Throughput: 0: 986.0. Samples: 964640. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-07 17:38:09,533][11933] Avg episode reward: [(0, '25.441')] [2025-01-07 17:38:14,531][11933] Fps is (10 sec: 3276.7, 60 sec: 3686.4, 300 sec: 3748.9). Total num frames: 7872512. Throughput: 0: 959.5. Samples: 966566. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-07 17:38:14,536][11933] Avg episode reward: [(0, '24.514')] [2025-01-07 17:38:14,546][13105] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001922_7872512.pth... [2025-01-07 17:38:14,692][13105] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001702_6971392.pth [2025-01-07 17:38:19,531][11933] Fps is (10 sec: 3276.7, 60 sec: 3686.4, 300 sec: 3762.8). Total num frames: 7892992. Throughput: 0: 920.6. Samples: 971730. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-07 17:38:19,533][11933] Avg episode reward: [(0, '25.540')] [2025-01-07 17:38:19,922][13118] Updated weights for policy 0, policy_version 1928 (0.0026) [2025-01-07 17:38:24,531][11933] Fps is (10 sec: 4505.7, 60 sec: 3891.2, 300 sec: 3776.7). Total num frames: 7917568. Throughput: 0: 964.8. Samples: 978474. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-07 17:38:24,537][11933] Avg episode reward: [(0, '25.633')] [2025-01-07 17:38:29,531][11933] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3762.8). Total num frames: 7933952. Throughput: 0: 977.5. Samples: 981244. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-07 17:38:29,543][11933] Avg episode reward: [(0, '26.143')] [2025-01-07 17:38:30,911][13118] Updated weights for policy 0, policy_version 1938 (0.0024) [2025-01-07 17:38:34,531][11933] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3762.8). Total num frames: 7950336. Throughput: 0: 919.8. Samples: 985462. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-07 17:38:34,533][11933] Avg episode reward: [(0, '26.797')] [2025-01-07 17:38:39,531][11933] Fps is (10 sec: 3686.4, 60 sec: 3823.0, 300 sec: 3776.7). Total num frames: 7970816. Throughput: 0: 945.2. Samples: 992350. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-01-07 17:38:39,533][11933] Avg episode reward: [(0, '26.508')] [2025-01-07 17:38:40,832][13118] Updated weights for policy 0, policy_version 1948 (0.0017) [2025-01-07 17:38:44,531][11933] Fps is (10 sec: 4095.9, 60 sec: 3823.1, 300 sec: 3762.8). Total num frames: 7991296. Throughput: 0: 971.4. Samples: 995578. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-07 17:38:44,535][11933] Avg episode reward: [(0, '25.103')] [2025-01-07 17:38:49,531][11933] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3735.0). Total num frames: 8003584. Throughput: 0: 930.4. Samples: 999876. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-07 17:38:49,536][11933] Avg episode reward: [(0, '24.623')] [2025-01-07 17:38:52,670][13118] Updated weights for policy 0, policy_version 1958 (0.0027) [2025-01-07 17:38:54,531][11933] Fps is (10 sec: 3686.4, 60 sec: 3754.9, 300 sec: 3776.7). Total num frames: 8028160. Throughput: 0: 918.4. Samples: 1005970. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-07 17:38:54,534][11933] Avg episode reward: [(0, '24.796')] [2025-01-07 17:38:59,531][11933] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3776.7). Total num frames: 8048640. Throughput: 0: 949.7. Samples: 1009302. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-07 17:38:59,533][11933] Avg episode reward: [(0, '23.784')] [2025-01-07 17:39:03,153][13118] Updated weights for policy 0, policy_version 1968 (0.0014) [2025-01-07 17:39:04,531][11933] Fps is (10 sec: 3276.9, 60 sec: 3686.4, 300 sec: 3735.0). Total num frames: 8060928. Throughput: 0: 950.4. Samples: 1014496. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-07 17:39:04,537][11933] Avg episode reward: [(0, '23.250')] [2025-01-07 17:39:09,531][11933] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3762.8). Total num frames: 8081408. Throughput: 0: 912.8. Samples: 1019552. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-01-07 17:39:09,537][11933] Avg episode reward: [(0, '22.957')] [2025-01-07 17:39:14,238][13118] Updated weights for policy 0, policy_version 1978 (0.0028) [2025-01-07 17:39:14,531][11933] Fps is (10 sec: 4096.0, 60 sec: 3823.0, 300 sec: 3776.6). Total num frames: 8101888. Throughput: 0: 921.6. Samples: 1022714. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-01-07 17:39:14,533][11933] Avg episode reward: [(0, '23.730')] [2025-01-07 17:39:19,531][11933] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 8118272. Throughput: 0: 948.8. Samples: 1028156. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2025-01-07 17:39:19,539][11933] Avg episode reward: [(0, '24.554')] [2025-01-07 17:39:24,531][11933] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3735.0). Total num frames: 8130560. Throughput: 0: 882.1. Samples: 1032044. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2025-01-07 17:39:24,535][11933] Avg episode reward: [(0, '24.878')] [2025-01-07 17:39:26,998][13118] Updated weights for policy 0, policy_version 1988 (0.0019) [2025-01-07 17:39:29,531][11933] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3748.9). Total num frames: 8151040. Throughput: 0: 879.8. Samples: 1035170. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-01-07 17:39:29,533][11933] Avg episode reward: [(0, '22.832')] [2025-01-07 17:39:34,531][11933] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3762.8). Total num frames: 8175616. Throughput: 0: 933.2. Samples: 1041868. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-07 17:39:34,534][11933] Avg episode reward: [(0, '22.972')] [2025-01-07 17:39:37,650][13118] Updated weights for policy 0, policy_version 1998 (0.0019) [2025-01-07 17:39:39,531][11933] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3735.0). Total num frames: 8187904. Throughput: 0: 889.5. Samples: 1045996. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-07 17:39:39,540][11933] Avg episode reward: [(0, '22.043')] [2025-01-07 17:39:44,531][11933] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3735.0). Total num frames: 8204288. Throughput: 0: 866.5. Samples: 1048294. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2025-01-07 17:39:44,533][11933] Avg episode reward: [(0, '22.348')] [2025-01-07 17:39:49,072][13118] Updated weights for policy 0, policy_version 2008 (0.0026) [2025-01-07 17:39:49,531][11933] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3748.9). Total num frames: 8224768. Throughput: 0: 887.7. Samples: 1054442. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-07 17:39:49,535][11933] Avg episode reward: [(0, '21.425')] [2025-01-07 17:39:54,535][11933] Fps is (10 sec: 3684.8, 60 sec: 3549.6, 300 sec: 3721.1). Total num frames: 8241152. Throughput: 0: 886.4. Samples: 1059442. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-07 17:39:54,541][11933] Avg episode reward: [(0, '21.588')] [2025-01-07 17:39:59,531][11933] Fps is (10 sec: 2867.2, 60 sec: 3413.3, 300 sec: 3707.2). Total num frames: 8253440. Throughput: 0: 859.4. Samples: 1061386. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-01-07 17:39:59,540][11933] Avg episode reward: [(0, '23.011')] [2025-01-07 17:40:01,669][13118] Updated weights for policy 0, policy_version 2018 (0.0014) [2025-01-07 17:40:04,531][11933] Fps is (10 sec: 3688.0, 60 sec: 3618.1, 300 sec: 3735.0). Total num frames: 8278016. Throughput: 0: 871.6. Samples: 1067376. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-07 17:40:04,537][11933] Avg episode reward: [(0, '22.490')] [2025-01-07 17:40:09,531][11933] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3721.1). Total num frames: 8294400. Throughput: 0: 921.3. Samples: 1073504. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-07 17:40:09,533][11933] Avg episode reward: [(0, '23.479')] [2025-01-07 17:40:13,028][13118] Updated weights for policy 0, policy_version 2028 (0.0024) [2025-01-07 17:40:14,538][11933] Fps is (10 sec: 2865.1, 60 sec: 3412.9, 300 sec: 3693.3). Total num frames: 8306688. Throughput: 0: 893.4. Samples: 1075380. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-07 17:40:14,545][11933] Avg episode reward: [(0, '23.532')] [2025-01-07 17:40:14,567][13105] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002028_8306688.pth... [2025-01-07 17:40:14,750][13105] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001812_7421952.pth [2025-01-07 17:40:19,531][11933] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3707.2). Total num frames: 8327168. Throughput: 0: 851.6. Samples: 1080192. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-07 17:40:19,533][11933] Avg episode reward: [(0, '23.604')] [2025-01-07 17:40:23,834][13118] Updated weights for policy 0, policy_version 2038 (0.0021) [2025-01-07 17:40:24,531][11933] Fps is (10 sec: 4098.9, 60 sec: 3618.1, 300 sec: 3707.2). Total num frames: 8347648. Throughput: 0: 902.3. Samples: 1086598. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-07 17:40:24,539][11933] Avg episode reward: [(0, '23.467')] [2025-01-07 17:40:29,531][11933] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3693.3). Total num frames: 8364032. Throughput: 0: 906.9. Samples: 1089106. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-07 17:40:29,533][11933] Avg episode reward: [(0, '23.641')] [2025-01-07 17:40:34,531][11933] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3693.3). Total num frames: 8380416. Throughput: 0: 863.3. Samples: 1093292. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-07 17:40:34,535][11933] Avg episode reward: [(0, '24.755')] [2025-01-07 17:40:36,193][13118] Updated weights for policy 0, policy_version 2048 (0.0023) [2025-01-07 17:40:39,531][11933] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3707.2). Total num frames: 8400896. Throughput: 0: 900.4. Samples: 1099956. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-07 17:40:39,533][11933] Avg episode reward: [(0, '27.387')] [2025-01-07 17:40:44,531][11933] Fps is (10 sec: 4095.9, 60 sec: 3618.1, 300 sec: 3707.2). Total num frames: 8421376. Throughput: 0: 931.3. Samples: 1103294. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-01-07 17:40:44,539][11933] Avg episode reward: [(0, '26.151')] [2025-01-07 17:40:46,783][13118] Updated weights for policy 0, policy_version 2058 (0.0028) [2025-01-07 17:40:49,531][11933] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3679.5). Total num frames: 8433664. Throughput: 0: 890.8. Samples: 1107462. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-07 17:40:49,536][11933] Avg episode reward: [(0, '25.505')] [2025-01-07 17:40:54,531][11933] Fps is (10 sec: 3276.9, 60 sec: 3550.1, 300 sec: 3693.3). Total num frames: 8454144. Throughput: 0: 886.3. Samples: 1113386. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-07 17:40:54,533][11933] Avg episode reward: [(0, '24.838')] [2025-01-07 17:40:57,429][13118] Updated weights for policy 0, policy_version 2068 (0.0014) [2025-01-07 17:40:59,531][11933] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3707.2). Total num frames: 8478720. Throughput: 0: 918.8. Samples: 1116720. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-07 17:40:59,533][11933] Avg episode reward: [(0, '23.523')] [2025-01-07 17:41:04,531][11933] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3679.5). Total num frames: 8491008. Throughput: 0: 926.9. Samples: 1121902. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-07 17:41:04,533][11933] Avg episode reward: [(0, '22.617')] [2025-01-07 17:41:09,378][13118] Updated weights for policy 0, policy_version 2078 (0.0030) [2025-01-07 17:41:09,531][11933] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3693.3). Total num frames: 8511488. Throughput: 0: 895.0. Samples: 1126874. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-07 17:41:09,535][11933] Avg episode reward: [(0, '22.514')] [2025-01-07 17:41:14,531][11933] Fps is (10 sec: 4095.8, 60 sec: 3755.1, 300 sec: 3707.2). Total num frames: 8531968. Throughput: 0: 909.9. Samples: 1130054. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-01-07 17:41:14,534][11933] Avg episode reward: [(0, '24.846')] [2025-01-07 17:41:19,533][11933] Fps is (10 sec: 3685.5, 60 sec: 3686.2, 300 sec: 3679.4). Total num frames: 8548352. Throughput: 0: 951.9. Samples: 1136128. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-07 17:41:19,536][11933] Avg episode reward: [(0, '25.350')] [2025-01-07 17:41:19,986][13118] Updated weights for policy 0, policy_version 2088 (0.0020) [2025-01-07 17:41:24,531][11933] Fps is (10 sec: 2867.3, 60 sec: 3549.9, 300 sec: 3665.6). Total num frames: 8560640. Throughput: 0: 891.7. Samples: 1140084. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-07 17:41:24,543][11933] Avg episode reward: [(0, '26.240')] [2025-01-07 17:41:29,531][11933] Fps is (10 sec: 3687.3, 60 sec: 3686.4, 300 sec: 3693.4). Total num frames: 8585216. Throughput: 0: 890.6. Samples: 1143372. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2025-01-07 17:41:29,533][11933] Avg episode reward: [(0, '26.894')] [2025-01-07 17:41:31,003][13118] Updated weights for policy 0, policy_version 2098 (0.0022) [2025-01-07 17:41:34,531][11933] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3693.3). Total num frames: 8605696. Throughput: 0: 945.5. Samples: 1150008. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-01-07 17:41:34,538][11933] Avg episode reward: [(0, '26.326')] [2025-01-07 17:41:39,531][11933] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3665.6). Total num frames: 8617984. Throughput: 0: 911.7. Samples: 1154414. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-07 17:41:39,539][11933] Avg episode reward: [(0, '26.628')] [2025-01-07 17:41:43,176][13118] Updated weights for policy 0, policy_version 2108 (0.0031) [2025-01-07 17:41:44,531][11933] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3679.5). Total num frames: 8638464. Throughput: 0: 889.6. Samples: 1156750. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-07 17:41:44,533][11933] Avg episode reward: [(0, '25.724')] [2025-01-07 17:41:49,531][11933] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3679.5). Total num frames: 8658944. Throughput: 0: 918.4. Samples: 1163232. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-07 17:41:49,533][11933] Avg episode reward: [(0, '26.329')] [2025-01-07 17:41:53,148][13118] Updated weights for policy 0, policy_version 2118 (0.0028) [2025-01-07 17:41:54,531][11933] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3665.6). Total num frames: 8675328. Throughput: 0: 927.8. Samples: 1168624. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-01-07 17:41:54,534][11933] Avg episode reward: [(0, '26.692')] [2025-01-07 17:41:59,531][11933] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3665.6). Total num frames: 8691712. Throughput: 0: 901.3. Samples: 1170612. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-01-07 17:41:59,536][11933] Avg episode reward: [(0, '27.385')] [2025-01-07 17:42:04,531][11933] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3679.5). Total num frames: 8712192. Throughput: 0: 902.5. Samples: 1176738. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-07 17:42:04,533][11933] Avg episode reward: [(0, '27.551')] [2025-01-07 17:42:04,605][13118] Updated weights for policy 0, policy_version 2128 (0.0025) [2025-01-07 17:42:09,535][11933] Fps is (10 sec: 4094.3, 60 sec: 3686.1, 300 sec: 3665.5). Total num frames: 8732672. Throughput: 0: 953.0. Samples: 1182974. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-07 17:42:09,538][11933] Avg episode reward: [(0, '26.148')] [2025-01-07 17:42:14,531][11933] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3637.8). Total num frames: 8744960. Throughput: 0: 923.3. Samples: 1184922. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-01-07 17:42:14,533][11933] Avg episode reward: [(0, '26.683')] [2025-01-07 17:42:14,565][13105] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002136_8749056.pth... [2025-01-07 17:42:14,769][13105] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001922_7872512.pth [2025-01-07 17:42:17,022][13118] Updated weights for policy 0, policy_version 2138 (0.0018) [2025-01-07 17:42:19,531][11933] Fps is (10 sec: 3278.2, 60 sec: 3618.3, 300 sec: 3665.6). Total num frames: 8765440. Throughput: 0: 884.8. Samples: 1189822. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-01-07 17:42:19,538][11933] Avg episode reward: [(0, '25.960')] [2025-01-07 17:42:24,531][11933] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3679.5). Total num frames: 8790016. Throughput: 0: 933.5. Samples: 1196420. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-07 17:42:24,533][11933] Avg episode reward: [(0, '24.465')] [2025-01-07 17:42:26,562][13118] Updated weights for policy 0, policy_version 2148 (0.0015) [2025-01-07 17:42:29,531][11933] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3637.8). Total num frames: 8802304. Throughput: 0: 939.9. Samples: 1199044. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-07 17:42:29,533][11933] Avg episode reward: [(0, '24.301')] [2025-01-07 17:42:34,531][11933] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3651.7). Total num frames: 8818688. Throughput: 0: 885.7. Samples: 1203090. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-07 17:42:34,534][11933] Avg episode reward: [(0, '23.449')] [2025-01-07 17:42:38,660][13118] Updated weights for policy 0, policy_version 2158 (0.0035) [2025-01-07 17:42:39,531][11933] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3651.7). Total num frames: 8839168. Throughput: 0: 911.2. Samples: 1209630. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-01-07 17:42:39,537][11933] Avg episode reward: [(0, '24.863')] [2025-01-07 17:42:44,531][11933] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3651.7). Total num frames: 8859648. Throughput: 0: 938.8. Samples: 1212860. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-01-07 17:42:44,537][11933] Avg episode reward: [(0, '23.411')] [2025-01-07 17:42:49,531][11933] Fps is (10 sec: 3276.6, 60 sec: 3549.8, 300 sec: 3624.0). Total num frames: 8871936. Throughput: 0: 893.6. Samples: 1216950. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-07 17:42:49,534][11933] Avg episode reward: [(0, '22.930')] [2025-01-07 17:42:51,204][13118] Updated weights for policy 0, policy_version 2168 (0.0020) [2025-01-07 17:42:54,531][11933] Fps is (10 sec: 3276.7, 60 sec: 3618.1, 300 sec: 3651.7). Total num frames: 8892416. Throughput: 0: 878.3. Samples: 1222494. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-07 17:42:54,537][11933] Avg episode reward: [(0, '24.248')] [2025-01-07 17:42:59,531][11933] Fps is (10 sec: 4096.3, 60 sec: 3686.4, 300 sec: 3637.8). Total num frames: 8912896. Throughput: 0: 904.0. Samples: 1225600. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-07 17:42:59,538][11933] Avg episode reward: [(0, '24.700')] [2025-01-07 17:43:01,713][13118] Updated weights for policy 0, policy_version 2178 (0.0019) [2025-01-07 17:43:04,531][11933] Fps is (10 sec: 3276.9, 60 sec: 3549.9, 300 sec: 3610.0). Total num frames: 8925184. Throughput: 0: 903.2. Samples: 1230464. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-07 17:43:04,535][11933] Avg episode reward: [(0, '24.118')] [2025-01-07 17:43:09,531][11933] Fps is (10 sec: 3276.8, 60 sec: 3550.1, 300 sec: 3637.8). Total num frames: 8945664. Throughput: 0: 865.0. Samples: 1235344. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-07 17:43:09,533][11933] Avg episode reward: [(0, '24.037')] [2025-01-07 17:43:13,265][13118] Updated weights for policy 0, policy_version 2188 (0.0016) [2025-01-07 17:43:14,531][11933] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3637.8). Total num frames: 8966144. Throughput: 0: 877.5. Samples: 1238530. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-07 17:43:14,535][11933] Avg episode reward: [(0, '23.760')] [2025-01-07 17:43:19,538][11933] Fps is (10 sec: 3683.7, 60 sec: 3617.7, 300 sec: 3609.9). Total num frames: 8982528. Throughput: 0: 913.8. Samples: 1244216. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-07 17:43:19,540][11933] Avg episode reward: [(0, '24.220')] [2025-01-07 17:43:24,531][11933] Fps is (10 sec: 2867.2, 60 sec: 3413.3, 300 sec: 3596.1). Total num frames: 8994816. Throughput: 0: 856.6. Samples: 1248178. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-07 17:43:24,533][11933] Avg episode reward: [(0, '24.749')] [2025-01-07 17:43:25,803][13118] Updated weights for policy 0, policy_version 2198 (0.0046) [2025-01-07 17:43:26,698][13105] Stopping Batcher_0... [2025-01-07 17:43:26,700][13105] Loop batcher_evt_loop terminating... [2025-01-07 17:43:26,699][13105] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002199_9007104.pth... [2025-01-07 17:43:26,699][11933] Component Batcher_0 stopped! [2025-01-07 17:43:26,767][13118] Weights refcount: 2 0 [2025-01-07 17:43:26,771][11933] Component InferenceWorker_p0-w0 stopped! [2025-01-07 17:43:26,771][13118] Stopping InferenceWorker_p0-w0... [2025-01-07 17:43:26,779][13118] Loop inference_proc0-0_evt_loop terminating... [2025-01-07 17:43:26,828][13105] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002028_8306688.pth [2025-01-07 17:43:26,840][13105] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002199_9007104.pth... [2025-01-07 17:43:27,102][13105] Stopping LearnerWorker_p0... [2025-01-07 17:43:27,101][11933] Component LearnerWorker_p0 stopped! [2025-01-07 17:43:27,103][13105] Loop learner_proc0_evt_loop terminating... [2025-01-07 17:43:27,173][11933] Component RolloutWorker_w1 stopped! [2025-01-07 17:43:27,176][13120] Stopping RolloutWorker_w1... [2025-01-07 17:43:27,176][13120] Loop rollout_proc1_evt_loop terminating... [2025-01-07 17:43:27,229][13123] Stopping RolloutWorker_w4... [2025-01-07 17:43:27,231][13125] Stopping RolloutWorker_w6... [2025-01-07 17:43:27,232][13123] Loop rollout_proc4_evt_loop terminating... [2025-01-07 17:43:27,229][11933] Component RolloutWorker_w4 stopped! [2025-01-07 17:43:27,233][11933] Component RolloutWorker_w6 stopped! [2025-01-07 17:43:27,237][13125] Loop rollout_proc6_evt_loop terminating... [2025-01-07 17:43:27,256][13121] Stopping RolloutWorker_w2... [2025-01-07 17:43:27,257][13121] Loop rollout_proc2_evt_loop terminating... [2025-01-07 17:43:27,256][11933] Component RolloutWorker_w2 stopped! [2025-01-07 17:43:27,260][11933] Component RolloutWorker_w5 stopped! [2025-01-07 17:43:27,263][13124] Stopping RolloutWorker_w5... [2025-01-07 17:43:27,267][13124] Loop rollout_proc5_evt_loop terminating... [2025-01-07 17:43:27,272][13119] Stopping RolloutWorker_w0... [2025-01-07 17:43:27,272][13119] Loop rollout_proc0_evt_loop terminating... [2025-01-07 17:43:27,271][11933] Component RolloutWorker_w0 stopped! [2025-01-07 17:43:27,277][13122] Stopping RolloutWorker_w3... [2025-01-07 17:43:27,278][11933] Component RolloutWorker_w3 stopped! [2025-01-07 17:43:27,284][13122] Loop rollout_proc3_evt_loop terminating... [2025-01-07 17:43:27,299][13126] Stopping RolloutWorker_w7... [2025-01-07 17:43:27,299][11933] Component RolloutWorker_w7 stopped! [2025-01-07 17:43:27,304][11933] Waiting for process learner_proc0 to stop... [2025-01-07 17:43:27,304][13126] Loop rollout_proc7_evt_loop terminating... [2025-01-07 17:43:28,775][11933] Waiting for process inference_proc0-0 to join... [2025-01-07 17:43:28,784][11933] Waiting for process rollout_proc0 to join... [2025-01-07 17:43:30,980][11933] Waiting for process rollout_proc1 to join... [2025-01-07 17:43:30,989][11933] Waiting for process rollout_proc2 to join... [2025-01-07 17:43:30,992][11933] Waiting for process rollout_proc3 to join... [2025-01-07 17:43:30,997][11933] Waiting for process rollout_proc4 to join... [2025-01-07 17:43:31,000][11933] Waiting for process rollout_proc5 to join... [2025-01-07 17:43:31,005][11933] Waiting for process rollout_proc6 to join... [2025-01-07 17:43:31,009][11933] Waiting for process rollout_proc7 to join... [2025-01-07 17:43:31,013][11933] Batcher 0 profile tree view: batching: 34.9167, releasing_batches: 0.0379 [2025-01-07 17:43:31,016][11933] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0000 wait_policy_total: 553.7358 update_model: 10.9782 weight_update: 0.0053 one_step: 0.0026 handle_policy_step: 744.5845 deserialize: 19.4657, stack: 4.2643, obs_to_device_normalize: 157.0131, forward: 373.6494, send_messages: 37.6186 prepare_outputs: 115.0135 to_cpu: 67.9878 [2025-01-07 17:43:31,019][11933] Learner 0 profile tree view: misc: 0.0059, prepare_batch: 16.3823 train: 92.6794 epoch_init: 0.0149, minibatch_init: 0.0133, losses_postprocess: 0.7539, kl_divergence: 0.8329, after_optimizer: 4.0364 calculate_losses: 32.6787 losses_init: 0.0098, forward_head: 1.7321, bptt_initial: 21.8464, tail: 1.3190, advantages_returns: 0.3285, losses: 4.6683 bptt: 2.4084 bptt_forward_core: 2.2717 update: 53.5158 clip: 1.1703 [2025-01-07 17:43:31,020][11933] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.5005, enqueue_policy_requests: 139.7066, env_step: 1058.3784, overhead: 18.8241, complete_rollouts: 9.7427 save_policy_outputs: 27.9916 split_output_tensors: 10.9436 [2025-01-07 17:43:31,022][11933] RolloutWorker_w7 profile tree view: wait_for_trajectories: 0.3860, enqueue_policy_requests: 138.2515, env_step: 1064.7715, overhead: 18.6742, complete_rollouts: 8.3265 save_policy_outputs: 28.2887 split_output_tensors: 11.1694 [2025-01-07 17:43:31,024][11933] Loop Runner_EvtLoop terminating... [2025-01-07 17:43:31,025][11933] Runner profile tree view: main_loop: 1391.1974 [2025-01-07 17:43:31,027][11933] Collected {0: 9007104}, FPS: 3594.9 [2025-01-07 17:43:41,237][11933] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2025-01-07 17:43:41,239][11933] Overriding arg 'num_workers' with value 1 passed from command line [2025-01-07 17:43:41,241][11933] Adding new argument 'no_render'=True that is not in the saved config file! [2025-01-07 17:43:41,243][11933] Adding new argument 'save_video'=True that is not in the saved config file! [2025-01-07 17:43:41,245][11933] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2025-01-07 17:43:41,247][11933] Adding new argument 'video_name'=None that is not in the saved config file! [2025-01-07 17:43:41,248][11933] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! [2025-01-07 17:43:41,250][11933] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2025-01-07 17:43:41,251][11933] Adding new argument 'push_to_hub'=False that is not in the saved config file! [2025-01-07 17:43:41,252][11933] Adding new argument 'hf_repository'=None that is not in the saved config file! [2025-01-07 17:43:41,253][11933] Adding new argument 'policy_index'=0 that is not in the saved config file! [2025-01-07 17:43:41,254][11933] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2025-01-07 17:43:41,255][11933] Adding new argument 'train_script'=None that is not in the saved config file! [2025-01-07 17:43:41,256][11933] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2025-01-07 17:43:41,257][11933] Using frameskip 1 and render_action_repeat=4 for evaluation [2025-01-07 17:43:41,298][11933] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-07 17:43:41,303][11933] RunningMeanStd input shape: (3, 72, 128) [2025-01-07 17:43:41,306][11933] RunningMeanStd input shape: (1,) [2025-01-07 17:43:41,328][11933] ConvEncoder: input_channels=3 [2025-01-07 17:43:41,455][11933] Conv encoder output size: 512 [2025-01-07 17:43:41,457][11933] Policy head output size: 512 [2025-01-07 17:43:41,734][11933] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002199_9007104.pth... [2025-01-07 17:43:42,569][11933] Num frames 100... [2025-01-07 17:43:42,699][11933] Num frames 200... [2025-01-07 17:43:42,833][11933] Num frames 300... [2025-01-07 17:43:42,963][11933] Num frames 400... [2025-01-07 17:43:43,100][11933] Num frames 500... [2025-01-07 17:43:43,227][11933] Num frames 600... [2025-01-07 17:43:43,369][11933] Num frames 700... [2025-01-07 17:43:43,501][11933] Num frames 800... [2025-01-07 17:43:43,630][11933] Num frames 900... [2025-01-07 17:43:43,753][11933] Num frames 1000... [2025-01-07 17:43:43,880][11933] Num frames 1100... [2025-01-07 17:43:44,006][11933] Num frames 1200... [2025-01-07 17:43:44,149][11933] Num frames 1300... [2025-01-07 17:43:44,274][11933] Num frames 1400... [2025-01-07 17:43:44,408][11933] Num frames 1500... [2025-01-07 17:43:44,533][11933] Num frames 1600... [2025-01-07 17:43:44,673][11933] Avg episode rewards: #0: 40.639, true rewards: #0: 16.640 [2025-01-07 17:43:44,674][11933] Avg episode reward: 40.639, avg true_objective: 16.640 [2025-01-07 17:43:44,723][11933] Num frames 1700... [2025-01-07 17:43:44,852][11933] Num frames 1800... [2025-01-07 17:43:44,977][11933] Num frames 1900... [2025-01-07 17:43:45,106][11933] Num frames 2000... [2025-01-07 17:43:45,233][11933] Num frames 2100... [2025-01-07 17:43:45,391][11933] Num frames 2200... [2025-01-07 17:43:45,516][11933] Num frames 2300... [2025-01-07 17:43:45,651][11933] Num frames 2400... [2025-01-07 17:43:45,807][11933] Avg episode rewards: #0: 28.320, true rewards: #0: 12.320 [2025-01-07 17:43:45,809][11933] Avg episode reward: 28.320, avg true_objective: 12.320 [2025-01-07 17:43:45,876][11933] Num frames 2500... [2025-01-07 17:43:46,056][11933] Num frames 2600... [2025-01-07 17:43:46,237][11933] Num frames 2700... [2025-01-07 17:43:46,415][11933] Num frames 2800... [2025-01-07 17:43:46,591][11933] Num frames 2900... [2025-01-07 17:43:46,762][11933] Num frames 3000... [2025-01-07 17:43:46,929][11933] Num frames 3100... [2025-01-07 17:43:47,104][11933] Num frames 3200... [2025-01-07 17:43:47,295][11933] Num frames 3300... [2025-01-07 17:43:47,480][11933] Num frames 3400... [2025-01-07 17:43:47,662][11933] Num frames 3500... [2025-01-07 17:43:47,761][11933] Avg episode rewards: #0: 28.070, true rewards: #0: 11.737 [2025-01-07 17:43:47,763][11933] Avg episode reward: 28.070, avg true_objective: 11.737 [2025-01-07 17:43:47,919][11933] Num frames 3600... [2025-01-07 17:43:48,105][11933] Num frames 3700... [2025-01-07 17:43:48,249][11933] Num frames 3800... [2025-01-07 17:43:48,387][11933] Num frames 3900... [2025-01-07 17:43:48,519][11933] Num frames 4000... [2025-01-07 17:43:48,645][11933] Num frames 4100... [2025-01-07 17:43:48,770][11933] Num frames 4200... [2025-01-07 17:43:48,939][11933] Avg episode rewards: #0: 25.222, true rewards: #0: 10.722 [2025-01-07 17:43:48,940][11933] Avg episode reward: 25.222, avg true_objective: 10.722 [2025-01-07 17:43:48,958][11933] Num frames 4300... [2025-01-07 17:43:49,088][11933] Num frames 4400... [2025-01-07 17:43:49,221][11933] Num frames 4500... [2025-01-07 17:43:49,357][11933] Num frames 4600... [2025-01-07 17:43:49,485][11933] Num frames 4700... [2025-01-07 17:43:49,616][11933] Num frames 4800... [2025-01-07 17:43:49,744][11933] Num frames 4900... [2025-01-07 17:43:49,872][11933] Num frames 5000... [2025-01-07 17:43:50,004][11933] Num frames 5100... [2025-01-07 17:43:50,131][11933] Num frames 5200... [2025-01-07 17:43:50,203][11933] Avg episode rewards: #0: 24.824, true rewards: #0: 10.424 [2025-01-07 17:43:50,204][11933] Avg episode reward: 24.824, avg true_objective: 10.424 [2025-01-07 17:43:50,335][11933] Num frames 5300... [2025-01-07 17:43:50,461][11933] Num frames 5400... [2025-01-07 17:43:50,592][11933] Num frames 5500... [2025-01-07 17:43:50,720][11933] Num frames 5600... [2025-01-07 17:43:50,896][11933] Avg episode rewards: #0: 22.325, true rewards: #0: 9.492 [2025-01-07 17:43:50,898][11933] Avg episode reward: 22.325, avg true_objective: 9.492 [2025-01-07 17:43:50,909][11933] Num frames 5700... [2025-01-07 17:43:51,033][11933] Num frames 5800... [2025-01-07 17:43:51,159][11933] Num frames 5900... [2025-01-07 17:43:51,298][11933] Num frames 6000... [2025-01-07 17:43:51,431][11933] Num frames 6100... [2025-01-07 17:43:51,560][11933] Num frames 6200... [2025-01-07 17:43:51,691][11933] Num frames 6300... [2025-01-07 17:43:51,823][11933] Num frames 6400... [2025-01-07 17:43:51,951][11933] Num frames 6500... [2025-01-07 17:43:52,043][11933] Avg episode rewards: #0: 21.896, true rewards: #0: 9.324 [2025-01-07 17:43:52,045][11933] Avg episode reward: 21.896, avg true_objective: 9.324 [2025-01-07 17:43:52,141][11933] Num frames 6600... [2025-01-07 17:43:52,270][11933] Num frames 6700... [2025-01-07 17:43:52,425][11933] Num frames 6800... [2025-01-07 17:43:52,549][11933] Num frames 6900... [2025-01-07 17:43:52,681][11933] Num frames 7000... [2025-01-07 17:43:52,812][11933] Num frames 7100... [2025-01-07 17:43:52,940][11933] Num frames 7200... [2025-01-07 17:43:53,070][11933] Num frames 7300... [2025-01-07 17:43:53,196][11933] Num frames 7400... [2025-01-07 17:43:53,334][11933] Num frames 7500... [2025-01-07 17:43:53,469][11933] Num frames 7600... [2025-01-07 17:43:53,598][11933] Num frames 7700... [2025-01-07 17:43:53,728][11933] Num frames 7800... [2025-01-07 17:43:53,919][11933] Avg episode rewards: #0: 23.624, true rewards: #0: 9.874 [2025-01-07 17:43:53,922][11933] Avg episode reward: 23.624, avg true_objective: 9.874 [2025-01-07 17:43:53,926][11933] Num frames 7900... [2025-01-07 17:43:54,053][11933] Num frames 8000... [2025-01-07 17:43:54,181][11933] Num frames 8100... [2025-01-07 17:43:54,321][11933] Num frames 8200... [2025-01-07 17:43:54,461][11933] Num frames 8300... [2025-01-07 17:43:54,538][11933] Avg episode rewards: #0: 21.683, true rewards: #0: 9.239 [2025-01-07 17:43:54,541][11933] Avg episode reward: 21.683, avg true_objective: 9.239 [2025-01-07 17:43:54,650][11933] Num frames 8400... [2025-01-07 17:43:54,784][11933] Num frames 8500... [2025-01-07 17:43:54,917][11933] Num frames 8600... [2025-01-07 17:43:55,048][11933] Num frames 8700... [2025-01-07 17:43:55,154][11933] Avg episode rewards: #0: 20.338, true rewards: #0: 8.738 [2025-01-07 17:43:55,156][11933] Avg episode reward: 20.338, avg true_objective: 8.738 [2025-01-07 17:44:47,884][11933] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2025-01-07 17:45:29,549][11933] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2025-01-07 17:45:29,551][11933] Overriding arg 'num_workers' with value 1 passed from command line [2025-01-07 17:45:29,552][11933] Adding new argument 'no_render'=True that is not in the saved config file! [2025-01-07 17:45:29,555][11933] Adding new argument 'save_video'=True that is not in the saved config file! [2025-01-07 17:45:29,556][11933] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2025-01-07 17:45:29,558][11933] Adding new argument 'video_name'=None that is not in the saved config file! [2025-01-07 17:45:29,559][11933] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2025-01-07 17:45:29,561][11933] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2025-01-07 17:45:29,562][11933] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2025-01-07 17:45:29,563][11933] Adding new argument 'hf_repository'='NBKi/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2025-01-07 17:45:29,564][11933] Adding new argument 'policy_index'=0 that is not in the saved config file! [2025-01-07 17:45:29,565][11933] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2025-01-07 17:45:29,566][11933] Adding new argument 'train_script'=None that is not in the saved config file! [2025-01-07 17:45:29,567][11933] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2025-01-07 17:45:29,568][11933] Using frameskip 1 and render_action_repeat=4 for evaluation [2025-01-07 17:45:29,598][11933] RunningMeanStd input shape: (3, 72, 128) [2025-01-07 17:45:29,601][11933] RunningMeanStd input shape: (1,) [2025-01-07 17:45:29,614][11933] ConvEncoder: input_channels=3 [2025-01-07 17:45:29,651][11933] Conv encoder output size: 512 [2025-01-07 17:45:29,653][11933] Policy head output size: 512 [2025-01-07 17:45:29,672][11933] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002199_9007104.pth... [2025-01-07 17:45:30,095][11933] Num frames 100... [2025-01-07 17:45:30,217][11933] Num frames 200... [2025-01-07 17:45:30,358][11933] Num frames 300... [2025-01-07 17:45:30,485][11933] Num frames 400... [2025-01-07 17:45:30,610][11933] Num frames 500... [2025-01-07 17:45:30,735][11933] Num frames 600... [2025-01-07 17:45:30,865][11933] Num frames 700... [2025-01-07 17:45:30,987][11933] Num frames 800... [2025-01-07 17:45:31,120][11933] Num frames 900... [2025-01-07 17:45:31,249][11933] Num frames 1000... [2025-01-07 17:45:31,381][11933] Num frames 1100... [2025-01-07 17:45:31,506][11933] Num frames 1200... [2025-01-07 17:45:31,637][11933] Num frames 1300... [2025-01-07 17:45:31,763][11933] Num frames 1400... [2025-01-07 17:45:31,895][11933] Num frames 1500... [2025-01-07 17:45:32,020][11933] Num frames 1600... [2025-01-07 17:45:32,156][11933] Num frames 1700... [2025-01-07 17:45:32,290][11933] Num frames 1800... [2025-01-07 17:45:32,420][11933] Num frames 1900... [2025-01-07 17:45:32,545][11933] Num frames 2000... [2025-01-07 17:45:32,681][11933] Num frames 2100... [2025-01-07 17:45:32,734][11933] Avg episode rewards: #0: 60.999, true rewards: #0: 21.000 [2025-01-07 17:45:32,736][11933] Avg episode reward: 60.999, avg true_objective: 21.000 [2025-01-07 17:45:32,874][11933] Num frames 2200... [2025-01-07 17:45:32,997][11933] Num frames 2300... [2025-01-07 17:45:33,124][11933] Num frames 2400... [2025-01-07 17:45:33,261][11933] Num frames 2500... [2025-01-07 17:45:33,411][11933] Num frames 2600... [2025-01-07 17:45:33,544][11933] Num frames 2700... [2025-01-07 17:45:33,672][11933] Num frames 2800... [2025-01-07 17:45:33,801][11933] Num frames 2900... [2025-01-07 17:45:33,927][11933] Num frames 3000... [2025-01-07 17:45:34,056][11933] Num frames 3100... [2025-01-07 17:45:34,191][11933] Num frames 3200... [2025-01-07 17:45:34,326][11933] Num frames 3300... [2025-01-07 17:45:34,456][11933] Num frames 3400... [2025-01-07 17:45:34,590][11933] Num frames 3500... [2025-01-07 17:45:34,717][11933] Num frames 3600... [2025-01-07 17:45:34,850][11933] Num frames 3700... [2025-01-07 17:45:34,910][11933] Avg episode rewards: #0: 49.009, true rewards: #0: 18.510 [2025-01-07 17:45:34,913][11933] Avg episode reward: 49.009, avg true_objective: 18.510 [2025-01-07 17:45:35,037][11933] Num frames 3800... [2025-01-07 17:45:35,167][11933] Num frames 3900... [2025-01-07 17:45:35,313][11933] Num frames 4000... [2025-01-07 17:45:35,483][11933] Num frames 4100... [2025-01-07 17:45:35,665][11933] Num frames 4200... [2025-01-07 17:45:35,871][11933] Num frames 4300... [2025-01-07 17:45:36,047][11933] Num frames 4400... [2025-01-07 17:45:36,226][11933] Num frames 4500... [2025-01-07 17:45:36,292][11933] Avg episode rewards: #0: 37.340, true rewards: #0: 15.007 [2025-01-07 17:45:36,295][11933] Avg episode reward: 37.340, avg true_objective: 15.007 [2025-01-07 17:45:36,467][11933] Num frames 4600... [2025-01-07 17:45:36,634][11933] Num frames 4700... [2025-01-07 17:45:36,806][11933] Num frames 4800... [2025-01-07 17:45:36,977][11933] Num frames 4900... [2025-01-07 17:45:37,176][11933] Avg episode rewards: #0: 29.947, true rewards: #0: 12.448 [2025-01-07 17:45:37,178][11933] Avg episode reward: 29.947, avg true_objective: 12.448 [2025-01-07 17:45:37,220][11933] Num frames 5000... [2025-01-07 17:45:37,410][11933] Num frames 5100... [2025-01-07 17:45:37,594][11933] Num frames 5200... [2025-01-07 17:45:37,782][11933] Num frames 5300... [2025-01-07 17:45:37,922][11933] Num frames 5400... [2025-01-07 17:45:38,047][11933] Num frames 5500... [2025-01-07 17:45:38,174][11933] Num frames 5600... [2025-01-07 17:45:38,303][11933] Num frames 5700... [2025-01-07 17:45:38,444][11933] Num frames 5800... [2025-01-07 17:45:38,576][11933] Num frames 5900... [2025-01-07 17:45:38,702][11933] Num frames 6000... [2025-01-07 17:45:38,829][11933] Num frames 6100... [2025-01-07 17:45:38,960][11933] Num frames 6200... [2025-01-07 17:45:39,088][11933] Num frames 6300... [2025-01-07 17:45:39,227][11933] Avg episode rewards: #0: 30.110, true rewards: #0: 12.710 [2025-01-07 17:45:39,229][11933] Avg episode reward: 30.110, avg true_objective: 12.710 [2025-01-07 17:45:39,291][11933] Num frames 6400... [2025-01-07 17:45:39,432][11933] Num frames 6500... [2025-01-07 17:45:39,557][11933] Num frames 6600... [2025-01-07 17:45:39,681][11933] Num frames 6700... [2025-01-07 17:45:39,807][11933] Num frames 6800... [2025-01-07 17:45:39,933][11933] Num frames 6900... [2025-01-07 17:45:40,056][11933] Num frames 7000... [2025-01-07 17:45:40,180][11933] Num frames 7100... [2025-01-07 17:45:40,317][11933] Num frames 7200... [2025-01-07 17:45:40,453][11933] Num frames 7300... [2025-01-07 17:45:40,580][11933] Num frames 7400... [2025-01-07 17:45:40,705][11933] Num frames 7500... [2025-01-07 17:45:40,835][11933] Num frames 7600... [2025-01-07 17:45:40,959][11933] Num frames 7700... [2025-01-07 17:45:41,084][11933] Num frames 7800... [2025-01-07 17:45:41,214][11933] Num frames 7900... [2025-01-07 17:45:41,347][11933] Num frames 8000... [2025-01-07 17:45:41,482][11933] Num frames 8100... [2025-01-07 17:45:41,615][11933] Num frames 8200... [2025-01-07 17:45:41,747][11933] Avg episode rewards: #0: 33.098, true rewards: #0: 13.765 [2025-01-07 17:45:41,749][11933] Avg episode reward: 33.098, avg true_objective: 13.765 [2025-01-07 17:45:41,806][11933] Num frames 8300... [2025-01-07 17:45:41,933][11933] Num frames 8400... [2025-01-07 17:45:42,058][11933] Num frames 8500... [2025-01-07 17:45:42,181][11933] Num frames 8600... [2025-01-07 17:45:42,314][11933] Num frames 8700... [2025-01-07 17:45:42,448][11933] Num frames 8800... [2025-01-07 17:45:42,575][11933] Num frames 8900... [2025-01-07 17:45:42,702][11933] Num frames 9000... [2025-01-07 17:45:42,828][11933] Num frames 9100... [2025-01-07 17:45:42,960][11933] Num frames 9200... [2025-01-07 17:45:43,116][11933] Avg episode rewards: #0: 31.833, true rewards: #0: 13.261 [2025-01-07 17:45:43,118][11933] Avg episode reward: 31.833, avg true_objective: 13.261 [2025-01-07 17:45:43,144][11933] Num frames 9300... [2025-01-07 17:45:43,268][11933] Num frames 9400... [2025-01-07 17:45:43,408][11933] Num frames 9500... [2025-01-07 17:45:43,545][11933] Num frames 9600... [2025-01-07 17:45:43,675][11933] Num frames 9700... [2025-01-07 17:45:43,804][11933] Num frames 9800... [2025-01-07 17:45:43,930][11933] Num frames 9900... [2025-01-07 17:45:44,060][11933] Num frames 10000... [2025-01-07 17:45:44,182][11933] Num frames 10100... [2025-01-07 17:45:44,316][11933] Num frames 10200... [2025-01-07 17:45:44,449][11933] Num frames 10300... [2025-01-07 17:45:44,587][11933] Num frames 10400... [2025-01-07 17:45:44,715][11933] Num frames 10500... [2025-01-07 17:45:44,842][11933] Num frames 10600... [2025-01-07 17:45:44,974][11933] Num frames 10700... [2025-01-07 17:45:45,098][11933] Num frames 10800... [2025-01-07 17:45:45,228][11933] Num frames 10900... [2025-01-07 17:45:45,365][11933] Num frames 11000... [2025-01-07 17:45:45,496][11933] Num frames 11100... [2025-01-07 17:45:45,633][11933] Num frames 11200... [2025-01-07 17:45:45,763][11933] Num frames 11300... [2025-01-07 17:45:45,925][11933] Avg episode rewards: #0: 35.103, true rewards: #0: 14.229 [2025-01-07 17:45:45,928][11933] Avg episode reward: 35.103, avg true_objective: 14.229 [2025-01-07 17:45:45,954][11933] Num frames 11400... [2025-01-07 17:45:46,085][11933] Num frames 11500... [2025-01-07 17:45:46,213][11933] Num frames 11600... [2025-01-07 17:45:46,345][11933] Num frames 11700... [2025-01-07 17:45:46,473][11933] Num frames 11800... [2025-01-07 17:45:46,616][11933] Num frames 11900... [2025-01-07 17:45:46,743][11933] Num frames 12000... [2025-01-07 17:45:46,870][11933] Num frames 12100... [2025-01-07 17:45:46,999][11933] Num frames 12200... [2025-01-07 17:45:47,128][11933] Num frames 12300... [2025-01-07 17:45:47,258][11933] Num frames 12400... [2025-01-07 17:45:47,398][11933] Num frames 12500... [2025-01-07 17:45:47,529][11933] Num frames 12600... [2025-01-07 17:45:47,664][11933] Num frames 12700... [2025-01-07 17:45:47,794][11933] Num frames 12800... [2025-01-07 17:45:47,924][11933] Avg episode rewards: #0: 35.163, true rewards: #0: 14.274 [2025-01-07 17:45:47,926][11933] Avg episode reward: 35.163, avg true_objective: 14.274 [2025-01-07 17:45:48,022][11933] Num frames 12900... [2025-01-07 17:45:48,205][11933] Num frames 13000... [2025-01-07 17:45:48,393][11933] Num frames 13100... [2025-01-07 17:45:48,563][11933] Num frames 13200... [2025-01-07 17:45:48,738][11933] Num frames 13300... [2025-01-07 17:45:48,907][11933] Num frames 13400... [2025-01-07 17:45:49,102][11933] Avg episode rewards: #0: 32.683, true rewards: #0: 13.483 [2025-01-07 17:45:49,105][11933] Avg episode reward: 32.683, avg true_objective: 13.483 [2025-01-07 17:47:10,385][11933] Replay video saved to /content/train_dir/default_experiment/replay.mp4!