[2024-11-25 02:27:38,831][00405] Saving configuration to /content/train_dir/default_experiment/config.json... [2024-11-25 02:27:38,836][00405] Rollout worker 0 uses device cpu [2024-11-25 02:27:38,838][00405] Rollout worker 1 uses device cpu [2024-11-25 02:27:38,841][00405] Rollout worker 2 uses device cpu [2024-11-25 02:27:38,842][00405] Rollout worker 3 uses device cpu [2024-11-25 02:27:38,846][00405] Rollout worker 4 uses device cpu [2024-11-25 02:27:38,847][00405] Rollout worker 5 uses device cpu [2024-11-25 02:27:38,848][00405] Rollout worker 6 uses device cpu [2024-11-25 02:27:38,849][00405] Rollout worker 7 uses device cpu [2024-11-25 02:27:39,027][00405] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-11-25 02:27:39,029][00405] InferenceWorker_p0-w0: min num requests: 2 [2024-11-25 02:27:39,073][00405] Starting all processes... [2024-11-25 02:27:39,075][00405] Starting process learner_proc0 [2024-11-25 02:27:39,135][00405] Starting all processes... [2024-11-25 02:27:39,150][00405] Starting process inference_proc0-0 [2024-11-25 02:27:39,150][00405] Starting process rollout_proc0 [2024-11-25 02:27:39,152][00405] Starting process rollout_proc1 [2024-11-25 02:27:39,153][00405] Starting process rollout_proc2 [2024-11-25 02:27:39,153][00405] Starting process rollout_proc3 [2024-11-25 02:27:39,153][00405] Starting process rollout_proc4 [2024-11-25 02:27:39,153][00405] Starting process rollout_proc5 [2024-11-25 02:27:39,153][00405] Starting process rollout_proc6 [2024-11-25 02:27:39,153][00405] Starting process rollout_proc7 [2024-11-25 02:27:55,728][04091] Worker 0 uses CPU cores [0] [2024-11-25 02:27:55,852][04092] Worker 1 uses CPU cores [1] [2024-11-25 02:27:56,026][04096] Worker 4 uses CPU cores [0] [2024-11-25 02:27:56,171][04086] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-11-25 02:27:56,173][04086] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2024-11-25 02:27:56,233][04073] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-11-25 02:27:56,244][04073] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2024-11-25 02:27:56,244][04086] Num visible devices: 1 [2024-11-25 02:27:56,268][04093] Worker 2 uses CPU cores [0] [2024-11-25 02:27:56,300][04073] Num visible devices: 1 [2024-11-25 02:27:56,323][04073] Starting seed is not provided [2024-11-25 02:27:56,324][04073] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-11-25 02:27:56,324][04073] Initializing actor-critic model on device cuda:0 [2024-11-25 02:27:56,325][04073] RunningMeanStd input shape: (3, 72, 128) [2024-11-25 02:27:56,333][04073] RunningMeanStd input shape: (1,) [2024-11-25 02:27:56,342][04095] Worker 5 uses CPU cores [1] [2024-11-25 02:27:56,353][04097] Worker 6 uses CPU cores [0] [2024-11-25 02:27:56,367][04073] ConvEncoder: input_channels=3 [2024-11-25 02:27:56,409][04094] Worker 3 uses CPU cores [1] [2024-11-25 02:27:56,418][04098] Worker 7 uses CPU cores [1] [2024-11-25 02:27:56,637][04073] Conv encoder output size: 512 [2024-11-25 02:27:56,637][04073] Policy head output size: 512 [2024-11-25 02:27:56,692][04073] Created Actor Critic model with architecture: [2024-11-25 02:27:56,692][04073] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2024-11-25 02:27:56,977][04073] Using optimizer [2024-11-25 02:27:59,017][00405] Heartbeat connected on Batcher_0 [2024-11-25 02:27:59,028][00405] Heartbeat connected on InferenceWorker_p0-w0 [2024-11-25 02:27:59,040][00405] Heartbeat connected on RolloutWorker_w0 [2024-11-25 02:27:59,043][00405] Heartbeat connected on RolloutWorker_w1 [2024-11-25 02:27:59,047][00405] Heartbeat connected on RolloutWorker_w2 [2024-11-25 02:27:59,056][00405] Heartbeat connected on RolloutWorker_w3 [2024-11-25 02:27:59,058][00405] Heartbeat connected on RolloutWorker_w4 [2024-11-25 02:27:59,063][00405] Heartbeat connected on RolloutWorker_w5 [2024-11-25 02:27:59,072][00405] Heartbeat connected on RolloutWorker_w6 [2024-11-25 02:27:59,073][00405] Heartbeat connected on RolloutWorker_w7 [2024-11-25 02:28:00,746][04073] No checkpoints found [2024-11-25 02:28:00,746][04073] Did not load from checkpoint, starting from scratch! [2024-11-25 02:28:00,746][04073] Initialized policy 0 weights for model version 0 [2024-11-25 02:28:00,750][04073] LearnerWorker_p0 finished initialization! [2024-11-25 02:28:00,753][04073] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-11-25 02:28:00,751][00405] Heartbeat connected on LearnerWorker_p0 [2024-11-25 02:28:00,846][04086] RunningMeanStd input shape: (3, 72, 128) [2024-11-25 02:28:00,847][04086] RunningMeanStd input shape: (1,) [2024-11-25 02:28:00,859][04086] ConvEncoder: input_channels=3 [2024-11-25 02:28:00,961][04086] Conv encoder output size: 512 [2024-11-25 02:28:00,961][04086] Policy head output size: 512 [2024-11-25 02:28:01,013][00405] Inference worker 0-0 is ready! [2024-11-25 02:28:01,014][00405] All inference workers are ready! Signal rollout workers to start! [2024-11-25 02:28:01,216][04092] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-25 02:28:01,217][04094] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-25 02:28:01,221][04095] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-25 02:28:01,215][04098] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-25 02:28:01,213][04096] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-25 02:28:01,224][04097] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-25 02:28:01,219][04091] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-25 02:28:01,226][04093] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-25 02:28:02,308][00405] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-11-25 02:28:02,692][04095] Decorrelating experience for 0 frames... [2024-11-25 02:28:02,693][04098] Decorrelating experience for 0 frames... [2024-11-25 02:28:02,691][04092] Decorrelating experience for 0 frames... [2024-11-25 02:28:02,876][04093] Decorrelating experience for 0 frames... [2024-11-25 02:28:02,872][04091] Decorrelating experience for 0 frames... [2024-11-25 02:28:02,878][04096] Decorrelating experience for 0 frames... [2024-11-25 02:28:02,880][04097] Decorrelating experience for 0 frames... [2024-11-25 02:28:03,236][04095] Decorrelating experience for 32 frames... [2024-11-25 02:28:03,686][04094] Decorrelating experience for 0 frames... [2024-11-25 02:28:03,828][04093] Decorrelating experience for 32 frames... [2024-11-25 02:28:03,830][04097] Decorrelating experience for 32 frames... [2024-11-25 02:28:04,030][04095] Decorrelating experience for 64 frames... [2024-11-25 02:28:04,605][04091] Decorrelating experience for 32 frames... [2024-11-25 02:28:05,088][04092] Decorrelating experience for 32 frames... [2024-11-25 02:28:05,325][04096] Decorrelating experience for 32 frames... [2024-11-25 02:28:05,534][04094] Decorrelating experience for 32 frames... [2024-11-25 02:28:05,664][04095] Decorrelating experience for 96 frames... [2024-11-25 02:28:05,960][04093] Decorrelating experience for 64 frames... [2024-11-25 02:28:05,967][04097] Decorrelating experience for 64 frames... [2024-11-25 02:28:07,311][00405] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-11-25 02:28:07,624][04098] Decorrelating experience for 32 frames... [2024-11-25 02:28:07,915][04092] Decorrelating experience for 64 frames... [2024-11-25 02:28:08,598][04094] Decorrelating experience for 64 frames... [2024-11-25 02:28:08,660][04096] Decorrelating experience for 64 frames... [2024-11-25 02:28:08,876][04091] Decorrelating experience for 64 frames... [2024-11-25 02:28:09,066][04093] Decorrelating experience for 96 frames... [2024-11-25 02:28:11,298][04094] Decorrelating experience for 96 frames... [2024-11-25 02:28:11,380][04092] Decorrelating experience for 96 frames... [2024-11-25 02:28:11,524][04096] Decorrelating experience for 96 frames... [2024-11-25 02:28:11,529][04091] Decorrelating experience for 96 frames... [2024-11-25 02:28:12,308][00405] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 38.0. Samples: 380. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-11-25 02:28:12,314][00405] Avg episode reward: [(0, '2.741')] [2024-11-25 02:28:14,319][04073] Signal inference workers to stop experience collection... [2024-11-25 02:28:14,330][04086] InferenceWorker_p0-w0: stopping experience collection [2024-11-25 02:28:14,373][04097] Decorrelating experience for 96 frames... [2024-11-25 02:28:14,467][04098] Decorrelating experience for 64 frames... [2024-11-25 02:28:14,854][04098] Decorrelating experience for 96 frames... [2024-11-25 02:28:17,308][00405] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 165.5. Samples: 2482. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-11-25 02:28:17,312][00405] Avg episode reward: [(0, '2.837')] [2024-11-25 02:28:17,677][04073] Signal inference workers to resume experience collection... [2024-11-25 02:28:17,679][04086] InferenceWorker_p0-w0: resuming experience collection [2024-11-25 02:28:22,308][00405] Fps is (10 sec: 2457.6, 60 sec: 1228.8, 300 sec: 1228.8). Total num frames: 24576. Throughput: 0: 367.2. Samples: 7344. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-25 02:28:22,310][00405] Avg episode reward: [(0, '3.415')] [2024-11-25 02:28:26,321][04086] Updated weights for policy 0, policy_version 10 (0.0033) [2024-11-25 02:28:27,308][00405] Fps is (10 sec: 4096.0, 60 sec: 1638.4, 300 sec: 1638.4). Total num frames: 40960. Throughput: 0: 391.7. Samples: 9792. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-11-25 02:28:27,310][00405] Avg episode reward: [(0, '3.750')] [2024-11-25 02:28:32,308][00405] Fps is (10 sec: 3686.4, 60 sec: 2048.0, 300 sec: 2048.0). Total num frames: 61440. Throughput: 0: 497.6. Samples: 14928. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-25 02:28:32,310][00405] Avg episode reward: [(0, '4.472')] [2024-11-25 02:28:36,156][04086] Updated weights for policy 0, policy_version 20 (0.0029) [2024-11-25 02:28:37,308][00405] Fps is (10 sec: 4505.6, 60 sec: 2457.6, 300 sec: 2457.6). Total num frames: 86016. Throughput: 0: 634.4. Samples: 22204. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-25 02:28:37,314][00405] Avg episode reward: [(0, '4.539')] [2024-11-25 02:28:42,308][00405] Fps is (10 sec: 4096.0, 60 sec: 2560.0, 300 sec: 2560.0). Total num frames: 102400. Throughput: 0: 638.0. Samples: 25522. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-11-25 02:28:42,313][00405] Avg episode reward: [(0, '4.252')] [2024-11-25 02:28:42,322][04073] Saving new best policy, reward=4.252! [2024-11-25 02:28:47,308][00405] Fps is (10 sec: 3276.8, 60 sec: 2639.6, 300 sec: 2639.6). Total num frames: 118784. Throughput: 0: 662.7. Samples: 29822. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-25 02:28:47,310][00405] Avg episode reward: [(0, '4.273')] [2024-11-25 02:28:47,320][04073] Saving new best policy, reward=4.273! [2024-11-25 02:28:47,730][04086] Updated weights for policy 0, policy_version 30 (0.0021) [2024-11-25 02:28:52,308][00405] Fps is (10 sec: 4096.0, 60 sec: 2867.2, 300 sec: 2867.2). Total num frames: 143360. Throughput: 0: 812.1. Samples: 36542. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-11-25 02:28:52,315][00405] Avg episode reward: [(0, '4.260')] [2024-11-25 02:28:55,970][04086] Updated weights for policy 0, policy_version 40 (0.0026) [2024-11-25 02:28:57,308][00405] Fps is (10 sec: 4915.2, 60 sec: 3053.4, 300 sec: 3053.4). Total num frames: 167936. Throughput: 0: 886.6. Samples: 40276. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-11-25 02:28:57,316][00405] Avg episode reward: [(0, '4.311')] [2024-11-25 02:28:57,318][04073] Saving new best policy, reward=4.311! [2024-11-25 02:29:02,308][00405] Fps is (10 sec: 3686.4, 60 sec: 3003.7, 300 sec: 3003.7). Total num frames: 180224. Throughput: 0: 949.4. Samples: 45204. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-11-25 02:29:02,312][00405] Avg episode reward: [(0, '4.513')] [2024-11-25 02:29:02,322][04073] Saving new best policy, reward=4.513! [2024-11-25 02:29:07,308][00405] Fps is (10 sec: 3276.8, 60 sec: 3345.2, 300 sec: 3087.8). Total num frames: 200704. Throughput: 0: 972.4. Samples: 51102. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-11-25 02:29:07,311][00405] Avg episode reward: [(0, '4.380')] [2024-11-25 02:29:07,461][04086] Updated weights for policy 0, policy_version 50 (0.0014) [2024-11-25 02:29:12,308][00405] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3218.3). Total num frames: 225280. Throughput: 0: 997.5. Samples: 54678. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-11-25 02:29:12,312][00405] Avg episode reward: [(0, '4.328')] [2024-11-25 02:29:17,308][00405] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3222.2). Total num frames: 241664. Throughput: 0: 1012.9. Samples: 60510. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-25 02:29:17,310][00405] Avg episode reward: [(0, '4.441')] [2024-11-25 02:29:18,216][04086] Updated weights for policy 0, policy_version 60 (0.0020) [2024-11-25 02:29:22,308][00405] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3225.6). Total num frames: 258048. Throughput: 0: 965.7. Samples: 65660. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-25 02:29:22,313][00405] Avg episode reward: [(0, '4.255')] [2024-11-25 02:29:27,308][00405] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3325.0). Total num frames: 282624. Throughput: 0: 970.1. Samples: 69178. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-11-25 02:29:27,310][00405] Avg episode reward: [(0, '4.381')] [2024-11-25 02:29:27,485][04086] Updated weights for policy 0, policy_version 70 (0.0025) [2024-11-25 02:29:32,308][00405] Fps is (10 sec: 4915.2, 60 sec: 4096.0, 300 sec: 3413.3). Total num frames: 307200. Throughput: 0: 1031.6. Samples: 76244. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-11-25 02:29:32,315][00405] Avg episode reward: [(0, '4.393')] [2024-11-25 02:29:32,320][04073] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000075_307200.pth... [2024-11-25 02:29:37,308][00405] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3363.0). Total num frames: 319488. Throughput: 0: 981.7. Samples: 80720. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-11-25 02:29:37,311][00405] Avg episode reward: [(0, '4.399')] [2024-11-25 02:29:38,459][04086] Updated weights for policy 0, policy_version 80 (0.0020) [2024-11-25 02:29:42,308][00405] Fps is (10 sec: 2867.2, 60 sec: 3891.2, 300 sec: 3358.7). Total num frames: 335872. Throughput: 0: 970.5. Samples: 83948. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-11-25 02:29:42,313][00405] Avg episode reward: [(0, '4.358')] [2024-11-25 02:29:47,308][00405] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3393.8). Total num frames: 356352. Throughput: 0: 966.2. Samples: 88684. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-11-25 02:29:47,310][00405] Avg episode reward: [(0, '4.286')] [2024-11-25 02:29:51,096][04086] Updated weights for policy 0, policy_version 90 (0.0027) [2024-11-25 02:29:52,309][00405] Fps is (10 sec: 3276.4, 60 sec: 3754.6, 300 sec: 3351.2). Total num frames: 368640. Throughput: 0: 938.5. Samples: 93334. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-25 02:29:52,316][00405] Avg episode reward: [(0, '4.259')] [2024-11-25 02:29:57,308][00405] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3383.7). Total num frames: 389120. Throughput: 0: 914.7. Samples: 95840. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-11-25 02:29:57,312][00405] Avg episode reward: [(0, '4.630')] [2024-11-25 02:29:57,315][04073] Saving new best policy, reward=4.630! [2024-11-25 02:30:01,217][04086] Updated weights for policy 0, policy_version 100 (0.0017) [2024-11-25 02:30:02,308][00405] Fps is (10 sec: 4506.1, 60 sec: 3891.2, 300 sec: 3447.5). Total num frames: 413696. Throughput: 0: 938.0. Samples: 102720. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-25 02:30:02,315][00405] Avg episode reward: [(0, '4.801')] [2024-11-25 02:30:02,325][04073] Saving new best policy, reward=4.801! [2024-11-25 02:30:07,309][00405] Fps is (10 sec: 4095.6, 60 sec: 3822.9, 300 sec: 3440.6). Total num frames: 430080. Throughput: 0: 948.7. Samples: 108352. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-25 02:30:07,312][00405] Avg episode reward: [(0, '4.702')] [2024-11-25 02:30:12,308][00405] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3434.3). Total num frames: 446464. Throughput: 0: 920.7. Samples: 110608. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-25 02:30:12,310][00405] Avg episode reward: [(0, '4.820')] [2024-11-25 02:30:12,318][04073] Saving new best policy, reward=4.820! [2024-11-25 02:30:12,704][04086] Updated weights for policy 0, policy_version 110 (0.0015) [2024-11-25 02:30:17,308][00405] Fps is (10 sec: 4096.4, 60 sec: 3822.9, 300 sec: 3489.2). Total num frames: 471040. Throughput: 0: 912.8. Samples: 117320. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-25 02:30:17,310][00405] Avg episode reward: [(0, '4.719')] [2024-11-25 02:30:21,296][04086] Updated weights for policy 0, policy_version 120 (0.0033) [2024-11-25 02:30:22,308][00405] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3510.9). Total num frames: 491520. Throughput: 0: 963.0. Samples: 124054. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-25 02:30:22,310][00405] Avg episode reward: [(0, '4.513')] [2024-11-25 02:30:27,309][00405] Fps is (10 sec: 3686.0, 60 sec: 3754.6, 300 sec: 3502.8). Total num frames: 507904. Throughput: 0: 939.9. Samples: 126244. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-25 02:30:27,313][00405] Avg episode reward: [(0, '4.536')] [2024-11-25 02:30:32,038][04086] Updated weights for policy 0, policy_version 130 (0.0029) [2024-11-25 02:30:32,308][00405] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3549.9). Total num frames: 532480. Throughput: 0: 973.2. Samples: 132480. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-25 02:30:32,310][00405] Avg episode reward: [(0, '4.461')] [2024-11-25 02:30:37,308][00405] Fps is (10 sec: 4915.6, 60 sec: 3959.5, 300 sec: 3593.9). Total num frames: 557056. Throughput: 0: 1033.4. Samples: 139836. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-11-25 02:30:37,315][00405] Avg episode reward: [(0, '4.834')] [2024-11-25 02:30:37,319][04073] Saving new best policy, reward=4.834! [2024-11-25 02:30:42,308][00405] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3558.4). Total num frames: 569344. Throughput: 0: 1030.0. Samples: 142188. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-11-25 02:30:42,312][00405] Avg episode reward: [(0, '4.794')] [2024-11-25 02:30:42,566][04086] Updated weights for policy 0, policy_version 140 (0.0014) [2024-11-25 02:30:47,309][00405] Fps is (10 sec: 3276.5, 60 sec: 3891.1, 300 sec: 3574.7). Total num frames: 589824. Throughput: 0: 990.8. Samples: 147306. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-11-25 02:30:47,310][00405] Avg episode reward: [(0, '4.442')] [2024-11-25 02:30:51,972][04086] Updated weights for policy 0, policy_version 150 (0.0017) [2024-11-25 02:30:52,308][00405] Fps is (10 sec: 4505.6, 60 sec: 4096.1, 300 sec: 3614.1). Total num frames: 614400. Throughput: 0: 1023.6. Samples: 154414. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-25 02:30:52,310][00405] Avg episode reward: [(0, '4.466')] [2024-11-25 02:30:57,308][00405] Fps is (10 sec: 4096.3, 60 sec: 4027.7, 300 sec: 3604.5). Total num frames: 630784. Throughput: 0: 1046.3. Samples: 157692. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-11-25 02:30:57,314][00405] Avg episode reward: [(0, '4.608')] [2024-11-25 02:31:02,308][00405] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3618.1). Total num frames: 651264. Throughput: 0: 994.4. Samples: 162068. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-25 02:31:02,314][00405] Avg episode reward: [(0, '4.736')] [2024-11-25 02:31:02,956][04086] Updated weights for policy 0, policy_version 160 (0.0022) [2024-11-25 02:31:07,308][00405] Fps is (10 sec: 4096.2, 60 sec: 4027.8, 300 sec: 3631.0). Total num frames: 671744. Throughput: 0: 1003.3. Samples: 169202. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-11-25 02:31:07,310][00405] Avg episode reward: [(0, '4.825')] [2024-11-25 02:31:11,894][04086] Updated weights for policy 0, policy_version 170 (0.0022) [2024-11-25 02:31:12,309][00405] Fps is (10 sec: 4505.0, 60 sec: 4164.2, 300 sec: 3664.8). Total num frames: 696320. Throughput: 0: 1032.0. Samples: 172684. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-25 02:31:12,311][00405] Avg episode reward: [(0, '4.950')] [2024-11-25 02:31:12,318][04073] Saving new best policy, reward=4.950! [2024-11-25 02:31:17,314][00405] Fps is (10 sec: 3684.1, 60 sec: 3959.1, 300 sec: 3633.8). Total num frames: 708608. Throughput: 0: 1003.3. Samples: 177634. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-25 02:31:17,316][00405] Avg episode reward: [(0, '4.913')] [2024-11-25 02:31:22,308][00405] Fps is (10 sec: 3686.9, 60 sec: 4027.7, 300 sec: 3665.9). Total num frames: 733184. Throughput: 0: 975.4. Samples: 183728. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-25 02:31:22,311][00405] Avg episode reward: [(0, '4.840')] [2024-11-25 02:31:23,037][04086] Updated weights for policy 0, policy_version 180 (0.0019) [2024-11-25 02:31:27,308][00405] Fps is (10 sec: 4918.2, 60 sec: 4164.3, 300 sec: 3696.4). Total num frames: 757760. Throughput: 0: 1004.7. Samples: 187398. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-25 02:31:27,311][00405] Avg episode reward: [(0, '4.786')] [2024-11-25 02:31:32,308][00405] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3686.4). Total num frames: 774144. Throughput: 0: 1025.8. Samples: 193468. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-25 02:31:32,310][00405] Avg episode reward: [(0, '4.951')] [2024-11-25 02:31:32,323][04073] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000189_774144.pth... [2024-11-25 02:31:32,484][04073] Saving new best policy, reward=4.951! [2024-11-25 02:31:33,609][04086] Updated weights for policy 0, policy_version 190 (0.0018) [2024-11-25 02:31:37,308][00405] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3676.9). Total num frames: 790528. Throughput: 0: 979.3. Samples: 198482. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-25 02:31:37,310][00405] Avg episode reward: [(0, '5.398')] [2024-11-25 02:31:37,312][04073] Saving new best policy, reward=5.398! [2024-11-25 02:31:42,311][00405] Fps is (10 sec: 4094.7, 60 sec: 4095.8, 300 sec: 3705.0). Total num frames: 815104. Throughput: 0: 983.1. Samples: 201932. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-11-25 02:31:42,313][00405] Avg episode reward: [(0, '5.234')] [2024-11-25 02:31:43,006][04086] Updated weights for policy 0, policy_version 200 (0.0037) [2024-11-25 02:31:47,308][00405] Fps is (10 sec: 4505.6, 60 sec: 4096.1, 300 sec: 3713.7). Total num frames: 835584. Throughput: 0: 1031.6. Samples: 208490. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-25 02:31:47,311][00405] Avg episode reward: [(0, '5.032')] [2024-11-25 02:31:52,308][00405] Fps is (10 sec: 3277.8, 60 sec: 3891.2, 300 sec: 3686.4). Total num frames: 847872. Throughput: 0: 970.4. Samples: 212870. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-25 02:31:52,316][00405] Avg episode reward: [(0, '5.280')] [2024-11-25 02:31:54,310][04086] Updated weights for policy 0, policy_version 210 (0.0015) [2024-11-25 02:31:57,308][00405] Fps is (10 sec: 3686.4, 60 sec: 4027.8, 300 sec: 3712.5). Total num frames: 872448. Throughput: 0: 973.4. Samples: 216486. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-25 02:31:57,314][00405] Avg episode reward: [(0, '5.122')] [2024-11-25 02:32:02,309][00405] Fps is (10 sec: 4914.6, 60 sec: 4095.9, 300 sec: 3737.6). Total num frames: 897024. Throughput: 0: 1026.6. Samples: 223826. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-25 02:32:02,311][00405] Avg episode reward: [(0, '4.982')] [2024-11-25 02:32:03,114][04086] Updated weights for policy 0, policy_version 220 (0.0024) [2024-11-25 02:32:07,308][00405] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3711.5). Total num frames: 909312. Throughput: 0: 997.0. Samples: 228594. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-25 02:32:07,316][00405] Avg episode reward: [(0, '4.942')] [2024-11-25 02:32:12,308][00405] Fps is (10 sec: 3686.8, 60 sec: 3959.6, 300 sec: 3735.6). Total num frames: 933888. Throughput: 0: 976.4. Samples: 231338. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-25 02:32:12,312][00405] Avg episode reward: [(0, '5.012')] [2024-11-25 02:32:14,066][04086] Updated weights for policy 0, policy_version 230 (0.0016) [2024-11-25 02:32:17,308][00405] Fps is (10 sec: 4505.5, 60 sec: 4096.4, 300 sec: 3742.6). Total num frames: 954368. Throughput: 0: 1000.1. Samples: 238474. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-25 02:32:17,312][00405] Avg episode reward: [(0, '5.284')] [2024-11-25 02:32:22,308][00405] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3733.7). Total num frames: 970752. Throughput: 0: 1014.0. Samples: 244114. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-25 02:32:22,315][00405] Avg episode reward: [(0, '5.185')] [2024-11-25 02:32:25,068][04086] Updated weights for policy 0, policy_version 240 (0.0036) [2024-11-25 02:32:27,308][00405] Fps is (10 sec: 3686.5, 60 sec: 3891.2, 300 sec: 3740.5). Total num frames: 991232. Throughput: 0: 986.2. Samples: 246306. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-25 02:32:27,309][00405] Avg episode reward: [(0, '5.220')] [2024-11-25 02:32:32,308][00405] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3762.3). Total num frames: 1015808. Throughput: 0: 992.6. Samples: 253156. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-25 02:32:32,314][00405] Avg episode reward: [(0, '5.260')] [2024-11-25 02:32:33,926][04086] Updated weights for policy 0, policy_version 250 (0.0031) [2024-11-25 02:32:37,309][00405] Fps is (10 sec: 4505.0, 60 sec: 4095.9, 300 sec: 3768.3). Total num frames: 1036288. Throughput: 0: 1044.0. Samples: 259852. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-25 02:32:37,314][00405] Avg episode reward: [(0, '5.084')] [2024-11-25 02:32:42,308][00405] Fps is (10 sec: 3686.4, 60 sec: 3959.7, 300 sec: 3759.5). Total num frames: 1052672. Throughput: 0: 1012.8. Samples: 262060. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-11-25 02:32:42,310][00405] Avg episode reward: [(0, '5.210')] [2024-11-25 02:32:45,187][04086] Updated weights for policy 0, policy_version 260 (0.0023) [2024-11-25 02:32:47,308][00405] Fps is (10 sec: 3686.9, 60 sec: 3959.5, 300 sec: 3765.4). Total num frames: 1073152. Throughput: 0: 978.4. Samples: 267854. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-25 02:32:47,312][00405] Avg episode reward: [(0, '5.553')] [2024-11-25 02:32:47,317][04073] Saving new best policy, reward=5.553! [2024-11-25 02:32:52,308][00405] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 3785.3). Total num frames: 1097728. Throughput: 0: 1032.5. Samples: 275056. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-25 02:32:52,310][00405] Avg episode reward: [(0, '6.114')] [2024-11-25 02:32:52,328][04073] Saving new best policy, reward=6.114! [2024-11-25 02:32:54,414][04086] Updated weights for policy 0, policy_version 270 (0.0028) [2024-11-25 02:32:57,308][00405] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3776.7). Total num frames: 1114112. Throughput: 0: 1026.4. Samples: 277528. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-25 02:32:57,310][00405] Avg episode reward: [(0, '6.126')] [2024-11-25 02:32:57,315][04073] Saving new best policy, reward=6.126! [2024-11-25 02:33:02,308][00405] Fps is (10 sec: 3276.8, 60 sec: 3891.3, 300 sec: 3832.2). Total num frames: 1130496. Throughput: 0: 977.5. Samples: 282462. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-11-25 02:33:02,314][00405] Avg episode reward: [(0, '6.200')] [2024-11-25 02:33:02,326][04073] Saving new best policy, reward=6.200! [2024-11-25 02:33:05,114][04086] Updated weights for policy 0, policy_version 280 (0.0024) [2024-11-25 02:33:07,308][00405] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 3915.5). Total num frames: 1155072. Throughput: 0: 1012.7. Samples: 289684. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-11-25 02:33:07,315][00405] Avg episode reward: [(0, '6.461')] [2024-11-25 02:33:07,321][04073] Saving new best policy, reward=6.461! [2024-11-25 02:33:12,308][00405] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3984.9). Total num frames: 1175552. Throughput: 0: 1038.7. Samples: 293046. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-25 02:33:12,315][00405] Avg episode reward: [(0, '6.604')] [2024-11-25 02:33:12,324][04073] Saving new best policy, reward=6.604! [2024-11-25 02:33:16,443][04086] Updated weights for policy 0, policy_version 290 (0.0014) [2024-11-25 02:33:17,308][00405] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3943.3). Total num frames: 1187840. Throughput: 0: 982.2. Samples: 297356. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-11-25 02:33:17,314][00405] Avg episode reward: [(0, '6.929')] [2024-11-25 02:33:17,387][04073] Saving new best policy, reward=6.929! [2024-11-25 02:33:22,308][00405] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3971.0). Total num frames: 1212416. Throughput: 0: 983.6. Samples: 304114. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-11-25 02:33:22,310][00405] Avg episode reward: [(0, '6.709')] [2024-11-25 02:33:25,174][04086] Updated weights for policy 0, policy_version 300 (0.0019) [2024-11-25 02:33:27,308][00405] Fps is (10 sec: 4915.2, 60 sec: 4096.0, 300 sec: 3984.9). Total num frames: 1236992. Throughput: 0: 1015.4. Samples: 307754. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-11-25 02:33:27,312][00405] Avg episode reward: [(0, '6.559')] [2024-11-25 02:33:32,309][00405] Fps is (10 sec: 4095.4, 60 sec: 3959.4, 300 sec: 3957.1). Total num frames: 1253376. Throughput: 0: 1004.4. Samples: 313052. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-25 02:33:32,315][00405] Avg episode reward: [(0, '6.607')] [2024-11-25 02:33:32,331][04073] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000306_1253376.pth... [2024-11-25 02:33:32,498][04073] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000075_307200.pth [2024-11-25 02:33:36,203][04086] Updated weights for policy 0, policy_version 310 (0.0032) [2024-11-25 02:33:37,308][00405] Fps is (10 sec: 3686.4, 60 sec: 3959.6, 300 sec: 3971.0). Total num frames: 1273856. Throughput: 0: 978.0. Samples: 319066. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-25 02:33:37,315][00405] Avg episode reward: [(0, '6.512')] [2024-11-25 02:33:42,308][00405] Fps is (10 sec: 4506.2, 60 sec: 4096.0, 300 sec: 3998.8). Total num frames: 1298432. Throughput: 0: 1002.6. Samples: 322644. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-11-25 02:33:42,316][00405] Avg episode reward: [(0, '7.481')] [2024-11-25 02:33:42,329][04073] Saving new best policy, reward=7.481! [2024-11-25 02:33:45,733][04086] Updated weights for policy 0, policy_version 320 (0.0032) [2024-11-25 02:33:47,310][00405] Fps is (10 sec: 4095.2, 60 sec: 4027.6, 300 sec: 3971.0). Total num frames: 1314816. Throughput: 0: 1026.0. Samples: 328634. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-25 02:33:47,314][00405] Avg episode reward: [(0, '8.113')] [2024-11-25 02:33:47,322][04073] Saving new best policy, reward=8.113! [2024-11-25 02:33:52,308][00405] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3943.3). Total num frames: 1331200. Throughput: 0: 977.4. Samples: 333666. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-25 02:33:52,310][00405] Avg episode reward: [(0, '9.063')] [2024-11-25 02:33:52,317][04073] Saving new best policy, reward=9.063! [2024-11-25 02:33:56,087][04086] Updated weights for policy 0, policy_version 330 (0.0018) [2024-11-25 02:33:57,308][00405] Fps is (10 sec: 4096.8, 60 sec: 4027.7, 300 sec: 3984.9). Total num frames: 1355776. Throughput: 0: 982.2. Samples: 337246. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-25 02:33:57,310][00405] Avg episode reward: [(0, '8.497')] [2024-11-25 02:34:02,308][00405] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 3984.9). Total num frames: 1376256. Throughput: 0: 1044.6. Samples: 344362. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-25 02:34:02,310][00405] Avg episode reward: [(0, '7.724')] [2024-11-25 02:34:07,017][04086] Updated weights for policy 0, policy_version 340 (0.0019) [2024-11-25 02:34:07,308][00405] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3957.2). Total num frames: 1392640. Throughput: 0: 993.6. Samples: 348826. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-11-25 02:34:07,310][00405] Avg episode reward: [(0, '7.940')] [2024-11-25 02:34:12,308][00405] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3984.9). Total num frames: 1417216. Throughput: 0: 986.8. Samples: 352160. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-25 02:34:12,310][00405] Avg episode reward: [(0, '8.885')] [2024-11-25 02:34:15,668][04086] Updated weights for policy 0, policy_version 350 (0.0019) [2024-11-25 02:34:17,308][00405] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 3998.8). Total num frames: 1437696. Throughput: 0: 1028.7. Samples: 359344. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-25 02:34:17,311][00405] Avg episode reward: [(0, '9.139')] [2024-11-25 02:34:17,391][04073] Saving new best policy, reward=9.139! [2024-11-25 02:34:22,308][00405] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3971.0). Total num frames: 1454080. Throughput: 0: 1008.6. Samples: 364454. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-25 02:34:22,316][00405] Avg episode reward: [(0, '9.345')] [2024-11-25 02:34:22,335][04073] Saving new best policy, reward=9.345! [2024-11-25 02:34:26,862][04086] Updated weights for policy 0, policy_version 360 (0.0026) [2024-11-25 02:34:27,308][00405] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3957.2). Total num frames: 1474560. Throughput: 0: 984.0. Samples: 366922. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-25 02:34:27,315][00405] Avg episode reward: [(0, '10.006')] [2024-11-25 02:34:27,319][04073] Saving new best policy, reward=10.006! [2024-11-25 02:34:32,308][00405] Fps is (10 sec: 4505.6, 60 sec: 4096.1, 300 sec: 3998.8). Total num frames: 1499136. Throughput: 0: 1012.1. Samples: 374176. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-25 02:34:32,313][00405] Avg episode reward: [(0, '10.859')] [2024-11-25 02:34:32,324][04073] Saving new best policy, reward=10.859! [2024-11-25 02:34:36,492][04086] Updated weights for policy 0, policy_version 370 (0.0020) [2024-11-25 02:34:37,308][00405] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3998.8). Total num frames: 1515520. Throughput: 0: 1030.0. Samples: 380018. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-25 02:34:37,311][00405] Avg episode reward: [(0, '10.508')] [2024-11-25 02:34:42,333][00405] Fps is (10 sec: 3268.6, 60 sec: 3889.6, 300 sec: 3984.6). Total num frames: 1531904. Throughput: 0: 998.1. Samples: 382184. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-11-25 02:34:42,338][00405] Avg episode reward: [(0, '10.217')] [2024-11-25 02:34:46,582][04086] Updated weights for policy 0, policy_version 380 (0.0031) [2024-11-25 02:34:47,308][00405] Fps is (10 sec: 4096.0, 60 sec: 4027.9, 300 sec: 4026.6). Total num frames: 1556480. Throughput: 0: 990.3. Samples: 388926. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-25 02:34:47,314][00405] Avg episode reward: [(0, '10.484')] [2024-11-25 02:34:52,308][00405] Fps is (10 sec: 4927.6, 60 sec: 4164.3, 300 sec: 4040.5). Total num frames: 1581056. Throughput: 0: 1041.7. Samples: 395702. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-11-25 02:34:52,310][00405] Avg episode reward: [(0, '10.918')] [2024-11-25 02:34:52,328][04073] Saving new best policy, reward=10.918! [2024-11-25 02:34:57,313][00405] Fps is (10 sec: 3684.5, 60 sec: 3959.1, 300 sec: 3998.7). Total num frames: 1593344. Throughput: 0: 1015.5. Samples: 397862. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-11-25 02:34:57,315][00405] Avg episode reward: [(0, '11.493')] [2024-11-25 02:34:57,317][04073] Saving new best policy, reward=11.493! [2024-11-25 02:34:57,852][04086] Updated weights for policy 0, policy_version 390 (0.0023) [2024-11-25 02:35:02,308][00405] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4026.6). Total num frames: 1617920. Throughput: 0: 984.0. Samples: 403624. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-25 02:35:02,310][00405] Avg episode reward: [(0, '12.681')] [2024-11-25 02:35:02,318][04073] Saving new best policy, reward=12.681! [2024-11-25 02:35:06,526][04086] Updated weights for policy 0, policy_version 400 (0.0021) [2024-11-25 02:35:07,308][00405] Fps is (10 sec: 4507.9, 60 sec: 4096.0, 300 sec: 4040.5). Total num frames: 1638400. Throughput: 0: 1029.1. Samples: 410762. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-11-25 02:35:07,310][00405] Avg episode reward: [(0, '13.437')] [2024-11-25 02:35:07,316][04073] Saving new best policy, reward=13.437! [2024-11-25 02:35:12,308][00405] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 4012.7). Total num frames: 1654784. Throughput: 0: 1030.0. Samples: 413274. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-25 02:35:12,310][00405] Avg episode reward: [(0, '13.880')] [2024-11-25 02:35:12,319][04073] Saving new best policy, reward=13.880! [2024-11-25 02:35:17,308][00405] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3998.8). Total num frames: 1671168. Throughput: 0: 970.7. Samples: 417858. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-25 02:35:17,310][00405] Avg episode reward: [(0, '13.093')] [2024-11-25 02:35:18,206][04086] Updated weights for policy 0, policy_version 410 (0.0035) [2024-11-25 02:35:22,308][00405] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 4026.6). Total num frames: 1695744. Throughput: 0: 1003.3. Samples: 425168. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-25 02:35:22,313][00405] Avg episode reward: [(0, '12.871')] [2024-11-25 02:35:27,308][00405] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 4012.7). Total num frames: 1716224. Throughput: 0: 1035.4. Samples: 428752. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-25 02:35:27,312][00405] Avg episode reward: [(0, '12.508')] [2024-11-25 02:35:27,378][04086] Updated weights for policy 0, policy_version 420 (0.0016) [2024-11-25 02:35:32,308][00405] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3984.9). Total num frames: 1732608. Throughput: 0: 982.8. Samples: 433154. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-25 02:35:32,310][00405] Avg episode reward: [(0, '13.435')] [2024-11-25 02:35:32,322][04073] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000423_1732608.pth... [2024-11-25 02:35:32,459][04073] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000189_774144.pth [2024-11-25 02:35:37,308][00405] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 4026.6). Total num frames: 1757184. Throughput: 0: 983.8. Samples: 439972. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-11-25 02:35:37,311][00405] Avg episode reward: [(0, '14.465')] [2024-11-25 02:35:37,316][04073] Saving new best policy, reward=14.465! [2024-11-25 02:35:37,793][04086] Updated weights for policy 0, policy_version 430 (0.0013) [2024-11-25 02:35:42,308][00405] Fps is (10 sec: 4505.6, 60 sec: 4097.7, 300 sec: 4026.6). Total num frames: 1777664. Throughput: 0: 1013.7. Samples: 443472. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-25 02:35:42,310][00405] Avg episode reward: [(0, '14.711')] [2024-11-25 02:35:42,319][04073] Saving new best policy, reward=14.711! [2024-11-25 02:35:47,308][00405] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3998.8). Total num frames: 1794048. Throughput: 0: 992.8. Samples: 448302. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-25 02:35:47,310][00405] Avg episode reward: [(0, '14.704')] [2024-11-25 02:35:49,483][04086] Updated weights for policy 0, policy_version 440 (0.0020) [2024-11-25 02:35:52,308][00405] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 4012.7). Total num frames: 1814528. Throughput: 0: 965.8. Samples: 454222. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-25 02:35:52,315][00405] Avg episode reward: [(0, '14.232')] [2024-11-25 02:35:57,308][00405] Fps is (10 sec: 4505.7, 60 sec: 4096.4, 300 sec: 4026.6). Total num frames: 1839104. Throughput: 0: 991.0. Samples: 457868. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-25 02:35:57,309][00405] Avg episode reward: [(0, '13.605')] [2024-11-25 02:35:57,985][04086] Updated weights for policy 0, policy_version 450 (0.0027) [2024-11-25 02:36:02,308][00405] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 4012.7). Total num frames: 1855488. Throughput: 0: 1025.0. Samples: 463982. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-11-25 02:36:02,316][00405] Avg episode reward: [(0, '14.181')] [2024-11-25 02:36:07,308][00405] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3984.9). Total num frames: 1871872. Throughput: 0: 975.9. Samples: 469084. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-25 02:36:07,310][00405] Avg episode reward: [(0, '14.669')] [2024-11-25 02:36:09,125][04086] Updated weights for policy 0, policy_version 460 (0.0025) [2024-11-25 02:36:12,308][00405] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 4026.7). Total num frames: 1896448. Throughput: 0: 974.8. Samples: 472620. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-11-25 02:36:12,313][00405] Avg episode reward: [(0, '15.507')] [2024-11-25 02:36:12,322][04073] Saving new best policy, reward=15.507! [2024-11-25 02:36:17,308][00405] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 4012.7). Total num frames: 1916928. Throughput: 0: 1026.3. Samples: 479338. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-25 02:36:17,314][00405] Avg episode reward: [(0, '15.749')] [2024-11-25 02:36:17,320][04073] Saving new best policy, reward=15.749! [2024-11-25 02:36:19,830][04086] Updated weights for policy 0, policy_version 470 (0.0021) [2024-11-25 02:36:22,308][00405] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3971.0). Total num frames: 1929216. Throughput: 0: 967.5. Samples: 483510. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-11-25 02:36:22,315][00405] Avg episode reward: [(0, '14.800')] [2024-11-25 02:36:27,308][00405] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3998.8). Total num frames: 1953792. Throughput: 0: 966.3. Samples: 486954. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-11-25 02:36:27,315][00405] Avg episode reward: [(0, '14.981')] [2024-11-25 02:36:29,373][04086] Updated weights for policy 0, policy_version 480 (0.0022) [2024-11-25 02:36:32,309][00405] Fps is (10 sec: 4914.6, 60 sec: 4095.9, 300 sec: 4026.6). Total num frames: 1978368. Throughput: 0: 1019.6. Samples: 494186. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-25 02:36:32,312][00405] Avg episode reward: [(0, '14.764')] [2024-11-25 02:36:37,311][00405] Fps is (10 sec: 4094.8, 60 sec: 3959.3, 300 sec: 3998.8). Total num frames: 1994752. Throughput: 0: 997.0. Samples: 499090. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-11-25 02:36:37,317][00405] Avg episode reward: [(0, '14.050')] [2024-11-25 02:36:40,568][04086] Updated weights for policy 0, policy_version 490 (0.0036) [2024-11-25 02:36:42,308][00405] Fps is (10 sec: 3277.2, 60 sec: 3891.2, 300 sec: 3984.9). Total num frames: 2011136. Throughput: 0: 973.7. Samples: 501684. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-11-25 02:36:42,313][00405] Avg episode reward: [(0, '13.720')] [2024-11-25 02:36:47,308][00405] Fps is (10 sec: 4097.3, 60 sec: 4027.7, 300 sec: 4026.6). Total num frames: 2035712. Throughput: 0: 995.2. Samples: 508764. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-25 02:36:47,311][00405] Avg episode reward: [(0, '14.616')] [2024-11-25 02:36:49,201][04086] Updated weights for policy 0, policy_version 500 (0.0027) [2024-11-25 02:36:52,308][00405] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 4012.7). Total num frames: 2056192. Throughput: 0: 1010.8. Samples: 514572. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-25 02:36:52,310][00405] Avg episode reward: [(0, '15.342')] [2024-11-25 02:36:57,308][00405] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3984.9). Total num frames: 2072576. Throughput: 0: 980.8. Samples: 516758. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-11-25 02:36:57,310][00405] Avg episode reward: [(0, '16.215')] [2024-11-25 02:36:57,312][04073] Saving new best policy, reward=16.215! [2024-11-25 02:37:00,576][04086] Updated weights for policy 0, policy_version 510 (0.0013) [2024-11-25 02:37:02,308][00405] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 4026.6). Total num frames: 2097152. Throughput: 0: 983.7. Samples: 523606. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-25 02:37:02,314][00405] Avg episode reward: [(0, '16.958')] [2024-11-25 02:37:02,323][04073] Saving new best policy, reward=16.958! [2024-11-25 02:37:07,308][00405] Fps is (10 sec: 4505.7, 60 sec: 4096.0, 300 sec: 4012.7). Total num frames: 2117632. Throughput: 0: 1041.7. Samples: 530386. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-11-25 02:37:07,312][00405] Avg episode reward: [(0, '17.029')] [2024-11-25 02:37:07,318][04073] Saving new best policy, reward=17.029! [2024-11-25 02:37:10,830][04086] Updated weights for policy 0, policy_version 520 (0.0030) [2024-11-25 02:37:12,308][00405] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3998.8). Total num frames: 2134016. Throughput: 0: 1013.0. Samples: 532538. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-25 02:37:12,310][00405] Avg episode reward: [(0, '16.877')] [2024-11-25 02:37:17,308][00405] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 4012.7). Total num frames: 2154496. Throughput: 0: 985.0. Samples: 538508. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-25 02:37:17,313][00405] Avg episode reward: [(0, '15.587')] [2024-11-25 02:37:20,463][04086] Updated weights for policy 0, policy_version 530 (0.0033) [2024-11-25 02:37:22,308][00405] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 4026.6). Total num frames: 2179072. Throughput: 0: 1029.2. Samples: 545402. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-25 02:37:22,312][00405] Avg episode reward: [(0, '14.672')] [2024-11-25 02:37:27,308][00405] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3998.8). Total num frames: 2195456. Throughput: 0: 1029.0. Samples: 547988. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-25 02:37:27,310][00405] Avg episode reward: [(0, '14.680')] [2024-11-25 02:37:31,456][04086] Updated weights for policy 0, policy_version 540 (0.0043) [2024-11-25 02:37:32,308][00405] Fps is (10 sec: 3686.3, 60 sec: 3959.5, 300 sec: 3998.8). Total num frames: 2215936. Throughput: 0: 982.8. Samples: 552990. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-25 02:37:32,314][00405] Avg episode reward: [(0, '15.317')] [2024-11-25 02:37:32,323][04073] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000541_2215936.pth... [2024-11-25 02:37:32,440][04073] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000306_1253376.pth [2024-11-25 02:37:37,308][00405] Fps is (10 sec: 4096.0, 60 sec: 4027.9, 300 sec: 4012.7). Total num frames: 2236416. Throughput: 0: 1017.4. Samples: 560354. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-25 02:37:37,310][00405] Avg episode reward: [(0, '15.842')] [2024-11-25 02:37:39,914][04086] Updated weights for policy 0, policy_version 550 (0.0024) [2024-11-25 02:37:42,310][00405] Fps is (10 sec: 4095.3, 60 sec: 4095.9, 300 sec: 4012.7). Total num frames: 2256896. Throughput: 0: 1043.7. Samples: 563728. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-11-25 02:37:42,317][00405] Avg episode reward: [(0, '16.331')] [2024-11-25 02:37:47,308][00405] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3984.9). Total num frames: 2273280. Throughput: 0: 991.8. Samples: 568236. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-11-25 02:37:47,315][00405] Avg episode reward: [(0, '16.066')] [2024-11-25 02:37:50,889][04086] Updated weights for policy 0, policy_version 560 (0.0025) [2024-11-25 02:37:52,308][00405] Fps is (10 sec: 4096.9, 60 sec: 4027.7, 300 sec: 4012.7). Total num frames: 2297856. Throughput: 0: 996.4. Samples: 575226. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-25 02:37:52,313][00405] Avg episode reward: [(0, '17.734')] [2024-11-25 02:37:52,321][04073] Saving new best policy, reward=17.734! [2024-11-25 02:37:57,308][00405] Fps is (10 sec: 4915.2, 60 sec: 4164.3, 300 sec: 4040.5). Total num frames: 2322432. Throughput: 0: 1028.1. Samples: 578802. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-25 02:37:57,312][00405] Avg episode reward: [(0, '18.098')] [2024-11-25 02:37:57,316][04073] Saving new best policy, reward=18.098! [2024-11-25 02:38:01,831][04086] Updated weights for policy 0, policy_version 570 (0.0017) [2024-11-25 02:38:02,309][00405] Fps is (10 sec: 3685.9, 60 sec: 3959.4, 300 sec: 3998.8). Total num frames: 2334720. Throughput: 0: 1001.7. Samples: 583584. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-25 02:38:02,312][00405] Avg episode reward: [(0, '18.306')] [2024-11-25 02:38:02,324][04073] Saving new best policy, reward=18.306! [2024-11-25 02:38:07,308][00405] Fps is (10 sec: 3686.3, 60 sec: 4027.7, 300 sec: 4012.7). Total num frames: 2359296. Throughput: 0: 990.3. Samples: 589964. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-25 02:38:07,313][00405] Avg episode reward: [(0, '18.583')] [2024-11-25 02:38:07,316][04073] Saving new best policy, reward=18.583! [2024-11-25 02:38:10,912][04086] Updated weights for policy 0, policy_version 580 (0.0024) [2024-11-25 02:38:12,308][00405] Fps is (10 sec: 4506.3, 60 sec: 4096.0, 300 sec: 4040.5). Total num frames: 2379776. Throughput: 0: 1012.2. Samples: 593536. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-25 02:38:12,311][00405] Avg episode reward: [(0, '17.828')] [2024-11-25 02:38:17,308][00405] Fps is (10 sec: 3686.5, 60 sec: 4027.7, 300 sec: 4012.7). Total num frames: 2396160. Throughput: 0: 1026.1. Samples: 599164. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-25 02:38:17,313][00405] Avg episode reward: [(0, '18.202')] [2024-11-25 02:38:21,871][04086] Updated weights for policy 0, policy_version 590 (0.0018) [2024-11-25 02:38:22,308][00405] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3998.8). Total num frames: 2416640. Throughput: 0: 988.3. Samples: 604826. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-25 02:38:22,314][00405] Avg episode reward: [(0, '17.130')] [2024-11-25 02:38:27,308][00405] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 4026.6). Total num frames: 2441216. Throughput: 0: 994.7. Samples: 608488. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-25 02:38:27,311][00405] Avg episode reward: [(0, '18.710')] [2024-11-25 02:38:27,314][04073] Saving new best policy, reward=18.710! [2024-11-25 02:38:31,157][04086] Updated weights for policy 0, policy_version 600 (0.0021) [2024-11-25 02:38:32,312][00405] Fps is (10 sec: 4094.3, 60 sec: 4027.5, 300 sec: 4012.6). Total num frames: 2457600. Throughput: 0: 1037.0. Samples: 614904. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-25 02:38:32,316][00405] Avg episode reward: [(0, '18.762')] [2024-11-25 02:38:32,325][04073] Saving new best policy, reward=18.762! [2024-11-25 02:38:37,308][00405] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3998.8). Total num frames: 2478080. Throughput: 0: 993.3. Samples: 619926. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-11-25 02:38:37,310][00405] Avg episode reward: [(0, '18.629')] [2024-11-25 02:38:41,539][04086] Updated weights for policy 0, policy_version 610 (0.0024) [2024-11-25 02:38:42,308][00405] Fps is (10 sec: 4097.7, 60 sec: 4027.9, 300 sec: 4012.7). Total num frames: 2498560. Throughput: 0: 992.3. Samples: 623454. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-25 02:38:42,315][00405] Avg episode reward: [(0, '17.728')] [2024-11-25 02:38:47,309][00405] Fps is (10 sec: 4505.0, 60 sec: 4164.2, 300 sec: 4040.4). Total num frames: 2523136. Throughput: 0: 1044.8. Samples: 630600. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-25 02:38:47,314][00405] Avg episode reward: [(0, '16.655')] [2024-11-25 02:38:52,308][00405] Fps is (10 sec: 3686.3, 60 sec: 3959.4, 300 sec: 3998.8). Total num frames: 2535424. Throughput: 0: 1001.3. Samples: 635024. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-25 02:38:52,315][00405] Avg episode reward: [(0, '16.391')] [2024-11-25 02:38:52,692][04086] Updated weights for policy 0, policy_version 620 (0.0034) [2024-11-25 02:38:57,308][00405] Fps is (10 sec: 3686.9, 60 sec: 3959.5, 300 sec: 4012.7). Total num frames: 2560000. Throughput: 0: 996.3. Samples: 638370. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-25 02:38:57,315][00405] Avg episode reward: [(0, '16.249')] [2024-11-25 02:39:01,277][04086] Updated weights for policy 0, policy_version 630 (0.0030) [2024-11-25 02:39:02,308][00405] Fps is (10 sec: 4915.4, 60 sec: 4164.4, 300 sec: 4040.5). Total num frames: 2584576. Throughput: 0: 1029.4. Samples: 645488. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-25 02:39:02,312][00405] Avg episode reward: [(0, '16.288')] [2024-11-25 02:39:07,308][00405] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3998.8). Total num frames: 2596864. Throughput: 0: 1014.8. Samples: 650492. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-25 02:39:07,310][00405] Avg episode reward: [(0, '16.357')] [2024-11-25 02:39:12,308][00405] Fps is (10 sec: 3276.8, 60 sec: 3959.5, 300 sec: 3998.8). Total num frames: 2617344. Throughput: 0: 992.3. Samples: 653142. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-25 02:39:12,314][00405] Avg episode reward: [(0, '17.347')] [2024-11-25 02:39:12,327][04086] Updated weights for policy 0, policy_version 640 (0.0026) [2024-11-25 02:39:17,308][00405] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 4026.6). Total num frames: 2641920. Throughput: 0: 1012.1. Samples: 660442. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-25 02:39:17,314][00405] Avg episode reward: [(0, '18.258')] [2024-11-25 02:39:22,145][04086] Updated weights for policy 0, policy_version 650 (0.0017) [2024-11-25 02:39:22,308][00405] Fps is (10 sec: 4505.5, 60 sec: 4096.0, 300 sec: 4026.6). Total num frames: 2662400. Throughput: 0: 1027.2. Samples: 666150. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-25 02:39:22,310][00405] Avg episode reward: [(0, '17.635')] [2024-11-25 02:39:27,308][00405] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3998.8). Total num frames: 2678784. Throughput: 0: 996.1. Samples: 668278. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-25 02:39:27,317][00405] Avg episode reward: [(0, '17.609')] [2024-11-25 02:39:32,008][04086] Updated weights for policy 0, policy_version 660 (0.0022) [2024-11-25 02:39:32,308][00405] Fps is (10 sec: 4096.0, 60 sec: 4096.3, 300 sec: 4026.6). Total num frames: 2703360. Throughput: 0: 993.6. Samples: 675310. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-25 02:39:32,312][00405] Avg episode reward: [(0, '16.941')] [2024-11-25 02:39:32,320][04073] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000660_2703360.pth... [2024-11-25 02:39:32,439][04073] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000423_1732608.pth [2024-11-25 02:39:37,308][00405] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 4040.8). Total num frames: 2723840. Throughput: 0: 1033.4. Samples: 681528. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-25 02:39:37,311][00405] Avg episode reward: [(0, '16.394')] [2024-11-25 02:39:42,308][00405] Fps is (10 sec: 3276.9, 60 sec: 3959.5, 300 sec: 3998.8). Total num frames: 2736128. Throughput: 0: 1006.7. Samples: 683670. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-25 02:39:42,310][00405] Avg episode reward: [(0, '16.918')] [2024-11-25 02:39:43,248][04086] Updated weights for policy 0, policy_version 670 (0.0023) [2024-11-25 02:39:47,308][00405] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3998.8). Total num frames: 2760704. Throughput: 0: 987.2. Samples: 689910. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-25 02:39:47,310][00405] Avg episode reward: [(0, '19.088')] [2024-11-25 02:39:47,318][04073] Saving new best policy, reward=19.088! [2024-11-25 02:39:52,305][04086] Updated weights for policy 0, policy_version 680 (0.0022) [2024-11-25 02:39:52,308][00405] Fps is (10 sec: 4505.3, 60 sec: 4096.0, 300 sec: 4026.6). Total num frames: 2781184. Throughput: 0: 1027.7. Samples: 696740. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-25 02:39:52,313][00405] Avg episode reward: [(0, '19.649')] [2024-11-25 02:39:52,324][04073] Saving new best policy, reward=19.649! [2024-11-25 02:39:57,308][00405] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3998.8). Total num frames: 2797568. Throughput: 0: 1013.7. Samples: 698758. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-25 02:39:57,313][00405] Avg episode reward: [(0, '20.175')] [2024-11-25 02:39:57,319][04073] Saving new best policy, reward=20.175! [2024-11-25 02:40:02,308][00405] Fps is (10 sec: 3686.3, 60 sec: 3891.2, 300 sec: 3998.8). Total num frames: 2818048. Throughput: 0: 970.8. Samples: 704128. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-25 02:40:02,312][00405] Avg episode reward: [(0, '19.867')] [2024-11-25 02:40:03,657][04086] Updated weights for policy 0, policy_version 690 (0.0016) [2024-11-25 02:40:07,308][00405] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 4026.6). Total num frames: 2842624. Throughput: 0: 1004.2. Samples: 711340. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-25 02:40:07,315][00405] Avg episode reward: [(0, '18.428')] [2024-11-25 02:40:12,308][00405] Fps is (10 sec: 4096.3, 60 sec: 4027.7, 300 sec: 4026.6). Total num frames: 2859008. Throughput: 0: 1018.5. Samples: 714110. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-11-25 02:40:12,310][00405] Avg episode reward: [(0, '18.723')] [2024-11-25 02:40:14,491][04086] Updated weights for policy 0, policy_version 700 (0.0037) [2024-11-25 02:40:17,308][00405] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3998.8). Total num frames: 2875392. Throughput: 0: 966.6. Samples: 718806. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-25 02:40:17,310][00405] Avg episode reward: [(0, '17.685')] [2024-11-25 02:40:22,308][00405] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 4012.7). Total num frames: 2899968. Throughput: 0: 982.4. Samples: 725738. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-25 02:40:22,310][00405] Avg episode reward: [(0, '16.678')] [2024-11-25 02:40:23,490][04086] Updated weights for policy 0, policy_version 710 (0.0021) [2024-11-25 02:40:27,308][00405] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 4026.6). Total num frames: 2920448. Throughput: 0: 1017.4. Samples: 729452. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-11-25 02:40:27,314][00405] Avg episode reward: [(0, '17.520')] [2024-11-25 02:40:32,308][00405] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3998.8). Total num frames: 2936832. Throughput: 0: 979.3. Samples: 733980. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-25 02:40:32,310][00405] Avg episode reward: [(0, '17.658')] [2024-11-25 02:40:34,647][04086] Updated weights for policy 0, policy_version 720 (0.0031) [2024-11-25 02:40:37,308][00405] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 4012.7). Total num frames: 2961408. Throughput: 0: 980.6. Samples: 740866. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-11-25 02:40:37,309][00405] Avg episode reward: [(0, '18.643')] [2024-11-25 02:40:42,308][00405] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 4026.6). Total num frames: 2981888. Throughput: 0: 1017.6. Samples: 744550. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-25 02:40:42,316][00405] Avg episode reward: [(0, '18.332')] [2024-11-25 02:40:43,965][04086] Updated weights for policy 0, policy_version 730 (0.0034) [2024-11-25 02:40:47,308][00405] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 4012.7). Total num frames: 2998272. Throughput: 0: 1010.9. Samples: 749618. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-25 02:40:47,313][00405] Avg episode reward: [(0, '20.250')] [2024-11-25 02:40:47,319][04073] Saving new best policy, reward=20.250! [2024-11-25 02:40:52,308][00405] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3998.8). Total num frames: 3018752. Throughput: 0: 978.8. Samples: 755386. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-25 02:40:52,310][00405] Avg episode reward: [(0, '19.770')] [2024-11-25 02:40:54,564][04086] Updated weights for policy 0, policy_version 740 (0.0021) [2024-11-25 02:40:57,308][00405] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 4026.6). Total num frames: 3043328. Throughput: 0: 993.4. Samples: 758814. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-25 02:40:57,316][00405] Avg episode reward: [(0, '20.114')] [2024-11-25 02:41:02,308][00405] Fps is (10 sec: 4095.9, 60 sec: 4027.8, 300 sec: 4026.6). Total num frames: 3059712. Throughput: 0: 1026.9. Samples: 765018. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-25 02:41:02,312][00405] Avg episode reward: [(0, '21.089')] [2024-11-25 02:41:02,323][04073] Saving new best policy, reward=21.089! [2024-11-25 02:41:05,836][04086] Updated weights for policy 0, policy_version 750 (0.0037) [2024-11-25 02:41:07,308][00405] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3998.8). Total num frames: 3076096. Throughput: 0: 983.3. Samples: 769988. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-25 02:41:07,313][00405] Avg episode reward: [(0, '21.864')] [2024-11-25 02:41:07,317][04073] Saving new best policy, reward=21.864! [2024-11-25 02:41:12,308][00405] Fps is (10 sec: 4096.1, 60 sec: 4027.7, 300 sec: 4012.7). Total num frames: 3100672. Throughput: 0: 978.6. Samples: 773490. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-25 02:41:12,312][00405] Avg episode reward: [(0, '21.559')] [2024-11-25 02:41:14,572][04086] Updated weights for policy 0, policy_version 760 (0.0027) [2024-11-25 02:41:17,308][00405] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 4040.5). Total num frames: 3121152. Throughput: 0: 1031.1. Samples: 780380. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-25 02:41:17,311][00405] Avg episode reward: [(0, '21.725')] [2024-11-25 02:41:22,308][00405] Fps is (10 sec: 3276.7, 60 sec: 3891.2, 300 sec: 3998.8). Total num frames: 3133440. Throughput: 0: 972.6. Samples: 784634. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-25 02:41:22,311][00405] Avg episode reward: [(0, '20.206')] [2024-11-25 02:41:25,776][04086] Updated weights for policy 0, policy_version 770 (0.0021) [2024-11-25 02:41:27,308][00405] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3998.8). Total num frames: 3158016. Throughput: 0: 967.6. Samples: 788092. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-25 02:41:27,313][00405] Avg episode reward: [(0, '20.610')] [2024-11-25 02:41:32,308][00405] Fps is (10 sec: 4915.3, 60 sec: 4096.0, 300 sec: 4026.6). Total num frames: 3182592. Throughput: 0: 1013.2. Samples: 795210. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-25 02:41:32,311][00405] Avg episode reward: [(0, '20.690')] [2024-11-25 02:41:32,318][04073] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000777_3182592.pth... [2024-11-25 02:41:32,472][04073] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000541_2215936.pth [2024-11-25 02:41:36,214][04086] Updated weights for policy 0, policy_version 780 (0.0021) [2024-11-25 02:41:37,309][00405] Fps is (10 sec: 3685.9, 60 sec: 3891.1, 300 sec: 4012.7). Total num frames: 3194880. Throughput: 0: 990.5. Samples: 799960. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-25 02:41:37,311][00405] Avg episode reward: [(0, '20.895')] [2024-11-25 02:41:42,308][00405] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 4012.7). Total num frames: 3219456. Throughput: 0: 977.4. Samples: 802798. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-25 02:41:42,312][00405] Avg episode reward: [(0, '21.167')] [2024-11-25 02:41:45,556][04086] Updated weights for policy 0, policy_version 790 (0.0022) [2024-11-25 02:41:47,308][00405] Fps is (10 sec: 4915.8, 60 sec: 4096.0, 300 sec: 4026.6). Total num frames: 3244032. Throughput: 0: 1000.5. Samples: 810042. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-25 02:41:47,310][00405] Avg episode reward: [(0, '20.364')] [2024-11-25 02:41:52,308][00405] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 4026.6). Total num frames: 3260416. Throughput: 0: 1013.3. Samples: 815586. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-25 02:41:52,313][00405] Avg episode reward: [(0, '20.645')] [2024-11-25 02:41:56,880][04086] Updated weights for policy 0, policy_version 800 (0.0019) [2024-11-25 02:41:57,308][00405] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3998.8). Total num frames: 3276800. Throughput: 0: 984.1. Samples: 817776. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-25 02:41:57,310][00405] Avg episode reward: [(0, '22.536')] [2024-11-25 02:41:57,314][04073] Saving new best policy, reward=22.536! [2024-11-25 02:42:02,308][00405] Fps is (10 sec: 4096.0, 60 sec: 4027.8, 300 sec: 4012.7). Total num frames: 3301376. Throughput: 0: 985.4. Samples: 824724. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-11-25 02:42:02,310][00405] Avg episode reward: [(0, '21.340')] [2024-11-25 02:42:05,803][04086] Updated weights for policy 0, policy_version 810 (0.0026) [2024-11-25 02:42:07,308][00405] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 4026.6). Total num frames: 3321856. Throughput: 0: 1029.3. Samples: 830952. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-25 02:42:07,316][00405] Avg episode reward: [(0, '22.106')] [2024-11-25 02:42:12,308][00405] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3998.8). Total num frames: 3334144. Throughput: 0: 999.2. Samples: 833058. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-25 02:42:12,311][00405] Avg episode reward: [(0, '23.004')] [2024-11-25 02:42:12,316][04073] Saving new best policy, reward=23.004! [2024-11-25 02:42:17,095][04086] Updated weights for policy 0, policy_version 820 (0.0029) [2024-11-25 02:42:17,308][00405] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3998.8). Total num frames: 3358720. Throughput: 0: 975.2. Samples: 839096. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-25 02:42:17,314][00405] Avg episode reward: [(0, '22.835')] [2024-11-25 02:42:22,308][00405] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 4012.7). Total num frames: 3379200. Throughput: 0: 1023.8. Samples: 846028. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-25 02:42:22,314][00405] Avg episode reward: [(0, '21.259')] [2024-11-25 02:42:27,308][00405] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3998.8). Total num frames: 3395584. Throughput: 0: 1006.6. Samples: 848094. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-25 02:42:27,315][00405] Avg episode reward: [(0, '21.680')] [2024-11-25 02:42:28,496][04086] Updated weights for policy 0, policy_version 830 (0.0036) [2024-11-25 02:42:32,308][00405] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3998.8). Total num frames: 3416064. Throughput: 0: 964.6. Samples: 853448. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-25 02:42:32,316][00405] Avg episode reward: [(0, '19.742')] [2024-11-25 02:42:37,135][04086] Updated weights for policy 0, policy_version 840 (0.0016) [2024-11-25 02:42:37,308][00405] Fps is (10 sec: 4505.6, 60 sec: 4096.1, 300 sec: 4012.7). Total num frames: 3440640. Throughput: 0: 1000.5. Samples: 860608. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-25 02:42:37,312][00405] Avg episode reward: [(0, '18.582')] [2024-11-25 02:42:42,308][00405] Fps is (10 sec: 4095.9, 60 sec: 3959.5, 300 sec: 4012.7). Total num frames: 3457024. Throughput: 0: 1016.3. Samples: 863510. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-25 02:42:42,310][00405] Avg episode reward: [(0, '19.923')] [2024-11-25 02:42:47,308][00405] Fps is (10 sec: 3276.7, 60 sec: 3822.9, 300 sec: 3984.9). Total num frames: 3473408. Throughput: 0: 963.3. Samples: 868072. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-11-25 02:42:47,317][00405] Avg episode reward: [(0, '19.946')] [2024-11-25 02:42:48,560][04086] Updated weights for policy 0, policy_version 850 (0.0013) [2024-11-25 02:42:52,308][00405] Fps is (10 sec: 4096.1, 60 sec: 3959.5, 300 sec: 3984.9). Total num frames: 3497984. Throughput: 0: 976.7. Samples: 874904. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-25 02:42:52,315][00405] Avg episode reward: [(0, '19.556')] [2024-11-25 02:42:57,308][00405] Fps is (10 sec: 4096.0, 60 sec: 3959.4, 300 sec: 3998.8). Total num frames: 3514368. Throughput: 0: 1006.9. Samples: 878370. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-25 02:42:57,315][00405] Avg episode reward: [(0, '20.929')] [2024-11-25 02:42:58,868][04086] Updated weights for policy 0, policy_version 860 (0.0026) [2024-11-25 02:43:02,308][00405] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3971.0). Total num frames: 3530752. Throughput: 0: 969.9. Samples: 882740. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-25 02:43:02,313][00405] Avg episode reward: [(0, '22.891')] [2024-11-25 02:43:07,308][00405] Fps is (10 sec: 4096.1, 60 sec: 3891.2, 300 sec: 3984.9). Total num frames: 3555328. Throughput: 0: 962.2. Samples: 889328. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-25 02:43:07,310][00405] Avg episode reward: [(0, '24.108')] [2024-11-25 02:43:07,312][04073] Saving new best policy, reward=24.108! [2024-11-25 02:43:08,937][04086] Updated weights for policy 0, policy_version 870 (0.0021) [2024-11-25 02:43:12,309][00405] Fps is (10 sec: 4505.2, 60 sec: 4027.7, 300 sec: 3998.8). Total num frames: 3575808. Throughput: 0: 989.8. Samples: 892634. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-25 02:43:12,315][00405] Avg episode reward: [(0, '23.217')] [2024-11-25 02:43:17,308][00405] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3984.9). Total num frames: 3592192. Throughput: 0: 984.3. Samples: 897740. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-25 02:43:17,310][00405] Avg episode reward: [(0, '23.520')] [2024-11-25 02:43:20,397][04086] Updated weights for policy 0, policy_version 880 (0.0017) [2024-11-25 02:43:22,308][00405] Fps is (10 sec: 3277.1, 60 sec: 3822.9, 300 sec: 3957.2). Total num frames: 3608576. Throughput: 0: 950.0. Samples: 903360. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-11-25 02:43:22,315][00405] Avg episode reward: [(0, '23.220')] [2024-11-25 02:43:27,308][00405] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3985.0). Total num frames: 3633152. Throughput: 0: 961.8. Samples: 906790. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-25 02:43:27,312][00405] Avg episode reward: [(0, '22.148')] [2024-11-25 02:43:29,755][04086] Updated weights for policy 0, policy_version 890 (0.0017) [2024-11-25 02:43:32,308][00405] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3971.0). Total num frames: 3649536. Throughput: 0: 992.7. Samples: 912742. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-25 02:43:32,316][00405] Avg episode reward: [(0, '22.260')] [2024-11-25 02:43:32,329][04073] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000891_3649536.pth... [2024-11-25 02:43:32,531][04073] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000660_2703360.pth [2024-11-25 02:43:37,308][00405] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3971.0). Total num frames: 3670016. Throughput: 0: 949.3. Samples: 917624. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-11-25 02:43:37,315][00405] Avg episode reward: [(0, '22.452')] [2024-11-25 02:43:40,754][04086] Updated weights for policy 0, policy_version 900 (0.0021) [2024-11-25 02:43:42,308][00405] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3957.2). Total num frames: 3690496. Throughput: 0: 950.1. Samples: 921126. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-11-25 02:43:42,314][00405] Avg episode reward: [(0, '23.383')] [2024-11-25 02:43:47,308][00405] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3984.9). Total num frames: 3710976. Throughput: 0: 1002.1. Samples: 927836. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-11-25 02:43:47,313][00405] Avg episode reward: [(0, '23.211')] [2024-11-25 02:43:52,308][00405] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3943.3). Total num frames: 3723264. Throughput: 0: 949.4. Samples: 932052. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-25 02:43:52,310][00405] Avg episode reward: [(0, '22.112')] [2024-11-25 02:43:52,333][04086] Updated weights for policy 0, policy_version 910 (0.0037) [2024-11-25 02:43:57,308][00405] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3943.3). Total num frames: 3747840. Throughput: 0: 952.2. Samples: 935482. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-25 02:43:57,310][00405] Avg episode reward: [(0, '21.809')] [2024-11-25 02:44:01,183][04086] Updated weights for policy 0, policy_version 920 (0.0016) [2024-11-25 02:44:02,308][00405] Fps is (10 sec: 4915.2, 60 sec: 4027.7, 300 sec: 3984.9). Total num frames: 3772416. Throughput: 0: 994.9. Samples: 942510. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-25 02:44:02,315][00405] Avg episode reward: [(0, '20.407')] [2024-11-25 02:44:07,308][00405] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3957.2). Total num frames: 3784704. Throughput: 0: 974.9. Samples: 947230. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-25 02:44:07,310][00405] Avg episode reward: [(0, '20.441')] [2024-11-25 02:44:12,308][00405] Fps is (10 sec: 3276.8, 60 sec: 3823.0, 300 sec: 3943.3). Total num frames: 3805184. Throughput: 0: 957.3. Samples: 949870. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-25 02:44:12,315][00405] Avg episode reward: [(0, '20.193')] [2024-11-25 02:44:12,465][04086] Updated weights for policy 0, policy_version 930 (0.0021) [2024-11-25 02:44:17,308][00405] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3957.2). Total num frames: 3829760. Throughput: 0: 983.1. Samples: 956982. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-25 02:44:17,313][00405] Avg episode reward: [(0, '20.572')] [2024-11-25 02:44:22,309][00405] Fps is (10 sec: 4095.5, 60 sec: 3959.4, 300 sec: 3957.1). Total num frames: 3846144. Throughput: 0: 994.0. Samples: 962356. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-25 02:44:22,313][00405] Avg episode reward: [(0, '20.070')] [2024-11-25 02:44:23,084][04086] Updated weights for policy 0, policy_version 940 (0.0025) [2024-11-25 02:44:27,308][00405] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3929.4). Total num frames: 3862528. Throughput: 0: 965.2. Samples: 964562. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-11-25 02:44:27,316][00405] Avg episode reward: [(0, '20.290')] [2024-11-25 02:44:32,308][00405] Fps is (10 sec: 4096.5, 60 sec: 3959.5, 300 sec: 3943.3). Total num frames: 3887104. Throughput: 0: 967.1. Samples: 971354. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-25 02:44:32,315][00405] Avg episode reward: [(0, '20.368')] [2024-11-25 02:44:32,798][04086] Updated weights for policy 0, policy_version 950 (0.0013) [2024-11-25 02:44:37,308][00405] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3971.0). Total num frames: 3907584. Throughput: 0: 1016.8. Samples: 977810. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-25 02:44:37,313][00405] Avg episode reward: [(0, '19.996')] [2024-11-25 02:44:42,308][00405] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3929.4). Total num frames: 3919872. Throughput: 0: 987.0. Samples: 979898. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-25 02:44:42,310][00405] Avg episode reward: [(0, '21.655')] [2024-11-25 02:44:44,128][04086] Updated weights for policy 0, policy_version 960 (0.0027) [2024-11-25 02:44:47,308][00405] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3943.3). Total num frames: 3944448. Throughput: 0: 963.0. Samples: 985846. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-25 02:44:47,314][00405] Avg episode reward: [(0, '22.546')] [2024-11-25 02:44:52,310][00405] Fps is (10 sec: 4914.2, 60 sec: 4095.9, 300 sec: 3971.0). Total num frames: 3969024. Throughput: 0: 1010.9. Samples: 992724. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-25 02:44:52,319][00405] Avg episode reward: [(0, '23.744')] [2024-11-25 02:44:53,598][04086] Updated weights for policy 0, policy_version 970 (0.0018) [2024-11-25 02:44:57,308][00405] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3943.3). Total num frames: 3981312. Throughput: 0: 998.7. Samples: 994812. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-25 02:44:57,315][00405] Avg episode reward: [(0, '24.438')] [2024-11-25 02:44:57,320][04073] Saving new best policy, reward=24.438! [2024-11-25 02:45:02,308][00405] Fps is (10 sec: 3277.5, 60 sec: 3822.9, 300 sec: 3929.4). Total num frames: 4001792. Throughput: 0: 958.2. Samples: 1000100. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-25 02:45:02,310][00405] Avg episode reward: [(0, '23.920')] [2024-11-25 02:45:02,657][04073] Stopping Batcher_0... [2024-11-25 02:45:02,657][00405] Component Batcher_0 stopped! [2024-11-25 02:45:02,659][04073] Loop batcher_evt_loop terminating... [2024-11-25 02:45:02,663][04073] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-11-25 02:45:02,730][04086] Weights refcount: 2 0 [2024-11-25 02:45:02,740][00405] Component InferenceWorker_p0-w0 stopped! [2024-11-25 02:45:02,742][04086] Stopping InferenceWorker_p0-w0... [2024-11-25 02:45:02,743][04086] Loop inference_proc0-0_evt_loop terminating... [2024-11-25 02:45:02,838][04073] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000777_3182592.pth [2024-11-25 02:45:02,855][04073] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-11-25 02:45:02,865][00405] Component RolloutWorker_w6 stopped! [2024-11-25 02:45:02,872][04097] Stopping RolloutWorker_w6... [2024-11-25 02:45:02,876][00405] Component RolloutWorker_w2 stopped! [2024-11-25 02:45:02,882][04093] Stopping RolloutWorker_w2... [2024-11-25 02:45:02,883][04093] Loop rollout_proc2_evt_loop terminating... [2024-11-25 02:45:02,891][04096] Stopping RolloutWorker_w4... [2024-11-25 02:45:02,885][00405] Component RolloutWorker_w4 stopped! [2024-11-25 02:45:02,900][00405] Component RolloutWorker_w0 stopped! [2024-11-25 02:45:02,904][04091] Stopping RolloutWorker_w0... [2024-11-25 02:45:02,905][04096] Loop rollout_proc4_evt_loop terminating... [2024-11-25 02:45:02,873][04097] Loop rollout_proc6_evt_loop terminating... [2024-11-25 02:45:02,910][04091] Loop rollout_proc0_evt_loop terminating... [2024-11-25 02:45:03,044][00405] Component LearnerWorker_p0 stopped! [2024-11-25 02:45:03,044][04073] Stopping LearnerWorker_p0... [2024-11-25 02:45:03,051][04073] Loop learner_proc0_evt_loop terminating... [2024-11-25 02:45:03,155][04095] Stopping RolloutWorker_w5... [2024-11-25 02:45:03,155][00405] Component RolloutWorker_w5 stopped! [2024-11-25 02:45:03,156][04095] Loop rollout_proc5_evt_loop terminating... [2024-11-25 02:45:03,172][04098] Stopping RolloutWorker_w7... [2024-11-25 02:45:03,173][00405] Component RolloutWorker_w7 stopped! [2024-11-25 02:45:03,183][04092] Stopping RolloutWorker_w1... [2024-11-25 02:45:03,179][04098] Loop rollout_proc7_evt_loop terminating... [2024-11-25 02:45:03,183][00405] Component RolloutWorker_w1 stopped! [2024-11-25 02:45:03,184][04092] Loop rollout_proc1_evt_loop terminating... [2024-11-25 02:45:03,240][04094] Stopping RolloutWorker_w3... [2024-11-25 02:45:03,240][00405] Component RolloutWorker_w3 stopped! [2024-11-25 02:45:03,252][00405] Waiting for process learner_proc0 to stop... [2024-11-25 02:45:03,243][04094] Loop rollout_proc3_evt_loop terminating... [2024-11-25 02:45:04,684][00405] Waiting for process inference_proc0-0 to join... [2024-11-25 02:45:04,691][00405] Waiting for process rollout_proc0 to join... [2024-11-25 02:45:06,428][00405] Waiting for process rollout_proc1 to join... [2024-11-25 02:45:06,651][00405] Waiting for process rollout_proc2 to join... [2024-11-25 02:45:06,655][00405] Waiting for process rollout_proc3 to join... [2024-11-25 02:45:06,660][00405] Waiting for process rollout_proc4 to join... [2024-11-25 02:45:06,665][00405] Waiting for process rollout_proc5 to join... [2024-11-25 02:45:06,669][00405] Waiting for process rollout_proc6 to join... [2024-11-25 02:45:06,674][00405] Waiting for process rollout_proc7 to join... [2024-11-25 02:45:06,678][00405] Batcher 0 profile tree view: batching: 26.4909, releasing_batches: 0.0273 [2024-11-25 02:45:06,681][00405] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0055 wait_policy_total: 396.2741 update_model: 8.3070 weight_update: 0.0029 one_step: 0.0040 handle_policy_step: 572.3708 deserialize: 14.3483, stack: 3.0452, obs_to_device_normalize: 121.5536, forward: 286.1444, send_messages: 29.0273 prepare_outputs: 89.4849 to_cpu: 54.2861 [2024-11-25 02:45:06,685][00405] Learner 0 profile tree view: misc: 0.0105, prepare_batch: 13.3422 train: 73.0674 epoch_init: 0.0067, minibatch_init: 0.0138, losses_postprocess: 0.6986, kl_divergence: 0.6422, after_optimizer: 33.8166 calculate_losses: 25.4285 losses_init: 0.0036, forward_head: 1.3204, bptt_initial: 16.9924, tail: 1.0290, advantages_returns: 0.2303, losses: 3.6852 bptt: 1.8411 bptt_forward_core: 1.7758 update: 11.8284 clip: 0.9467 [2024-11-25 02:45:06,686][00405] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.3464, enqueue_policy_requests: 90.7981, env_step: 801.4143, overhead: 11.9572, complete_rollouts: 7.3925 save_policy_outputs: 19.7729 split_output_tensors: 7.9406 [2024-11-25 02:45:06,687][00405] RolloutWorker_w7 profile tree view: wait_for_trajectories: 0.3497, enqueue_policy_requests: 92.8201, env_step: 798.1528, overhead: 12.5208, complete_rollouts: 6.1458 save_policy_outputs: 19.5633 split_output_tensors: 7.6872 [2024-11-25 02:45:06,688][00405] Loop Runner_EvtLoop terminating... [2024-11-25 02:45:06,690][00405] Runner profile tree view: main_loop: 1047.6176 [2024-11-25 02:45:06,691][00405] Collected {0: 4005888}, FPS: 3823.8 [2024-11-25 02:48:09,147][00405] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-11-25 02:48:09,150][00405] Overriding arg 'num_workers' with value 1 passed from command line [2024-11-25 02:48:09,154][00405] Adding new argument 'no_render'=True that is not in the saved config file! [2024-11-25 02:48:09,156][00405] Adding new argument 'save_video'=True that is not in the saved config file! [2024-11-25 02:48:09,158][00405] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-11-25 02:48:09,160][00405] Adding new argument 'video_name'=None that is not in the saved config file! [2024-11-25 02:48:09,161][00405] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! [2024-11-25 02:48:09,164][00405] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-11-25 02:48:09,165][00405] Adding new argument 'push_to_hub'=False that is not in the saved config file! [2024-11-25 02:48:09,167][00405] Adding new argument 'hf_repository'=None that is not in the saved config file! [2024-11-25 02:48:09,168][00405] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-11-25 02:48:09,169][00405] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-11-25 02:48:09,170][00405] Adding new argument 'train_script'=None that is not in the saved config file! [2024-11-25 02:48:09,171][00405] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-11-25 02:48:09,172][00405] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-11-25 02:48:09,204][00405] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-25 02:48:09,208][00405] RunningMeanStd input shape: (3, 72, 128) [2024-11-25 02:48:09,210][00405] RunningMeanStd input shape: (1,) [2024-11-25 02:48:09,228][00405] ConvEncoder: input_channels=3 [2024-11-25 02:48:09,328][00405] Conv encoder output size: 512 [2024-11-25 02:48:09,329][00405] Policy head output size: 512 [2024-11-25 02:48:09,532][00405] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-11-25 02:48:10,679][00405] Num frames 100... [2024-11-25 02:48:10,849][00405] Num frames 200... [2024-11-25 02:48:11,017][00405] Num frames 300... [2024-11-25 02:48:11,187][00405] Num frames 400... [2024-11-25 02:48:11,367][00405] Num frames 500... [2024-11-25 02:48:11,552][00405] Avg episode rewards: #0: 8.760, true rewards: #0: 5.760 [2024-11-25 02:48:11,555][00405] Avg episode reward: 8.760, avg true_objective: 5.760 [2024-11-25 02:48:11,611][00405] Num frames 600... [2024-11-25 02:48:11,732][00405] Num frames 700... [2024-11-25 02:48:11,860][00405] Num frames 800... [2024-11-25 02:48:11,981][00405] Num frames 900... [2024-11-25 02:48:12,054][00405] Avg episode rewards: #0: 6.565, true rewards: #0: 4.565 [2024-11-25 02:48:12,055][00405] Avg episode reward: 6.565, avg true_objective: 4.565 [2024-11-25 02:48:12,160][00405] Num frames 1000... [2024-11-25 02:48:12,281][00405] Num frames 1100... [2024-11-25 02:48:12,399][00405] Num frames 1200... [2024-11-25 02:48:12,518][00405] Num frames 1300... [2024-11-25 02:48:12,648][00405] Num frames 1400... [2024-11-25 02:48:12,768][00405] Avg episode rewards: #0: 8.180, true rewards: #0: 4.847 [2024-11-25 02:48:12,770][00405] Avg episode reward: 8.180, avg true_objective: 4.847 [2024-11-25 02:48:12,838][00405] Num frames 1500... [2024-11-25 02:48:12,954][00405] Num frames 1600... [2024-11-25 02:48:13,073][00405] Num frames 1700... [2024-11-25 02:48:13,193][00405] Num frames 1800... [2024-11-25 02:48:13,310][00405] Num frames 1900... [2024-11-25 02:48:13,434][00405] Num frames 2000... [2024-11-25 02:48:13,527][00405] Avg episode rewards: #0: 9.325, true rewards: #0: 5.075 [2024-11-25 02:48:13,529][00405] Avg episode reward: 9.325, avg true_objective: 5.075 [2024-11-25 02:48:13,622][00405] Num frames 2100... [2024-11-25 02:48:13,741][00405] Num frames 2200... [2024-11-25 02:48:13,871][00405] Num frames 2300... [2024-11-25 02:48:13,990][00405] Num frames 2400... [2024-11-25 02:48:14,110][00405] Num frames 2500... [2024-11-25 02:48:14,232][00405] Num frames 2600... [2024-11-25 02:48:14,353][00405] Num frames 2700... [2024-11-25 02:48:14,473][00405] Num frames 2800... [2024-11-25 02:48:14,599][00405] Num frames 2900... [2024-11-25 02:48:14,724][00405] Num frames 3000... [2024-11-25 02:48:14,846][00405] Num frames 3100... [2024-11-25 02:48:14,975][00405] Num frames 3200... [2024-11-25 02:48:15,097][00405] Num frames 3300... [2024-11-25 02:48:15,221][00405] Num frames 3400... [2024-11-25 02:48:15,343][00405] Num frames 3500... [2024-11-25 02:48:15,466][00405] Num frames 3600... [2024-11-25 02:48:15,592][00405] Num frames 3700... [2024-11-25 02:48:15,719][00405] Num frames 3800... [2024-11-25 02:48:15,843][00405] Num frames 3900... [2024-11-25 02:48:15,974][00405] Num frames 4000... [2024-11-25 02:48:16,095][00405] Num frames 4100... [2024-11-25 02:48:16,164][00405] Avg episode rewards: #0: 18.620, true rewards: #0: 8.220 [2024-11-25 02:48:16,167][00405] Avg episode reward: 18.620, avg true_objective: 8.220 [2024-11-25 02:48:16,282][00405] Num frames 4200... [2024-11-25 02:48:16,410][00405] Num frames 4300... [2024-11-25 02:48:16,532][00405] Num frames 4400... [2024-11-25 02:48:16,664][00405] Num frames 4500... [2024-11-25 02:48:16,813][00405] Avg episode rewards: #0: 17.132, true rewards: #0: 7.632 [2024-11-25 02:48:16,815][00405] Avg episode reward: 17.132, avg true_objective: 7.632 [2024-11-25 02:48:16,844][00405] Num frames 4600... [2024-11-25 02:48:16,976][00405] Num frames 4700... [2024-11-25 02:48:17,097][00405] Num frames 4800... [2024-11-25 02:48:17,217][00405] Num frames 4900... [2024-11-25 02:48:17,342][00405] Num frames 5000... [2024-11-25 02:48:17,463][00405] Num frames 5100... [2024-11-25 02:48:17,584][00405] Num frames 5200... [2024-11-25 02:48:17,712][00405] Num frames 5300... [2024-11-25 02:48:17,824][00405] Avg episode rewards: #0: 16.781, true rewards: #0: 7.639 [2024-11-25 02:48:17,826][00405] Avg episode reward: 16.781, avg true_objective: 7.639 [2024-11-25 02:48:17,894][00405] Num frames 5400... [2024-11-25 02:48:18,020][00405] Num frames 5500... [2024-11-25 02:48:18,144][00405] Num frames 5600... [2024-11-25 02:48:18,264][00405] Num frames 5700... [2024-11-25 02:48:18,384][00405] Num frames 5800... [2024-11-25 02:48:18,502][00405] Num frames 5900... [2024-11-25 02:48:18,627][00405] Num frames 6000... [2024-11-25 02:48:18,746][00405] Num frames 6100... [2024-11-25 02:48:18,865][00405] Num frames 6200... [2024-11-25 02:48:18,982][00405] Avg episode rewards: #0: 16.937, true rewards: #0: 7.812 [2024-11-25 02:48:18,984][00405] Avg episode reward: 16.937, avg true_objective: 7.812 [2024-11-25 02:48:19,045][00405] Num frames 6300... [2024-11-25 02:48:19,166][00405] Num frames 6400... [2024-11-25 02:48:19,285][00405] Num frames 6500... [2024-11-25 02:48:19,409][00405] Num frames 6600... [2024-11-25 02:48:19,529][00405] Num frames 6700... [2024-11-25 02:48:19,658][00405] Num frames 6800... [2024-11-25 02:48:19,782][00405] Num frames 6900... [2024-11-25 02:48:19,901][00405] Num frames 7000... [2024-11-25 02:48:20,027][00405] Num frames 7100... [2024-11-25 02:48:20,153][00405] Num frames 7200... [2024-11-25 02:48:20,271][00405] Num frames 7300... [2024-11-25 02:48:20,391][00405] Num frames 7400... [2024-11-25 02:48:20,512][00405] Num frames 7500... [2024-11-25 02:48:20,636][00405] Num frames 7600... [2024-11-25 02:48:20,804][00405] Avg episode rewards: #0: 18.883, true rewards: #0: 8.550 [2024-11-25 02:48:20,806][00405] Avg episode reward: 18.883, avg true_objective: 8.550 [2024-11-25 02:48:20,816][00405] Num frames 7700... [2024-11-25 02:48:20,939][00405] Num frames 7800... [2024-11-25 02:48:21,067][00405] Num frames 7900... [2024-11-25 02:48:21,188][00405] Num frames 8000... [2024-11-25 02:48:21,309][00405] Num frames 8100... [2024-11-25 02:48:21,430][00405] Num frames 8200... [2024-11-25 02:48:21,495][00405] Avg episode rewards: #0: 17.807, true rewards: #0: 8.207 [2024-11-25 02:48:21,496][00405] Avg episode reward: 17.807, avg true_objective: 8.207 [2024-11-25 02:49:11,697][00405] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2024-11-25 02:51:26,140][00405] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-11-25 02:51:26,141][00405] Overriding arg 'num_workers' with value 1 passed from command line [2024-11-25 02:51:26,143][00405] Adding new argument 'no_render'=True that is not in the saved config file! [2024-11-25 02:51:26,145][00405] Adding new argument 'save_video'=True that is not in the saved config file! [2024-11-25 02:51:26,147][00405] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-11-25 02:51:26,149][00405] Adding new argument 'video_name'=None that is not in the saved config file! [2024-11-25 02:51:26,150][00405] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2024-11-25 02:51:26,151][00405] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-11-25 02:51:26,152][00405] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2024-11-25 02:51:26,153][00405] Adding new argument 'hf_repository'='zfh1995/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2024-11-25 02:51:26,154][00405] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-11-25 02:51:26,155][00405] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-11-25 02:51:26,157][00405] Adding new argument 'train_script'=None that is not in the saved config file! [2024-11-25 02:51:26,158][00405] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-11-25 02:51:26,159][00405] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-11-25 02:51:26,198][00405] RunningMeanStd input shape: (3, 72, 128) [2024-11-25 02:51:26,200][00405] RunningMeanStd input shape: (1,) [2024-11-25 02:51:26,212][00405] ConvEncoder: input_channels=3 [2024-11-25 02:51:26,249][00405] Conv encoder output size: 512 [2024-11-25 02:51:26,251][00405] Policy head output size: 512 [2024-11-25 02:51:26,269][00405] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-11-25 02:51:26,687][00405] Num frames 100... [2024-11-25 02:51:26,813][00405] Num frames 200... [2024-11-25 02:51:26,931][00405] Num frames 300... [2024-11-25 02:51:27,050][00405] Num frames 400... [2024-11-25 02:51:27,172][00405] Num frames 500... [2024-11-25 02:51:27,299][00405] Num frames 600... [2024-11-25 02:51:27,422][00405] Num frames 700... [2024-11-25 02:51:27,553][00405] Num frames 800... [2024-11-25 02:51:27,680][00405] Num frames 900... [2024-11-25 02:51:27,798][00405] Num frames 1000... [2024-11-25 02:51:27,920][00405] Num frames 1100... [2024-11-25 02:51:28,093][00405] Num frames 1200... [2024-11-25 02:51:28,269][00405] Num frames 1300... [2024-11-25 02:51:28,441][00405] Num frames 1400... [2024-11-25 02:51:28,609][00405] Num frames 1500... [2024-11-25 02:51:28,734][00405] Avg episode rewards: #0: 33.360, true rewards: #0: 15.360 [2024-11-25 02:51:28,736][00405] Avg episode reward: 33.360, avg true_objective: 15.360 [2024-11-25 02:51:28,841][00405] Num frames 1600... [2024-11-25 02:51:28,997][00405] Num frames 1700... [2024-11-25 02:51:29,160][00405] Num frames 1800... [2024-11-25 02:51:29,334][00405] Num frames 1900... [2024-11-25 02:51:29,519][00405] Num frames 2000... [2024-11-25 02:51:29,705][00405] Num frames 2100... [2024-11-25 02:51:29,877][00405] Num frames 2200... [2024-11-25 02:51:30,046][00405] Num frames 2300... [2024-11-25 02:51:30,212][00405] Num frames 2400... [2024-11-25 02:51:30,339][00405] Num frames 2500... [2024-11-25 02:51:30,458][00405] Num frames 2600... [2024-11-25 02:51:30,577][00405] Num frames 2700... [2024-11-25 02:51:30,752][00405] Avg episode rewards: #0: 33.475, true rewards: #0: 13.975 [2024-11-25 02:51:30,753][00405] Avg episode reward: 33.475, avg true_objective: 13.975 [2024-11-25 02:51:30,762][00405] Num frames 2800... [2024-11-25 02:51:30,879][00405] Num frames 2900... [2024-11-25 02:51:30,999][00405] Num frames 3000... [2024-11-25 02:51:31,116][00405] Num frames 3100... [2024-11-25 02:51:31,232][00405] Num frames 3200... [2024-11-25 02:51:31,363][00405] Num frames 3300... [2024-11-25 02:51:31,482][00405] Num frames 3400... [2024-11-25 02:51:31,607][00405] Num frames 3500... [2024-11-25 02:51:31,659][00405] Avg episode rewards: #0: 27.667, true rewards: #0: 11.667 [2024-11-25 02:51:31,660][00405] Avg episode reward: 27.667, avg true_objective: 11.667 [2024-11-25 02:51:31,780][00405] Num frames 3600... [2024-11-25 02:51:31,897][00405] Num frames 3700... [2024-11-25 02:51:32,013][00405] Num frames 3800... [2024-11-25 02:51:32,131][00405] Num frames 3900... [2024-11-25 02:51:32,248][00405] Num frames 4000... [2024-11-25 02:51:32,370][00405] Num frames 4100... [2024-11-25 02:51:32,547][00405] Avg episode rewards: #0: 23.988, true rewards: #0: 10.487 [2024-11-25 02:51:32,548][00405] Avg episode reward: 23.988, avg true_objective: 10.487 [2024-11-25 02:51:32,557][00405] Num frames 4200... [2024-11-25 02:51:32,684][00405] Num frames 4300... [2024-11-25 02:51:32,801][00405] Num frames 4400... [2024-11-25 02:51:32,921][00405] Num frames 4500... [2024-11-25 02:51:33,038][00405] Num frames 4600... [2024-11-25 02:51:33,155][00405] Num frames 4700... [2024-11-25 02:51:33,256][00405] Avg episode rewards: #0: 20.678, true rewards: #0: 9.478 [2024-11-25 02:51:33,257][00405] Avg episode reward: 20.678, avg true_objective: 9.478 [2024-11-25 02:51:33,331][00405] Num frames 4800... [2024-11-25 02:51:33,458][00405] Num frames 4900... [2024-11-25 02:51:33,577][00405] Num frames 5000... [2024-11-25 02:51:33,706][00405] Num frames 5100... [2024-11-25 02:51:33,824][00405] Num frames 5200... [2024-11-25 02:51:33,952][00405] Avg episode rewards: #0: 18.770, true rewards: #0: 8.770 [2024-11-25 02:51:33,953][00405] Avg episode reward: 18.770, avg true_objective: 8.770 [2024-11-25 02:51:33,999][00405] Num frames 5300... [2024-11-25 02:51:34,118][00405] Num frames 5400... [2024-11-25 02:51:34,240][00405] Num frames 5500... [2024-11-25 02:51:34,362][00405] Num frames 5600... [2024-11-25 02:51:34,493][00405] Num frames 5700... [2024-11-25 02:51:34,624][00405] Num frames 5800... [2024-11-25 02:51:34,745][00405] Num frames 5900... [2024-11-25 02:51:34,867][00405] Num frames 6000... [2024-11-25 02:51:34,984][00405] Num frames 6100... [2024-11-25 02:51:35,152][00405] Avg episode rewards: #0: 18.843, true rewards: #0: 8.843 [2024-11-25 02:51:35,153][00405] Avg episode reward: 18.843, avg true_objective: 8.843 [2024-11-25 02:51:35,168][00405] Num frames 6200... [2024-11-25 02:51:35,287][00405] Num frames 6300... [2024-11-25 02:51:35,407][00405] Num frames 6400... [2024-11-25 02:51:35,541][00405] Num frames 6500... [2024-11-25 02:51:35,670][00405] Num frames 6600... [2024-11-25 02:51:35,791][00405] Num frames 6700... [2024-11-25 02:51:35,914][00405] Num frames 6800... [2024-11-25 02:51:36,032][00405] Num frames 6900... [2024-11-25 02:51:36,193][00405] Avg episode rewards: #0: 18.863, true rewards: #0: 8.737 [2024-11-25 02:51:36,195][00405] Avg episode reward: 18.863, avg true_objective: 8.737 [2024-11-25 02:51:36,210][00405] Num frames 7000... [2024-11-25 02:51:36,329][00405] Num frames 7100... [2024-11-25 02:51:36,449][00405] Num frames 7200... [2024-11-25 02:51:36,582][00405] Num frames 7300... [2024-11-25 02:51:36,708][00405] Num frames 7400... [2024-11-25 02:51:36,827][00405] Num frames 7500... [2024-11-25 02:51:36,948][00405] Num frames 7600... [2024-11-25 02:51:37,068][00405] Num frames 7700... [2024-11-25 02:51:37,188][00405] Num frames 7800... [2024-11-25 02:51:37,312][00405] Num frames 7900... [2024-11-25 02:51:37,432][00405] Num frames 8000... [2024-11-25 02:51:37,560][00405] Num frames 8100... [2024-11-25 02:51:37,693][00405] Num frames 8200... [2024-11-25 02:51:37,819][00405] Num frames 8300... [2024-11-25 02:51:37,938][00405] Num frames 8400... [2024-11-25 02:51:38,062][00405] Num frames 8500... [2024-11-25 02:51:38,187][00405] Num frames 8600... [2024-11-25 02:51:38,305][00405] Num frames 8700... [2024-11-25 02:51:38,431][00405] Num frames 8800... [2024-11-25 02:51:38,561][00405] Num frames 8900... [2024-11-25 02:51:38,696][00405] Num frames 9000... [2024-11-25 02:51:38,863][00405] Avg episode rewards: #0: 23.211, true rewards: #0: 10.100 [2024-11-25 02:51:38,865][00405] Avg episode reward: 23.211, avg true_objective: 10.100 [2024-11-25 02:51:38,879][00405] Num frames 9100... [2024-11-25 02:51:38,996][00405] Num frames 9200... [2024-11-25 02:51:39,116][00405] Num frames 9300... [2024-11-25 02:51:39,243][00405] Num frames 9400... [2024-11-25 02:51:39,363][00405] Num frames 9500... [2024-11-25 02:51:39,483][00405] Num frames 9600... [2024-11-25 02:51:39,679][00405] Avg episode rewards: #0: 21.998, true rewards: #0: 9.698 [2024-11-25 02:51:39,681][00405] Avg episode reward: 21.998, avg true_objective: 9.698 [2024-11-25 02:51:39,686][00405] Num frames 9700... [2024-11-25 02:52:36,479][00405] Replay video saved to /content/train_dir/default_experiment/replay.mp4!