diff --git "a/sf_log.txt" "b/sf_log.txt" new file mode 100644--- /dev/null +++ "b/sf_log.txt" @@ -0,0 +1,1331 @@ +[2024-09-18 11:19:19,246][00268] Saving configuration to /content/train_dir/default_experiment/config.json... +[2024-09-18 11:19:19,248][00268] Rollout worker 0 uses device cpu +[2024-09-18 11:19:19,250][00268] Rollout worker 1 uses device cpu +[2024-09-18 11:19:19,252][00268] Rollout worker 2 uses device cpu +[2024-09-18 11:19:19,253][00268] Rollout worker 3 uses device cpu +[2024-09-18 11:19:19,254][00268] Rollout worker 4 uses device cpu +[2024-09-18 11:19:19,259][00268] Rollout worker 5 uses device cpu +[2024-09-18 11:19:19,260][00268] Rollout worker 6 uses device cpu +[2024-09-18 11:19:19,261][00268] Rollout worker 7 uses device cpu +[2024-09-18 11:19:19,404][00268] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-09-18 11:19:19,405][00268] InferenceWorker_p0-w0: min num requests: 2 +[2024-09-18 11:19:19,438][00268] Starting all processes... +[2024-09-18 11:19:19,439][00268] Starting process learner_proc0 +[2024-09-18 11:19:19,488][00268] Starting all processes... +[2024-09-18 11:19:19,499][00268] Starting process inference_proc0-0 +[2024-09-18 11:19:19,500][00268] Starting process rollout_proc0 +[2024-09-18 11:19:19,500][00268] Starting process rollout_proc1 +[2024-09-18 11:19:19,500][00268] Starting process rollout_proc2 +[2024-09-18 11:19:19,500][00268] Starting process rollout_proc3 +[2024-09-18 11:19:19,500][00268] Starting process rollout_proc4 +[2024-09-18 11:19:19,500][00268] Starting process rollout_proc5 +[2024-09-18 11:19:19,500][00268] Starting process rollout_proc6 +[2024-09-18 11:19:19,500][00268] Starting process rollout_proc7 +[2024-09-18 11:19:30,928][03589] Worker 5 uses CPU cores [1] +[2024-09-18 11:19:31,020][03587] Worker 3 uses CPU cores [1] +[2024-09-18 11:19:31,023][03570] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-09-18 11:19:31,023][03570] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 +[2024-09-18 11:19:31,027][03588] Worker 6 uses CPU cores [0] +[2024-09-18 11:19:31,045][03584] Worker 0 uses CPU cores [0] +[2024-09-18 11:19:31,050][03586] Worker 2 uses CPU cores [0] +[2024-09-18 11:19:31,070][03591] Worker 7 uses CPU cores [1] +[2024-09-18 11:19:31,082][03570] Num visible devices: 1 +[2024-09-18 11:19:31,116][03570] Starting seed is not provided +[2024-09-18 11:19:31,117][03570] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-09-18 11:19:31,118][03570] Initializing actor-critic model on device cuda:0 +[2024-09-18 11:19:31,120][03570] RunningMeanStd input shape: (3, 72, 128) +[2024-09-18 11:19:31,121][03570] RunningMeanStd input shape: (1,) +[2024-09-18 11:19:31,132][03583] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-09-18 11:19:31,132][03583] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 +[2024-09-18 11:19:31,133][03590] Worker 4 uses CPU cores [0] +[2024-09-18 11:19:31,150][03583] Num visible devices: 1 +[2024-09-18 11:19:31,154][03570] ConvEncoder: input_channels=3 +[2024-09-18 11:19:31,183][03585] Worker 1 uses CPU cores [1] +[2024-09-18 11:19:31,328][03570] Conv encoder output size: 512 +[2024-09-18 11:19:31,328][03570] Policy head output size: 512 +[2024-09-18 11:19:31,344][03570] Created Actor Critic model with architecture: +[2024-09-18 11:19:31,344][03570] ActorCriticSharedWeights( + (obs_normalizer): ObservationNormalizer( + (running_mean_std): RunningMeanStdDictInPlace( + (running_mean_std): ModuleDict( + (obs): RunningMeanStdInPlace() + ) + ) + ) + (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) + (encoder): VizdoomEncoder( + (basic_encoder): ConvEncoder( + (enc): RecursiveScriptModule( + original_name=ConvEncoderImpl + (conv_head): RecursiveScriptModule( + original_name=Sequential + (0): RecursiveScriptModule(original_name=Conv2d) + (1): RecursiveScriptModule(original_name=ELU) + (2): RecursiveScriptModule(original_name=Conv2d) + (3): RecursiveScriptModule(original_name=ELU) + (4): RecursiveScriptModule(original_name=Conv2d) + (5): RecursiveScriptModule(original_name=ELU) + ) + (mlp_layers): RecursiveScriptModule( + original_name=Sequential + (0): RecursiveScriptModule(original_name=Linear) + (1): RecursiveScriptModule(original_name=ELU) + ) + ) + ) + ) + (core): ModelCoreRNN( + (core): GRU(512, 512) + ) + (decoder): MlpDecoder( + (mlp): Identity() + ) + (critic_linear): Linear(in_features=512, out_features=1, bias=True) + (action_parameterization): ActionParameterizationDefault( + (distribution_linear): Linear(in_features=512, out_features=5, bias=True) + ) +) +[2024-09-18 11:19:35,560][03570] Using optimizer +[2024-09-18 11:19:35,561][03570] No checkpoints found +[2024-09-18 11:19:35,561][03570] Did not load from checkpoint, starting from scratch! +[2024-09-18 11:19:35,561][03570] Initialized policy 0 weights for model version 0 +[2024-09-18 11:19:35,564][03570] LearnerWorker_p0 finished initialization! +[2024-09-18 11:19:35,566][03570] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-09-18 11:19:35,698][03583] RunningMeanStd input shape: (3, 72, 128) +[2024-09-18 11:19:35,700][03583] RunningMeanStd input shape: (1,) +[2024-09-18 11:19:35,715][03583] ConvEncoder: input_channels=3 +[2024-09-18 11:19:35,831][03583] Conv encoder output size: 512 +[2024-09-18 11:19:35,831][03583] Policy head output size: 512 +[2024-09-18 11:19:35,848][00268] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-18 11:19:37,362][00268] Inference worker 0-0 is ready! +[2024-09-18 11:19:37,364][00268] All inference workers are ready! Signal rollout workers to start! +[2024-09-18 11:19:37,488][03584] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-09-18 11:19:37,491][03590] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-09-18 11:19:37,506][03591] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-09-18 11:19:37,511][03589] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-09-18 11:19:37,510][03587] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-09-18 11:19:37,513][03586] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-09-18 11:19:37,522][03588] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-09-18 11:19:37,527][03585] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-09-18 11:19:38,851][03590] Decorrelating experience for 0 frames... +[2024-09-18 11:19:38,853][03586] Decorrelating experience for 0 frames... +[2024-09-18 11:19:38,851][03585] Decorrelating experience for 0 frames... +[2024-09-18 11:19:38,853][03587] Decorrelating experience for 0 frames... +[2024-09-18 11:19:38,854][03589] Decorrelating experience for 0 frames... +[2024-09-18 11:19:39,396][00268] Heartbeat connected on Batcher_0 +[2024-09-18 11:19:39,403][00268] Heartbeat connected on LearnerWorker_p0 +[2024-09-18 11:19:39,432][00268] Heartbeat connected on InferenceWorker_p0-w0 +[2024-09-18 11:19:39,610][03586] Decorrelating experience for 32 frames... +[2024-09-18 11:19:39,615][03590] Decorrelating experience for 32 frames... +[2024-09-18 11:19:39,628][03589] Decorrelating experience for 32 frames... +[2024-09-18 11:19:39,630][03587] Decorrelating experience for 32 frames... +[2024-09-18 11:19:40,574][03586] Decorrelating experience for 64 frames... +[2024-09-18 11:19:40,590][03590] Decorrelating experience for 64 frames... +[2024-09-18 11:19:40,615][03585] Decorrelating experience for 32 frames... +[2024-09-18 11:19:40,731][03589] Decorrelating experience for 64 frames... +[2024-09-18 11:19:40,847][00268] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-18 11:19:41,437][03590] Decorrelating experience for 96 frames... +[2024-09-18 11:19:41,564][00268] Heartbeat connected on RolloutWorker_w4 +[2024-09-18 11:19:41,567][03586] Decorrelating experience for 96 frames... +[2024-09-18 11:19:41,693][00268] Heartbeat connected on RolloutWorker_w2 +[2024-09-18 11:19:41,778][03587] Decorrelating experience for 64 frames... +[2024-09-18 11:19:41,880][03585] Decorrelating experience for 64 frames... +[2024-09-18 11:19:41,926][03589] Decorrelating experience for 96 frames... +[2024-09-18 11:19:42,039][00268] Heartbeat connected on RolloutWorker_w5 +[2024-09-18 11:19:42,275][03585] Decorrelating experience for 96 frames... +[2024-09-18 11:19:42,335][00268] Heartbeat connected on RolloutWorker_w1 +[2024-09-18 11:19:42,585][03587] Decorrelating experience for 96 frames... +[2024-09-18 11:19:42,645][00268] Heartbeat connected on RolloutWorker_w3 +[2024-09-18 11:19:45,848][00268] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 2.4. Samples: 24. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-18 11:19:45,850][00268] Avg episode reward: [(0, '2.020')] +[2024-09-18 11:19:48,873][03570] Signal inference workers to stop experience collection... +[2024-09-18 11:19:48,882][03583] InferenceWorker_p0-w0: stopping experience collection +[2024-09-18 11:19:50,018][03570] Signal inference workers to resume experience collection... +[2024-09-18 11:19:50,020][03583] InferenceWorker_p0-w0: resuming experience collection +[2024-09-18 11:19:50,848][00268] Fps is (10 sec: 409.6, 60 sec: 273.1, 300 sec: 273.1). Total num frames: 4096. Throughput: 0: 165.3. Samples: 2480. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) +[2024-09-18 11:19:50,850][00268] Avg episode reward: [(0, '3.103')] +[2024-09-18 11:19:55,848][00268] Fps is (10 sec: 2867.2, 60 sec: 1433.6, 300 sec: 1433.6). Total num frames: 28672. Throughput: 0: 359.4. Samples: 7188. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-18 11:19:55,849][00268] Avg episode reward: [(0, '4.019')] +[2024-09-18 11:19:59,169][03583] Updated weights for policy 0, policy_version 10 (0.0362) +[2024-09-18 11:20:00,849][00268] Fps is (10 sec: 4095.3, 60 sec: 1802.1, 300 sec: 1802.1). Total num frames: 45056. Throughput: 0: 396.1. Samples: 9904. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-09-18 11:20:00,852][00268] Avg episode reward: [(0, '4.359')] +[2024-09-18 11:20:05,848][00268] Fps is (10 sec: 2867.2, 60 sec: 1911.5, 300 sec: 1911.5). Total num frames: 57344. Throughput: 0: 479.5. Samples: 14384. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-09-18 11:20:05,855][00268] Avg episode reward: [(0, '4.327')] +[2024-09-18 11:20:10,848][00268] Fps is (10 sec: 3277.4, 60 sec: 2223.5, 300 sec: 2223.5). Total num frames: 77824. Throughput: 0: 579.5. Samples: 20284. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-09-18 11:20:10,860][00268] Avg episode reward: [(0, '4.429')] +[2024-09-18 11:20:10,933][03583] Updated weights for policy 0, policy_version 20 (0.0018) +[2024-09-18 11:20:15,848][00268] Fps is (10 sec: 4096.0, 60 sec: 2457.6, 300 sec: 2457.6). Total num frames: 98304. Throughput: 0: 585.0. Samples: 23400. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-18 11:20:15,852][00268] Avg episode reward: [(0, '4.569')] +[2024-09-18 11:20:20,849][00268] Fps is (10 sec: 3685.8, 60 sec: 2548.5, 300 sec: 2548.5). Total num frames: 114688. Throughput: 0: 625.6. Samples: 28152. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-09-18 11:20:20,851][00268] Avg episode reward: [(0, '4.446')] +[2024-09-18 11:20:20,858][03570] Saving new best policy, reward=4.446! +[2024-09-18 11:20:22,873][03583] Updated weights for policy 0, policy_version 30 (0.0015) +[2024-09-18 11:20:25,850][00268] Fps is (10 sec: 3685.4, 60 sec: 2703.2, 300 sec: 2703.2). Total num frames: 135168. Throughput: 0: 746.9. Samples: 33612. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-18 11:20:25,855][00268] Avg episode reward: [(0, '4.500')] +[2024-09-18 11:20:25,863][03570] Saving new best policy, reward=4.500! +[2024-09-18 11:20:30,848][00268] Fps is (10 sec: 3687.0, 60 sec: 2755.5, 300 sec: 2755.5). Total num frames: 151552. Throughput: 0: 813.5. Samples: 36632. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-18 11:20:30,856][00268] Avg episode reward: [(0, '4.542')] +[2024-09-18 11:20:30,859][03570] Saving new best policy, reward=4.542! +[2024-09-18 11:20:33,994][03583] Updated weights for policy 0, policy_version 40 (0.0014) +[2024-09-18 11:20:35,848][00268] Fps is (10 sec: 3277.7, 60 sec: 2798.9, 300 sec: 2798.9). Total num frames: 167936. Throughput: 0: 865.9. Samples: 41444. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-09-18 11:20:35,854][00268] Avg episode reward: [(0, '4.331')] +[2024-09-18 11:20:40,848][00268] Fps is (10 sec: 3276.8, 60 sec: 3072.0, 300 sec: 2835.7). Total num frames: 184320. Throughput: 0: 877.5. Samples: 46674. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-18 11:20:40,857][00268] Avg episode reward: [(0, '4.349')] +[2024-09-18 11:20:45,029][03583] Updated weights for policy 0, policy_version 50 (0.0014) +[2024-09-18 11:20:45,848][00268] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 2925.7). Total num frames: 204800. Throughput: 0: 886.3. Samples: 49786. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-18 11:20:45,853][00268] Avg episode reward: [(0, '4.333')] +[2024-09-18 11:20:50,847][00268] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 2949.1). Total num frames: 221184. Throughput: 0: 902.4. Samples: 54994. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-09-18 11:20:50,858][00268] Avg episode reward: [(0, '4.517')] +[2024-09-18 11:20:55,848][00268] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 2969.6). Total num frames: 237568. Throughput: 0: 883.3. Samples: 60032. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-09-18 11:20:55,850][00268] Avg episode reward: [(0, '4.437')] +[2024-09-18 11:20:57,080][03583] Updated weights for policy 0, policy_version 60 (0.0017) +[2024-09-18 11:21:00,848][00268] Fps is (10 sec: 3686.4, 60 sec: 3550.0, 300 sec: 3035.9). Total num frames: 258048. Throughput: 0: 883.9. Samples: 63174. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-18 11:21:00,855][00268] Avg episode reward: [(0, '4.298')] +[2024-09-18 11:21:05,851][00268] Fps is (10 sec: 3685.0, 60 sec: 3617.9, 300 sec: 3049.1). Total num frames: 274432. Throughput: 0: 897.5. Samples: 68540. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-09-18 11:21:05,857][00268] Avg episode reward: [(0, '4.370')] +[2024-09-18 11:21:09,049][03583] Updated weights for policy 0, policy_version 70 (0.0014) +[2024-09-18 11:21:10,848][00268] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3061.2). Total num frames: 290816. Throughput: 0: 882.0. Samples: 73298. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2024-09-18 11:21:10,856][00268] Avg episode reward: [(0, '4.459')] +[2024-09-18 11:21:15,848][00268] Fps is (10 sec: 3687.8, 60 sec: 3549.9, 300 sec: 3113.0). Total num frames: 311296. Throughput: 0: 884.7. Samples: 76442. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-18 11:21:15,851][00268] Avg episode reward: [(0, '4.622')] +[2024-09-18 11:21:15,872][03570] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000077_315392.pth... +[2024-09-18 11:21:15,997][03570] Saving new best policy, reward=4.622! +[2024-09-18 11:21:19,783][03583] Updated weights for policy 0, policy_version 80 (0.0014) +[2024-09-18 11:21:20,848][00268] Fps is (10 sec: 3686.4, 60 sec: 3550.0, 300 sec: 3120.8). Total num frames: 327680. Throughput: 0: 900.2. Samples: 81952. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-18 11:21:20,850][00268] Avg episode reward: [(0, '4.606')] +[2024-09-18 11:21:25,848][00268] Fps is (10 sec: 3276.8, 60 sec: 3481.8, 300 sec: 3127.9). Total num frames: 344064. Throughput: 0: 884.8. Samples: 86492. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-09-18 11:21:25,850][00268] Avg episode reward: [(0, '4.602')] +[2024-09-18 11:21:30,848][00268] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3169.9). Total num frames: 364544. Throughput: 0: 884.0. Samples: 89568. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-09-18 11:21:30,850][00268] Avg episode reward: [(0, '4.565')] +[2024-09-18 11:21:31,357][03583] Updated weights for policy 0, policy_version 90 (0.0016) +[2024-09-18 11:21:35,850][00268] Fps is (10 sec: 3685.6, 60 sec: 3549.7, 300 sec: 3174.3). Total num frames: 380928. Throughput: 0: 896.2. Samples: 95324. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-18 11:21:35,852][00268] Avg episode reward: [(0, '4.454')] +[2024-09-18 11:21:40,848][00268] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3178.5). Total num frames: 397312. Throughput: 0: 875.7. Samples: 99440. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-18 11:21:40,850][00268] Avg episode reward: [(0, '4.530')] +[2024-09-18 11:21:43,611][03583] Updated weights for policy 0, policy_version 100 (0.0018) +[2024-09-18 11:21:45,848][00268] Fps is (10 sec: 3687.2, 60 sec: 3549.9, 300 sec: 3213.8). Total num frames: 417792. Throughput: 0: 872.4. Samples: 102432. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-09-18 11:21:45,854][00268] Avg episode reward: [(0, '4.457')] +[2024-09-18 11:21:50,848][00268] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3216.1). Total num frames: 434176. Throughput: 0: 891.9. Samples: 108674. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-18 11:21:50,851][00268] Avg episode reward: [(0, '4.456')] +[2024-09-18 11:21:55,771][03583] Updated weights for policy 0, policy_version 110 (0.0024) +[2024-09-18 11:21:55,848][00268] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3218.3). Total num frames: 450560. Throughput: 0: 873.3. Samples: 112598. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-09-18 11:21:55,855][00268] Avg episode reward: [(0, '4.704')] +[2024-09-18 11:21:55,868][03570] Saving new best policy, reward=4.704! +[2024-09-18 11:22:00,847][00268] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3220.3). Total num frames: 466944. Throughput: 0: 869.3. Samples: 115560. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-09-18 11:22:00,854][00268] Avg episode reward: [(0, '4.757')] +[2024-09-18 11:22:00,947][03570] Saving new best policy, reward=4.757! +[2024-09-18 11:22:05,848][00268] Fps is (10 sec: 3686.4, 60 sec: 3550.1, 300 sec: 3249.5). Total num frames: 487424. Throughput: 0: 879.7. Samples: 121538. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-09-18 11:22:05,851][00268] Avg episode reward: [(0, '4.508')] +[2024-09-18 11:22:06,489][03583] Updated weights for policy 0, policy_version 120 (0.0017) +[2024-09-18 11:22:10,848][00268] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3223.9). Total num frames: 499712. Throughput: 0: 876.0. Samples: 125910. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-09-18 11:22:10,857][00268] Avg episode reward: [(0, '4.328')] +[2024-09-18 11:22:15,848][00268] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3251.2). Total num frames: 520192. Throughput: 0: 866.6. Samples: 128566. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-09-18 11:22:15,850][00268] Avg episode reward: [(0, '4.399')] +[2024-09-18 11:22:18,208][03583] Updated weights for policy 0, policy_version 130 (0.0019) +[2024-09-18 11:22:20,848][00268] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3276.8). Total num frames: 540672. Throughput: 0: 875.0. Samples: 134696. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-09-18 11:22:20,855][00268] Avg episode reward: [(0, '4.406')] +[2024-09-18 11:22:25,848][00268] Fps is (10 sec: 3686.3, 60 sec: 3549.9, 300 sec: 3276.8). Total num frames: 557056. Throughput: 0: 887.0. Samples: 139356. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-09-18 11:22:25,853][00268] Avg episode reward: [(0, '4.523')] +[2024-09-18 11:22:30,379][03583] Updated weights for policy 0, policy_version 140 (0.0021) +[2024-09-18 11:22:30,847][00268] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3276.8). Total num frames: 573440. Throughput: 0: 873.2. Samples: 141726. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-09-18 11:22:30,850][00268] Avg episode reward: [(0, '4.425')] +[2024-09-18 11:22:35,848][00268] Fps is (10 sec: 3686.5, 60 sec: 3550.0, 300 sec: 3299.6). Total num frames: 593920. Throughput: 0: 868.0. Samples: 147734. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-09-18 11:22:35,850][00268] Avg episode reward: [(0, '4.410')] +[2024-09-18 11:22:40,847][00268] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3298.9). Total num frames: 610304. Throughput: 0: 890.3. Samples: 152660. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-09-18 11:22:40,855][00268] Avg episode reward: [(0, '4.373')] +[2024-09-18 11:22:42,334][03583] Updated weights for policy 0, policy_version 150 (0.0012) +[2024-09-18 11:22:45,848][00268] Fps is (10 sec: 3276.7, 60 sec: 3481.6, 300 sec: 3298.4). Total num frames: 626688. Throughput: 0: 872.3. Samples: 154812. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-09-18 11:22:45,850][00268] Avg episode reward: [(0, '4.299')] +[2024-09-18 11:22:50,848][00268] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3318.8). Total num frames: 647168. Throughput: 0: 874.9. Samples: 160910. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-18 11:22:50,852][00268] Avg episode reward: [(0, '4.312')] +[2024-09-18 11:22:52,666][03583] Updated weights for policy 0, policy_version 160 (0.0012) +[2024-09-18 11:22:55,849][00268] Fps is (10 sec: 3686.0, 60 sec: 3549.8, 300 sec: 3317.7). Total num frames: 663552. Throughput: 0: 894.5. Samples: 166162. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-18 11:22:55,854][00268] Avg episode reward: [(0, '4.478')] +[2024-09-18 11:23:00,848][00268] Fps is (10 sec: 3276.7, 60 sec: 3549.9, 300 sec: 3316.8). Total num frames: 679936. Throughput: 0: 878.4. Samples: 168094. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-18 11:23:00,851][00268] Avg episode reward: [(0, '4.674')] +[2024-09-18 11:23:04,871][03583] Updated weights for policy 0, policy_version 170 (0.0021) +[2024-09-18 11:23:05,850][00268] Fps is (10 sec: 3686.1, 60 sec: 3549.7, 300 sec: 3335.3). Total num frames: 700416. Throughput: 0: 874.2. Samples: 174038. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-09-18 11:23:05,853][00268] Avg episode reward: [(0, '4.713')] +[2024-09-18 11:23:10,848][00268] Fps is (10 sec: 3686.5, 60 sec: 3618.1, 300 sec: 3334.0). Total num frames: 716800. Throughput: 0: 891.6. Samples: 179478. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-18 11:23:10,850][00268] Avg episode reward: [(0, '4.564')] +[2024-09-18 11:23:15,848][00268] Fps is (10 sec: 3277.5, 60 sec: 3549.9, 300 sec: 3332.7). Total num frames: 733184. Throughput: 0: 883.7. Samples: 181492. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-18 11:23:15,850][00268] Avg episode reward: [(0, '4.597')] +[2024-09-18 11:23:15,859][03570] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000179_733184.pth... +[2024-09-18 11:23:16,864][03583] Updated weights for policy 0, policy_version 180 (0.0014) +[2024-09-18 11:23:20,848][00268] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3349.6). Total num frames: 753664. Throughput: 0: 880.8. Samples: 187368. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-18 11:23:20,853][00268] Avg episode reward: [(0, '4.606')] +[2024-09-18 11:23:25,849][00268] Fps is (10 sec: 3685.9, 60 sec: 3549.8, 300 sec: 3348.0). Total num frames: 770048. Throughput: 0: 898.6. Samples: 193098. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-18 11:23:25,853][00268] Avg episode reward: [(0, '4.436')] +[2024-09-18 11:23:28,042][03583] Updated weights for policy 0, policy_version 190 (0.0016) +[2024-09-18 11:23:30,848][00268] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3329.1). Total num frames: 782336. Throughput: 0: 895.6. Samples: 195112. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-18 11:23:30,852][00268] Avg episode reward: [(0, '4.604')] +[2024-09-18 11:23:35,848][00268] Fps is (10 sec: 3687.0, 60 sec: 3549.9, 300 sec: 3362.1). Total num frames: 806912. Throughput: 0: 880.8. Samples: 200546. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2024-09-18 11:23:35,850][00268] Avg episode reward: [(0, '4.580')] +[2024-09-18 11:23:38,775][03583] Updated weights for policy 0, policy_version 200 (0.0018) +[2024-09-18 11:23:40,850][00268] Fps is (10 sec: 4094.9, 60 sec: 3549.7, 300 sec: 3360.4). Total num frames: 823296. Throughput: 0: 899.0. Samples: 206620. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-09-18 11:23:40,860][00268] Avg episode reward: [(0, '4.477')] +[2024-09-18 11:23:45,850][00268] Fps is (10 sec: 2866.4, 60 sec: 3481.5, 300 sec: 3342.3). Total num frames: 835584. Throughput: 0: 900.3. Samples: 208610. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-09-18 11:23:45,857][00268] Avg episode reward: [(0, '4.556')] +[2024-09-18 11:23:50,848][00268] Fps is (10 sec: 3277.7, 60 sec: 3481.6, 300 sec: 3357.1). Total num frames: 856064. Throughput: 0: 884.4. Samples: 213832. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-09-18 11:23:50,854][00268] Avg episode reward: [(0, '4.837')] +[2024-09-18 11:23:50,859][03570] Saving new best policy, reward=4.837! +[2024-09-18 11:23:51,116][03583] Updated weights for policy 0, policy_version 210 (0.0014) +[2024-09-18 11:23:55,848][00268] Fps is (10 sec: 4097.0, 60 sec: 3549.9, 300 sec: 3371.3). Total num frames: 876544. Throughput: 0: 897.6. Samples: 219872. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-09-18 11:23:55,854][00268] Avg episode reward: [(0, '5.091')] +[2024-09-18 11:23:55,865][03570] Saving new best policy, reward=5.091! +[2024-09-18 11:24:00,848][00268] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3369.5). Total num frames: 892928. Throughput: 0: 896.1. Samples: 221818. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-09-18 11:24:00,854][00268] Avg episode reward: [(0, '5.288')] +[2024-09-18 11:24:00,857][03570] Saving new best policy, reward=5.288! +[2024-09-18 11:24:03,292][03583] Updated weights for policy 0, policy_version 220 (0.0015) +[2024-09-18 11:24:05,848][00268] Fps is (10 sec: 3276.9, 60 sec: 3481.7, 300 sec: 3367.8). Total num frames: 909312. Throughput: 0: 874.4. Samples: 226714. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-18 11:24:05,856][00268] Avg episode reward: [(0, '5.011')] +[2024-09-18 11:24:10,848][00268] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3381.1). Total num frames: 929792. Throughput: 0: 883.6. Samples: 232860. Policy #0 lag: (min: 0.0, avg: 0.3, max: 2.0) +[2024-09-18 11:24:10,850][00268] Avg episode reward: [(0, '4.666')] +[2024-09-18 11:24:14,497][03583] Updated weights for policy 0, policy_version 230 (0.0013) +[2024-09-18 11:24:15,848][00268] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3364.6). Total num frames: 942080. Throughput: 0: 891.6. Samples: 235232. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-18 11:24:15,854][00268] Avg episode reward: [(0, '4.724')] +[2024-09-18 11:24:20,848][00268] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3377.4). Total num frames: 962560. Throughput: 0: 876.4. Samples: 239984. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-09-18 11:24:20,852][00268] Avg episode reward: [(0, '4.824')] +[2024-09-18 11:24:25,284][03583] Updated weights for policy 0, policy_version 240 (0.0012) +[2024-09-18 11:24:25,848][00268] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3389.8). Total num frames: 983040. Throughput: 0: 878.1. Samples: 246134. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-18 11:24:25,854][00268] Avg episode reward: [(0, '4.984')] +[2024-09-18 11:24:30,848][00268] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3387.9). Total num frames: 999424. Throughput: 0: 890.8. Samples: 248694. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-18 11:24:30,854][00268] Avg episode reward: [(0, '5.253')] +[2024-09-18 11:24:35,848][00268] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3443.4). Total num frames: 1015808. Throughput: 0: 875.2. Samples: 253218. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-09-18 11:24:35,855][00268] Avg episode reward: [(0, '5.111')] +[2024-09-18 11:24:37,394][03583] Updated weights for policy 0, policy_version 250 (0.0012) +[2024-09-18 11:24:40,848][00268] Fps is (10 sec: 3686.4, 60 sec: 3550.0, 300 sec: 3512.8). Total num frames: 1036288. Throughput: 0: 879.7. Samples: 259458. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-09-18 11:24:40,850][00268] Avg episode reward: [(0, '5.163')] +[2024-09-18 11:24:45,848][00268] Fps is (10 sec: 3686.4, 60 sec: 3618.3, 300 sec: 3554.5). Total num frames: 1052672. Throughput: 0: 902.5. Samples: 262432. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-18 11:24:45,852][00268] Avg episode reward: [(0, '5.169')] +[2024-09-18 11:24:49,232][03583] Updated weights for policy 0, policy_version 260 (0.0014) +[2024-09-18 11:24:50,849][00268] Fps is (10 sec: 3276.2, 60 sec: 3549.8, 300 sec: 3526.7). Total num frames: 1069056. Throughput: 0: 890.7. Samples: 266796. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-18 11:24:50,857][00268] Avg episode reward: [(0, '5.309')] +[2024-09-18 11:24:50,861][03570] Saving new best policy, reward=5.309! +[2024-09-18 11:24:55,848][00268] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3540.6). Total num frames: 1089536. Throughput: 0: 890.2. Samples: 272918. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-18 11:24:55,850][00268] Avg episode reward: [(0, '4.976')] +[2024-09-18 11:24:59,387][03583] Updated weights for policy 0, policy_version 270 (0.0016) +[2024-09-18 11:25:00,848][00268] Fps is (10 sec: 3687.1, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 1105920. Throughput: 0: 906.7. Samples: 276034. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-18 11:25:00,855][00268] Avg episode reward: [(0, '5.349')] +[2024-09-18 11:25:00,858][03570] Saving new best policy, reward=5.349! +[2024-09-18 11:25:05,848][00268] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3540.6). Total num frames: 1122304. Throughput: 0: 889.4. Samples: 280008. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-18 11:25:05,849][00268] Avg episode reward: [(0, '5.447')] +[2024-09-18 11:25:05,863][03570] Saving new best policy, reward=5.447! +[2024-09-18 11:25:10,848][00268] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3540.6). Total num frames: 1142784. Throughput: 0: 890.8. Samples: 286222. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-18 11:25:10,850][00268] Avg episode reward: [(0, '5.860')] +[2024-09-18 11:25:10,856][03570] Saving new best policy, reward=5.860! +[2024-09-18 11:25:11,208][03583] Updated weights for policy 0, policy_version 280 (0.0021) +[2024-09-18 11:25:15,848][00268] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3540.6). Total num frames: 1159168. Throughput: 0: 901.0. Samples: 289240. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-18 11:25:15,851][00268] Avg episode reward: [(0, '5.828')] +[2024-09-18 11:25:15,891][03570] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000284_1163264.pth... +[2024-09-18 11:25:16,046][03570] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000077_315392.pth +[2024-09-18 11:25:20,848][00268] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3526.8). Total num frames: 1175552. Throughput: 0: 891.6. Samples: 293340. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-18 11:25:20,851][00268] Avg episode reward: [(0, '5.563')] +[2024-09-18 11:25:23,239][03583] Updated weights for policy 0, policy_version 290 (0.0017) +[2024-09-18 11:25:25,848][00268] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3540.6). Total num frames: 1196032. Throughput: 0: 888.6. Samples: 299446. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-09-18 11:25:25,850][00268] Avg episode reward: [(0, '5.255')] +[2024-09-18 11:25:30,848][00268] Fps is (10 sec: 4095.9, 60 sec: 3618.1, 300 sec: 3554.5). Total num frames: 1216512. Throughput: 0: 891.7. Samples: 302560. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-18 11:25:30,856][00268] Avg episode reward: [(0, '5.591')] +[2024-09-18 11:25:34,894][03583] Updated weights for policy 0, policy_version 300 (0.0018) +[2024-09-18 11:25:35,850][00268] Fps is (10 sec: 3275.9, 60 sec: 3549.7, 300 sec: 3540.6). Total num frames: 1228800. Throughput: 0: 894.3. Samples: 307042. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-09-18 11:25:35,853][00268] Avg episode reward: [(0, '5.911')] +[2024-09-18 11:25:35,868][03570] Saving new best policy, reward=5.911! +[2024-09-18 11:25:40,848][00268] Fps is (10 sec: 3276.9, 60 sec: 3549.9, 300 sec: 3540.6). Total num frames: 1249280. Throughput: 0: 885.8. Samples: 312778. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-18 11:25:40,851][00268] Avg episode reward: [(0, '6.271')] +[2024-09-18 11:25:40,868][03570] Saving new best policy, reward=6.271! +[2024-09-18 11:25:45,264][03583] Updated weights for policy 0, policy_version 310 (0.0012) +[2024-09-18 11:25:45,848][00268] Fps is (10 sec: 4097.1, 60 sec: 3618.1, 300 sec: 3554.5). Total num frames: 1269760. Throughput: 0: 884.1. Samples: 315820. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-09-18 11:25:45,851][00268] Avg episode reward: [(0, '6.389')] +[2024-09-18 11:25:45,869][03570] Saving new best policy, reward=6.389! +[2024-09-18 11:25:50,848][00268] Fps is (10 sec: 3276.8, 60 sec: 3550.0, 300 sec: 3540.6). Total num frames: 1282048. Throughput: 0: 902.2. Samples: 320606. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-18 11:25:50,850][00268] Avg episode reward: [(0, '6.987')] +[2024-09-18 11:25:50,859][03570] Saving new best policy, reward=6.987! +[2024-09-18 11:25:55,848][00268] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3540.6). Total num frames: 1302528. Throughput: 0: 884.4. Samples: 326018. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-18 11:25:55,852][00268] Avg episode reward: [(0, '7.429')] +[2024-09-18 11:25:55,862][03570] Saving new best policy, reward=7.429! +[2024-09-18 11:25:57,393][03583] Updated weights for policy 0, policy_version 320 (0.0013) +[2024-09-18 11:26:00,848][00268] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3554.5). Total num frames: 1323008. Throughput: 0: 884.9. Samples: 329060. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-18 11:26:00,856][00268] Avg episode reward: [(0, '7.123')] +[2024-09-18 11:26:05,848][00268] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3540.6). Total num frames: 1335296. Throughput: 0: 907.6. Samples: 334182. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-18 11:26:05,852][00268] Avg episode reward: [(0, '7.216')] +[2024-09-18 11:26:09,269][03583] Updated weights for policy 0, policy_version 330 (0.0013) +[2024-09-18 11:26:10,847][00268] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3540.6). Total num frames: 1355776. Throughput: 0: 888.6. Samples: 339432. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-18 11:26:10,850][00268] Avg episode reward: [(0, '7.243')] +[2024-09-18 11:26:15,848][00268] Fps is (10 sec: 4095.9, 60 sec: 3618.1, 300 sec: 3554.5). Total num frames: 1376256. Throughput: 0: 888.2. Samples: 342528. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-18 11:26:15,851][00268] Avg episode reward: [(0, '8.040')] +[2024-09-18 11:26:15,863][03570] Saving new best policy, reward=8.040! +[2024-09-18 11:26:20,444][03583] Updated weights for policy 0, policy_version 340 (0.0013) +[2024-09-18 11:26:20,848][00268] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3554.5). Total num frames: 1392640. Throughput: 0: 905.2. Samples: 347774. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-09-18 11:26:20,850][00268] Avg episode reward: [(0, '8.616')] +[2024-09-18 11:26:20,854][03570] Saving new best policy, reward=8.616! +[2024-09-18 11:26:25,848][00268] Fps is (10 sec: 3276.9, 60 sec: 3549.9, 300 sec: 3540.6). Total num frames: 1409024. Throughput: 0: 886.6. Samples: 352674. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-18 11:26:25,855][00268] Avg episode reward: [(0, '9.240')] +[2024-09-18 11:26:25,876][03570] Saving new best policy, reward=9.240! +[2024-09-18 11:26:30,848][00268] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 1429504. Throughput: 0: 885.4. Samples: 355662. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-09-18 11:26:30,849][00268] Avg episode reward: [(0, '8.832')] +[2024-09-18 11:26:31,283][03583] Updated weights for policy 0, policy_version 350 (0.0016) +[2024-09-18 11:26:35,848][00268] Fps is (10 sec: 3686.4, 60 sec: 3618.3, 300 sec: 3554.5). Total num frames: 1445888. Throughput: 0: 900.5. Samples: 361128. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-18 11:26:35,850][00268] Avg episode reward: [(0, '7.798')] +[2024-09-18 11:26:40,848][00268] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3540.6). Total num frames: 1462272. Throughput: 0: 887.9. Samples: 365974. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-18 11:26:40,850][00268] Avg episode reward: [(0, '7.667')] +[2024-09-18 11:26:43,242][03583] Updated weights for policy 0, policy_version 360 (0.0025) +[2024-09-18 11:26:45,848][00268] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 1482752. Throughput: 0: 891.3. Samples: 369168. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-09-18 11:26:45,858][00268] Avg episode reward: [(0, '8.492')] +[2024-09-18 11:26:50,848][00268] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3554.5). Total num frames: 1499136. Throughput: 0: 905.6. Samples: 374932. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-09-18 11:26:50,853][00268] Avg episode reward: [(0, '8.853')] +[2024-09-18 11:26:55,177][03583] Updated weights for policy 0, policy_version 370 (0.0013) +[2024-09-18 11:26:55,848][00268] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 1515520. Throughput: 0: 889.4. Samples: 379454. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-09-18 11:26:55,853][00268] Avg episode reward: [(0, '9.710')] +[2024-09-18 11:26:55,862][03570] Saving new best policy, reward=9.710! +[2024-09-18 11:27:00,848][00268] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 1536000. Throughput: 0: 886.7. Samples: 382428. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-09-18 11:27:00,854][00268] Avg episode reward: [(0, '10.164')] +[2024-09-18 11:27:00,862][03570] Saving new best policy, reward=10.164! +[2024-09-18 11:27:05,848][00268] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3568.4). Total num frames: 1552384. Throughput: 0: 898.0. Samples: 388182. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-18 11:27:05,857][00268] Avg episode reward: [(0, '10.795')] +[2024-09-18 11:27:05,872][03570] Saving new best policy, reward=10.795! +[2024-09-18 11:27:06,323][03583] Updated weights for policy 0, policy_version 380 (0.0015) +[2024-09-18 11:27:10,848][00268] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 1568768. Throughput: 0: 881.2. Samples: 392330. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-18 11:27:10,854][00268] Avg episode reward: [(0, '10.806')] +[2024-09-18 11:27:10,857][03570] Saving new best policy, reward=10.806! +[2024-09-18 11:27:15,848][00268] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 1589248. Throughput: 0: 882.7. Samples: 395384. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-18 11:27:15,854][00268] Avg episode reward: [(0, '10.794')] +[2024-09-18 11:27:15,863][03570] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000388_1589248.pth... +[2024-09-18 11:27:15,976][03570] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000179_733184.pth +[2024-09-18 11:27:17,520][03583] Updated weights for policy 0, policy_version 390 (0.0013) +[2024-09-18 11:27:20,850][00268] Fps is (10 sec: 3685.6, 60 sec: 3549.7, 300 sec: 3554.5). Total num frames: 1605632. Throughput: 0: 897.5. Samples: 401518. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-09-18 11:27:20,853][00268] Avg episode reward: [(0, '10.299')] +[2024-09-18 11:27:25,847][00268] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 1622016. Throughput: 0: 879.3. Samples: 405544. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-09-18 11:27:25,854][00268] Avg episode reward: [(0, '9.965')] +[2024-09-18 11:27:29,529][03583] Updated weights for policy 0, policy_version 400 (0.0017) +[2024-09-18 11:27:30,847][00268] Fps is (10 sec: 3687.3, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 1642496. Throughput: 0: 877.2. Samples: 408642. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-09-18 11:27:30,850][00268] Avg episode reward: [(0, '9.577')] +[2024-09-18 11:27:35,848][00268] Fps is (10 sec: 3686.3, 60 sec: 3549.8, 300 sec: 3554.5). Total num frames: 1658880. Throughput: 0: 884.9. Samples: 414752. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-18 11:27:35,855][00268] Avg episode reward: [(0, '9.911')] +[2024-09-18 11:27:40,848][00268] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 1675264. Throughput: 0: 876.9. Samples: 418916. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-09-18 11:27:40,850][00268] Avg episode reward: [(0, '10.083')] +[2024-09-18 11:27:41,905][03583] Updated weights for policy 0, policy_version 410 (0.0017) +[2024-09-18 11:27:45,848][00268] Fps is (10 sec: 3686.5, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 1695744. Throughput: 0: 875.8. Samples: 421838. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-09-18 11:27:45,851][00268] Avg episode reward: [(0, '10.960')] +[2024-09-18 11:27:45,860][03570] Saving new best policy, reward=10.960! +[2024-09-18 11:27:50,848][00268] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3568.4). Total num frames: 1716224. Throughput: 0: 886.2. Samples: 428062. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-09-18 11:27:50,853][00268] Avg episode reward: [(0, '11.777')] +[2024-09-18 11:27:50,858][03570] Saving new best policy, reward=11.777! +[2024-09-18 11:27:52,255][03583] Updated weights for policy 0, policy_version 420 (0.0014) +[2024-09-18 11:27:55,848][00268] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 1728512. Throughput: 0: 891.2. Samples: 432434. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-18 11:27:55,857][00268] Avg episode reward: [(0, '11.729')] +[2024-09-18 11:28:00,848][00268] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3540.6). Total num frames: 1744896. Throughput: 0: 881.7. Samples: 435062. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-09-18 11:28:00,850][00268] Avg episode reward: [(0, '12.197')] +[2024-09-18 11:28:00,853][03570] Saving new best policy, reward=12.197! +[2024-09-18 11:28:04,037][03583] Updated weights for policy 0, policy_version 430 (0.0024) +[2024-09-18 11:28:05,847][00268] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 1765376. Throughput: 0: 878.9. Samples: 441066. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-18 11:28:05,849][00268] Avg episode reward: [(0, '12.195')] +[2024-09-18 11:28:10,848][00268] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 1781760. Throughput: 0: 897.3. Samples: 445922. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-18 11:28:10,853][00268] Avg episode reward: [(0, '12.103')] +[2024-09-18 11:28:15,849][00268] Fps is (10 sec: 3276.4, 60 sec: 3481.5, 300 sec: 3540.6). Total num frames: 1798144. Throughput: 0: 881.8. Samples: 448322. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-18 11:28:15,851][00268] Avg episode reward: [(0, '13.471')] +[2024-09-18 11:28:15,929][03570] Saving new best policy, reward=13.471! +[2024-09-18 11:28:15,930][03583] Updated weights for policy 0, policy_version 440 (0.0017) +[2024-09-18 11:28:20,848][00268] Fps is (10 sec: 3686.4, 60 sec: 3550.0, 300 sec: 3554.5). Total num frames: 1818624. Throughput: 0: 882.8. Samples: 454478. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-18 11:28:20,855][00268] Avg episode reward: [(0, '14.444')] +[2024-09-18 11:28:20,879][03570] Saving new best policy, reward=14.444! +[2024-09-18 11:28:25,848][00268] Fps is (10 sec: 3686.8, 60 sec: 3549.9, 300 sec: 3568.4). Total num frames: 1835008. Throughput: 0: 901.2. Samples: 459468. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-09-18 11:28:25,852][00268] Avg episode reward: [(0, '14.270')] +[2024-09-18 11:28:28,064][03583] Updated weights for policy 0, policy_version 450 (0.0017) +[2024-09-18 11:28:30,847][00268] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3540.6). Total num frames: 1851392. Throughput: 0: 882.2. Samples: 461536. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-18 11:28:30,853][00268] Avg episode reward: [(0, '14.892')] +[2024-09-18 11:28:30,856][03570] Saving new best policy, reward=14.892! +[2024-09-18 11:28:35,847][00268] Fps is (10 sec: 3686.5, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 1871872. Throughput: 0: 878.8. Samples: 467610. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-18 11:28:35,850][00268] Avg episode reward: [(0, '13.119')] +[2024-09-18 11:28:37,997][03583] Updated weights for policy 0, policy_version 460 (0.0018) +[2024-09-18 11:28:40,847][00268] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3568.4). Total num frames: 1888256. Throughput: 0: 899.7. Samples: 472920. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-18 11:28:40,853][00268] Avg episode reward: [(0, '12.122')] +[2024-09-18 11:28:45,847][00268] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3554.5). Total num frames: 1904640. Throughput: 0: 884.1. Samples: 474848. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-09-18 11:28:45,850][00268] Avg episode reward: [(0, '12.922')] +[2024-09-18 11:28:49,840][03583] Updated weights for policy 0, policy_version 470 (0.0015) +[2024-09-18 11:28:50,847][00268] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3554.5). Total num frames: 1925120. Throughput: 0: 889.7. Samples: 481104. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-18 11:28:50,851][00268] Avg episode reward: [(0, '12.592')] +[2024-09-18 11:28:55,848][00268] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3568.4). Total num frames: 1945600. Throughput: 0: 903.8. Samples: 486592. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-18 11:28:55,853][00268] Avg episode reward: [(0, '13.595')] +[2024-09-18 11:29:00,848][00268] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3568.4). Total num frames: 1961984. Throughput: 0: 895.8. Samples: 488632. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-18 11:29:00,854][00268] Avg episode reward: [(0, '14.362')] +[2024-09-18 11:29:01,721][03583] Updated weights for policy 0, policy_version 480 (0.0021) +[2024-09-18 11:29:05,847][00268] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3568.4). Total num frames: 1982464. Throughput: 0: 888.2. Samples: 494448. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-09-18 11:29:05,850][00268] Avg episode reward: [(0, '14.819')] +[2024-09-18 11:29:10,848][00268] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3582.3). Total num frames: 1998848. Throughput: 0: 904.7. Samples: 500180. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-18 11:29:10,852][00268] Avg episode reward: [(0, '15.540')] +[2024-09-18 11:29:10,857][03570] Saving new best policy, reward=15.540! +[2024-09-18 11:29:13,166][03583] Updated weights for policy 0, policy_version 490 (0.0017) +[2024-09-18 11:29:15,848][00268] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 2011136. Throughput: 0: 901.7. Samples: 502112. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-09-18 11:29:15,850][00268] Avg episode reward: [(0, '16.066')] +[2024-09-18 11:29:15,859][00268] Components not started: RolloutWorker_w0, RolloutWorker_w6, RolloutWorker_w7, wait_time=600.0 seconds +[2024-09-18 11:29:15,911][03570] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000492_2015232.pth... +[2024-09-18 11:29:16,010][03570] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000284_1163264.pth +[2024-09-18 11:29:16,024][03570] Saving new best policy, reward=16.066! +[2024-09-18 11:29:20,847][00268] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 2031616. Throughput: 0: 891.4. Samples: 507724. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-18 11:29:20,850][00268] Avg episode reward: [(0, '16.859')] +[2024-09-18 11:29:20,930][03570] Saving new best policy, reward=16.859! +[2024-09-18 11:29:23,907][03583] Updated weights for policy 0, policy_version 500 (0.0018) +[2024-09-18 11:29:25,848][00268] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3568.4). Total num frames: 2052096. Throughput: 0: 903.8. Samples: 513592. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-18 11:29:25,855][00268] Avg episode reward: [(0, '15.709')] +[2024-09-18 11:29:30,848][00268] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 2064384. Throughput: 0: 905.2. Samples: 515584. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-09-18 11:29:30,849][00268] Avg episode reward: [(0, '15.991')] +[2024-09-18 11:29:35,848][00268] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 2084864. Throughput: 0: 886.4. Samples: 520990. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-09-18 11:29:35,856][00268] Avg episode reward: [(0, '16.446')] +[2024-09-18 11:29:35,893][03583] Updated weights for policy 0, policy_version 510 (0.0015) +[2024-09-18 11:29:40,847][00268] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3568.4). Total num frames: 2105344. Throughput: 0: 901.0. Samples: 527138. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-09-18 11:29:40,855][00268] Avg episode reward: [(0, '16.243')] +[2024-09-18 11:29:45,848][00268] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 2117632. Throughput: 0: 899.7. Samples: 529118. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-18 11:29:45,850][00268] Avg episode reward: [(0, '16.109')] +[2024-09-18 11:29:47,893][03583] Updated weights for policy 0, policy_version 520 (0.0013) +[2024-09-18 11:29:50,848][00268] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 2138112. Throughput: 0: 887.4. Samples: 534380. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-09-18 11:29:50,850][00268] Avg episode reward: [(0, '16.686')] +[2024-09-18 11:29:55,849][00268] Fps is (10 sec: 4504.9, 60 sec: 3618.0, 300 sec: 3582.2). Total num frames: 2162688. Throughput: 0: 899.2. Samples: 540646. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-18 11:29:55,852][00268] Avg episode reward: [(0, '17.365')] +[2024-09-18 11:29:55,862][03570] Saving new best policy, reward=17.365! +[2024-09-18 11:29:58,641][03583] Updated weights for policy 0, policy_version 530 (0.0014) +[2024-09-18 11:30:00,850][00268] Fps is (10 sec: 3685.4, 60 sec: 3549.7, 300 sec: 3568.3). Total num frames: 2174976. Throughput: 0: 903.5. Samples: 542774. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-09-18 11:30:00,855][00268] Avg episode reward: [(0, '17.931')] +[2024-09-18 11:30:00,861][03570] Saving new best policy, reward=17.931! +[2024-09-18 11:30:05,848][00268] Fps is (10 sec: 2867.6, 60 sec: 3481.6, 300 sec: 3554.5). Total num frames: 2191360. Throughput: 0: 887.5. Samples: 547662. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-09-18 11:30:05,850][00268] Avg episode reward: [(0, '17.475')] +[2024-09-18 11:30:09,802][03583] Updated weights for policy 0, policy_version 540 (0.0013) +[2024-09-18 11:30:10,848][00268] Fps is (10 sec: 4097.1, 60 sec: 3618.1, 300 sec: 3582.3). Total num frames: 2215936. Throughput: 0: 896.0. Samples: 553912. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-18 11:30:10,849][00268] Avg episode reward: [(0, '17.394')] +[2024-09-18 11:30:15,848][00268] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3568.4). Total num frames: 2228224. Throughput: 0: 904.4. Samples: 556284. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-18 11:30:15,854][00268] Avg episode reward: [(0, '16.805')] +[2024-09-18 11:30:20,848][00268] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3568.4). Total num frames: 2248704. Throughput: 0: 891.7. Samples: 561116. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-09-18 11:30:20,851][00268] Avg episode reward: [(0, '15.713')] +[2024-09-18 11:30:21,740][03583] Updated weights for policy 0, policy_version 550 (0.0022) +[2024-09-18 11:30:25,848][00268] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3568.4). Total num frames: 2269184. Throughput: 0: 895.4. Samples: 567432. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-09-18 11:30:25,854][00268] Avg episode reward: [(0, '15.488')] +[2024-09-18 11:30:30,848][00268] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3568.4). Total num frames: 2281472. Throughput: 0: 909.5. Samples: 570046. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-09-18 11:30:30,854][00268] Avg episode reward: [(0, '16.843')] +[2024-09-18 11:30:33,611][03583] Updated weights for policy 0, policy_version 560 (0.0012) +[2024-09-18 11:30:35,847][00268] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3568.4). Total num frames: 2301952. Throughput: 0: 892.7. Samples: 574550. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-18 11:30:35,855][00268] Avg episode reward: [(0, '17.624')] +[2024-09-18 11:30:40,848][00268] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3568.4). Total num frames: 2322432. Throughput: 0: 892.0. Samples: 580784. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-18 11:30:40,851][00268] Avg episode reward: [(0, '17.272')] +[2024-09-18 11:30:43,867][03583] Updated weights for policy 0, policy_version 570 (0.0021) +[2024-09-18 11:30:45,848][00268] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3582.3). Total num frames: 2338816. Throughput: 0: 910.4. Samples: 583740. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-18 11:30:45,853][00268] Avg episode reward: [(0, '17.620')] +[2024-09-18 11:30:50,848][00268] Fps is (10 sec: 3276.7, 60 sec: 3618.1, 300 sec: 3568.4). Total num frames: 2355200. Throughput: 0: 898.4. Samples: 588090. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-18 11:30:50,850][00268] Avg episode reward: [(0, '18.075')] +[2024-09-18 11:30:50,854][03570] Saving new best policy, reward=18.075! +[2024-09-18 11:30:55,663][03583] Updated weights for policy 0, policy_version 580 (0.0024) +[2024-09-18 11:30:55,848][00268] Fps is (10 sec: 3686.4, 60 sec: 3550.0, 300 sec: 3568.4). Total num frames: 2375680. Throughput: 0: 892.8. Samples: 594090. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-09-18 11:30:55,852][00268] Avg episode reward: [(0, '17.052')] +[2024-09-18 11:31:00,848][00268] Fps is (10 sec: 3686.5, 60 sec: 3618.3, 300 sec: 3582.3). Total num frames: 2392064. Throughput: 0: 907.0. Samples: 597100. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-09-18 11:31:00,851][00268] Avg episode reward: [(0, '16.923')] +[2024-09-18 11:31:05,848][00268] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3568.4). Total num frames: 2408448. Throughput: 0: 888.0. Samples: 601076. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-18 11:31:05,850][00268] Avg episode reward: [(0, '16.540')] +[2024-09-18 11:31:07,736][03583] Updated weights for policy 0, policy_version 590 (0.0013) +[2024-09-18 11:31:10,849][00268] Fps is (10 sec: 3276.3, 60 sec: 3481.5, 300 sec: 3554.5). Total num frames: 2424832. Throughput: 0: 884.0. Samples: 607214. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-09-18 11:31:10,851][00268] Avg episode reward: [(0, '17.007')] +[2024-09-18 11:31:15,849][00268] Fps is (10 sec: 3685.7, 60 sec: 3618.0, 300 sec: 3568.4). Total num frames: 2445312. Throughput: 0: 896.5. Samples: 610388. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-18 11:31:15,852][00268] Avg episode reward: [(0, '17.184')] +[2024-09-18 11:31:15,865][03570] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000597_2445312.pth... +[2024-09-18 11:31:15,995][03570] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000388_1589248.pth +[2024-09-18 11:31:19,595][03583] Updated weights for policy 0, policy_version 600 (0.0013) +[2024-09-18 11:31:20,848][00268] Fps is (10 sec: 3686.9, 60 sec: 3549.9, 300 sec: 3568.4). Total num frames: 2461696. Throughput: 0: 891.3. Samples: 614658. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-18 11:31:20,854][00268] Avg episode reward: [(0, '17.517')] +[2024-09-18 11:31:25,848][00268] Fps is (10 sec: 3687.0, 60 sec: 3549.9, 300 sec: 3568.4). Total num frames: 2482176. Throughput: 0: 887.4. Samples: 620718. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-18 11:31:25,856][00268] Avg episode reward: [(0, '17.887')] +[2024-09-18 11:31:29,589][03583] Updated weights for policy 0, policy_version 610 (0.0018) +[2024-09-18 11:31:30,850][00268] Fps is (10 sec: 4095.1, 60 sec: 3686.3, 300 sec: 3582.2). Total num frames: 2502656. Throughput: 0: 891.9. Samples: 623876. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-18 11:31:30,854][00268] Avg episode reward: [(0, '18.923')] +[2024-09-18 11:31:30,856][03570] Saving new best policy, reward=18.923! +[2024-09-18 11:31:35,848][00268] Fps is (10 sec: 3276.9, 60 sec: 3549.9, 300 sec: 3568.4). Total num frames: 2514944. Throughput: 0: 894.1. Samples: 628324. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-18 11:31:35,854][00268] Avg episode reward: [(0, '19.366')] +[2024-09-18 11:31:35,866][03570] Saving new best policy, reward=19.366! +[2024-09-18 11:31:40,847][00268] Fps is (10 sec: 3277.5, 60 sec: 3549.9, 300 sec: 3568.4). Total num frames: 2535424. Throughput: 0: 886.7. Samples: 633990. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-18 11:31:40,850][00268] Avg episode reward: [(0, '17.481')] +[2024-09-18 11:31:41,493][03583] Updated weights for policy 0, policy_version 620 (0.0012) +[2024-09-18 11:31:45,848][00268] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3582.3). Total num frames: 2555904. Throughput: 0: 890.4. Samples: 637170. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-09-18 11:31:45,853][00268] Avg episode reward: [(0, '18.259')] +[2024-09-18 11:31:50,851][00268] Fps is (10 sec: 3275.6, 60 sec: 3549.7, 300 sec: 3568.3). Total num frames: 2568192. Throughput: 0: 908.9. Samples: 641978. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-18 11:31:50,856][00268] Avg episode reward: [(0, '17.725')] +[2024-09-18 11:31:53,648][03583] Updated weights for policy 0, policy_version 630 (0.0015) +[2024-09-18 11:31:55,848][00268] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3568.4). Total num frames: 2588672. Throughput: 0: 892.3. Samples: 647368. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-09-18 11:31:55,850][00268] Avg episode reward: [(0, '17.782')] +[2024-09-18 11:32:00,848][00268] Fps is (10 sec: 4097.5, 60 sec: 3618.1, 300 sec: 3582.3). Total num frames: 2609152. Throughput: 0: 890.1. Samples: 650442. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-09-18 11:32:00,850][00268] Avg episode reward: [(0, '17.894')] +[2024-09-18 11:32:04,653][03583] Updated weights for policy 0, policy_version 640 (0.0016) +[2024-09-18 11:32:05,848][00268] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3568.4). Total num frames: 2621440. Throughput: 0: 907.6. Samples: 655500. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-18 11:32:05,853][00268] Avg episode reward: [(0, '19.030')] +[2024-09-18 11:32:10,848][00268] Fps is (10 sec: 3276.8, 60 sec: 3618.2, 300 sec: 3568.4). Total num frames: 2641920. Throughput: 0: 885.4. Samples: 660562. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-18 11:32:10,850][00268] Avg episode reward: [(0, '19.598')] +[2024-09-18 11:32:10,852][03570] Saving new best policy, reward=19.598! +[2024-09-18 11:32:15,693][03583] Updated weights for policy 0, policy_version 650 (0.0019) +[2024-09-18 11:32:15,848][00268] Fps is (10 sec: 4096.0, 60 sec: 3618.2, 300 sec: 3582.3). Total num frames: 2662400. Throughput: 0: 881.7. Samples: 663552. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-18 11:32:15,850][00268] Avg episode reward: [(0, '20.710')] +[2024-09-18 11:32:15,857][03570] Saving new best policy, reward=20.710! +[2024-09-18 11:32:20,850][00268] Fps is (10 sec: 3275.9, 60 sec: 3549.7, 300 sec: 3568.3). Total num frames: 2674688. Throughput: 0: 900.9. Samples: 668866. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-18 11:32:20,852][00268] Avg episode reward: [(0, '20.401')] +[2024-09-18 11:32:25,848][00268] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3568.4). Total num frames: 2695168. Throughput: 0: 883.5. Samples: 673748. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-18 11:32:25,853][00268] Avg episode reward: [(0, '19.838')] +[2024-09-18 11:32:27,773][03583] Updated weights for policy 0, policy_version 660 (0.0018) +[2024-09-18 11:32:30,848][00268] Fps is (10 sec: 4097.1, 60 sec: 3550.0, 300 sec: 3582.3). Total num frames: 2715648. Throughput: 0: 880.7. Samples: 676800. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-18 11:32:30,854][00268] Avg episode reward: [(0, '19.531')] +[2024-09-18 11:32:35,848][00268] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3582.3). Total num frames: 2732032. Throughput: 0: 898.8. Samples: 682422. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-18 11:32:35,850][00268] Avg episode reward: [(0, '19.685')] +[2024-09-18 11:32:39,840][03583] Updated weights for policy 0, policy_version 670 (0.0015) +[2024-09-18 11:32:40,848][00268] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3554.5). Total num frames: 2744320. Throughput: 0: 881.2. Samples: 687022. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-09-18 11:32:40,852][00268] Avg episode reward: [(0, '19.965')] +[2024-09-18 11:32:45,848][00268] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3568.4). Total num frames: 2768896. Throughput: 0: 881.8. Samples: 690122. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-18 11:32:45,850][00268] Avg episode reward: [(0, '21.358')] +[2024-09-18 11:32:45,861][03570] Saving new best policy, reward=21.358! +[2024-09-18 11:32:50,622][03583] Updated weights for policy 0, policy_version 680 (0.0017) +[2024-09-18 11:32:50,848][00268] Fps is (10 sec: 4096.0, 60 sec: 3618.4, 300 sec: 3582.3). Total num frames: 2785280. Throughput: 0: 899.6. Samples: 695984. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-09-18 11:32:50,852][00268] Avg episode reward: [(0, '21.131')] +[2024-09-18 11:32:55,848][00268] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3568.4). Total num frames: 2797568. Throughput: 0: 882.0. Samples: 700250. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2024-09-18 11:32:55,851][00268] Avg episode reward: [(0, '21.886')] +[2024-09-18 11:32:55,914][03570] Saving new best policy, reward=21.886! +[2024-09-18 11:33:00,848][00268] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3568.4). Total num frames: 2818048. Throughput: 0: 882.1. Samples: 703246. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-09-18 11:33:00,849][00268] Avg episode reward: [(0, '22.067')] +[2024-09-18 11:33:00,933][03570] Saving new best policy, reward=22.067! +[2024-09-18 11:33:02,074][03583] Updated weights for policy 0, policy_version 690 (0.0014) +[2024-09-18 11:33:05,849][00268] Fps is (10 sec: 4095.5, 60 sec: 3618.1, 300 sec: 3582.2). Total num frames: 2838528. Throughput: 0: 900.7. Samples: 709398. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-09-18 11:33:05,856][00268] Avg episode reward: [(0, '21.232')] +[2024-09-18 11:33:10,848][00268] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3568.4). Total num frames: 2850816. Throughput: 0: 880.1. Samples: 713352. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-09-18 11:33:10,850][00268] Avg episode reward: [(0, '20.798')] +[2024-09-18 11:33:14,017][03583] Updated weights for policy 0, policy_version 700 (0.0019) +[2024-09-18 11:33:15,848][00268] Fps is (10 sec: 3277.2, 60 sec: 3481.6, 300 sec: 3568.4). Total num frames: 2871296. Throughput: 0: 881.5. Samples: 716468. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-09-18 11:33:15,850][00268] Avg episode reward: [(0, '21.792')] +[2024-09-18 11:33:15,858][03570] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000701_2871296.pth... +[2024-09-18 11:33:15,957][03570] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000492_2015232.pth +[2024-09-18 11:33:20,848][00268] Fps is (10 sec: 4095.9, 60 sec: 3618.3, 300 sec: 3582.3). Total num frames: 2891776. Throughput: 0: 892.4. Samples: 722582. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-18 11:33:20,856][00268] Avg episode reward: [(0, '21.790')] +[2024-09-18 11:33:25,848][00268] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3568.4). Total num frames: 2904064. Throughput: 0: 883.4. Samples: 726774. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-18 11:33:25,854][00268] Avg episode reward: [(0, '21.234')] +[2024-09-18 11:33:25,936][03583] Updated weights for policy 0, policy_version 710 (0.0012) +[2024-09-18 11:33:30,848][00268] Fps is (10 sec: 3276.9, 60 sec: 3481.6, 300 sec: 3568.4). Total num frames: 2924544. Throughput: 0: 881.8. Samples: 729802. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-18 11:33:30,854][00268] Avg episode reward: [(0, '22.302')] +[2024-09-18 11:33:30,857][03570] Saving new best policy, reward=22.302! +[2024-09-18 11:33:35,848][00268] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3582.3). Total num frames: 2945024. Throughput: 0: 888.0. Samples: 735944. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-09-18 11:33:35,857][00268] Avg episode reward: [(0, '22.404')] +[2024-09-18 11:33:35,869][03570] Saving new best policy, reward=22.404! +[2024-09-18 11:33:36,405][03583] Updated weights for policy 0, policy_version 720 (0.0015) +[2024-09-18 11:33:40,848][00268] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3568.4). Total num frames: 2957312. Throughput: 0: 887.0. Samples: 740164. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-18 11:33:40,855][00268] Avg episode reward: [(0, '23.180')] +[2024-09-18 11:33:40,857][03570] Saving new best policy, reward=23.180! +[2024-09-18 11:33:45,848][00268] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3568.4). Total num frames: 2977792. Throughput: 0: 880.3. Samples: 742858. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-18 11:33:45,849][00268] Avg episode reward: [(0, '21.975')] +[2024-09-18 11:33:48,257][03583] Updated weights for policy 0, policy_version 730 (0.0016) +[2024-09-18 11:33:50,848][00268] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3568.4). Total num frames: 2998272. Throughput: 0: 881.0. Samples: 749044. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-18 11:33:50,850][00268] Avg episode reward: [(0, '22.104')] +[2024-09-18 11:33:55,848][00268] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3568.4). Total num frames: 3014656. Throughput: 0: 897.0. Samples: 753716. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-09-18 11:33:55,849][00268] Avg episode reward: [(0, '22.080')] +[2024-09-18 11:34:00,226][03583] Updated weights for policy 0, policy_version 740 (0.0018) +[2024-09-18 11:34:00,848][00268] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 3031040. Throughput: 0: 883.1. Samples: 756208. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-18 11:34:00,849][00268] Avg episode reward: [(0, '22.730')] +[2024-09-18 11:34:05,848][00268] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3568.4). Total num frames: 3051520. Throughput: 0: 884.7. Samples: 762392. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-18 11:34:05,850][00268] Avg episode reward: [(0, '21.880')] +[2024-09-18 11:34:10,849][00268] Fps is (10 sec: 3685.7, 60 sec: 3618.0, 300 sec: 3582.2). Total num frames: 3067904. Throughput: 0: 899.4. Samples: 767250. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-18 11:34:10,853][00268] Avg episode reward: [(0, '21.076')] +[2024-09-18 11:34:11,956][03583] Updated weights for policy 0, policy_version 750 (0.0012) +[2024-09-18 11:34:15,848][00268] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3568.4). Total num frames: 3084288. Throughput: 0: 881.8. Samples: 769484. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-09-18 11:34:15,857][00268] Avg episode reward: [(0, '20.580')] +[2024-09-18 11:34:20,848][00268] Fps is (10 sec: 3687.1, 60 sec: 3549.9, 300 sec: 3568.4). Total num frames: 3104768. Throughput: 0: 880.6. Samples: 775572. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-09-18 11:34:20,850][00268] Avg episode reward: [(0, '20.713')] +[2024-09-18 11:34:22,357][03583] Updated weights for policy 0, policy_version 760 (0.0013) +[2024-09-18 11:34:25,848][00268] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3582.3). Total num frames: 3121152. Throughput: 0: 902.5. Samples: 780776. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-18 11:34:25,850][00268] Avg episode reward: [(0, '21.570')] +[2024-09-18 11:34:30,848][00268] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3568.4). Total num frames: 3137536. Throughput: 0: 886.6. Samples: 782754. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-09-18 11:34:30,850][00268] Avg episode reward: [(0, '21.134')] +[2024-09-18 11:34:34,200][03583] Updated weights for policy 0, policy_version 770 (0.0014) +[2024-09-18 11:34:35,848][00268] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3568.4). Total num frames: 3158016. Throughput: 0: 886.8. Samples: 788948. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-09-18 11:34:35,850][00268] Avg episode reward: [(0, '21.014')] +[2024-09-18 11:34:40,848][00268] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3582.3). Total num frames: 3174400. Throughput: 0: 898.6. Samples: 794154. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-09-18 11:34:40,850][00268] Avg episode reward: [(0, '20.660')] +[2024-09-18 11:34:45,848][00268] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3568.4). Total num frames: 3190784. Throughput: 0: 886.4. Samples: 796098. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-18 11:34:45,850][00268] Avg episode reward: [(0, '19.505')] +[2024-09-18 11:34:46,463][03583] Updated weights for policy 0, policy_version 780 (0.0015) +[2024-09-18 11:34:50,847][00268] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 3211264. Throughput: 0: 882.7. Samples: 802112. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-09-18 11:34:50,850][00268] Avg episode reward: [(0, '18.187')] +[2024-09-18 11:34:55,848][00268] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3568.4). Total num frames: 3227648. Throughput: 0: 899.2. Samples: 807710. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-18 11:34:55,850][00268] Avg episode reward: [(0, '18.231')] +[2024-09-18 11:34:57,915][03583] Updated weights for policy 0, policy_version 790 (0.0013) +[2024-09-18 11:35:00,848][00268] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3568.4). Total num frames: 3244032. Throughput: 0: 893.3. Samples: 809682. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-09-18 11:35:00,852][00268] Avg episode reward: [(0, '20.332')] +[2024-09-18 11:35:05,848][00268] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 3264512. Throughput: 0: 884.2. Samples: 815360. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-18 11:35:05,854][00268] Avg episode reward: [(0, '21.054')] +[2024-09-18 11:35:08,426][03583] Updated weights for policy 0, policy_version 800 (0.0015) +[2024-09-18 11:35:10,850][00268] Fps is (10 sec: 3685.4, 60 sec: 3549.8, 300 sec: 3568.3). Total num frames: 3280896. Throughput: 0: 898.9. Samples: 821228. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-09-18 11:35:10,853][00268] Avg episode reward: [(0, '20.805')] +[2024-09-18 11:35:15,848][00268] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 3297280. Throughput: 0: 898.4. Samples: 823182. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2024-09-18 11:35:15,853][00268] Avg episode reward: [(0, '21.634')] +[2024-09-18 11:35:15,868][03570] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000805_3297280.pth... +[2024-09-18 11:35:15,980][03570] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000597_2445312.pth +[2024-09-18 11:35:20,400][03583] Updated weights for policy 0, policy_version 810 (0.0013) +[2024-09-18 11:35:20,848][00268] Fps is (10 sec: 3687.4, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 3317760. Throughput: 0: 881.2. Samples: 828604. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-18 11:35:20,855][00268] Avg episode reward: [(0, '22.731')] +[2024-09-18 11:35:25,848][00268] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3582.3). Total num frames: 3338240. Throughput: 0: 901.0. Samples: 834700. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-18 11:35:25,850][00268] Avg episode reward: [(0, '22.013')] +[2024-09-18 11:35:30,848][00268] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 3350528. Throughput: 0: 901.7. Samples: 836676. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-18 11:35:30,850][00268] Avg episode reward: [(0, '21.434')] +[2024-09-18 11:35:32,478][03583] Updated weights for policy 0, policy_version 820 (0.0012) +[2024-09-18 11:35:35,848][00268] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 3371008. Throughput: 0: 886.2. Samples: 841992. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-18 11:35:35,850][00268] Avg episode reward: [(0, '20.932')] +[2024-09-18 11:35:40,848][00268] Fps is (10 sec: 4095.6, 60 sec: 3618.1, 300 sec: 3568.4). Total num frames: 3391488. Throughput: 0: 896.7. Samples: 848062. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-18 11:35:40,850][00268] Avg episode reward: [(0, '20.247')] +[2024-09-18 11:35:43,472][03583] Updated weights for policy 0, policy_version 830 (0.0017) +[2024-09-18 11:35:45,849][00268] Fps is (10 sec: 3276.3, 60 sec: 3549.8, 300 sec: 3554.5). Total num frames: 3403776. Throughput: 0: 900.9. Samples: 850224. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2024-09-18 11:35:45,856][00268] Avg episode reward: [(0, '19.602')] +[2024-09-18 11:35:50,848][00268] Fps is (10 sec: 3277.1, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 3424256. Throughput: 0: 886.4. Samples: 855246. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-09-18 11:35:50,853][00268] Avg episode reward: [(0, '19.900')] +[2024-09-18 11:35:54,437][03583] Updated weights for policy 0, policy_version 840 (0.0021) +[2024-09-18 11:35:55,848][00268] Fps is (10 sec: 4096.6, 60 sec: 3618.1, 300 sec: 3568.4). Total num frames: 3444736. Throughput: 0: 895.1. Samples: 861504. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-18 11:35:55,852][00268] Avg episode reward: [(0, '20.187')] +[2024-09-18 11:36:00,848][00268] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 3457024. Throughput: 0: 900.6. Samples: 863710. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-09-18 11:36:00,855][00268] Avg episode reward: [(0, '19.620')] +[2024-09-18 11:36:05,848][00268] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3568.4). Total num frames: 3477504. Throughput: 0: 888.2. Samples: 868572. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-18 11:36:05,855][00268] Avg episode reward: [(0, '20.147')] +[2024-09-18 11:36:06,490][03583] Updated weights for policy 0, policy_version 850 (0.0019) +[2024-09-18 11:36:10,848][00268] Fps is (10 sec: 4096.0, 60 sec: 3618.3, 300 sec: 3568.4). Total num frames: 3497984. Throughput: 0: 886.3. Samples: 874584. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-18 11:36:10,850][00268] Avg episode reward: [(0, '20.240')] +[2024-09-18 11:36:15,851][00268] Fps is (10 sec: 3275.5, 60 sec: 3549.6, 300 sec: 3554.4). Total num frames: 3510272. Throughput: 0: 897.3. Samples: 877058. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-18 11:36:15,854][00268] Avg episode reward: [(0, '21.439')] +[2024-09-18 11:36:18,803][03583] Updated weights for policy 0, policy_version 860 (0.0014) +[2024-09-18 11:36:20,848][00268] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3540.6). Total num frames: 3526656. Throughput: 0: 879.5. Samples: 881570. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-18 11:36:20,850][00268] Avg episode reward: [(0, '20.837')] +[2024-09-18 11:36:25,848][00268] Fps is (10 sec: 4097.6, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 3551232. Throughput: 0: 880.8. Samples: 887696. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-18 11:36:25,855][00268] Avg episode reward: [(0, '21.172')] +[2024-09-18 11:36:29,501][03583] Updated weights for policy 0, policy_version 870 (0.0014) +[2024-09-18 11:36:30,849][00268] Fps is (10 sec: 3685.8, 60 sec: 3549.8, 300 sec: 3554.5). Total num frames: 3563520. Throughput: 0: 893.9. Samples: 890448. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-09-18 11:36:30,851][00268] Avg episode reward: [(0, '23.472')] +[2024-09-18 11:36:30,857][03570] Saving new best policy, reward=23.472! +[2024-09-18 11:36:35,848][00268] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3540.6). Total num frames: 3579904. Throughput: 0: 877.4. Samples: 894728. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-18 11:36:35,850][00268] Avg episode reward: [(0, '23.187')] +[2024-09-18 11:36:40,848][00268] Fps is (10 sec: 3687.0, 60 sec: 3481.7, 300 sec: 3540.6). Total num frames: 3600384. Throughput: 0: 871.7. Samples: 900732. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-18 11:36:40,849][00268] Avg episode reward: [(0, '23.378')] +[2024-09-18 11:36:41,157][03583] Updated weights for policy 0, policy_version 880 (0.0012) +[2024-09-18 11:36:45,848][00268] Fps is (10 sec: 3686.3, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 3616768. Throughput: 0: 888.1. Samples: 903676. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-09-18 11:36:45,856][00268] Avg episode reward: [(0, '23.717')] +[2024-09-18 11:36:45,869][03570] Saving new best policy, reward=23.717! +[2024-09-18 11:36:50,847][00268] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3540.6). Total num frames: 3633152. Throughput: 0: 869.3. Samples: 907690. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-18 11:36:50,850][00268] Avg episode reward: [(0, '24.130')] +[2024-09-18 11:36:50,853][03570] Saving new best policy, reward=24.130! +[2024-09-18 11:36:53,308][03583] Updated weights for policy 0, policy_version 890 (0.0018) +[2024-09-18 11:36:55,848][00268] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3540.6). Total num frames: 3653632. Throughput: 0: 871.2. Samples: 913790. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-18 11:36:55,853][00268] Avg episode reward: [(0, '26.798')] +[2024-09-18 11:36:55,873][03570] Saving new best policy, reward=26.798! +[2024-09-18 11:37:00,854][00268] Fps is (10 sec: 3683.9, 60 sec: 3549.5, 300 sec: 3554.4). Total num frames: 3670016. Throughput: 0: 882.7. Samples: 916780. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-18 11:37:00,857][00268] Avg episode reward: [(0, '25.346')] +[2024-09-18 11:37:05,438][03583] Updated weights for policy 0, policy_version 900 (0.0014) +[2024-09-18 11:37:05,848][00268] Fps is (10 sec: 3276.9, 60 sec: 3481.6, 300 sec: 3540.6). Total num frames: 3686400. Throughput: 0: 874.0. Samples: 920898. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-18 11:37:05,850][00268] Avg episode reward: [(0, '25.623')] +[2024-09-18 11:37:10,848][00268] Fps is (10 sec: 3688.9, 60 sec: 3481.6, 300 sec: 3540.6). Total num frames: 3706880. Throughput: 0: 874.0. Samples: 927024. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-18 11:37:10,855][00268] Avg episode reward: [(0, '26.154')] +[2024-09-18 11:37:15,805][03583] Updated weights for policy 0, policy_version 910 (0.0012) +[2024-09-18 11:37:15,848][00268] Fps is (10 sec: 4095.9, 60 sec: 3618.4, 300 sec: 3568.4). Total num frames: 3727360. Throughput: 0: 881.4. Samples: 930108. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-18 11:37:15,850][00268] Avg episode reward: [(0, '24.689')] +[2024-09-18 11:37:15,862][03570] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000910_3727360.pth... +[2024-09-18 11:37:16,006][03570] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000701_2871296.pth +[2024-09-18 11:37:20,848][00268] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3540.6). Total num frames: 3739648. Throughput: 0: 880.9. Samples: 934370. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-09-18 11:37:20,850][00268] Avg episode reward: [(0, '24.609')] +[2024-09-18 11:37:25,848][00268] Fps is (10 sec: 3276.9, 60 sec: 3481.6, 300 sec: 3540.6). Total num frames: 3760128. Throughput: 0: 878.1. Samples: 940248. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-18 11:37:25,849][00268] Avg episode reward: [(0, '22.402')] +[2024-09-18 11:37:27,559][03583] Updated weights for policy 0, policy_version 920 (0.0013) +[2024-09-18 11:37:30,848][00268] Fps is (10 sec: 4096.0, 60 sec: 3618.2, 300 sec: 3554.5). Total num frames: 3780608. Throughput: 0: 880.2. Samples: 943284. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-09-18 11:37:30,853][00268] Avg episode reward: [(0, '20.913')] +[2024-09-18 11:37:35,848][00268] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 3792896. Throughput: 0: 894.9. Samples: 947960. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-09-18 11:37:35,854][00268] Avg episode reward: [(0, '19.927')] +[2024-09-18 11:37:39,713][03583] Updated weights for policy 0, policy_version 930 (0.0016) +[2024-09-18 11:37:40,848][00268] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3540.6). Total num frames: 3813376. Throughput: 0: 881.6. Samples: 953460. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-18 11:37:40,850][00268] Avg episode reward: [(0, '21.082')] +[2024-09-18 11:37:45,848][00268] Fps is (10 sec: 4096.0, 60 sec: 3618.2, 300 sec: 3554.5). Total num frames: 3833856. Throughput: 0: 883.7. Samples: 956540. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-09-18 11:37:45,850][00268] Avg episode reward: [(0, '22.413')] +[2024-09-18 11:37:50,848][00268] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 3846144. Throughput: 0: 902.4. Samples: 961504. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-18 11:37:50,850][00268] Avg episode reward: [(0, '23.016')] +[2024-09-18 11:37:51,514][03583] Updated weights for policy 0, policy_version 940 (0.0013) +[2024-09-18 11:37:55,848][00268] Fps is (10 sec: 3276.7, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 3866624. Throughput: 0: 885.1. Samples: 966854. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-09-18 11:37:55,851][00268] Avg episode reward: [(0, '23.986')] +[2024-09-18 11:38:00,848][00268] Fps is (10 sec: 4096.0, 60 sec: 3618.5, 300 sec: 3554.5). Total num frames: 3887104. Throughput: 0: 884.0. Samples: 969890. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-09-18 11:38:00,849][00268] Avg episode reward: [(0, '24.067')] +[2024-09-18 11:38:01,531][03583] Updated weights for policy 0, policy_version 950 (0.0015) +[2024-09-18 11:38:05,848][00268] Fps is (10 sec: 3276.8, 60 sec: 3549.8, 300 sec: 3554.5). Total num frames: 3899392. Throughput: 0: 903.1. Samples: 975012. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-09-18 11:38:05,852][00268] Avg episode reward: [(0, '25.298')] +[2024-09-18 11:38:10,848][00268] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 3919872. Throughput: 0: 883.1. Samples: 979988. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-18 11:38:10,850][00268] Avg episode reward: [(0, '25.019')] +[2024-09-18 11:38:13,865][03583] Updated weights for policy 0, policy_version 960 (0.0023) +[2024-09-18 11:38:15,848][00268] Fps is (10 sec: 4096.1, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 3940352. Throughput: 0: 882.7. Samples: 983004. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-09-18 11:38:15,859][00268] Avg episode reward: [(0, '24.346')] +[2024-09-18 11:38:20,848][00268] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 3952640. Throughput: 0: 897.3. Samples: 988338. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-09-18 11:38:20,852][00268] Avg episode reward: [(0, '23.663')] +[2024-09-18 11:38:25,848][00268] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3540.6). Total num frames: 3969024. Throughput: 0: 880.9. Samples: 993100. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-09-18 11:38:25,855][00268] Avg episode reward: [(0, '23.525')] +[2024-09-18 11:38:25,922][03583] Updated weights for policy 0, policy_version 970 (0.0018) +[2024-09-18 11:38:30,848][00268] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3540.6). Total num frames: 3989504. Throughput: 0: 880.3. Samples: 996154. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-09-18 11:38:30,849][00268] Avg episode reward: [(0, '22.843')] +[2024-09-18 11:38:34,754][03570] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... +[2024-09-18 11:38:34,773][00268] Component Batcher_0 stopped! +[2024-09-18 11:38:34,781][00268] Component RolloutWorker_w0 process died already! Don't wait for it. +[2024-09-18 11:38:34,783][00268] Component RolloutWorker_w6 process died already! Don't wait for it. +[2024-09-18 11:38:34,784][00268] Component RolloutWorker_w7 process died already! Don't wait for it. +[2024-09-18 11:38:34,764][03570] Stopping Batcher_0... +[2024-09-18 11:38:34,819][03570] Loop batcher_evt_loop terminating... +[2024-09-18 11:38:34,880][03570] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000805_3297280.pth +[2024-09-18 11:38:34,882][03583] Weights refcount: 2 0 +[2024-09-18 11:38:34,888][03583] Stopping InferenceWorker_p0-w0... +[2024-09-18 11:38:34,891][03583] Loop inference_proc0-0_evt_loop terminating... +[2024-09-18 11:38:34,892][00268] Component InferenceWorker_p0-w0 stopped! +[2024-09-18 11:38:34,898][03570] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... +[2024-09-18 11:38:35,170][03570] Stopping LearnerWorker_p0... +[2024-09-18 11:38:35,172][03570] Loop learner_proc0_evt_loop terminating... +[2024-09-18 11:38:35,170][00268] Component LearnerWorker_p0 stopped! +[2024-09-18 11:38:35,431][00268] Component RolloutWorker_w4 stopped! +[2024-09-18 11:38:35,431][03590] Stopping RolloutWorker_w4... +[2024-09-18 11:38:35,444][03590] Loop rollout_proc4_evt_loop terminating... +[2024-09-18 11:38:35,493][00268] Component RolloutWorker_w2 stopped! +[2024-09-18 11:38:35,493][03586] Stopping RolloutWorker_w2... +[2024-09-18 11:38:35,497][03586] Loop rollout_proc2_evt_loop terminating... +[2024-09-18 11:38:35,536][00268] Component RolloutWorker_w5 stopped! +[2024-09-18 11:38:35,541][03589] Stopping RolloutWorker_w5... +[2024-09-18 11:38:35,542][03589] Loop rollout_proc5_evt_loop terminating... +[2024-09-18 11:38:35,563][00268] Component RolloutWorker_w3 stopped! +[2024-09-18 11:38:35,566][03587] Stopping RolloutWorker_w3... +[2024-09-18 11:38:35,572][03587] Loop rollout_proc3_evt_loop terminating... +[2024-09-18 11:38:35,582][00268] Component RolloutWorker_w1 stopped! +[2024-09-18 11:38:35,588][00268] Waiting for process learner_proc0 to stop... +[2024-09-18 11:38:35,592][03585] Stopping RolloutWorker_w1... +[2024-09-18 11:38:35,594][03585] Loop rollout_proc1_evt_loop terminating... +[2024-09-18 11:38:37,057][00268] Waiting for process inference_proc0-0 to join... +[2024-09-18 11:38:37,572][00268] Waiting for process rollout_proc0 to join... +[2024-09-18 11:38:37,575][00268] Waiting for process rollout_proc1 to join... +[2024-09-18 11:38:38,374][00268] Waiting for process rollout_proc2 to join... +[2024-09-18 11:38:38,378][00268] Waiting for process rollout_proc3 to join... +[2024-09-18 11:38:38,387][00268] Waiting for process rollout_proc4 to join... +[2024-09-18 11:38:38,391][00268] Waiting for process rollout_proc5 to join... +[2024-09-18 11:38:38,395][00268] Waiting for process rollout_proc6 to join... +[2024-09-18 11:38:38,397][00268] Waiting for process rollout_proc7 to join... +[2024-09-18 11:38:38,399][00268] Batcher 0 profile tree view: +batching: 24.2802, releasing_batches: 0.0240 +[2024-09-18 11:38:38,400][00268] InferenceWorker_p0-w0 profile tree view: +wait_policy: 0.0000 + wait_policy_total: 475.0022 +update_model: 8.5899 + weight_update: 0.0013 +one_step: 0.0141 + handle_policy_step: 608.9279 + deserialize: 15.9187, stack: 3.4847, obs_to_device_normalize: 130.2658, forward: 312.9496, send_messages: 22.9440 + prepare_outputs: 92.1547 + to_cpu: 59.4190 +[2024-09-18 11:38:38,402][00268] Learner 0 profile tree view: +misc: 0.0054, prepare_batch: 15.7238 +train: 70.6340 + epoch_init: 0.0060, minibatch_init: 0.0069, losses_postprocess: 0.4975, kl_divergence: 0.5171, after_optimizer: 32.7668 + calculate_losses: 22.8399 + losses_init: 0.0044, forward_head: 1.6228, bptt_initial: 14.9823, tail: 1.0065, advantages_returns: 0.2676, losses: 2.4548 + bptt: 2.1843 + bptt_forward_core: 2.0930 + update: 13.4511 + clip: 1.3997 +[2024-09-18 11:38:38,404][00268] Loop Runner_EvtLoop terminating... +[2024-09-18 11:38:38,406][00268] Runner profile tree view: +main_loop: 1158.9681 +[2024-09-18 11:38:38,408][00268] Collected {0: 4005888}, FPS: 3456.4 +[2024-09-18 11:38:38,688][00268] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json +[2024-09-18 11:38:38,690][00268] Overriding arg 'num_workers' with value 1 passed from command line +[2024-09-18 11:38:38,692][00268] Adding new argument 'no_render'=True that is not in the saved config file! +[2024-09-18 11:38:38,698][00268] Adding new argument 'save_video'=True that is not in the saved config file! +[2024-09-18 11:38:38,700][00268] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! +[2024-09-18 11:38:38,701][00268] Adding new argument 'video_name'=None that is not in the saved config file! +[2024-09-18 11:38:38,706][00268] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! +[2024-09-18 11:38:38,708][00268] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! +[2024-09-18 11:38:38,710][00268] Adding new argument 'push_to_hub'=False that is not in the saved config file! +[2024-09-18 11:38:38,711][00268] Adding new argument 'hf_repository'=None that is not in the saved config file! +[2024-09-18 11:38:38,712][00268] Adding new argument 'policy_index'=0 that is not in the saved config file! +[2024-09-18 11:38:38,713][00268] Adding new argument 'eval_deterministic'=False that is not in the saved config file! +[2024-09-18 11:38:38,714][00268] Adding new argument 'train_script'=None that is not in the saved config file! +[2024-09-18 11:38:38,715][00268] Adding new argument 'enjoy_script'=None that is not in the saved config file! +[2024-09-18 11:38:38,716][00268] Using frameskip 1 and render_action_repeat=4 for evaluation +[2024-09-18 11:38:38,740][00268] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-09-18 11:38:38,744][00268] RunningMeanStd input shape: (3, 72, 128) +[2024-09-18 11:38:38,747][00268] RunningMeanStd input shape: (1,) +[2024-09-18 11:38:38,763][00268] ConvEncoder: input_channels=3 +[2024-09-18 11:38:38,887][00268] Conv encoder output size: 512 +[2024-09-18 11:38:38,889][00268] Policy head output size: 512 +[2024-09-18 11:38:40,470][00268] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... +[2024-09-18 11:38:41,315][00268] Num frames 100... +[2024-09-18 11:38:41,433][00268] Num frames 200... +[2024-09-18 11:38:41,554][00268] Num frames 300... +[2024-09-18 11:38:41,672][00268] Num frames 400... +[2024-09-18 11:38:41,783][00268] Avg episode rewards: #0: 5.480, true rewards: #0: 4.480 +[2024-09-18 11:38:41,784][00268] Avg episode reward: 5.480, avg true_objective: 4.480 +[2024-09-18 11:38:41,852][00268] Num frames 500... +[2024-09-18 11:38:41,981][00268] Num frames 600... +[2024-09-18 11:38:42,106][00268] Num frames 700... +[2024-09-18 11:38:42,223][00268] Num frames 800... +[2024-09-18 11:38:42,340][00268] Num frames 900... +[2024-09-18 11:38:42,459][00268] Num frames 1000... +[2024-09-18 11:38:42,577][00268] Num frames 1100... +[2024-09-18 11:38:42,692][00268] Num frames 1200... +[2024-09-18 11:38:42,810][00268] Num frames 1300... +[2024-09-18 11:38:42,936][00268] Num frames 1400... +[2024-09-18 11:38:43,055][00268] Num frames 1500... +[2024-09-18 11:38:43,184][00268] Num frames 1600... +[2024-09-18 11:38:43,302][00268] Num frames 1700... +[2024-09-18 11:38:43,433][00268] Avg episode rewards: #0: 20.805, true rewards: #0: 8.805 +[2024-09-18 11:38:43,435][00268] Avg episode reward: 20.805, avg true_objective: 8.805 +[2024-09-18 11:38:43,487][00268] Num frames 1800... +[2024-09-18 11:38:43,603][00268] Num frames 1900... +[2024-09-18 11:38:43,721][00268] Num frames 2000... +[2024-09-18 11:38:43,840][00268] Num frames 2100... +[2024-09-18 11:38:43,970][00268] Num frames 2200... +[2024-09-18 11:38:44,038][00268] Avg episode rewards: #0: 15.697, true rewards: #0: 7.363 +[2024-09-18 11:38:44,039][00268] Avg episode reward: 15.697, avg true_objective: 7.363 +[2024-09-18 11:38:44,155][00268] Num frames 2300... +[2024-09-18 11:38:44,273][00268] Num frames 2400... +[2024-09-18 11:38:44,392][00268] Num frames 2500... +[2024-09-18 11:38:44,513][00268] Num frames 2600... +[2024-09-18 11:38:44,631][00268] Num frames 2700... +[2024-09-18 11:38:44,750][00268] Num frames 2800... +[2024-09-18 11:38:44,871][00268] Num frames 2900... +[2024-09-18 11:38:45,007][00268] Num frames 3000... +[2024-09-18 11:38:45,126][00268] Num frames 3100... +[2024-09-18 11:38:45,254][00268] Num frames 3200... +[2024-09-18 11:38:45,374][00268] Num frames 3300... +[2024-09-18 11:38:45,499][00268] Num frames 3400... +[2024-09-18 11:38:45,619][00268] Num frames 3500... +[2024-09-18 11:38:45,740][00268] Num frames 3600... +[2024-09-18 11:38:45,861][00268] Num frames 3700... +[2024-09-18 11:38:45,995][00268] Num frames 3800... +[2024-09-18 11:38:46,114][00268] Num frames 3900... +[2024-09-18 11:38:46,242][00268] Num frames 4000... +[2024-09-18 11:38:46,366][00268] Num frames 4100... +[2024-09-18 11:38:46,487][00268] Num frames 4200... +[2024-09-18 11:38:46,606][00268] Num frames 4300... +[2024-09-18 11:38:46,675][00268] Avg episode rewards: #0: 26.522, true rewards: #0: 10.773 +[2024-09-18 11:38:46,677][00268] Avg episode reward: 26.522, avg true_objective: 10.773 +[2024-09-18 11:38:46,786][00268] Num frames 4400... +[2024-09-18 11:38:46,918][00268] Num frames 4500... +[2024-09-18 11:38:47,045][00268] Num frames 4600... +[2024-09-18 11:38:47,164][00268] Num frames 4700... +[2024-09-18 11:38:47,297][00268] Num frames 4800... +[2024-09-18 11:38:47,417][00268] Num frames 4900... +[2024-09-18 11:38:47,550][00268] Num frames 5000... +[2024-09-18 11:38:47,698][00268] Num frames 5100... +[2024-09-18 11:38:47,868][00268] Num frames 5200... +[2024-09-18 11:38:48,041][00268] Num frames 5300... +[2024-09-18 11:38:48,204][00268] Num frames 5400... +[2024-09-18 11:38:48,378][00268] Num frames 5500... +[2024-09-18 11:38:48,545][00268] Num frames 5600... +[2024-09-18 11:38:48,707][00268] Num frames 5700... +[2024-09-18 11:38:48,872][00268] Num frames 5800... +[2024-09-18 11:38:49,004][00268] Avg episode rewards: #0: 28.280, true rewards: #0: 11.680 +[2024-09-18 11:38:49,007][00268] Avg episode reward: 28.280, avg true_objective: 11.680 +[2024-09-18 11:38:49,106][00268] Num frames 5900... +[2024-09-18 11:38:49,275][00268] Num frames 6000... +[2024-09-18 11:38:49,459][00268] Num frames 6100... +[2024-09-18 11:38:49,628][00268] Num frames 6200... +[2024-09-18 11:38:49,796][00268] Num frames 6300... +[2024-09-18 11:38:49,962][00268] Num frames 6400... +[2024-09-18 11:38:50,083][00268] Num frames 6500... +[2024-09-18 11:38:50,204][00268] Num frames 6600... +[2024-09-18 11:38:50,326][00268] Num frames 6700... +[2024-09-18 11:38:50,478][00268] Avg episode rewards: #0: 27.621, true rewards: #0: 11.288 +[2024-09-18 11:38:50,479][00268] Avg episode reward: 27.621, avg true_objective: 11.288 +[2024-09-18 11:38:50,516][00268] Num frames 6800... +[2024-09-18 11:38:50,630][00268] Num frames 6900... +[2024-09-18 11:38:50,751][00268] Num frames 7000... +[2024-09-18 11:38:50,870][00268] Num frames 7100... +[2024-09-18 11:38:50,999][00268] Num frames 7200... +[2024-09-18 11:38:51,115][00268] Num frames 7300... +[2024-09-18 11:38:51,233][00268] Num frames 7400... +[2024-09-18 11:38:51,358][00268] Num frames 7500... +[2024-09-18 11:38:51,479][00268] Num frames 7600... +[2024-09-18 11:38:51,597][00268] Num frames 7700... +[2024-09-18 11:38:51,718][00268] Num frames 7800... +[2024-09-18 11:38:51,839][00268] Num frames 7900... +[2024-09-18 11:38:51,972][00268] Num frames 8000... +[2024-09-18 11:38:52,089][00268] Num frames 8100... +[2024-09-18 11:38:52,208][00268] Num frames 8200... +[2024-09-18 11:38:52,339][00268] Num frames 8300... +[2024-09-18 11:38:52,472][00268] Num frames 8400... +[2024-09-18 11:38:52,595][00268] Num frames 8500... +[2024-09-18 11:38:52,715][00268] Num frames 8600... +[2024-09-18 11:38:52,836][00268] Num frames 8700... +[2024-09-18 11:38:52,965][00268] Num frames 8800... +[2024-09-18 11:38:53,109][00268] Avg episode rewards: #0: 32.390, true rewards: #0: 12.676 +[2024-09-18 11:38:53,111][00268] Avg episode reward: 32.390, avg true_objective: 12.676 +[2024-09-18 11:38:53,147][00268] Num frames 8900... +[2024-09-18 11:38:53,263][00268] Num frames 9000... +[2024-09-18 11:38:53,385][00268] Num frames 9100... +[2024-09-18 11:38:53,518][00268] Num frames 9200... +[2024-09-18 11:38:53,637][00268] Num frames 9300... +[2024-09-18 11:38:53,759][00268] Num frames 9400... +[2024-09-18 11:38:53,881][00268] Num frames 9500... +[2024-09-18 11:38:54,017][00268] Num frames 9600... +[2024-09-18 11:38:54,147][00268] Num frames 9700... +[2024-09-18 11:38:54,271][00268] Num frames 9800... +[2024-09-18 11:38:54,422][00268] Avg episode rewards: #0: 30.846, true rewards: #0: 12.346 +[2024-09-18 11:38:54,424][00268] Avg episode reward: 30.846, avg true_objective: 12.346 +[2024-09-18 11:38:54,465][00268] Num frames 9900... +[2024-09-18 11:38:54,581][00268] Num frames 10000... +[2024-09-18 11:38:54,697][00268] Num frames 10100... +[2024-09-18 11:38:54,818][00268] Num frames 10200... +[2024-09-18 11:38:54,945][00268] Num frames 10300... +[2024-09-18 11:38:55,063][00268] Num frames 10400... +[2024-09-18 11:38:55,179][00268] Num frames 10500... +[2024-09-18 11:38:55,298][00268] Num frames 10600... +[2024-09-18 11:38:55,416][00268] Num frames 10700... +[2024-09-18 11:38:55,548][00268] Num frames 10800... +[2024-09-18 11:38:55,665][00268] Num frames 10900... +[2024-09-18 11:38:55,788][00268] Num frames 11000... +[2024-09-18 11:38:55,944][00268] Avg episode rewards: #0: 30.316, true rewards: #0: 12.317 +[2024-09-18 11:38:55,947][00268] Avg episode reward: 30.316, avg true_objective: 12.317 +[2024-09-18 11:38:55,968][00268] Num frames 11100... +[2024-09-18 11:38:56,084][00268] Num frames 11200... +[2024-09-18 11:38:56,204][00268] Num frames 11300... +[2024-09-18 11:38:56,323][00268] Num frames 11400... +[2024-09-18 11:38:56,444][00268] Num frames 11500... +[2024-09-18 11:38:56,573][00268] Num frames 11600... +[2024-09-18 11:38:56,692][00268] Num frames 11700... +[2024-09-18 11:38:56,814][00268] Num frames 11800... +[2024-09-18 11:38:56,944][00268] Num frames 11900... +[2024-09-18 11:38:57,070][00268] Num frames 12000... +[2024-09-18 11:38:57,191][00268] Num frames 12100... +[2024-09-18 11:38:57,308][00268] Num frames 12200... +[2024-09-18 11:38:57,427][00268] Num frames 12300... +[2024-09-18 11:38:57,562][00268] Num frames 12400... +[2024-09-18 11:38:57,685][00268] Num frames 12500... +[2024-09-18 11:38:57,805][00268] Num frames 12600... +[2024-09-18 11:38:57,932][00268] Num frames 12700... +[2024-09-18 11:38:58,055][00268] Num frames 12800... +[2024-09-18 11:38:58,176][00268] Num frames 12900... +[2024-09-18 11:38:58,289][00268] Avg episode rewards: #0: 32.245, true rewards: #0: 12.945 +[2024-09-18 11:38:58,290][00268] Avg episode reward: 32.245, avg true_objective: 12.945 +[2024-09-18 11:40:20,882][00268] Replay video saved to /content/train_dir/default_experiment/replay.mp4! +[2024-09-18 11:41:19,372][00268] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json +[2024-09-18 11:41:19,374][00268] Overriding arg 'num_workers' with value 1 passed from command line +[2024-09-18 11:41:19,376][00268] Adding new argument 'no_render'=True that is not in the saved config file! +[2024-09-18 11:41:19,378][00268] Adding new argument 'save_video'=True that is not in the saved config file! +[2024-09-18 11:41:19,380][00268] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! +[2024-09-18 11:41:19,382][00268] Adding new argument 'video_name'=None that is not in the saved config file! +[2024-09-18 11:41:19,385][00268] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! +[2024-09-18 11:41:19,387][00268] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! +[2024-09-18 11:41:19,388][00268] Adding new argument 'push_to_hub'=True that is not in the saved config file! +[2024-09-18 11:41:19,391][00268] Adding new argument 'hf_repository'='mkdem/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! +[2024-09-18 11:41:19,392][00268] Adding new argument 'policy_index'=0 that is not in the saved config file! +[2024-09-18 11:41:19,394][00268] Adding new argument 'eval_deterministic'=False that is not in the saved config file! +[2024-09-18 11:41:19,395][00268] Adding new argument 'train_script'=None that is not in the saved config file! +[2024-09-18 11:41:19,396][00268] Adding new argument 'enjoy_script'=None that is not in the saved config file! +[2024-09-18 11:41:19,399][00268] Using frameskip 1 and render_action_repeat=4 for evaluation +[2024-09-18 11:41:19,410][00268] RunningMeanStd input shape: (3, 72, 128) +[2024-09-18 11:41:19,412][00268] RunningMeanStd input shape: (1,) +[2024-09-18 11:41:19,425][00268] ConvEncoder: input_channels=3 +[2024-09-18 11:41:19,462][00268] Conv encoder output size: 512 +[2024-09-18 11:41:19,463][00268] Policy head output size: 512 +[2024-09-18 11:41:19,482][00268] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... +[2024-09-18 11:41:20,009][00268] Num frames 100... +[2024-09-18 11:41:20,124][00268] Num frames 200... +[2024-09-18 11:41:20,243][00268] Num frames 300... +[2024-09-18 11:41:20,360][00268] Num frames 400... +[2024-09-18 11:41:20,477][00268] Num frames 500... +[2024-09-18 11:41:20,599][00268] Num frames 600... +[2024-09-18 11:41:20,717][00268] Num frames 700... +[2024-09-18 11:41:20,835][00268] Num frames 800... +[2024-09-18 11:41:20,971][00268] Num frames 900... +[2024-09-18 11:41:21,096][00268] Num frames 1000... +[2024-09-18 11:41:21,213][00268] Num frames 1100... +[2024-09-18 11:41:21,335][00268] Num frames 1200... +[2024-09-18 11:41:21,456][00268] Num frames 1300... +[2024-09-18 11:41:21,596][00268] Num frames 1400... +[2024-09-18 11:41:21,767][00268] Num frames 1500... +[2024-09-18 11:41:21,994][00268] Num frames 1600... +[2024-09-18 11:41:22,171][00268] Num frames 1700... +[2024-09-18 11:41:22,360][00268] Num frames 1800... +[2024-09-18 11:41:22,540][00268] Num frames 1900... +[2024-09-18 11:41:22,786][00268] Num frames 2000... +[2024-09-18 11:41:23,096][00268] Num frames 2100... +[2024-09-18 11:41:23,148][00268] Avg episode rewards: #0: 60.999, true rewards: #0: 21.000 +[2024-09-18 11:41:23,154][00268] Avg episode reward: 60.999, avg true_objective: 21.000 +[2024-09-18 11:41:23,386][00268] Num frames 2200... +[2024-09-18 11:41:23,677][00268] Num frames 2300... +[2024-09-18 11:41:24,024][00268] Num frames 2400... +[2024-09-18 11:41:24,195][00268] Num frames 2500... +[2024-09-18 11:41:24,403][00268] Num frames 2600... +[2024-09-18 11:41:24,619][00268] Num frames 2700... +[2024-09-18 11:41:24,950][00268] Num frames 2800... +[2024-09-18 11:41:25,262][00268] Num frames 2900... +[2024-09-18 11:41:25,512][00268] Num frames 3000... +[2024-09-18 11:41:25,757][00268] Avg episode rewards: #0: 43.459, true rewards: #0: 15.460 +[2024-09-18 11:41:25,759][00268] Avg episode reward: 43.459, avg true_objective: 15.460 +[2024-09-18 11:41:25,773][00268] Num frames 3100... +[2024-09-18 11:41:25,894][00268] Num frames 3200... +[2024-09-18 11:41:26,020][00268] Num frames 3300... +[2024-09-18 11:41:26,146][00268] Num frames 3400... +[2024-09-18 11:41:26,269][00268] Num frames 3500... +[2024-09-18 11:41:26,396][00268] Num frames 3600... +[2024-09-18 11:41:26,517][00268] Num frames 3700... +[2024-09-18 11:41:26,635][00268] Num frames 3800... +[2024-09-18 11:41:26,753][00268] Num frames 3900... +[2024-09-18 11:41:26,879][00268] Num frames 4000... +[2024-09-18 11:41:27,051][00268] Avg episode rewards: #0: 36.633, true rewards: #0: 13.633 +[2024-09-18 11:41:27,053][00268] Avg episode reward: 36.633, avg true_objective: 13.633 +[2024-09-18 11:41:27,068][00268] Num frames 4100... +[2024-09-18 11:41:27,198][00268] Num frames 4200... +[2024-09-18 11:41:27,369][00268] Num frames 4300... +[2024-09-18 11:41:27,539][00268] Num frames 4400... +[2024-09-18 11:41:27,698][00268] Num frames 4500... +[2024-09-18 11:41:27,870][00268] Num frames 4600... +[2024-09-18 11:41:28,042][00268] Num frames 4700... +[2024-09-18 11:41:28,206][00268] Num frames 4800... +[2024-09-18 11:41:28,374][00268] Num frames 4900... +[2024-09-18 11:41:28,470][00268] Avg episode rewards: #0: 31.555, true rewards: #0: 12.305 +[2024-09-18 11:41:28,472][00268] Avg episode reward: 31.555, avg true_objective: 12.305 +[2024-09-18 11:41:28,602][00268] Num frames 5000... +[2024-09-18 11:41:28,770][00268] Num frames 5100... +[2024-09-18 11:41:28,942][00268] Num frames 5200... +[2024-09-18 11:41:29,107][00268] Num frames 5300... +[2024-09-18 11:41:29,289][00268] Num frames 5400... +[2024-09-18 11:41:29,459][00268] Num frames 5500... +[2024-09-18 11:41:29,625][00268] Num frames 5600... +[2024-09-18 11:41:29,774][00268] Num frames 5700... +[2024-09-18 11:41:29,903][00268] Num frames 5800... +[2024-09-18 11:41:30,031][00268] Num frames 5900... +[2024-09-18 11:41:30,154][00268] Num frames 6000... +[2024-09-18 11:41:30,284][00268] Num frames 6100... +[2024-09-18 11:41:30,406][00268] Num frames 6200... +[2024-09-18 11:41:30,525][00268] Num frames 6300... +[2024-09-18 11:41:30,642][00268] Num frames 6400... +[2024-09-18 11:41:30,760][00268] Num frames 6500... +[2024-09-18 11:41:30,880][00268] Num frames 6600... +[2024-09-18 11:41:31,002][00268] Num frames 6700... +[2024-09-18 11:41:31,075][00268] Avg episode rewards: #0: 34.228, true rewards: #0: 13.428 +[2024-09-18 11:41:31,077][00268] Avg episode reward: 34.228, avg true_objective: 13.428 +[2024-09-18 11:41:31,176][00268] Num frames 6800... +[2024-09-18 11:41:31,305][00268] Num frames 6900... +[2024-09-18 11:41:31,433][00268] Num frames 7000... +[2024-09-18 11:41:31,550][00268] Num frames 7100... +[2024-09-18 11:41:31,668][00268] Num frames 7200... +[2024-09-18 11:41:31,787][00268] Num frames 7300... +[2024-09-18 11:41:31,904][00268] Num frames 7400... +[2024-09-18 11:41:32,030][00268] Num frames 7500... +[2024-09-18 11:41:32,148][00268] Num frames 7600... +[2024-09-18 11:41:32,271][00268] Num frames 7700... +[2024-09-18 11:41:32,417][00268] Num frames 7800... +[2024-09-18 11:41:32,533][00268] Num frames 7900... +[2024-09-18 11:41:32,653][00268] Num frames 8000... +[2024-09-18 11:41:32,775][00268] Num frames 8100... +[2024-09-18 11:41:32,892][00268] Num frames 8200... +[2024-09-18 11:41:33,023][00268] Num frames 8300... +[2024-09-18 11:41:33,139][00268] Num frames 8400... +[2024-09-18 11:41:33,257][00268] Num frames 8500... +[2024-09-18 11:41:33,391][00268] Num frames 8600... +[2024-09-18 11:41:33,511][00268] Num frames 8700... +[2024-09-18 11:41:33,607][00268] Avg episode rewards: #0: 38.050, true rewards: #0: 14.550 +[2024-09-18 11:41:33,609][00268] Avg episode reward: 38.050, avg true_objective: 14.550 +[2024-09-18 11:41:33,694][00268] Num frames 8800... +[2024-09-18 11:41:33,815][00268] Num frames 8900... +[2024-09-18 11:41:33,942][00268] Num frames 9000... +[2024-09-18 11:41:34,056][00268] Num frames 9100... +[2024-09-18 11:41:34,177][00268] Num frames 9200... +[2024-09-18 11:41:34,282][00268] Avg episode rewards: #0: 34.060, true rewards: #0: 13.203 +[2024-09-18 11:41:34,284][00268] Avg episode reward: 34.060, avg true_objective: 13.203 +[2024-09-18 11:41:34,358][00268] Num frames 9300... +[2024-09-18 11:41:34,496][00268] Num frames 9400... +[2024-09-18 11:41:34,616][00268] Num frames 9500... +[2024-09-18 11:41:34,730][00268] Num frames 9600... +[2024-09-18 11:41:34,851][00268] Num frames 9700... +[2024-09-18 11:41:34,974][00268] Num frames 9800... +[2024-09-18 11:41:35,094][00268] Num frames 9900... +[2024-09-18 11:41:35,213][00268] Num frames 10000... +[2024-09-18 11:41:35,330][00268] Num frames 10100... +[2024-09-18 11:41:35,464][00268] Num frames 10200... +[2024-09-18 11:41:35,582][00268] Num frames 10300... +[2024-09-18 11:41:35,702][00268] Num frames 10400... +[2024-09-18 11:41:35,827][00268] Num frames 10500... +[2024-09-18 11:41:35,954][00268] Num frames 10600... +[2024-09-18 11:41:36,076][00268] Num frames 10700... +[2024-09-18 11:41:36,191][00268] Num frames 10800... +[2024-09-18 11:41:36,315][00268] Num frames 10900... +[2024-09-18 11:41:36,451][00268] Num frames 11000... +[2024-09-18 11:41:36,607][00268] Num frames 11100... +[2024-09-18 11:41:36,767][00268] Num frames 11200... +[2024-09-18 11:41:36,932][00268] Num frames 11300... +[2024-09-18 11:41:37,057][00268] Avg episode rewards: #0: 36.802, true rewards: #0: 14.178 +[2024-09-18 11:41:37,059][00268] Avg episode reward: 36.802, avg true_objective: 14.178 +[2024-09-18 11:41:37,128][00268] Num frames 11400... +[2024-09-18 11:41:37,246][00268] Num frames 11500... +[2024-09-18 11:41:37,369][00268] Num frames 11600... +[2024-09-18 11:41:37,496][00268] Num frames 11700... +[2024-09-18 11:41:37,613][00268] Num frames 11800... +[2024-09-18 11:41:37,723][00268] Avg episode rewards: #0: 34.160, true rewards: #0: 13.160 +[2024-09-18 11:41:37,725][00268] Avg episode reward: 34.160, avg true_objective: 13.160 +[2024-09-18 11:41:37,793][00268] Num frames 11900... +[2024-09-18 11:41:37,915][00268] Num frames 12000... +[2024-09-18 11:41:38,041][00268] Num frames 12100... +[2024-09-18 11:41:38,165][00268] Num frames 12200... +[2024-09-18 11:41:38,291][00268] Num frames 12300... +[2024-09-18 11:41:38,413][00268] Num frames 12400... +[2024-09-18 11:41:38,541][00268] Num frames 12500... +[2024-09-18 11:41:38,665][00268] Num frames 12600... +[2024-09-18 11:41:38,785][00268] Num frames 12700... +[2024-09-18 11:41:38,907][00268] Num frames 12800... +[2024-09-18 11:41:39,031][00268] Num frames 12900... +[2024-09-18 11:41:39,152][00268] Num frames 13000... +[2024-09-18 11:41:39,271][00268] Num frames 13100... +[2024-09-18 11:41:39,396][00268] Num frames 13200... +[2024-09-18 11:41:39,516][00268] Avg episode rewards: #0: 34.451, true rewards: #0: 13.251 +[2024-09-18 11:41:39,518][00268] Avg episode reward: 34.451, avg true_objective: 13.251 +[2024-09-18 11:43:05,352][00268] Replay video saved to /content/train_dir/default_experiment/replay.mp4! +[2024-09-18 11:43:42,216][00268] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json +[2024-09-18 11:43:42,218][00268] Overriding arg 'num_workers' with value 1 passed from command line +[2024-09-18 11:43:42,220][00268] Adding new argument 'no_render'=True that is not in the saved config file! +[2024-09-18 11:43:42,222][00268] Adding new argument 'save_video'=True that is not in the saved config file! +[2024-09-18 11:43:42,224][00268] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! +[2024-09-18 11:43:42,225][00268] Adding new argument 'video_name'=None that is not in the saved config file! +[2024-09-18 11:43:42,227][00268] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! +[2024-09-18 11:43:42,229][00268] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! +[2024-09-18 11:43:42,230][00268] Adding new argument 'push_to_hub'=True that is not in the saved config file! +[2024-09-18 11:43:42,231][00268] Adding new argument 'hf_repository'='mkdem/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! +[2024-09-18 11:43:42,232][00268] Adding new argument 'policy_index'=0 that is not in the saved config file! +[2024-09-18 11:43:42,233][00268] Adding new argument 'eval_deterministic'=False that is not in the saved config file! +[2024-09-18 11:43:42,234][00268] Adding new argument 'train_script'=None that is not in the saved config file! +[2024-09-18 11:43:42,235][00268] Adding new argument 'enjoy_script'=None that is not in the saved config file! +[2024-09-18 11:43:42,236][00268] Using frameskip 1 and render_action_repeat=4 for evaluation +[2024-09-18 11:43:42,256][00268] RunningMeanStd input shape: (3, 72, 128) +[2024-09-18 11:43:42,257][00268] RunningMeanStd input shape: (1,) +[2024-09-18 11:43:42,271][00268] ConvEncoder: input_channels=3 +[2024-09-18 11:43:42,308][00268] Conv encoder output size: 512 +[2024-09-18 11:43:42,309][00268] Policy head output size: 512 +[2024-09-18 11:43:42,328][00268] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... +[2024-09-18 11:43:42,803][00268] Num frames 100... +[2024-09-18 11:43:42,933][00268] Num frames 200... +[2024-09-18 11:43:43,065][00268] Num frames 300... +[2024-09-18 11:43:43,185][00268] Num frames 400... +[2024-09-18 11:43:43,303][00268] Num frames 500... +[2024-09-18 11:43:43,423][00268] Num frames 600... +[2024-09-18 11:43:43,544][00268] Num frames 700... +[2024-09-18 11:43:43,671][00268] Num frames 800... +[2024-09-18 11:43:43,795][00268] Num frames 900... +[2024-09-18 11:43:43,920][00268] Num frames 1000... +[2024-09-18 11:43:44,044][00268] Num frames 1100... +[2024-09-18 11:43:44,167][00268] Num frames 1200... +[2024-09-18 11:43:44,287][00268] Num frames 1300... +[2024-09-18 11:43:44,413][00268] Num frames 1400... +[2024-09-18 11:43:44,537][00268] Num frames 1500... +[2024-09-18 11:43:44,660][00268] Num frames 1600... +[2024-09-18 11:43:44,793][00268] Num frames 1700... +[2024-09-18 11:43:44,919][00268] Num frames 1800... +[2024-09-18 11:43:45,044][00268] Num frames 1900... +[2024-09-18 11:43:45,168][00268] Num frames 2000... +[2024-09-18 11:43:45,291][00268] Num frames 2100... +[2024-09-18 11:43:45,346][00268] Avg episode rewards: #0: 58.999, true rewards: #0: 21.000 +[2024-09-18 11:43:45,348][00268] Avg episode reward: 58.999, avg true_objective: 21.000 +[2024-09-18 11:43:45,476][00268] Num frames 2200... +[2024-09-18 11:43:45,601][00268] Num frames 2300... +[2024-09-18 11:43:45,738][00268] Num frames 2400... +[2024-09-18 11:43:45,858][00268] Num frames 2500... +[2024-09-18 11:43:45,991][00268] Num frames 2600... +[2024-09-18 11:43:46,061][00268] Avg episode rewards: #0: 33.559, true rewards: #0: 13.060 +[2024-09-18 11:43:46,063][00268] Avg episode reward: 33.559, avg true_objective: 13.060 +[2024-09-18 11:43:46,166][00268] Num frames 2700... +[2024-09-18 11:43:46,281][00268] Num frames 2800... +[2024-09-18 11:43:46,400][00268] Num frames 2900... +[2024-09-18 11:43:46,515][00268] Num frames 3000... +[2024-09-18 11:43:46,638][00268] Avg episode rewards: #0: 24.200, true rewards: #0: 10.200 +[2024-09-18 11:43:46,644][00268] Avg episode reward: 24.200, avg true_objective: 10.200 +[2024-09-18 11:43:46,696][00268] Num frames 3100... +[2024-09-18 11:43:46,824][00268] Num frames 3200... +[2024-09-18 11:43:46,954][00268] Num frames 3300... +[2024-09-18 11:43:47,072][00268] Num frames 3400... +[2024-09-18 11:43:47,190][00268] Num frames 3500... +[2024-09-18 11:43:47,310][00268] Num frames 3600... +[2024-09-18 11:43:47,428][00268] Num frames 3700... +[2024-09-18 11:43:47,556][00268] Num frames 3800... +[2024-09-18 11:43:47,662][00268] Avg episode rewards: #0: 21.820, true rewards: #0: 9.570 +[2024-09-18 11:43:47,664][00268] Avg episode reward: 21.820, avg true_objective: 9.570 +[2024-09-18 11:43:47,793][00268] Num frames 3900... +[2024-09-18 11:43:47,976][00268] Num frames 4000... +[2024-09-18 11:43:48,135][00268] Num frames 4100... +[2024-09-18 11:43:48,301][00268] Num frames 4200... +[2024-09-18 11:43:48,463][00268] Num frames 4300... +[2024-09-18 11:43:48,631][00268] Num frames 4400... +[2024-09-18 11:43:48,789][00268] Num frames 4500... +[2024-09-18 11:43:48,974][00268] Num frames 4600... +[2024-09-18 11:43:49,145][00268] Num frames 4700... +[2024-09-18 11:43:49,322][00268] Num frames 4800... +[2024-09-18 11:43:49,486][00268] Num frames 4900... +[2024-09-18 11:43:49,655][00268] Num frames 5000... +[2024-09-18 11:43:49,829][00268] Num frames 5100... +[2024-09-18 11:43:50,007][00268] Num frames 5200... +[2024-09-18 11:43:50,155][00268] Num frames 5300... +[2024-09-18 11:43:50,276][00268] Num frames 5400... +[2024-09-18 11:43:50,397][00268] Num frames 5500... +[2024-09-18 11:43:50,516][00268] Num frames 5600... +[2024-09-18 11:43:50,631][00268] Num frames 5700... +[2024-09-18 11:43:50,755][00268] Num frames 5800... +[2024-09-18 11:43:50,876][00268] Num frames 5900... +[2024-09-18 11:43:50,969][00268] Avg episode rewards: #0: 29.456, true rewards: #0: 11.856 +[2024-09-18 11:43:50,972][00268] Avg episode reward: 29.456, avg true_objective: 11.856 +[2024-09-18 11:43:51,062][00268] Num frames 6000... +[2024-09-18 11:43:51,181][00268] Num frames 6100... +[2024-09-18 11:43:51,299][00268] Num frames 6200... +[2024-09-18 11:43:51,429][00268] Num frames 6300... +[2024-09-18 11:43:51,548][00268] Num frames 6400... +[2024-09-18 11:43:51,664][00268] Num frames 6500... +[2024-09-18 11:43:51,725][00268] Avg episode rewards: #0: 26.506, true rewards: #0: 10.840 +[2024-09-18 11:43:51,726][00268] Avg episode reward: 26.506, avg true_objective: 10.840 +[2024-09-18 11:43:51,843][00268] Num frames 6600... +[2024-09-18 11:43:51,979][00268] Num frames 6700... +[2024-09-18 11:43:52,100][00268] Num frames 6800... +[2024-09-18 11:43:52,216][00268] Num frames 6900... +[2024-09-18 11:43:52,331][00268] Num frames 7000... +[2024-09-18 11:43:52,452][00268] Num frames 7100... +[2024-09-18 11:43:52,595][00268] Avg episode rewards: #0: 24.394, true rewards: #0: 10.251 +[2024-09-18 11:43:52,596][00268] Avg episode reward: 24.394, avg true_objective: 10.251 +[2024-09-18 11:43:52,627][00268] Num frames 7200... +[2024-09-18 11:43:52,747][00268] Num frames 7300... +[2024-09-18 11:43:52,868][00268] Num frames 7400... +[2024-09-18 11:43:53,021][00268] Num frames 7500... +[2024-09-18 11:43:53,149][00268] Num frames 7600... +[2024-09-18 11:43:53,273][00268] Num frames 7700... +[2024-09-18 11:43:53,403][00268] Num frames 7800... +[2024-09-18 11:43:53,519][00268] Num frames 7900... +[2024-09-18 11:43:53,640][00268] Num frames 8000... +[2024-09-18 11:43:53,759][00268] Num frames 8100... +[2024-09-18 11:43:53,877][00268] Num frames 8200... +[2024-09-18 11:43:54,013][00268] Num frames 8300... +[2024-09-18 11:43:54,135][00268] Num frames 8400... +[2024-09-18 11:43:54,256][00268] Num frames 8500... +[2024-09-18 11:43:54,379][00268] Num frames 8600... +[2024-09-18 11:43:54,497][00268] Num frames 8700... +[2024-09-18 11:43:54,641][00268] Avg episode rewards: #0: 26.095, true rewards: #0: 10.970 +[2024-09-18 11:43:54,643][00268] Avg episode reward: 26.095, avg true_objective: 10.970 +[2024-09-18 11:43:54,673][00268] Num frames 8800... +[2024-09-18 11:43:54,792][00268] Num frames 8900... +[2024-09-18 11:43:54,910][00268] Num frames 9000... +[2024-09-18 11:43:55,053][00268] Num frames 9100... +[2024-09-18 11:43:55,167][00268] Num frames 9200... +[2024-09-18 11:43:55,284][00268] Num frames 9300... +[2024-09-18 11:43:55,403][00268] Num frames 9400... +[2024-09-18 11:43:55,520][00268] Num frames 9500... +[2024-09-18 11:43:55,635][00268] Num frames 9600... +[2024-09-18 11:43:55,701][00268] Avg episode rewards: #0: 25.453, true rewards: #0: 10.676 +[2024-09-18 11:43:55,703][00268] Avg episode reward: 25.453, avg true_objective: 10.676 +[2024-09-18 11:43:55,808][00268] Num frames 9700... +[2024-09-18 11:43:55,933][00268] Num frames 9800... +[2024-09-18 11:43:56,054][00268] Num frames 9900... +[2024-09-18 11:43:56,172][00268] Num frames 10000... +[2024-09-18 11:43:56,288][00268] Num frames 10100... +[2024-09-18 11:43:56,408][00268] Num frames 10200... +[2024-09-18 11:43:56,527][00268] Num frames 10300... +[2024-09-18 11:43:56,649][00268] Num frames 10400... +[2024-09-18 11:43:56,774][00268] Num frames 10500... +[2024-09-18 11:43:56,907][00268] Num frames 10600... +[2024-09-18 11:43:57,039][00268] Num frames 10700... +[2024-09-18 11:43:57,178][00268] Num frames 10800... +[2024-09-18 11:43:57,308][00268] Num frames 10900... +[2024-09-18 11:43:57,430][00268] Num frames 11000... +[2024-09-18 11:43:57,554][00268] Num frames 11100... +[2024-09-18 11:43:57,673][00268] Num frames 11200... +[2024-09-18 11:43:57,791][00268] Num frames 11300... +[2024-09-18 11:43:57,919][00268] Num frames 11400... +[2024-09-18 11:43:58,047][00268] Num frames 11500... +[2024-09-18 11:43:58,174][00268] Num frames 11600... +[2024-09-18 11:43:58,294][00268] Num frames 11700... +[2024-09-18 11:43:58,360][00268] Avg episode rewards: #0: 28.408, true rewards: #0: 11.708 +[2024-09-18 11:43:58,361][00268] Avg episode reward: 28.408, avg true_objective: 11.708 +[2024-09-18 11:45:12,588][00268] Replay video saved to /content/train_dir/default_experiment/replay.mp4!