Upload folder using huggingface_hub

c82f6e5 verified 3 months ago

153 kB

	[2024-10-08 03:44:08,357][04317] Saving configuration to /content/train_dir/default_experiment/config.json...
	[2024-10-08 03:44:08,359][04317] Rollout worker 0 uses device cpu
	[2024-10-08 03:44:08,360][04317] Rollout worker 1 uses device cpu
	[2024-10-08 03:44:08,362][04317] Rollout worker 2 uses device cpu
	[2024-10-08 03:44:08,363][04317] Rollout worker 3 uses device cpu
	[2024-10-08 03:44:08,365][04317] Rollout worker 4 uses device cpu
	[2024-10-08 03:44:08,366][04317] Rollout worker 5 uses device cpu
	[2024-10-08 03:44:08,367][04317] Rollout worker 6 uses device cpu
	[2024-10-08 03:44:08,368][04317] Rollout worker 7 uses device cpu
	[2024-10-08 03:44:08,423][04317] Using GPUs [0] for process 0 (actually maps to GPUs [0])
	[2024-10-08 03:44:08,424][04317] InferenceWorker_p0-w0: min num requests: 2
	[2024-10-08 03:44:08,458][04317] Starting all processes...
	[2024-10-08 03:44:08,459][04317] Starting process learner_proc0
	[2024-10-08 03:44:08,508][04317] Starting all processes...
	[2024-10-08 03:44:08,513][04317] Starting process inference_proc0-0
	[2024-10-08 03:44:08,514][04317] Starting process rollout_proc0
	[2024-10-08 03:44:08,516][04317] Starting process rollout_proc1
	[2024-10-08 03:44:08,520][04317] Starting process rollout_proc2
	[2024-10-08 03:44:08,520][04317] Starting process rollout_proc3
	[2024-10-08 03:44:08,521][04317] Starting process rollout_proc4
	[2024-10-08 03:44:08,521][04317] Starting process rollout_proc5
	[2024-10-08 03:44:08,521][04317] Starting process rollout_proc6
	[2024-10-08 03:44:08,521][04317] Starting process rollout_proc7
	[2024-10-08 03:44:10,457][07032] Worker 6 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
	[2024-10-08 03:44:10,614][07027] Worker 0 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
	[2024-10-08 03:44:10,633][07031] Worker 5 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
	[2024-10-08 03:44:10,680][07008] Using GPUs [0] for process 0 (actually maps to GPUs [0])
	[2024-10-08 03:44:10,681][07008] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0
	[2024-10-08 03:44:10,699][07008] Num visible devices: 1
	[2024-10-08 03:44:10,731][07008] Starting seed is not provided
	[2024-10-08 03:44:10,732][07008] Using GPUs [0] for process 0 (actually maps to GPUs [0])
	[2024-10-08 03:44:10,732][07008] Initializing actor-critic model on device cuda:0
	[2024-10-08 03:44:10,732][07008] RunningMeanStd input shape: (3, 72, 128)
	[2024-10-08 03:44:10,734][07008] RunningMeanStd input shape: (1,)
	[2024-10-08 03:44:10,754][07008] ConvEncoder: input_channels=3
	[2024-10-08 03:44:10,764][07029] Worker 2 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
	[2024-10-08 03:44:10,803][07021] Using GPUs [0] for process 0 (actually maps to GPUs [0])
	[2024-10-08 03:44:10,803][07021] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0
	[2024-10-08 03:44:10,810][07022] Worker 1 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
	[2024-10-08 03:44:10,819][07021] Num visible devices: 1
	[2024-10-08 03:44:10,846][07030] Worker 4 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
	[2024-10-08 03:44:10,909][07044] Worker 7 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
	[2024-10-08 03:44:10,940][07028] Worker 3 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
	[2024-10-08 03:44:10,947][07008] Conv encoder output size: 512
	[2024-10-08 03:44:10,948][07008] Policy head output size: 512
	[2024-10-08 03:44:10,965][07008] Created Actor Critic model with architecture:
	[2024-10-08 03:44:10,965][07008] ActorCriticSharedWeights(
	(obs_normalizer): ObservationNormalizer(
	(running_mean_std): RunningMeanStdDictInPlace(
	(running_mean_std): ModuleDict(
	(obs): RunningMeanStdInPlace()
	)
	)
	)
	(returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace)
	(encoder): VizdoomEncoder(
	(basic_encoder): ConvEncoder(
	(enc): RecursiveScriptModule(
	original_name=ConvEncoderImpl
	(conv_head): RecursiveScriptModule(
	original_name=Sequential
	(0): RecursiveScriptModule(original_name=Conv2d)
	(1): RecursiveScriptModule(original_name=ELU)
	(2): RecursiveScriptModule(original_name=Conv2d)
	(3): RecursiveScriptModule(original_name=ELU)
	(4): RecursiveScriptModule(original_name=Conv2d)
	(5): RecursiveScriptModule(original_name=ELU)
	)
	(mlp_layers): RecursiveScriptModule(
	original_name=Sequential
	(0): RecursiveScriptModule(original_name=Linear)
	(1): RecursiveScriptModule(original_name=ELU)
	)
	)
	)
	)
	(core): ModelCoreRNN(
	(core): GRU(512, 512)
	)
	(decoder): MlpDecoder(
	(mlp): Identity()
	)
	(critic_linear): Linear(in_features=512, out_features=1, bias=True)
	(action_parameterization): ActionParameterizationDefault(
	(distribution_linear): Linear(in_features=512, out_features=5, bias=True)
	)
	)
	[2024-10-08 03:44:12,800][07008] Using optimizer <class 'torch.optim.adam.Adam'>
	[2024-10-08 03:44:12,800][07008] No checkpoints found
	[2024-10-08 03:44:12,801][07008] Did not load from checkpoint, starting from scratch!
	[2024-10-08 03:44:12,801][07008] Initialized policy 0 weights for model version 0
	[2024-10-08 03:44:12,803][07008] LearnerWorker_p0 finished initialization!
	[2024-10-08 03:44:12,803][07008] Using GPUs [0] for process 0 (actually maps to GPUs [0])
	[2024-10-08 03:44:12,878][07021] RunningMeanStd input shape: (3, 72, 128)
	[2024-10-08 03:44:12,879][07021] RunningMeanStd input shape: (1,)
	[2024-10-08 03:44:12,891][07021] ConvEncoder: input_channels=3
	[2024-10-08 03:44:12,995][07021] Conv encoder output size: 512
	[2024-10-08 03:44:12,995][07021] Policy head output size: 512
	[2024-10-08 03:44:14,748][04317] Inference worker 0-0 is ready!
	[2024-10-08 03:44:14,749][04317] All inference workers are ready! Signal rollout workers to start!
	[2024-10-08 03:44:14,763][07031] Doom resolution: 160x120, resize resolution: (128, 72)
	[2024-10-08 03:44:14,764][07032] Doom resolution: 160x120, resize resolution: (128, 72)
	[2024-10-08 03:44:14,764][07030] Doom resolution: 160x120, resize resolution: (128, 72)
	[2024-10-08 03:44:14,764][07028] Doom resolution: 160x120, resize resolution: (128, 72)
	[2024-10-08 03:44:14,768][07029] Doom resolution: 160x120, resize resolution: (128, 72)
	[2024-10-08 03:44:14,768][07027] Doom resolution: 160x120, resize resolution: (128, 72)
	[2024-10-08 03:44:14,769][07022] Doom resolution: 160x120, resize resolution: (128, 72)
	[2024-10-08 03:44:14,769][07044] Doom resolution: 160x120, resize resolution: (128, 72)
	[2024-10-08 03:44:15,090][07029] Decorrelating experience for 0 frames...
	[2024-10-08 03:44:15,090][07032] Decorrelating experience for 0 frames...
	[2024-10-08 03:44:15,090][07022] Decorrelating experience for 0 frames...
	[2024-10-08 03:44:15,091][07027] Decorrelating experience for 0 frames...
	[2024-10-08 03:44:15,091][07031] Decorrelating experience for 0 frames...
	[2024-10-08 03:44:15,091][07030] Decorrelating experience for 0 frames...
	[2024-10-08 03:44:15,341][07028] Decorrelating experience for 0 frames...
	[2024-10-08 03:44:15,354][07022] Decorrelating experience for 32 frames...
	[2024-10-08 03:44:15,355][07029] Decorrelating experience for 32 frames...
	[2024-10-08 03:44:15,357][07027] Decorrelating experience for 32 frames...
	[2024-10-08 03:44:15,358][07044] Decorrelating experience for 0 frames...
	[2024-10-08 03:44:15,505][04317] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
	[2024-10-08 03:44:15,590][07030] Decorrelating experience for 32 frames...
	[2024-10-08 03:44:15,610][07044] Decorrelating experience for 32 frames...
	[2024-10-08 03:44:15,612][07028] Decorrelating experience for 32 frames...
	[2024-10-08 03:44:15,643][07032] Decorrelating experience for 32 frames...
	[2024-10-08 03:44:15,671][07022] Decorrelating experience for 64 frames...
	[2024-10-08 03:44:15,767][07031] Decorrelating experience for 32 frames...
	[2024-10-08 03:44:15,788][07027] Decorrelating experience for 64 frames...
	[2024-10-08 03:44:15,883][07029] Decorrelating experience for 64 frames...
	[2024-10-08 03:44:15,917][07028] Decorrelating experience for 64 frames...
	[2024-10-08 03:44:15,919][07044] Decorrelating experience for 64 frames...
	[2024-10-08 03:44:15,962][07022] Decorrelating experience for 96 frames...
	[2024-10-08 03:44:16,082][07031] Decorrelating experience for 64 frames...
	[2024-10-08 03:44:16,099][07027] Decorrelating experience for 96 frames...
	[2024-10-08 03:44:16,160][07030] Decorrelating experience for 64 frames...
	[2024-10-08 03:44:16,185][07029] Decorrelating experience for 96 frames...
	[2024-10-08 03:44:16,377][07031] Decorrelating experience for 96 frames...
	[2024-10-08 03:44:16,417][07028] Decorrelating experience for 96 frames...
	[2024-10-08 03:44:16,446][07030] Decorrelating experience for 96 frames...
	[2024-10-08 03:44:16,472][07044] Decorrelating experience for 96 frames...
	[2024-10-08 03:44:16,727][07032] Decorrelating experience for 64 frames...
	[2024-10-08 03:44:17,117][07032] Decorrelating experience for 96 frames...
	[2024-10-08 03:44:17,315][07008] Signal inference workers to stop experience collection...
	[2024-10-08 03:44:17,330][07021] InferenceWorker_p0-w0: stopping experience collection
	[2024-10-08 03:44:18,345][07008] Signal inference workers to resume experience collection...
	[2024-10-08 03:44:18,346][07021] InferenceWorker_p0-w0: resuming experience collection
	[2024-10-08 03:44:20,505][04317] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6553.6). Total num frames: 32768. Throughput: 0: 592.8. Samples: 2964. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
	[2024-10-08 03:44:20,508][04317] Avg episode reward: [(0, '3.746')]
	[2024-10-08 03:44:20,783][07021] Updated weights for policy 0, policy_version 10 (0.0369)
	[2024-10-08 03:44:22,786][07021] Updated weights for policy 0, policy_version 20 (0.0012)
	[2024-10-08 03:44:24,901][07021] Updated weights for policy 0, policy_version 30 (0.0012)
	[2024-10-08 03:44:25,505][04317] Fps is (10 sec: 13516.8, 60 sec: 13516.8, 300 sec: 13516.8). Total num frames: 135168. Throughput: 0: 2542.6. Samples: 25426. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
	[2024-10-08 03:44:25,507][04317] Avg episode reward: [(0, '4.579')]
	[2024-10-08 03:44:25,515][07008] Saving new best policy, reward=4.579!
	[2024-10-08 03:44:26,869][07021] Updated weights for policy 0, policy_version 40 (0.0012)
	[2024-10-08 03:44:28,417][04317] Heartbeat connected on Batcher_0
	[2024-10-08 03:44:28,430][04317] Heartbeat connected on InferenceWorker_p0-w0
	[2024-10-08 03:44:28,432][04317] Heartbeat connected on RolloutWorker_w0
	[2024-10-08 03:44:28,436][04317] Heartbeat connected on RolloutWorker_w1
	[2024-10-08 03:44:28,440][04317] Heartbeat connected on RolloutWorker_w2
	[2024-10-08 03:44:28,443][04317] Heartbeat connected on RolloutWorker_w3
	[2024-10-08 03:44:28,446][04317] Heartbeat connected on LearnerWorker_p0
	[2024-10-08 03:44:28,448][04317] Heartbeat connected on RolloutWorker_w4
	[2024-10-08 03:44:28,452][04317] Heartbeat connected on RolloutWorker_w5
	[2024-10-08 03:44:28,454][04317] Heartbeat connected on RolloutWorker_w6
	[2024-10-08 03:44:28,459][04317] Heartbeat connected on RolloutWorker_w7
	[2024-10-08 03:44:28,843][07021] Updated weights for policy 0, policy_version 50 (0.0012)
	[2024-10-08 03:44:30,505][04317] Fps is (10 sec: 20480.0, 60 sec: 15837.9, 300 sec: 15837.9). Total num frames: 237568. Throughput: 0: 3761.3. Samples: 56420. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
	[2024-10-08 03:44:30,507][04317] Avg episode reward: [(0, '4.508')]
	[2024-10-08 03:44:30,856][07021] Updated weights for policy 0, policy_version 60 (0.0011)
	[2024-10-08 03:44:32,980][07021] Updated weights for policy 0, policy_version 70 (0.0012)
	[2024-10-08 03:44:35,035][07021] Updated weights for policy 0, policy_version 80 (0.0011)
	[2024-10-08 03:44:35,505][04317] Fps is (10 sec: 20070.1, 60 sec: 16793.5, 300 sec: 16793.5). Total num frames: 335872. Throughput: 0: 3565.4. Samples: 71308. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
	[2024-10-08 03:44:35,508][04317] Avg episode reward: [(0, '4.285')]
	[2024-10-08 03:44:37,035][07021] Updated weights for policy 0, policy_version 90 (0.0011)
	[2024-10-08 03:44:39,028][07021] Updated weights for policy 0, policy_version 100 (0.0011)
	[2024-10-08 03:44:40,505][04317] Fps is (10 sec: 20070.4, 60 sec: 17530.9, 300 sec: 17530.9). Total num frames: 438272. Throughput: 0: 4072.9. Samples: 101822. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
	[2024-10-08 03:44:40,508][04317] Avg episode reward: [(0, '4.472')]
	[2024-10-08 03:44:41,013][07021] Updated weights for policy 0, policy_version 110 (0.0011)
	[2024-10-08 03:44:42,979][07021] Updated weights for policy 0, policy_version 120 (0.0011)
	[2024-10-08 03:44:45,049][07021] Updated weights for policy 0, policy_version 130 (0.0011)
	[2024-10-08 03:44:45,505][04317] Fps is (10 sec: 20480.4, 60 sec: 18022.4, 300 sec: 18022.4). Total num frames: 540672. Throughput: 0: 4413.2. Samples: 132396. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
	[2024-10-08 03:44:45,507][04317] Avg episode reward: [(0, '4.779')]
	[2024-10-08 03:44:45,516][07008] Saving new best policy, reward=4.779!
	[2024-10-08 03:44:47,184][07021] Updated weights for policy 0, policy_version 140 (0.0012)
	[2024-10-08 03:44:49,265][07021] Updated weights for policy 0, policy_version 150 (0.0011)
	[2024-10-08 03:44:50,505][04317] Fps is (10 sec: 20070.4, 60 sec: 18256.5, 300 sec: 18256.5). Total num frames: 638976. Throughput: 0: 4196.3. Samples: 146872. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
	[2024-10-08 03:44:50,507][04317] Avg episode reward: [(0, '4.530')]
	[2024-10-08 03:44:51,250][07021] Updated weights for policy 0, policy_version 160 (0.0011)
	[2024-10-08 03:44:53,333][07021] Updated weights for policy 0, policy_version 170 (0.0012)
	[2024-10-08 03:44:55,487][07021] Updated weights for policy 0, policy_version 180 (0.0012)
	[2024-10-08 03:44:55,505][04317] Fps is (10 sec: 19660.7, 60 sec: 18432.0, 300 sec: 18432.0). Total num frames: 737280. Throughput: 0: 4420.9. Samples: 176836. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
	[2024-10-08 03:44:55,507][04317] Avg episode reward: [(0, '4.479')]
	[2024-10-08 03:44:57,496][07021] Updated weights for policy 0, policy_version 190 (0.0011)
	[2024-10-08 03:44:59,493][07021] Updated weights for policy 0, policy_version 200 (0.0011)
	[2024-10-08 03:45:00,505][04317] Fps is (10 sec: 19660.8, 60 sec: 18568.6, 300 sec: 18568.6). Total num frames: 835584. Throughput: 0: 4598.9. Samples: 206950. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
	[2024-10-08 03:45:00,507][04317] Avg episode reward: [(0, '4.344')]
	[2024-10-08 03:45:01,606][07021] Updated weights for policy 0, policy_version 210 (0.0011)
	[2024-10-08 03:45:03,666][07021] Updated weights for policy 0, policy_version 220 (0.0011)
	[2024-10-08 03:45:05,505][04317] Fps is (10 sec: 20070.3, 60 sec: 18759.7, 300 sec: 18759.7). Total num frames: 937984. Throughput: 0: 4857.5. Samples: 221550. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0)
	[2024-10-08 03:45:05,508][04317] Avg episode reward: [(0, '4.417')]
	[2024-10-08 03:45:05,651][07021] Updated weights for policy 0, policy_version 230 (0.0011)
	[2024-10-08 03:45:07,624][07021] Updated weights for policy 0, policy_version 240 (0.0011)
	[2024-10-08 03:45:09,616][07021] Updated weights for policy 0, policy_version 250 (0.0011)
	[2024-10-08 03:45:10,505][04317] Fps is (10 sec: 20480.2, 60 sec: 18916.1, 300 sec: 18916.1). Total num frames: 1040384. Throughput: 0: 5043.5. Samples: 252382. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
	[2024-10-08 03:45:10,507][04317] Avg episode reward: [(0, '4.859')]
	[2024-10-08 03:45:10,510][07008] Saving new best policy, reward=4.859!
	[2024-10-08 03:45:11,606][07021] Updated weights for policy 0, policy_version 260 (0.0011)
	[2024-10-08 03:45:13,636][07021] Updated weights for policy 0, policy_version 270 (0.0012)
	[2024-10-08 03:45:15,505][04317] Fps is (10 sec: 20070.4, 60 sec: 18978.1, 300 sec: 18978.1). Total num frames: 1138688. Throughput: 0: 5030.4. Samples: 282786. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
	[2024-10-08 03:45:15,508][04317] Avg episode reward: [(0, '4.682')]
	[2024-10-08 03:45:15,735][07021] Updated weights for policy 0, policy_version 280 (0.0012)
	[2024-10-08 03:45:17,787][07021] Updated weights for policy 0, policy_version 290 (0.0011)
	[2024-10-08 03:45:19,788][07021] Updated weights for policy 0, policy_version 300 (0.0011)
	[2024-10-08 03:45:20,505][04317] Fps is (10 sec: 20069.9, 60 sec: 20138.6, 300 sec: 19093.6). Total num frames: 1241088. Throughput: 0: 5030.0. Samples: 297656. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
	[2024-10-08 03:45:20,508][04317] Avg episode reward: [(0, '4.792')]
	[2024-10-08 03:45:21,798][07021] Updated weights for policy 0, policy_version 310 (0.0011)
	[2024-10-08 03:45:23,776][07021] Updated weights for policy 0, policy_version 320 (0.0011)
	[2024-10-08 03:45:25,505][04317] Fps is (10 sec: 20480.1, 60 sec: 20138.7, 300 sec: 19192.7). Total num frames: 1343488. Throughput: 0: 5035.3. Samples: 328412. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
	[2024-10-08 03:45:25,507][04317] Avg episode reward: [(0, '4.972')]
	[2024-10-08 03:45:25,515][07008] Saving new best policy, reward=4.972!
	[2024-10-08 03:45:25,823][07021] Updated weights for policy 0, policy_version 330 (0.0012)
	[2024-10-08 03:45:27,865][07021] Updated weights for policy 0, policy_version 340 (0.0012)
	[2024-10-08 03:45:29,985][07021] Updated weights for policy 0, policy_version 350 (0.0011)
	[2024-10-08 03:45:30,505][04317] Fps is (10 sec: 20070.6, 60 sec: 20070.4, 300 sec: 19223.9). Total num frames: 1441792. Throughput: 0: 5016.9. Samples: 358158. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
	[2024-10-08 03:45:30,507][04317] Avg episode reward: [(0, '4.786')]
	[2024-10-08 03:45:32,006][07021] Updated weights for policy 0, policy_version 360 (0.0011)
	[2024-10-08 03:45:34,012][07021] Updated weights for policy 0, policy_version 370 (0.0011)
	[2024-10-08 03:45:35,505][04317] Fps is (10 sec: 20070.4, 60 sec: 20138.7, 300 sec: 19302.4). Total num frames: 1544192. Throughput: 0: 5031.2. Samples: 373278. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
	[2024-10-08 03:45:35,507][04317] Avg episode reward: [(0, '4.970')]
	[2024-10-08 03:45:36,002][07021] Updated weights for policy 0, policy_version 380 (0.0011)
	[2024-10-08 03:45:37,997][07021] Updated weights for policy 0, policy_version 390 (0.0011)
	[2024-10-08 03:45:39,995][07021] Updated weights for policy 0, policy_version 400 (0.0011)
	[2024-10-08 03:45:40,505][04317] Fps is (10 sec: 20480.0, 60 sec: 20138.7, 300 sec: 19371.7). Total num frames: 1646592. Throughput: 0: 5049.9. Samples: 404080. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
	[2024-10-08 03:45:40,507][04317] Avg episode reward: [(0, '5.066')]
	[2024-10-08 03:45:40,509][07008] Saving new best policy, reward=5.066!
	[2024-10-08 03:45:42,046][07021] Updated weights for policy 0, policy_version 410 (0.0011)
	[2024-10-08 03:45:44,122][07021] Updated weights for policy 0, policy_version 420 (0.0011)
	[2024-10-08 03:45:45,505][04317] Fps is (10 sec: 20070.4, 60 sec: 20070.4, 300 sec: 19387.7). Total num frames: 1744896. Throughput: 0: 5047.1. Samples: 434070. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0)
	[2024-10-08 03:45:45,508][04317] Avg episode reward: [(0, '5.450')]
	[2024-10-08 03:45:45,515][07008] Saving new best policy, reward=5.450!
	[2024-10-08 03:45:46,159][07021] Updated weights for policy 0, policy_version 430 (0.0011)
	[2024-10-08 03:45:48,154][07021] Updated weights for policy 0, policy_version 440 (0.0011)
	[2024-10-08 03:45:50,136][07021] Updated weights for policy 0, policy_version 450 (0.0011)
	[2024-10-08 03:45:50,505][04317] Fps is (10 sec: 20070.6, 60 sec: 20138.7, 300 sec: 19445.2). Total num frames: 1847296. Throughput: 0: 5062.3. Samples: 449354. Policy #0 lag: (min: 0.0, avg: 0.8, max: 1.0)
	[2024-10-08 03:45:50,507][04317] Avg episode reward: [(0, '7.312')]
	[2024-10-08 03:45:50,534][07008] Saving new best policy, reward=7.312!
	[2024-10-08 03:45:52,122][07021] Updated weights for policy 0, policy_version 460 (0.0011)
	[2024-10-08 03:45:54,103][07021] Updated weights for policy 0, policy_version 470 (0.0011)
	[2024-10-08 03:45:55,505][04317] Fps is (10 sec: 20889.5, 60 sec: 20275.2, 300 sec: 19537.9). Total num frames: 1953792. Throughput: 0: 5064.3. Samples: 480278. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
	[2024-10-08 03:45:55,508][04317] Avg episode reward: [(0, '8.058')]
	[2024-10-08 03:45:55,515][07008] Saving new best policy, reward=8.058!
	[2024-10-08 03:45:56,098][07021] Updated weights for policy 0, policy_version 480 (0.0011)
	[2024-10-08 03:45:58,150][07021] Updated weights for policy 0, policy_version 490 (0.0012)
	[2024-10-08 03:46:00,178][07021] Updated weights for policy 0, policy_version 500 (0.0011)
	[2024-10-08 03:46:00,505][04317] Fps is (10 sec: 20479.8, 60 sec: 20275.2, 300 sec: 19543.8). Total num frames: 2052096. Throughput: 0: 5058.8. Samples: 510434. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2024-10-08 03:46:00,508][04317] Avg episode reward: [(0, '8.725')]
	[2024-10-08 03:46:00,510][07008] Saving new best policy, reward=8.725!
	[2024-10-08 03:46:02,167][07021] Updated weights for policy 0, policy_version 510 (0.0011)
	[2024-10-08 03:46:04,129][07021] Updated weights for policy 0, policy_version 520 (0.0011)
	[2024-10-08 03:46:05,505][04317] Fps is (10 sec: 20480.0, 60 sec: 20343.5, 300 sec: 19623.6). Total num frames: 2158592. Throughput: 0: 5072.6. Samples: 525922. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
	[2024-10-08 03:46:05,508][04317] Avg episode reward: [(0, '10.800')]
	[2024-10-08 03:46:05,515][07008] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000527_2158592.pth...
	[2024-10-08 03:46:05,584][07008] Saving new best policy, reward=10.800!
	[2024-10-08 03:46:06,099][07021] Updated weights for policy 0, policy_version 530 (0.0011)
	[2024-10-08 03:46:08,054][07021] Updated weights for policy 0, policy_version 540 (0.0012)
	[2024-10-08 03:46:10,077][07021] Updated weights for policy 0, policy_version 550 (0.0011)
	[2024-10-08 03:46:10,505][04317] Fps is (10 sec: 20889.7, 60 sec: 20343.4, 300 sec: 19660.8). Total num frames: 2260992. Throughput: 0: 5081.2. Samples: 557066. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
	[2024-10-08 03:46:10,508][04317] Avg episode reward: [(0, '10.034')]
	[2024-10-08 03:46:12,195][07021] Updated weights for policy 0, policy_version 560 (0.0011)
	[2024-10-08 03:46:14,225][07021] Updated weights for policy 0, policy_version 570 (0.0011)
	[2024-10-08 03:46:15,505][04317] Fps is (10 sec: 20070.4, 60 sec: 20343.5, 300 sec: 19660.8). Total num frames: 2359296. Throughput: 0: 5086.8. Samples: 587064. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2024-10-08 03:46:15,508][04317] Avg episode reward: [(0, '12.551')]
	[2024-10-08 03:46:15,515][07008] Saving new best policy, reward=12.551!
	[2024-10-08 03:46:16,207][07021] Updated weights for policy 0, policy_version 580 (0.0011)
	[2024-10-08 03:46:18,183][07021] Updated weights for policy 0, policy_version 590 (0.0011)
	[2024-10-08 03:46:20,143][07021] Updated weights for policy 0, policy_version 600 (0.0011)
	[2024-10-08 03:46:20,505][04317] Fps is (10 sec: 20070.4, 60 sec: 20343.5, 300 sec: 19693.6). Total num frames: 2461696. Throughput: 0: 5093.4. Samples: 602482. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
	[2024-10-08 03:46:20,508][04317] Avg episode reward: [(0, '13.357')]
	[2024-10-08 03:46:20,534][07008] Saving new best policy, reward=13.357!
	[2024-10-08 03:46:22,131][07021] Updated weights for policy 0, policy_version 610 (0.0011)
	[2024-10-08 03:46:24,133][07021] Updated weights for policy 0, policy_version 620 (0.0011)
	[2024-10-08 03:46:25,505][04317] Fps is (10 sec: 20480.2, 60 sec: 20343.5, 300 sec: 19723.8). Total num frames: 2564096. Throughput: 0: 5098.9. Samples: 633530. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2024-10-08 03:46:25,510][04317] Avg episode reward: [(0, '14.219')]
	[2024-10-08 03:46:25,519][07008] Saving new best policy, reward=14.219!
	[2024-10-08 03:46:26,237][07021] Updated weights for policy 0, policy_version 630 (0.0011)
	[2024-10-08 03:46:28,266][07021] Updated weights for policy 0, policy_version 640 (0.0011)
	[2024-10-08 03:46:30,247][07021] Updated weights for policy 0, policy_version 650 (0.0011)
	[2024-10-08 03:46:30,505][04317] Fps is (10 sec: 20479.9, 60 sec: 20411.7, 300 sec: 19751.8). Total num frames: 2666496. Throughput: 0: 5101.1. Samples: 663618. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
	[2024-10-08 03:46:30,508][04317] Avg episode reward: [(0, '14.796')]
	[2024-10-08 03:46:30,510][07008] Saving new best policy, reward=14.796!
	[2024-10-08 03:46:32,196][07021] Updated weights for policy 0, policy_version 660 (0.0011)
	[2024-10-08 03:46:34,172][07021] Updated weights for policy 0, policy_version 670 (0.0012)
	[2024-10-08 03:46:35,505][04317] Fps is (10 sec: 20479.9, 60 sec: 20411.7, 300 sec: 19777.8). Total num frames: 2768896. Throughput: 0: 5110.9. Samples: 679344. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
	[2024-10-08 03:46:35,507][04317] Avg episode reward: [(0, '16.917')]
	[2024-10-08 03:46:35,515][07008] Saving new best policy, reward=16.917!
	[2024-10-08 03:46:36,141][07021] Updated weights for policy 0, policy_version 680 (0.0011)
	[2024-10-08 03:46:38,172][07021] Updated weights for policy 0, policy_version 690 (0.0011)
	[2024-10-08 03:46:40,264][07021] Updated weights for policy 0, policy_version 700 (0.0011)
	[2024-10-08 03:46:40,505][04317] Fps is (10 sec: 20480.0, 60 sec: 20411.7, 300 sec: 19802.0). Total num frames: 2871296. Throughput: 0: 5107.3. Samples: 710108. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2024-10-08 03:46:40,508][04317] Avg episode reward: [(0, '17.158')]
	[2024-10-08 03:46:40,510][07008] Saving new best policy, reward=17.158!
	[2024-10-08 03:46:42,273][07021] Updated weights for policy 0, policy_version 710 (0.0011)
	[2024-10-08 03:46:44,254][07021] Updated weights for policy 0, policy_version 720 (0.0011)
	[2024-10-08 03:46:45,505][04317] Fps is (10 sec: 20480.1, 60 sec: 20480.0, 300 sec: 19824.7). Total num frames: 2973696. Throughput: 0: 5114.0. Samples: 740564. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2024-10-08 03:46:45,507][04317] Avg episode reward: [(0, '17.717')]
	[2024-10-08 03:46:45,516][07008] Saving new best policy, reward=17.717!
	[2024-10-08 03:46:46,237][07021] Updated weights for policy 0, policy_version 730 (0.0011)
	[2024-10-08 03:46:48,181][07021] Updated weights for policy 0, policy_version 740 (0.0011)
	[2024-10-08 03:46:50,161][07021] Updated weights for policy 0, policy_version 750 (0.0011)
	[2024-10-08 03:46:50,505][04317] Fps is (10 sec: 20480.0, 60 sec: 20480.0, 300 sec: 19845.8). Total num frames: 3076096. Throughput: 0: 5115.9. Samples: 756138. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2024-10-08 03:46:50,508][04317] Avg episode reward: [(0, '17.503')]
	[2024-10-08 03:46:52,150][07021] Updated weights for policy 0, policy_version 760 (0.0011)
	[2024-10-08 03:46:54,257][07021] Updated weights for policy 0, policy_version 770 (0.0012)
	[2024-10-08 03:46:55,505][04317] Fps is (10 sec: 20070.2, 60 sec: 20343.5, 300 sec: 19840.0). Total num frames: 3174400. Throughput: 0: 5103.7. Samples: 786734. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
	[2024-10-08 03:46:55,507][04317] Avg episode reward: [(0, '20.499')]
	[2024-10-08 03:46:55,515][07008] Saving new best policy, reward=20.499!
	[2024-10-08 03:46:56,298][07021] Updated weights for policy 0, policy_version 780 (0.0011)
	[2024-10-08 03:46:58,263][07021] Updated weights for policy 0, policy_version 790 (0.0012)
	[2024-10-08 03:47:00,228][07021] Updated weights for policy 0, policy_version 800 (0.0011)
	[2024-10-08 03:47:00,505][04317] Fps is (10 sec: 20480.0, 60 sec: 20480.0, 300 sec: 19884.2). Total num frames: 3280896. Throughput: 0: 5120.5. Samples: 817488. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
	[2024-10-08 03:47:00,508][04317] Avg episode reward: [(0, '20.218')]
	[2024-10-08 03:47:02,224][07021] Updated weights for policy 0, policy_version 810 (0.0011)
	[2024-10-08 03:47:04,180][07021] Updated weights for policy 0, policy_version 820 (0.0011)
	[2024-10-08 03:47:05,505][04317] Fps is (10 sec: 20889.7, 60 sec: 20411.7, 300 sec: 19901.7). Total num frames: 3383296. Throughput: 0: 5123.0. Samples: 833016. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0)
	[2024-10-08 03:47:05,507][04317] Avg episode reward: [(0, '17.323')]
	[2024-10-08 03:47:06,164][07021] Updated weights for policy 0, policy_version 830 (0.0011)
	[2024-10-08 03:47:08,220][07021] Updated weights for policy 0, policy_version 840 (0.0011)
	[2024-10-08 03:47:10,279][07021] Updated weights for policy 0, policy_version 850 (0.0011)
	[2024-10-08 03:47:10,505][04317] Fps is (10 sec: 20479.9, 60 sec: 20411.7, 300 sec: 19918.3). Total num frames: 3485696. Throughput: 0: 5108.6. Samples: 863418. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
	[2024-10-08 03:47:10,507][04317] Avg episode reward: [(0, '21.221')]
	[2024-10-08 03:47:10,510][07008] Saving new best policy, reward=21.221!
	[2024-10-08 03:47:12,259][07021] Updated weights for policy 0, policy_version 860 (0.0011)
	[2024-10-08 03:47:14,225][07021] Updated weights for policy 0, policy_version 870 (0.0011)
	[2024-10-08 03:47:15,505][04317] Fps is (10 sec: 20480.0, 60 sec: 20480.0, 300 sec: 19933.9). Total num frames: 3588096. Throughput: 0: 5129.9. Samples: 894462. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
	[2024-10-08 03:47:15,508][04317] Avg episode reward: [(0, '21.242')]
	[2024-10-08 03:47:15,516][07008] Saving new best policy, reward=21.242!
	[2024-10-08 03:47:16,208][07021] Updated weights for policy 0, policy_version 880 (0.0012)
	[2024-10-08 03:47:18,209][07021] Updated weights for policy 0, policy_version 890 (0.0011)
	[2024-10-08 03:47:20,203][07021] Updated weights for policy 0, policy_version 900 (0.0011)
	[2024-10-08 03:47:20,505][04317] Fps is (10 sec: 20480.0, 60 sec: 20480.0, 300 sec: 19948.6). Total num frames: 3690496. Throughput: 0: 5124.3. Samples: 909936. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
	[2024-10-08 03:47:20,507][04317] Avg episode reward: [(0, '21.015')]
	[2024-10-08 03:47:22,270][07021] Updated weights for policy 0, policy_version 910 (0.0011)
	[2024-10-08 03:47:24,295][07021] Updated weights for policy 0, policy_version 920 (0.0012)
	[2024-10-08 03:47:25,505][04317] Fps is (10 sec: 20479.9, 60 sec: 20480.0, 300 sec: 19962.6). Total num frames: 3792896. Throughput: 0: 5107.0. Samples: 939922. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
	[2024-10-08 03:47:25,508][04317] Avg episode reward: [(0, '22.854')]
	[2024-10-08 03:47:25,516][07008] Saving new best policy, reward=22.854!
	[2024-10-08 03:47:26,256][07021] Updated weights for policy 0, policy_version 930 (0.0012)
	[2024-10-08 03:47:28,270][07021] Updated weights for policy 0, policy_version 940 (0.0011)
	[2024-10-08 03:47:30,224][07021] Updated weights for policy 0, policy_version 950 (0.0012)
	[2024-10-08 03:47:30,505][04317] Fps is (10 sec: 20479.9, 60 sec: 20480.0, 300 sec: 19975.9). Total num frames: 3895296. Throughput: 0: 5121.6. Samples: 971038. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
	[2024-10-08 03:47:30,508][04317] Avg episode reward: [(0, '20.259')]
	[2024-10-08 03:47:32,193][07021] Updated weights for policy 0, policy_version 960 (0.0010)
	[2024-10-08 03:47:34,180][07021] Updated weights for policy 0, policy_version 970 (0.0011)
	[2024-10-08 03:47:35,505][04317] Fps is (10 sec: 20480.2, 60 sec: 20480.0, 300 sec: 19988.5). Total num frames: 3997696. Throughput: 0: 5123.8. Samples: 986708. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
	[2024-10-08 03:47:35,508][04317] Avg episode reward: [(0, '19.869')]
	[2024-10-08 03:47:35,855][07008] Stopping Batcher_0...
	[2024-10-08 03:47:35,855][07008] Loop batcher_evt_loop terminating...
	[2024-10-08 03:47:35,856][07008] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
	[2024-10-08 03:47:35,855][04317] Component Batcher_0 stopped!
	[2024-10-08 03:47:35,871][07021] Weights refcount: 2 0
	[2024-10-08 03:47:35,873][07021] Stopping InferenceWorker_p0-w0...
	[2024-10-08 03:47:35,873][07021] Loop inference_proc0-0_evt_loop terminating...
	[2024-10-08 03:47:35,873][04317] Component InferenceWorker_p0-w0 stopped!
	[2024-10-08 03:47:35,911][07044] Stopping RolloutWorker_w7...
	[2024-10-08 03:47:35,912][07044] Loop rollout_proc7_evt_loop terminating...
	[2024-10-08 03:47:35,912][04317] Component RolloutWorker_w7 stopped!
	[2024-10-08 03:47:35,917][07031] Stopping RolloutWorker_w5...
	[2024-10-08 03:47:35,918][07031] Loop rollout_proc5_evt_loop terminating...
	[2024-10-08 03:47:35,918][07027] Stopping RolloutWorker_w0...
	[2024-10-08 03:47:35,918][07029] Stopping RolloutWorker_w2...
	[2024-10-08 03:47:35,918][07027] Loop rollout_proc0_evt_loop terminating...
	[2024-10-08 03:47:35,918][07029] Loop rollout_proc2_evt_loop terminating...
	[2024-10-08 03:47:35,917][04317] Component RolloutWorker_w5 stopped!
	[2024-10-08 03:47:35,919][07022] Stopping RolloutWorker_w1...
	[2024-10-08 03:47:35,920][07022] Loop rollout_proc1_evt_loop terminating...
	[2024-10-08 03:47:35,919][04317] Component RolloutWorker_w0 stopped!
	[2024-10-08 03:47:35,921][07030] Stopping RolloutWorker_w4...
	[2024-10-08 03:47:35,921][07032] Stopping RolloutWorker_w6...
	[2024-10-08 03:47:35,922][07030] Loop rollout_proc4_evt_loop terminating...
	[2024-10-08 03:47:35,922][07032] Loop rollout_proc6_evt_loop terminating...
	[2024-10-08 03:47:35,923][07028] Stopping RolloutWorker_w3...
	[2024-10-08 03:47:35,921][04317] Component RolloutWorker_w2 stopped!
	[2024-10-08 03:47:35,923][07028] Loop rollout_proc3_evt_loop terminating...
	[2024-10-08 03:47:35,923][04317] Component RolloutWorker_w1 stopped!
	[2024-10-08 03:47:35,925][04317] Component RolloutWorker_w4 stopped!
	[2024-10-08 03:47:35,926][04317] Component RolloutWorker_w6 stopped!
	[2024-10-08 03:47:35,928][04317] Component RolloutWorker_w3 stopped!
	[2024-10-08 03:47:35,938][07008] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
	[2024-10-08 03:47:36,053][07008] Stopping LearnerWorker_p0...
	[2024-10-08 03:47:36,054][07008] Loop learner_proc0_evt_loop terminating...
	[2024-10-08 03:47:36,053][04317] Component LearnerWorker_p0 stopped!
	[2024-10-08 03:47:36,056][04317] Waiting for process learner_proc0 to stop...
	[2024-10-08 03:47:36,785][04317] Waiting for process inference_proc0-0 to join...
	[2024-10-08 03:47:36,788][04317] Waiting for process rollout_proc0 to join...
	[2024-10-08 03:47:36,790][04317] Waiting for process rollout_proc1 to join...
	[2024-10-08 03:47:36,792][04317] Waiting for process rollout_proc2 to join...
	[2024-10-08 03:47:36,794][04317] Waiting for process rollout_proc3 to join...
	[2024-10-08 03:47:36,796][04317] Waiting for process rollout_proc4 to join...
	[2024-10-08 03:47:36,798][04317] Waiting for process rollout_proc5 to join...
	[2024-10-08 03:47:36,800][04317] Waiting for process rollout_proc6 to join...
	[2024-10-08 03:47:36,801][04317] Waiting for process rollout_proc7 to join...
	[2024-10-08 03:47:36,803][04317] Batcher 0 profile tree view:
	batching: 15.9929, releasing_batches: 0.0206
	[2024-10-08 03:47:36,804][04317] InferenceWorker_p0-w0 profile tree view:
	wait_policy: 0.0000
	wait_policy_total: 3.7280
	update_model: 3.1113
	weight_update: 0.0011
	one_step: 0.0023
	handle_policy_step: 183.7676
	deserialize: 7.5591, stack: 1.2014, obs_to_device_normalize: 44.3436, forward: 85.3254, send_messages: 13.4925
	prepare_outputs: 23.7374
	to_cpu: 15.2000
	[2024-10-08 03:47:36,807][04317] Learner 0 profile tree view:
	misc: 0.0049, prepare_batch: 8.9619
	train: 19.3725
	epoch_init: 0.0057, minibatch_init: 0.0063, losses_postprocess: 0.3112, kl_divergence: 0.4220, after_optimizer: 1.4131
	calculate_losses: 7.8730
	losses_init: 0.0033, forward_head: 0.9183, bptt_initial: 3.5001, tail: 0.6437, advantages_returns: 0.1728, losses: 1.0377
	bptt: 1.4136
	bptt_forward_core: 1.3582
	update: 8.9947
	clip: 1.1159
	[2024-10-08 03:47:36,808][04317] RolloutWorker_w0 profile tree view:
	wait_for_trajectories: 0.1452, enqueue_policy_requests: 7.6561, env_step: 125.3836, overhead: 6.2686, complete_rollouts: 0.2347
	save_policy_outputs: 10.3628
	split_output_tensors: 3.5845
	[2024-10-08 03:47:36,809][04317] RolloutWorker_w7 profile tree view:
	wait_for_trajectories: 0.1455, enqueue_policy_requests: 7.6012, env_step: 125.4226, overhead: 6.2732, complete_rollouts: 0.2301
	save_policy_outputs: 10.2584
	split_output_tensors: 3.5595
	[2024-10-08 03:47:36,815][04317] Loop Runner_EvtLoop terminating...
	[2024-10-08 03:47:36,816][04317] Runner profile tree view:
	main_loop: 208.3583
	[2024-10-08 03:47:36,817][04317] Collected {0: 4005888}, FPS: 19226.0
	[2024-10-08 03:47:47,529][04317] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
	[2024-10-08 03:47:47,531][04317] Overriding arg 'num_workers' with value 1 passed from command line
	[2024-10-08 03:47:47,532][04317] Adding new argument 'no_render'=True that is not in the saved config file!
	[2024-10-08 03:47:47,535][04317] Adding new argument 'save_video'=True that is not in the saved config file!
	[2024-10-08 03:47:47,536][04317] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
	[2024-10-08 03:47:47,537][04317] Adding new argument 'video_name'=None that is not in the saved config file!
	[2024-10-08 03:47:47,539][04317] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file!
	[2024-10-08 03:47:47,540][04317] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
	[2024-10-08 03:47:47,541][04317] Adding new argument 'push_to_hub'=False that is not in the saved config file!
	[2024-10-08 03:47:47,542][04317] Adding new argument 'hf_repository'=None that is not in the saved config file!
	[2024-10-08 03:47:47,544][04317] Adding new argument 'policy_index'=0 that is not in the saved config file!
	[2024-10-08 03:47:47,546][04317] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
	[2024-10-08 03:47:47,547][04317] Adding new argument 'train_script'=None that is not in the saved config file!
	[2024-10-08 03:47:47,548][04317] Adding new argument 'enjoy_script'=None that is not in the saved config file!
	[2024-10-08 03:47:47,550][04317] Using frameskip 1 and render_action_repeat=4 for evaluation
	[2024-10-08 03:47:47,562][04317] Doom resolution: 160x120, resize resolution: (128, 72)
	[2024-10-08 03:47:47,565][04317] RunningMeanStd input shape: (3, 72, 128)
	[2024-10-08 03:47:47,567][04317] RunningMeanStd input shape: (1,)
	[2024-10-08 03:47:47,581][04317] ConvEncoder: input_channels=3
	[2024-10-08 03:47:47,715][04317] Conv encoder output size: 512
	[2024-10-08 03:47:47,717][04317] Policy head output size: 512
	[2024-10-08 03:47:49,581][04317] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
	[2024-10-08 03:47:50,495][04317] Num frames 100...
	[2024-10-08 03:47:50,611][04317] Num frames 200...
	[2024-10-08 03:47:50,725][04317] Num frames 300...
	[2024-10-08 03:47:50,842][04317] Num frames 400...
	[2024-10-08 03:47:50,955][04317] Num frames 500...
	[2024-10-08 03:47:51,097][04317] Avg episode rewards: #0: 10.760, true rewards: #0: 5.760
	[2024-10-08 03:47:51,098][04317] Avg episode reward: 10.760, avg true_objective: 5.760
	[2024-10-08 03:47:51,129][04317] Num frames 600...
	[2024-10-08 03:47:51,241][04317] Num frames 700...
	[2024-10-08 03:47:51,365][04317] Num frames 800...
	[2024-10-08 03:47:51,480][04317] Num frames 900...
	[2024-10-08 03:47:51,597][04317] Num frames 1000...
	[2024-10-08 03:47:51,711][04317] Num frames 1100...
	[2024-10-08 03:47:51,827][04317] Num frames 1200...
	[2024-10-08 03:47:51,942][04317] Num frames 1300...
	[2024-10-08 03:47:52,056][04317] Num frames 1400...
	[2024-10-08 03:47:52,171][04317] Num frames 1500...
	[2024-10-08 03:47:52,288][04317] Num frames 1600...
	[2024-10-08 03:47:52,406][04317] Num frames 1700...
	[2024-10-08 03:47:52,523][04317] Num frames 1800...
	[2024-10-08 03:47:52,641][04317] Num frames 1900...
	[2024-10-08 03:47:52,736][04317] Avg episode rewards: #0: 21.670, true rewards: #0: 9.670
	[2024-10-08 03:47:52,737][04317] Avg episode reward: 21.670, avg true_objective: 9.670
	[2024-10-08 03:47:52,814][04317] Num frames 2000...
	[2024-10-08 03:47:52,926][04317] Num frames 2100...
	[2024-10-08 03:47:53,083][04317] Avg episode rewards: #0: 15.300, true rewards: #0: 7.300
	[2024-10-08 03:47:53,085][04317] Avg episode reward: 15.300, avg true_objective: 7.300
	[2024-10-08 03:47:53,098][04317] Num frames 2200...
	[2024-10-08 03:47:53,210][04317] Num frames 2300...
	[2024-10-08 03:47:53,327][04317] Num frames 2400...
	[2024-10-08 03:47:53,440][04317] Num frames 2500...
	[2024-10-08 03:47:53,558][04317] Num frames 2600...
	[2024-10-08 03:47:53,674][04317] Num frames 2700...
	[2024-10-08 03:47:53,787][04317] Num frames 2800...
	[2024-10-08 03:47:53,899][04317] Num frames 2900...
	[2024-10-08 03:47:54,016][04317] Num frames 3000...
	[2024-10-08 03:47:54,132][04317] Num frames 3100...
	[2024-10-08 03:47:54,247][04317] Num frames 3200...
	[2024-10-08 03:47:54,364][04317] Num frames 3300...
	[2024-10-08 03:47:54,481][04317] Num frames 3400...
	[2024-10-08 03:47:54,599][04317] Num frames 3500...
	[2024-10-08 03:47:54,714][04317] Num frames 3600...
	[2024-10-08 03:47:54,829][04317] Num frames 3700...
	[2024-10-08 03:47:54,944][04317] Num frames 3800...
	[2024-10-08 03:47:55,062][04317] Num frames 3900...
	[2024-10-08 03:47:55,175][04317] Num frames 4000...
	[2024-10-08 03:47:55,292][04317] Num frames 4100...
	[2024-10-08 03:47:55,407][04317] Num frames 4200...
	[2024-10-08 03:47:55,564][04317] Avg episode rewards: #0: 25.725, true rewards: #0: 10.725
	[2024-10-08 03:47:55,565][04317] Avg episode reward: 25.725, avg true_objective: 10.725
	[2024-10-08 03:47:55,578][04317] Num frames 4300...
	[2024-10-08 03:47:55,690][04317] Num frames 4400...
	[2024-10-08 03:47:55,803][04317] Num frames 4500...
	[2024-10-08 03:47:55,916][04317] Num frames 4600...
	[2024-10-08 03:47:56,032][04317] Num frames 4700...
	[2024-10-08 03:47:56,132][04317] Avg episode rewards: #0: 22.076, true rewards: #0: 9.476
	[2024-10-08 03:47:56,133][04317] Avg episode reward: 22.076, avg true_objective: 9.476
	[2024-10-08 03:47:56,206][04317] Num frames 4800...
	[2024-10-08 03:47:56,323][04317] Num frames 4900...
	[2024-10-08 03:47:56,441][04317] Num frames 5000...
	[2024-10-08 03:47:56,599][04317] Avg episode rewards: #0: 19.150, true rewards: #0: 8.483
	[2024-10-08 03:47:56,600][04317] Avg episode reward: 19.150, avg true_objective: 8.483
	[2024-10-08 03:47:56,615][04317] Num frames 5100...
	[2024-10-08 03:47:56,729][04317] Num frames 5200...
	[2024-10-08 03:47:56,842][04317] Num frames 5300...
	[2024-10-08 03:47:56,955][04317] Num frames 5400...
	[2024-10-08 03:47:57,072][04317] Num frames 5500...
	[2024-10-08 03:47:57,184][04317] Num frames 5600...
	[2024-10-08 03:47:57,316][04317] Avg episode rewards: #0: 17.951, true rewards: #0: 8.094
	[2024-10-08 03:47:57,317][04317] Avg episode reward: 17.951, avg true_objective: 8.094
	[2024-10-08 03:47:57,359][04317] Num frames 5700...
	[2024-10-08 03:47:57,474][04317] Num frames 5800...
	[2024-10-08 03:47:57,589][04317] Num frames 5900...
	[2024-10-08 03:47:57,703][04317] Num frames 6000...
	[2024-10-08 03:47:57,818][04317] Num frames 6100...
	[2024-10-08 03:47:57,933][04317] Num frames 6200...
	[2024-10-08 03:47:58,036][04317] Avg episode rewards: #0: 16.802, true rewards: #0: 7.802
	[2024-10-08 03:47:58,037][04317] Avg episode reward: 16.802, avg true_objective: 7.802
	[2024-10-08 03:47:58,104][04317] Num frames 6300...
	[2024-10-08 03:47:58,220][04317] Num frames 6400...
	[2024-10-08 03:47:58,341][04317] Num frames 6500...
	[2024-10-08 03:47:58,459][04317] Num frames 6600...
	[2024-10-08 03:47:58,575][04317] Num frames 6700...
	[2024-10-08 03:47:58,691][04317] Num frames 6800...
	[2024-10-08 03:47:58,813][04317] Num frames 6900...
	[2024-10-08 03:47:58,939][04317] Num frames 7000...
	[2024-10-08 03:47:59,068][04317] Num frames 7100...
	[2024-10-08 03:47:59,188][04317] Num frames 7200...
	[2024-10-08 03:47:59,309][04317] Num frames 7300...
	[2024-10-08 03:47:59,431][04317] Num frames 7400...
	[2024-10-08 03:47:59,550][04317] Num frames 7500...
	[2024-10-08 03:47:59,667][04317] Num frames 7600...
	[2024-10-08 03:47:59,791][04317] Num frames 7700...
	[2024-10-08 03:47:59,955][04317] Avg episode rewards: #0: 19.323, true rewards: #0: 8.657
	[2024-10-08 03:47:59,957][04317] Avg episode reward: 19.323, avg true_objective: 8.657
	[2024-10-08 03:47:59,970][04317] Num frames 7800...
	[2024-10-08 03:48:00,083][04317] Num frames 7900...
	[2024-10-08 03:48:00,195][04317] Num frames 8000...
	[2024-10-08 03:48:00,311][04317] Num frames 8100...
	[2024-10-08 03:48:00,429][04317] Num frames 8200...
	[2024-10-08 03:48:00,546][04317] Num frames 8300...
	[2024-10-08 03:48:00,662][04317] Num frames 8400...
	[2024-10-08 03:48:00,780][04317] Num frames 8500...
	[2024-10-08 03:48:00,898][04317] Num frames 8600...
	[2024-10-08 03:48:01,015][04317] Num frames 8700...
	[2024-10-08 03:48:01,132][04317] Num frames 8800...
	[2024-10-08 03:48:01,251][04317] Num frames 8900...
	[2024-10-08 03:48:01,376][04317] Num frames 9000...
	[2024-10-08 03:48:01,497][04317] Num frames 9100...
	[2024-10-08 03:48:01,620][04317] Num frames 9200...
	[2024-10-08 03:48:01,745][04317] Num frames 9300...
	[2024-10-08 03:48:01,864][04317] Num frames 9400...
	[2024-10-08 03:48:01,985][04317] Avg episode rewards: #0: 21.455, true rewards: #0: 9.455
	[2024-10-08 03:48:01,986][04317] Avg episode reward: 21.455, avg true_objective: 9.455
	[2024-10-08 03:48:24,521][04317] Replay video saved to /content/train_dir/default_experiment/replay.mp4!
	[2024-10-08 03:50:12,684][04317] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
	[2024-10-08 03:50:12,686][04317] Overriding arg 'num_workers' with value 1 passed from command line
	[2024-10-08 03:50:12,687][04317] Adding new argument 'no_render'=True that is not in the saved config file!
	[2024-10-08 03:50:12,689][04317] Adding new argument 'save_video'=True that is not in the saved config file!
	[2024-10-08 03:50:12,691][04317] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
	[2024-10-08 03:50:12,692][04317] Adding new argument 'video_name'=None that is not in the saved config file!
	[2024-10-08 03:50:12,694][04317] Adding new argument 'max_num_frames'=100000 that is not in the saved config file!
	[2024-10-08 03:50:12,695][04317] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
	[2024-10-08 03:50:12,697][04317] Adding new argument 'push_to_hub'=True that is not in the saved config file!
	[2024-10-08 03:50:12,699][04317] Adding new argument 'hf_repository'='EntropicLettuce/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file!
	[2024-10-08 03:50:12,700][04317] Adding new argument 'policy_index'=0 that is not in the saved config file!
	[2024-10-08 03:50:12,702][04317] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
	[2024-10-08 03:50:12,703][04317] Adding new argument 'train_script'=None that is not in the saved config file!
	[2024-10-08 03:50:12,705][04317] Adding new argument 'enjoy_script'=None that is not in the saved config file!
	[2024-10-08 03:50:12,707][04317] Using frameskip 1 and render_action_repeat=4 for evaluation
	[2024-10-08 03:50:12,714][04317] RunningMeanStd input shape: (3, 72, 128)
	[2024-10-08 03:50:12,716][04317] RunningMeanStd input shape: (1,)
	[2024-10-08 03:50:12,728][04317] ConvEncoder: input_channels=3
	[2024-10-08 03:50:12,764][04317] Conv encoder output size: 512
	[2024-10-08 03:50:12,765][04317] Policy head output size: 512
	[2024-10-08 03:50:12,786][04317] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
	[2024-10-08 03:50:13,263][04317] Num frames 100...
	[2024-10-08 03:50:13,376][04317] Num frames 200...
	[2024-10-08 03:50:13,488][04317] Num frames 300...
	[2024-10-08 03:50:13,599][04317] Num frames 400...
	[2024-10-08 03:50:13,716][04317] Num frames 500...
	[2024-10-08 03:50:13,830][04317] Num frames 600...
	[2024-10-08 03:50:13,942][04317] Num frames 700...
	[2024-10-08 03:50:14,056][04317] Num frames 800...
	[2024-10-08 03:50:14,168][04317] Num frames 900...
	[2024-10-08 03:50:14,285][04317] Num frames 1000...
	[2024-10-08 03:50:14,398][04317] Num frames 1100...
	[2024-10-08 03:50:14,513][04317] Num frames 1200...
	[2024-10-08 03:50:14,631][04317] Num frames 1300...
	[2024-10-08 03:50:14,746][04317] Num frames 1400...
	[2024-10-08 03:50:14,860][04317] Num frames 1500...
	[2024-10-08 03:50:14,977][04317] Num frames 1600...
	[2024-10-08 03:50:15,094][04317] Num frames 1700...
	[2024-10-08 03:50:15,181][04317] Avg episode rewards: #0: 47.279, true rewards: #0: 17.280
	[2024-10-08 03:50:15,182][04317] Avg episode reward: 47.279, avg true_objective: 17.280
	[2024-10-08 03:50:15,265][04317] Num frames 1800...
	[2024-10-08 03:50:15,378][04317] Num frames 1900...
	[2024-10-08 03:50:15,491][04317] Num frames 2000...
	[2024-10-08 03:50:15,603][04317] Num frames 2100...
	[2024-10-08 03:50:15,723][04317] Num frames 2200...
	[2024-10-08 03:50:15,839][04317] Num frames 2300...
	[2024-10-08 03:50:15,952][04317] Num frames 2400...
	[2024-10-08 03:50:16,070][04317] Num frames 2500...
	[2024-10-08 03:50:16,185][04317] Num frames 2600...
	[2024-10-08 03:50:16,301][04317] Num frames 2700...
	[2024-10-08 03:50:16,427][04317] Num frames 2800...
	[2024-10-08 03:50:16,541][04317] Num frames 2900...
	[2024-10-08 03:50:16,653][04317] Num frames 3000...
	[2024-10-08 03:50:16,790][04317] Avg episode rewards: #0: 38.845, true rewards: #0: 15.345
	[2024-10-08 03:50:16,791][04317] Avg episode reward: 38.845, avg true_objective: 15.345
	[2024-10-08 03:50:16,828][04317] Num frames 3100...
	[2024-10-08 03:50:16,944][04317] Num frames 3200...
	[2024-10-08 03:50:17,061][04317] Num frames 3300...
	[2024-10-08 03:50:17,174][04317] Num frames 3400...
	[2024-10-08 03:50:17,289][04317] Num frames 3500...
	[2024-10-08 03:50:17,401][04317] Num frames 3600...
	[2024-10-08 03:50:17,515][04317] Num frames 3700...
	[2024-10-08 03:50:17,630][04317] Num frames 3800...
	[2024-10-08 03:50:17,755][04317] Num frames 3900...
	[2024-10-08 03:50:17,877][04317] Num frames 4000...
	[2024-10-08 03:50:17,991][04317] Num frames 4100...
	[2024-10-08 03:50:18,110][04317] Num frames 4200...
	[2024-10-08 03:50:18,223][04317] Num frames 4300...
	[2024-10-08 03:50:18,335][04317] Num frames 4400...
	[2024-10-08 03:50:18,455][04317] Num frames 4500...
	[2024-10-08 03:50:18,578][04317] Num frames 4600...
	[2024-10-08 03:50:18,694][04317] Num frames 4700...
	[2024-10-08 03:50:18,813][04317] Num frames 4800...
	[2024-10-08 03:50:18,932][04317] Num frames 4900...
	[2024-10-08 03:50:19,089][04317] Avg episode rewards: #0: 42.296, true rewards: #0: 16.630
	[2024-10-08 03:50:19,091][04317] Avg episode reward: 42.296, avg true_objective: 16.630
	[2024-10-08 03:50:19,107][04317] Num frames 5000...
	[2024-10-08 03:50:19,232][04317] Num frames 5100...
	[2024-10-08 03:50:19,352][04317] Num frames 5200...
	[2024-10-08 03:50:19,468][04317] Num frames 5300...
	[2024-10-08 03:50:19,582][04317] Num frames 5400...
	[2024-10-08 03:50:19,697][04317] Num frames 5500...
	[2024-10-08 03:50:19,813][04317] Num frames 5600...
	[2024-10-08 03:50:19,925][04317] Num frames 5700...
	[2024-10-08 03:50:20,051][04317] Num frames 5800...
	[2024-10-08 03:50:20,171][04317] Num frames 5900...
	[2024-10-08 03:50:20,286][04317] Num frames 6000...
	[2024-10-08 03:50:20,404][04317] Num frames 6100...
	[2024-10-08 03:50:20,522][04317] Num frames 6200...
	[2024-10-08 03:50:20,640][04317] Num frames 6300...
	[2024-10-08 03:50:20,754][04317] Num frames 6400...
	[2024-10-08 03:50:20,870][04317] Num frames 6500...
	[2024-10-08 03:50:20,985][04317] Num frames 6600...
	[2024-10-08 03:50:21,102][04317] Num frames 6700...
	[2024-10-08 03:50:21,224][04317] Num frames 6800...
	[2024-10-08 03:50:21,342][04317] Num frames 6900...
	[2024-10-08 03:50:21,460][04317] Num frames 7000...
	[2024-10-08 03:50:21,621][04317] Avg episode rewards: #0: 44.722, true rewards: #0: 17.723
	[2024-10-08 03:50:21,623][04317] Avg episode reward: 44.722, avg true_objective: 17.723
	[2024-10-08 03:50:21,637][04317] Num frames 7100...
	[2024-10-08 03:50:21,750][04317] Num frames 7200...
	[2024-10-08 03:50:21,862][04317] Num frames 7300...
	[2024-10-08 03:50:21,975][04317] Num frames 7400...
	[2024-10-08 03:50:22,089][04317] Num frames 7500...
	[2024-10-08 03:50:22,222][04317] Avg episode rewards: #0: 37.537, true rewards: #0: 15.138
	[2024-10-08 03:50:22,223][04317] Avg episode reward: 37.537, avg true_objective: 15.138
	[2024-10-08 03:50:22,258][04317] Num frames 7600...
	[2024-10-08 03:50:22,376][04317] Num frames 7700...
	[2024-10-08 03:50:22,492][04317] Num frames 7800...
	[2024-10-08 03:50:22,608][04317] Num frames 7900...
	[2024-10-08 03:50:22,724][04317] Num frames 8000...
	[2024-10-08 03:50:22,838][04317] Num frames 8100...
	[2024-10-08 03:50:22,951][04317] Num frames 8200...
	[2024-10-08 03:50:23,068][04317] Num frames 8300...
	[2024-10-08 03:50:23,184][04317] Num frames 8400...
	[2024-10-08 03:50:23,301][04317] Num frames 8500...
	[2024-10-08 03:50:23,418][04317] Num frames 8600...
	[2024-10-08 03:50:23,537][04317] Num frames 8700...
	[2024-10-08 03:50:23,656][04317] Num frames 8800...
	[2024-10-08 03:50:23,774][04317] Num frames 8900...
	[2024-10-08 03:50:23,890][04317] Num frames 9000...
	[2024-10-08 03:50:24,006][04317] Num frames 9100...
	[2024-10-08 03:50:24,125][04317] Num frames 9200...
	[2024-10-08 03:50:24,241][04317] Num frames 9300...
	[2024-10-08 03:50:24,357][04317] Num frames 9400...
	[2024-10-08 03:50:24,474][04317] Num frames 9500...
	[2024-10-08 03:50:24,592][04317] Num frames 9600...
	[2024-10-08 03:50:24,730][04317] Avg episode rewards: #0: 40.448, true rewards: #0: 16.115
	[2024-10-08 03:50:24,731][04317] Avg episode reward: 40.448, avg true_objective: 16.115
	[2024-10-08 03:50:24,770][04317] Num frames 9700...
	[2024-10-08 03:50:24,884][04317] Num frames 9800...
	[2024-10-08 03:50:24,999][04317] Num frames 9900...
	[2024-10-08 03:50:25,115][04317] Num frames 10000...
	[2024-10-08 03:50:25,230][04317] Avg episode rewards: #0: 35.218, true rewards: #0: 14.361
	[2024-10-08 03:50:25,232][04317] Avg episode reward: 35.218, avg true_objective: 14.361
	[2024-10-08 03:50:25,288][04317] Num frames 10100...
	[2024-10-08 03:50:25,404][04317] Num frames 10200...
	[2024-10-08 03:50:25,519][04317] Num frames 10300...
	[2024-10-08 03:50:25,636][04317] Num frames 10400...
	[2024-10-08 03:50:25,754][04317] Num frames 10500...
	[2024-10-08 03:50:25,871][04317] Num frames 10600...
	[2024-10-08 03:50:25,986][04317] Num frames 10700...
	[2024-10-08 03:50:26,101][04317] Num frames 10800...
	[2024-10-08 03:50:26,217][04317] Num frames 10900...
	[2024-10-08 03:50:26,336][04317] Num frames 11000...
	[2024-10-08 03:50:26,442][04317] Avg episode rewards: #0: 33.306, true rewards: #0: 13.806
	[2024-10-08 03:50:26,444][04317] Avg episode reward: 33.306, avg true_objective: 13.806
	[2024-10-08 03:50:26,509][04317] Num frames 11100...
	[2024-10-08 03:50:26,625][04317] Num frames 11200...
	[2024-10-08 03:50:26,740][04317] Num frames 11300...
	[2024-10-08 03:50:26,856][04317] Num frames 11400...
	[2024-10-08 03:50:26,974][04317] Num frames 11500...
	[2024-10-08 03:50:27,088][04317] Num frames 11600...
	[2024-10-08 03:50:27,203][04317] Num frames 11700...
	[2024-10-08 03:50:27,279][04317] Avg episode rewards: #0: 31.018, true rewards: #0: 13.019
	[2024-10-08 03:50:27,280][04317] Avg episode reward: 31.018, avg true_objective: 13.019
	[2024-10-08 03:50:27,377][04317] Num frames 11800...
	[2024-10-08 03:50:27,491][04317] Num frames 11900...
	[2024-10-08 03:50:27,607][04317] Num frames 12000...
	[2024-10-08 03:50:27,723][04317] Num frames 12100...
	[2024-10-08 03:50:27,841][04317] Num frames 12200...
	[2024-10-08 03:50:27,957][04317] Num frames 12300...
	[2024-10-08 03:50:28,070][04317] Num frames 12400...
	[2024-10-08 03:50:28,186][04317] Num frames 12500...
	[2024-10-08 03:50:28,303][04317] Num frames 12600...
	[2024-10-08 03:50:28,420][04317] Num frames 12700...
	[2024-10-08 03:50:28,535][04317] Num frames 12800...
	[2024-10-08 03:50:28,652][04317] Num frames 12900...
	[2024-10-08 03:50:28,768][04317] Num frames 13000...
	[2024-10-08 03:50:28,883][04317] Num frames 13100...
	[2024-10-08 03:50:28,997][04317] Num frames 13200...
	[2024-10-08 03:50:29,117][04317] Num frames 13300...
	[2024-10-08 03:50:29,237][04317] Num frames 13400...
	[2024-10-08 03:50:29,358][04317] Num frames 13500...
	[2024-10-08 03:50:29,480][04317] Num frames 13600...
	[2024-10-08 03:50:29,597][04317] Num frames 13700...
	[2024-10-08 03:50:29,717][04317] Num frames 13800...
	[2024-10-08 03:50:29,792][04317] Avg episode rewards: #0: 33.717, true rewards: #0: 13.817
	[2024-10-08 03:50:29,794][04317] Avg episode reward: 33.717, avg true_objective: 13.817
	[2024-10-08 03:51:02,121][04317] Replay video saved to /content/train_dir/default_experiment/replay.mp4!
	[2024-10-08 03:51:06,565][04317] The model has been pushed to https://huggingface.co./EntropicLettuce/rl_course_vizdoom_health_gathering_supreme
	[2024-10-08 03:51:57,185][04317] Loading legacy config file train_dir/doom_health_gathering_supreme_2222/cfg.json instead of train_dir/doom_health_gathering_supreme_2222/config.json
	[2024-10-08 03:51:57,187][04317] Loading existing experiment configuration from train_dir/doom_health_gathering_supreme_2222/config.json
	[2024-10-08 03:51:57,188][04317] Overriding arg 'experiment' with value 'doom_health_gathering_supreme_2222' passed from command line
	[2024-10-08 03:51:57,189][04317] Overriding arg 'train_dir' with value 'train_dir' passed from command line
	[2024-10-08 03:51:57,191][04317] Overriding arg 'num_workers' with value 1 passed from command line
	[2024-10-08 03:51:57,193][04317] Adding new argument 'lr_adaptive_min'=1e-06 that is not in the saved config file!
	[2024-10-08 03:51:57,194][04317] Adding new argument 'lr_adaptive_max'=0.01 that is not in the saved config file!
	[2024-10-08 03:51:57,195][04317] Adding new argument 'env_gpu_observations'=True that is not in the saved config file!
	[2024-10-08 03:51:57,197][04317] Adding new argument 'no_render'=True that is not in the saved config file!
	[2024-10-08 03:51:57,198][04317] Adding new argument 'save_video'=True that is not in the saved config file!
	[2024-10-08 03:51:57,200][04317] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
	[2024-10-08 03:51:57,201][04317] Adding new argument 'video_name'=None that is not in the saved config file!
	[2024-10-08 03:51:57,202][04317] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file!
	[2024-10-08 03:51:57,205][04317] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
	[2024-10-08 03:51:57,206][04317] Adding new argument 'push_to_hub'=False that is not in the saved config file!
	[2024-10-08 03:51:57,207][04317] Adding new argument 'hf_repository'=None that is not in the saved config file!
	[2024-10-08 03:51:57,209][04317] Adding new argument 'policy_index'=0 that is not in the saved config file!
	[2024-10-08 03:51:57,210][04317] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
	[2024-10-08 03:51:57,212][04317] Adding new argument 'train_script'=None that is not in the saved config file!
	[2024-10-08 03:51:57,213][04317] Adding new argument 'enjoy_script'=None that is not in the saved config file!
	[2024-10-08 03:51:57,214][04317] Using frameskip 1 and render_action_repeat=4 for evaluation
	[2024-10-08 03:51:57,221][04317] RunningMeanStd input shape: (3, 72, 128)
	[2024-10-08 03:51:57,223][04317] RunningMeanStd input shape: (1,)
	[2024-10-08 03:51:57,234][04317] ConvEncoder: input_channels=3
	[2024-10-08 03:51:57,280][04317] Conv encoder output size: 512
	[2024-10-08 03:51:57,281][04317] Policy head output size: 512
	[2024-10-08 03:51:57,304][04317] Loading state from checkpoint train_dir/doom_health_gathering_supreme_2222/checkpoint_p0/checkpoint_000539850_4422451200.pth...
	[2024-10-08 03:51:57,802][04317] Num frames 100...
	[2024-10-08 03:51:57,918][04317] Num frames 200...
	[2024-10-08 03:51:58,035][04317] Num frames 300...
	[2024-10-08 03:51:58,150][04317] Num frames 400...
	[2024-10-08 03:51:58,264][04317] Num frames 500...
	[2024-10-08 03:51:58,383][04317] Num frames 600...
	[2024-10-08 03:51:58,500][04317] Num frames 700...
	[2024-10-08 03:51:58,618][04317] Num frames 800...
	[2024-10-08 03:51:58,734][04317] Num frames 900...
	[2024-10-08 03:51:58,851][04317] Num frames 1000...
	[2024-10-08 03:51:58,967][04317] Num frames 1100...
	[2024-10-08 03:51:59,084][04317] Num frames 1200...
	[2024-10-08 03:51:59,200][04317] Num frames 1300...
	[2024-10-08 03:51:59,316][04317] Num frames 1400...
	[2024-10-08 03:51:59,438][04317] Num frames 1500...
	[2024-10-08 03:51:59,557][04317] Num frames 1600...
	[2024-10-08 03:51:59,674][04317] Num frames 1700...
	[2024-10-08 03:51:59,791][04317] Num frames 1800...
	[2024-10-08 03:51:59,909][04317] Num frames 1900...
	[2024-10-08 03:52:00,027][04317] Num frames 2000...
	[2024-10-08 03:52:00,153][04317] Num frames 2100...
	[2024-10-08 03:52:00,205][04317] Avg episode rewards: #0: 62.998, true rewards: #0: 21.000
	[2024-10-08 03:52:00,207][04317] Avg episode reward: 62.998, avg true_objective: 21.000
	[2024-10-08 03:52:00,324][04317] Num frames 2200...
	[2024-10-08 03:52:00,447][04317] Num frames 2300...
	[2024-10-08 03:52:00,567][04317] Num frames 2400...
	[2024-10-08 03:52:00,682][04317] Num frames 2500...
	[2024-10-08 03:52:00,804][04317] Num frames 2600...
	[2024-10-08 03:52:00,917][04317] Num frames 2700...
	[2024-10-08 03:52:01,032][04317] Num frames 2800...
	[2024-10-08 03:52:01,150][04317] Num frames 2900...
	[2024-10-08 03:52:01,267][04317] Num frames 3000...
	[2024-10-08 03:52:01,391][04317] Num frames 3100...
	[2024-10-08 03:52:01,510][04317] Num frames 3200...
	[2024-10-08 03:52:01,628][04317] Num frames 3300...
	[2024-10-08 03:52:01,745][04317] Num frames 3400...
	[2024-10-08 03:52:01,865][04317] Num frames 3500...
	[2024-10-08 03:52:01,986][04317] Num frames 3600...
	[2024-10-08 03:52:02,104][04317] Num frames 3700...
	[2024-10-08 03:52:02,222][04317] Num frames 3800...
	[2024-10-08 03:52:02,345][04317] Num frames 3900...
	[2024-10-08 03:52:02,465][04317] Num frames 4000...
	[2024-10-08 03:52:02,585][04317] Num frames 4100...
	[2024-10-08 03:52:02,703][04317] Num frames 4200...
	[2024-10-08 03:52:02,756][04317] Avg episode rewards: #0: 62.499, true rewards: #0: 21.000
	[2024-10-08 03:52:02,758][04317] Avg episode reward: 62.499, avg true_objective: 21.000
	[2024-10-08 03:52:02,875][04317] Num frames 4300...
	[2024-10-08 03:52:02,994][04317] Num frames 4400...
	[2024-10-08 03:52:03,115][04317] Num frames 4500...
	[2024-10-08 03:52:03,240][04317] Num frames 4600...
	[2024-10-08 03:52:03,364][04317] Num frames 4700...
	[2024-10-08 03:52:03,488][04317] Num frames 4800...
	[2024-10-08 03:52:03,617][04317] Num frames 4900...
	[2024-10-08 03:52:03,743][04317] Num frames 5000...
	[2024-10-08 03:52:03,860][04317] Num frames 5100...
	[2024-10-08 03:52:03,991][04317] Num frames 5200...
	[2024-10-08 03:52:04,116][04317] Num frames 5300...
	[2024-10-08 03:52:04,240][04317] Num frames 5400...
	[2024-10-08 03:52:04,315][04317] Avg episode rewards: #0: 52.052, true rewards: #0: 18.053
	[2024-10-08 03:52:04,317][04317] Avg episode reward: 52.052, avg true_objective: 18.053
	[2024-10-08 03:52:04,423][04317] Num frames 5500...
	[2024-10-08 03:52:04,543][04317] Num frames 5600...
	[2024-10-08 03:52:04,670][04317] Num frames 5700...
	[2024-10-08 03:52:04,790][04317] Num frames 5800...
	[2024-10-08 03:52:04,904][04317] Num frames 5900...
	[2024-10-08 03:52:05,030][04317] Num frames 6000...
	[2024-10-08 03:52:05,154][04317] Num frames 6100...
	[2024-10-08 03:52:05,273][04317] Num frames 6200...
	[2024-10-08 03:52:05,395][04317] Num frames 6300...
	[2024-10-08 03:52:05,514][04317] Num frames 6400...
	[2024-10-08 03:52:05,638][04317] Num frames 6500...
	[2024-10-08 03:52:05,760][04317] Num frames 6600...
	[2024-10-08 03:52:05,878][04317] Num frames 6700...
	[2024-10-08 03:52:05,996][04317] Num frames 6800...
	[2024-10-08 03:52:06,114][04317] Num frames 6900...
	[2024-10-08 03:52:06,230][04317] Num frames 7000...
	[2024-10-08 03:52:06,348][04317] Num frames 7100...
	[2024-10-08 03:52:06,464][04317] Num frames 7200...
	[2024-10-08 03:52:06,581][04317] Num frames 7300...
	[2024-10-08 03:52:06,699][04317] Num frames 7400...
	[2024-10-08 03:52:06,817][04317] Num frames 7500...
	[2024-10-08 03:52:06,891][04317] Avg episode rewards: #0: 54.289, true rewards: #0: 18.790
	[2024-10-08 03:52:06,892][04317] Avg episode reward: 54.289, avg true_objective: 18.790
	[2024-10-08 03:52:06,992][04317] Num frames 7600...
	[2024-10-08 03:52:07,110][04317] Num frames 7700...
	[2024-10-08 03:52:07,226][04317] Num frames 7800...
	[2024-10-08 03:52:07,342][04317] Num frames 7900...
	[2024-10-08 03:52:07,460][04317] Num frames 8000...
	[2024-10-08 03:52:07,576][04317] Num frames 8100...
	[2024-10-08 03:52:07,700][04317] Num frames 8200...
	[2024-10-08 03:52:07,821][04317] Num frames 8300...
	[2024-10-08 03:52:07,939][04317] Num frames 8400...
	[2024-10-08 03:52:08,058][04317] Num frames 8500...
	[2024-10-08 03:52:08,176][04317] Num frames 8600...
	[2024-10-08 03:52:08,294][04317] Num frames 8700...
	[2024-10-08 03:52:08,414][04317] Num frames 8800...
	[2024-10-08 03:52:08,536][04317] Num frames 8900...
	[2024-10-08 03:52:08,655][04317] Num frames 9000...
	[2024-10-08 03:52:08,771][04317] Num frames 9100...
	[2024-10-08 03:52:08,887][04317] Num frames 9200...
	[2024-10-08 03:52:09,007][04317] Num frames 9300...
	[2024-10-08 03:52:09,128][04317] Num frames 9400...
	[2024-10-08 03:52:09,247][04317] Num frames 9500...
	[2024-10-08 03:52:09,368][04317] Num frames 9600...
	[2024-10-08 03:52:09,442][04317] Avg episode rewards: #0: 57.431, true rewards: #0: 19.232
	[2024-10-08 03:52:09,443][04317] Avg episode reward: 57.431, avg true_objective: 19.232
	[2024-10-08 03:52:09,546][04317] Num frames 9700...
	[2024-10-08 03:52:09,664][04317] Num frames 9800...
	[2024-10-08 03:52:09,782][04317] Num frames 9900...
	[2024-10-08 03:52:09,900][04317] Num frames 10000...
	[2024-10-08 03:52:10,018][04317] Num frames 10100...
	[2024-10-08 03:52:10,136][04317] Num frames 10200...
	[2024-10-08 03:52:10,260][04317] Num frames 10300...
	[2024-10-08 03:52:10,387][04317] Num frames 10400...
	[2024-10-08 03:52:10,504][04317] Num frames 10500...
	[2024-10-08 03:52:10,620][04317] Num frames 10600...
	[2024-10-08 03:52:10,737][04317] Num frames 10700...
	[2024-10-08 03:52:10,854][04317] Num frames 10800...
	[2024-10-08 03:52:10,972][04317] Num frames 10900...
	[2024-10-08 03:52:11,088][04317] Num frames 11000...
	[2024-10-08 03:52:11,202][04317] Num frames 11100...
	[2024-10-08 03:52:11,319][04317] Num frames 11200...
	[2024-10-08 03:52:11,444][04317] Num frames 11300...
	[2024-10-08 03:52:11,562][04317] Num frames 11400...
	[2024-10-08 03:52:11,681][04317] Num frames 11500...
	[2024-10-08 03:52:11,801][04317] Num frames 11600...
	[2024-10-08 03:52:11,925][04317] Num frames 11700...
	[2024-10-08 03:52:12,000][04317] Avg episode rewards: #0: 58.859, true rewards: #0: 19.527
	[2024-10-08 03:52:12,002][04317] Avg episode reward: 58.859, avg true_objective: 19.527
	[2024-10-08 03:52:12,102][04317] Num frames 11800...
	[2024-10-08 03:52:12,218][04317] Num frames 11900...
	[2024-10-08 03:52:12,335][04317] Num frames 12000...
	[2024-10-08 03:52:12,456][04317] Num frames 12100...
	[2024-10-08 03:52:12,572][04317] Num frames 12200...
	[2024-10-08 03:52:12,689][04317] Num frames 12300...
	[2024-10-08 03:52:12,808][04317] Num frames 12400...
	[2024-10-08 03:52:12,928][04317] Num frames 12500...
	[2024-10-08 03:52:13,046][04317] Num frames 12600...
	[2024-10-08 03:52:13,165][04317] Num frames 12700...
	[2024-10-08 03:52:13,284][04317] Num frames 12800...
	[2024-10-08 03:52:13,403][04317] Num frames 12900...
	[2024-10-08 03:52:13,519][04317] Num frames 13000...
	[2024-10-08 03:52:13,634][04317] Num frames 13100...
	[2024-10-08 03:52:13,754][04317] Num frames 13200...
	[2024-10-08 03:52:13,872][04317] Num frames 13300...
	[2024-10-08 03:52:13,992][04317] Num frames 13400...
	[2024-10-08 03:52:14,111][04317] Num frames 13500...
	[2024-10-08 03:52:14,230][04317] Num frames 13600...
	[2024-10-08 03:52:14,352][04317] Num frames 13700...
	[2024-10-08 03:52:14,473][04317] Num frames 13800...
	[2024-10-08 03:52:14,547][04317] Avg episode rewards: #0: 59.308, true rewards: #0: 19.737
	[2024-10-08 03:52:14,548][04317] Avg episode reward: 59.308, avg true_objective: 19.737
	[2024-10-08 03:52:14,647][04317] Num frames 13900...
	[2024-10-08 03:52:14,766][04317] Num frames 14000...
	[2024-10-08 03:52:14,883][04317] Num frames 14100...
	[2024-10-08 03:52:15,006][04317] Num frames 14200...
	[2024-10-08 03:52:15,124][04317] Num frames 14300...
	[2024-10-08 03:52:15,250][04317] Num frames 14400...
	[2024-10-08 03:52:15,379][04317] Num frames 14500...
	[2024-10-08 03:52:15,504][04317] Num frames 14600...
	[2024-10-08 03:52:15,621][04317] Num frames 14700...
	[2024-10-08 03:52:15,739][04317] Num frames 14800...
	[2024-10-08 03:52:15,856][04317] Num frames 14900...
	[2024-10-08 03:52:15,978][04317] Num frames 15000...
	[2024-10-08 03:52:16,101][04317] Num frames 15100...
	[2024-10-08 03:52:16,219][04317] Num frames 15200...
	[2024-10-08 03:52:16,340][04317] Num frames 15300...
	[2024-10-08 03:52:16,462][04317] Num frames 15400...
	[2024-10-08 03:52:16,582][04317] Num frames 15500...
	[2024-10-08 03:52:16,709][04317] Num frames 15600...
	[2024-10-08 03:52:16,838][04317] Num frames 15700...
	[2024-10-08 03:52:16,958][04317] Num frames 15800...
	[2024-10-08 03:52:17,077][04317] Num frames 15900...
	[2024-10-08 03:52:17,152][04317] Avg episode rewards: #0: 59.894, true rewards: #0: 19.895
	[2024-10-08 03:52:17,153][04317] Avg episode reward: 59.894, avg true_objective: 19.895
	[2024-10-08 03:52:17,252][04317] Num frames 16000...
	[2024-10-08 03:52:17,373][04317] Num frames 16100...
	[2024-10-08 03:52:17,491][04317] Num frames 16200...
	[2024-10-08 03:52:17,612][04317] Num frames 16300...
	[2024-10-08 03:52:17,731][04317] Num frames 16400...
	[2024-10-08 03:52:17,853][04317] Num frames 16500...
	[2024-10-08 03:52:17,980][04317] Num frames 16600...
	[2024-10-08 03:52:18,106][04317] Num frames 16700...
	[2024-10-08 03:52:18,222][04317] Num frames 16800...
	[2024-10-08 03:52:18,342][04317] Num frames 16900...
	[2024-10-08 03:52:18,462][04317] Num frames 17000...
	[2024-10-08 03:52:18,581][04317] Num frames 17100...
	[2024-10-08 03:52:18,703][04317] Num frames 17200...
	[2024-10-08 03:52:18,830][04317] Num frames 17300...
	[2024-10-08 03:52:18,952][04317] Num frames 17400...
	[2024-10-08 03:52:19,072][04317] Num frames 17500...
	[2024-10-08 03:52:19,196][04317] Num frames 17600...
	[2024-10-08 03:52:19,324][04317] Num frames 17700...
	[2024-10-08 03:52:19,451][04317] Num frames 17800...
	[2024-10-08 03:52:19,578][04317] Num frames 17900...
	[2024-10-08 03:52:19,705][04317] Num frames 18000...
	[2024-10-08 03:52:19,781][04317] Avg episode rewards: #0: 59.906, true rewards: #0: 20.018
	[2024-10-08 03:52:19,783][04317] Avg episode reward: 59.906, avg true_objective: 20.018
	[2024-10-08 03:52:19,890][04317] Num frames 18100...
	[2024-10-08 03:52:20,017][04317] Num frames 18200...
	[2024-10-08 03:52:20,145][04317] Num frames 18300...
	[2024-10-08 03:52:20,268][04317] Num frames 18400...
	[2024-10-08 03:52:20,395][04317] Num frames 18500...
	[2024-10-08 03:52:20,522][04317] Num frames 18600...
	[2024-10-08 03:52:20,649][04317] Num frames 18700...
	[2024-10-08 03:52:20,772][04317] Num frames 18800...
	[2024-10-08 03:52:20,892][04317] Num frames 18900...
	[2024-10-08 03:52:21,012][04317] Num frames 19000...
	[2024-10-08 03:52:21,131][04317] Num frames 19100...
	[2024-10-08 03:52:21,248][04317] Num frames 19200...
	[2024-10-08 03:52:21,370][04317] Num frames 19300...
	[2024-10-08 03:52:21,491][04317] Num frames 19400...
	[2024-10-08 03:52:21,618][04317] Num frames 19500...
	[2024-10-08 03:52:21,744][04317] Num frames 19600...
	[2024-10-08 03:52:21,869][04317] Num frames 19700...
	[2024-10-08 03:52:21,998][04317] Num frames 19800...
	[2024-10-08 03:52:22,128][04317] Num frames 19900...
	[2024-10-08 03:52:22,257][04317] Num frames 20000...
	[2024-10-08 03:52:22,379][04317] Num frames 20100...
	[2024-10-08 03:52:22,453][04317] Avg episode rewards: #0: 59.815, true rewards: #0: 20.116
	[2024-10-08 03:52:22,454][04317] Avg episode reward: 59.815, avg true_objective: 20.116
	[2024-10-08 03:53:09,881][04317] Replay video saved to train_dir/doom_health_gathering_supreme_2222/replay.mp4!
	[2024-10-08 03:56:17,887][04317] Environment doom_basic already registered, overwriting...
	[2024-10-08 03:56:17,889][04317] Environment doom_two_colors_easy already registered, overwriting...
	[2024-10-08 03:56:17,890][04317] Environment doom_two_colors_hard already registered, overwriting...
	[2024-10-08 03:56:17,892][04317] Environment doom_dm already registered, overwriting...
	[2024-10-08 03:56:17,893][04317] Environment doom_dwango5 already registered, overwriting...
	[2024-10-08 03:56:17,895][04317] Environment doom_my_way_home_flat_actions already registered, overwriting...
	[2024-10-08 03:56:17,896][04317] Environment doom_defend_the_center_flat_actions already registered, overwriting...
	[2024-10-08 03:56:17,897][04317] Environment doom_my_way_home already registered, overwriting...
	[2024-10-08 03:56:17,899][04317] Environment doom_deadly_corridor already registered, overwriting...
	[2024-10-08 03:56:17,900][04317] Environment doom_defend_the_center already registered, overwriting...
	[2024-10-08 03:56:17,901][04317] Environment doom_defend_the_line already registered, overwriting...
	[2024-10-08 03:56:17,902][04317] Environment doom_health_gathering already registered, overwriting...
	[2024-10-08 03:56:17,903][04317] Environment doom_health_gathering_supreme already registered, overwriting...
	[2024-10-08 03:56:17,904][04317] Environment doom_battle already registered, overwriting...
	[2024-10-08 03:56:17,906][04317] Environment doom_battle2 already registered, overwriting...
	[2024-10-08 03:56:17,907][04317] Environment doom_duel_bots already registered, overwriting...
	[2024-10-08 03:56:17,908][04317] Environment doom_deathmatch_bots already registered, overwriting...
	[2024-10-08 03:56:17,910][04317] Environment doom_duel already registered, overwriting...
	[2024-10-08 03:56:17,911][04317] Environment doom_deathmatch_full already registered, overwriting...
	[2024-10-08 03:56:17,912][04317] Environment doom_benchmark already registered, overwriting...
	[2024-10-08 03:56:17,913][04317] register_encoder_factory: <function make_vizdoom_encoder at 0x7c5edf5485e0>
	[2024-10-08 03:56:17,924][04317] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
	[2024-10-08 03:56:17,925][04317] Overriding arg 'train_for_env_steps' with value 12000000 passed from command line
	[2024-10-08 03:56:17,931][04317] Experiment dir /content/train_dir/default_experiment already exists!
	[2024-10-08 03:56:17,931][04317] Resuming existing experiment from /content/train_dir/default_experiment...
	[2024-10-08 03:56:17,932][04317] Weights and Biases integration disabled
	[2024-10-08 03:56:17,936][04317] Environment var CUDA_VISIBLE_DEVICES is 0

	[2024-10-08 03:56:19,417][04317] Starting experiment with the following configuration:
	help=False
	algo=APPO
	env=doom_health_gathering_supreme
	experiment=default_experiment
	train_dir=/content/train_dir
	restart_behavior=resume
	device=gpu
	seed=None
	num_policies=1
	async_rl=True
	serial_mode=False
	batched_sampling=False
	num_batches_to_accumulate=2
	worker_num_splits=2
	policy_workers_per_policy=1
	max_policy_lag=1000
	num_workers=8
	num_envs_per_worker=4
	batch_size=1024
	num_batches_per_epoch=1
	num_epochs=1
	rollout=32
	recurrence=32
	shuffle_minibatches=False
	gamma=0.99
	reward_scale=1.0
	reward_clip=1000.0
	value_bootstrap=False
	normalize_returns=True
	exploration_loss_coeff=0.001
	value_loss_coeff=0.5
	kl_loss_coeff=0.0
	exploration_loss=symmetric_kl
	gae_lambda=0.95
	ppo_clip_ratio=0.1
	ppo_clip_value=0.2
	with_vtrace=False
	vtrace_rho=1.0
	vtrace_c=1.0
	optimizer=adam
	adam_eps=1e-06
	adam_beta1=0.9
	adam_beta2=0.999
	max_grad_norm=4.0
	learning_rate=0.0001
	lr_schedule=constant
	lr_schedule_kl_threshold=0.008
	lr_adaptive_min=1e-06
	lr_adaptive_max=0.01
	obs_subtract_mean=0.0
	obs_scale=255.0
	normalize_input=True
	normalize_input_keys=None
	decorrelate_experience_max_seconds=0
	decorrelate_envs_on_one_worker=True
	actor_worker_gpus=[]
	set_workers_cpu_affinity=True
	force_envs_single_thread=False
	default_niceness=0
	log_to_file=True
	experiment_summaries_interval=10
	flush_summaries_interval=30
	stats_avg=100
	summaries_use_frameskip=True
	heartbeat_interval=20
	heartbeat_reporting_interval=600
	train_for_env_steps=12000000
	train_for_seconds=10000000000
	save_every_sec=120
	keep_checkpoints=2
	load_checkpoint_kind=latest
	save_milestones_sec=-1
	save_best_every_sec=5
	save_best_metric=reward
	save_best_after=100000
	benchmark=False
	encoder_mlp_layers=[512, 512]
	encoder_conv_architecture=convnet_simple
	encoder_conv_mlp_layers=[512]
	use_rnn=True
	rnn_size=512
	rnn_type=gru
	rnn_num_layers=1
	decoder_mlp_layers=[]
	nonlinearity=elu
	policy_initialization=orthogonal
	policy_init_gain=1.0
	actor_critic_share_weights=True
	adaptive_stddev=True
	continuous_tanh_scale=0.0
	initial_stddev=1.0
	use_env_info_cache=False
	env_gpu_actions=False
	env_gpu_observations=True
	env_frameskip=4
	env_framestack=1
	pixel_format=CHW
	use_record_episode_statistics=False
	with_wandb=False
	wandb_user=None
	wandb_project=sample_factory
	wandb_group=None
	wandb_job_type=SF
	wandb_tags=[]
	with_pbt=False
	pbt_mix_policies_in_one_env=True
	pbt_period_env_steps=5000000
	pbt_start_mutation=20000000
	pbt_replace_fraction=0.3
	pbt_mutation_rate=0.15
	pbt_replace_reward_gap=0.1
	pbt_replace_reward_gap_absolute=1e-06
	pbt_optimize_gamma=False
	pbt_target_objective=true_objective
	pbt_perturb_min=1.1
	pbt_perturb_max=1.5
	num_agents=-1
	num_humans=0
	num_bots=-1
	start_bot_difficulty=None
	timelimit=None
	res_w=128
	res_h=72
	wide_aspect_ratio=False
	eval_env_frameskip=1
	fps=35
	command_line=--env=doom_health_gathering_supreme --num_workers=8 --num_envs_per_worker=4 --train_for_env_steps=4000000
	cli_args={'env': 'doom_health_gathering_supreme', 'num_workers': 8, 'num_envs_per_worker': 4, 'train_for_env_steps': 4000000}
	git_hash=unknown
	git_repo_name=not a git repository
	[2024-10-08 03:56:19,419][04317] Saving configuration to /content/train_dir/default_experiment/config.json...
	[2024-10-08 03:56:19,422][04317] Rollout worker 0 uses device cpu
	[2024-10-08 03:56:19,423][04317] Rollout worker 1 uses device cpu
	[2024-10-08 03:56:19,425][04317] Rollout worker 2 uses device cpu
	[2024-10-08 03:56:19,425][04317] Rollout worker 3 uses device cpu
	[2024-10-08 03:56:19,427][04317] Rollout worker 4 uses device cpu
	[2024-10-08 03:56:19,429][04317] Rollout worker 5 uses device cpu
	[2024-10-08 03:56:19,430][04317] Rollout worker 6 uses device cpu
	[2024-10-08 03:56:19,432][04317] Rollout worker 7 uses device cpu
	[2024-10-08 03:56:19,473][04317] Using GPUs [0] for process 0 (actually maps to GPUs [0])
	[2024-10-08 03:56:19,474][04317] InferenceWorker_p0-w0: min num requests: 2
	[2024-10-08 03:56:19,507][04317] Starting all processes...
	[2024-10-08 03:56:19,508][04317] Starting process learner_proc0
	[2024-10-08 03:56:19,557][04317] Starting all processes...
	[2024-10-08 03:56:19,561][04317] Starting process inference_proc0-0
	[2024-10-08 03:56:19,561][04317] Starting process rollout_proc0
	[2024-10-08 03:56:19,562][04317] Starting process rollout_proc1
	[2024-10-08 03:56:19,563][04317] Starting process rollout_proc2
	[2024-10-08 03:56:19,564][04317] Starting process rollout_proc3
	[2024-10-08 03:56:19,565][04317] Starting process rollout_proc4
	[2024-10-08 03:56:19,567][04317] Starting process rollout_proc5
	[2024-10-08 03:56:19,576][04317] Starting process rollout_proc6
	[2024-10-08 03:56:19,582][04317] Starting process rollout_proc7
	[2024-10-08 03:56:21,574][11573] Worker 0 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
	[2024-10-08 03:56:21,709][11578] Worker 5 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
	[2024-10-08 03:56:21,847][11574] Worker 1 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
	[2024-10-08 03:56:21,852][11575] Worker 2 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
	[2024-10-08 03:56:21,854][11559] Using GPUs [0] for process 0 (actually maps to GPUs [0])
	[2024-10-08 03:56:21,854][11559] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0
	[2024-10-08 03:56:21,867][11580] Worker 7 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
	[2024-10-08 03:56:21,870][11559] Num visible devices: 1
	[2024-10-08 03:56:21,872][11572] Using GPUs [0] for process 0 (actually maps to GPUs [0])
	[2024-10-08 03:56:21,872][11572] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0
	[2024-10-08 03:56:21,895][11572] Num visible devices: 1
	[2024-10-08 03:56:21,896][11576] Worker 3 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
	[2024-10-08 03:56:21,926][11559] Starting seed is not provided
	[2024-10-08 03:56:21,926][11559] Using GPUs [0] for process 0 (actually maps to GPUs [0])
	[2024-10-08 03:56:21,926][11559] Initializing actor-critic model on device cuda:0
	[2024-10-08 03:56:21,927][11559] RunningMeanStd input shape: (3, 72, 128)
	[2024-10-08 03:56:21,928][11559] RunningMeanStd input shape: (1,)
	[2024-10-08 03:56:21,943][11559] ConvEncoder: input_channels=3
	[2024-10-08 03:56:21,949][11577] Worker 4 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
	[2024-10-08 03:56:22,006][11579] Worker 6 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
	[2024-10-08 03:56:22,060][11559] Conv encoder output size: 512
	[2024-10-08 03:56:22,061][11559] Policy head output size: 512
	[2024-10-08 03:56:22,076][11559] Created Actor Critic model with architecture:
	[2024-10-08 03:56:22,076][11559] ActorCriticSharedWeights(
	(obs_normalizer): ObservationNormalizer(
	(running_mean_std): RunningMeanStdDictInPlace(
	(running_mean_std): ModuleDict(
	(obs): RunningMeanStdInPlace()
	)
	)
	)
	(returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace)
	(encoder): VizdoomEncoder(
	(basic_encoder): ConvEncoder(
	(enc): RecursiveScriptModule(
	original_name=ConvEncoderImpl
	(conv_head): RecursiveScriptModule(
	original_name=Sequential
	(0): RecursiveScriptModule(original_name=Conv2d)
	(1): RecursiveScriptModule(original_name=ELU)
	(2): RecursiveScriptModule(original_name=Conv2d)
	(3): RecursiveScriptModule(original_name=ELU)
	(4): RecursiveScriptModule(original_name=Conv2d)
	(5): RecursiveScriptModule(original_name=ELU)
	)
	(mlp_layers): RecursiveScriptModule(
	original_name=Sequential
	(0): RecursiveScriptModule(original_name=Linear)
	(1): RecursiveScriptModule(original_name=ELU)
	)
	)
	)
	)
	(core): ModelCoreRNN(
	(core): GRU(512, 512)
	)
	(decoder): MlpDecoder(
	(mlp): Identity()
	)
	(critic_linear): Linear(in_features=512, out_features=1, bias=True)
	(action_parameterization): ActionParameterizationDefault(
	(distribution_linear): Linear(in_features=512, out_features=5, bias=True)
	)
	)
	[2024-10-08 03:56:23,901][11559] Using optimizer <class 'torch.optim.adam.Adam'>
	[2024-10-08 03:56:23,901][11559] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
	[2024-10-08 03:56:23,935][11559] Loading model from checkpoint
	[2024-10-08 03:56:23,939][11559] Loaded experiment state at self.train_step=978, self.env_steps=4005888
	[2024-10-08 03:56:23,939][11559] Initialized policy 0 weights for model version 978
	[2024-10-08 03:56:23,941][11559] LearnerWorker_p0 finished initialization!
	[2024-10-08 03:56:23,941][11559] Using GPUs [0] for process 0 (actually maps to GPUs [0])
	[2024-10-08 03:56:24,027][11572] RunningMeanStd input shape: (3, 72, 128)
	[2024-10-08 03:56:24,028][11572] RunningMeanStd input shape: (1,)
	[2024-10-08 03:56:24,040][11572] ConvEncoder: input_channels=3
	[2024-10-08 03:56:24,144][11572] Conv encoder output size: 512
	[2024-10-08 03:56:24,145][11572] Policy head output size: 512
	[2024-10-08 03:56:25,891][04317] Inference worker 0-0 is ready!
	[2024-10-08 03:56:25,892][04317] All inference workers are ready! Signal rollout workers to start!
	[2024-10-08 03:56:25,907][11577] Doom resolution: 160x120, resize resolution: (128, 72)
	[2024-10-08 03:56:25,907][11576] Doom resolution: 160x120, resize resolution: (128, 72)
	[2024-10-08 03:56:25,907][11580] Doom resolution: 160x120, resize resolution: (128, 72)
	[2024-10-08 03:56:25,907][11575] Doom resolution: 160x120, resize resolution: (128, 72)
	[2024-10-08 03:56:25,912][11573] Doom resolution: 160x120, resize resolution: (128, 72)
	[2024-10-08 03:56:25,912][11579] Doom resolution: 160x120, resize resolution: (128, 72)
	[2024-10-08 03:56:25,912][11578] Doom resolution: 160x120, resize resolution: (128, 72)
	[2024-10-08 03:56:25,912][11574] Doom resolution: 160x120, resize resolution: (128, 72)
	[2024-10-08 03:56:26,199][11576] Decorrelating experience for 0 frames...
	[2024-10-08 03:56:26,202][11573] Decorrelating experience for 0 frames...
	[2024-10-08 03:56:26,202][11577] Decorrelating experience for 0 frames...
	[2024-10-08 03:56:26,202][11580] Decorrelating experience for 0 frames...
	[2024-10-08 03:56:26,204][11575] Decorrelating experience for 0 frames...
	[2024-10-08 03:56:26,205][11579] Decorrelating experience for 0 frames...
	[2024-10-08 03:56:26,455][11580] Decorrelating experience for 32 frames...
	[2024-10-08 03:56:26,455][11576] Decorrelating experience for 32 frames...
	[2024-10-08 03:56:26,457][11573] Decorrelating experience for 32 frames...
	[2024-10-08 03:56:26,458][11577] Decorrelating experience for 32 frames...
	[2024-10-08 03:56:26,696][11579] Decorrelating experience for 32 frames...
	[2024-10-08 03:56:26,710][11575] Decorrelating experience for 32 frames...
	[2024-10-08 03:56:26,743][11577] Decorrelating experience for 64 frames...
	[2024-10-08 03:56:26,763][11576] Decorrelating experience for 64 frames...
	[2024-10-08 03:56:26,765][11573] Decorrelating experience for 64 frames...
	[2024-10-08 03:56:26,950][11580] Decorrelating experience for 64 frames...
	[2024-10-08 03:56:26,990][11579] Decorrelating experience for 64 frames...
	[2024-10-08 03:56:27,016][11575] Decorrelating experience for 64 frames...
	[2024-10-08 03:56:27,056][11576] Decorrelating experience for 96 frames...
	[2024-10-08 03:56:27,225][11578] Decorrelating experience for 0 frames...
	[2024-10-08 03:56:27,230][11577] Decorrelating experience for 96 frames...
	[2024-10-08 03:56:27,284][11575] Decorrelating experience for 96 frames...
	[2024-10-08 03:56:27,330][11573] Decorrelating experience for 96 frames...
	[2024-10-08 03:56:27,473][11578] Decorrelating experience for 32 frames...
	[2024-10-08 03:56:27,510][11574] Decorrelating experience for 0 frames...
	[2024-10-08 03:56:27,575][11579] Decorrelating experience for 96 frames...
	[2024-10-08 03:56:27,790][11574] Decorrelating experience for 32 frames...
	[2024-10-08 03:56:27,799][11578] Decorrelating experience for 64 frames...
	[2024-10-08 03:56:27,809][11580] Decorrelating experience for 96 frames...
	[2024-10-08 03:56:27,936][04317] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 4005888. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
	[2024-10-08 03:56:27,939][04317] Avg episode reward: [(0, '0.320')]
	[2024-10-08 03:56:28,168][11574] Decorrelating experience for 64 frames...
	[2024-10-08 03:56:28,173][11578] Decorrelating experience for 96 frames...
	[2024-10-08 03:56:28,310][11559] Signal inference workers to stop experience collection...
	[2024-10-08 03:56:28,315][11572] InferenceWorker_p0-w0: stopping experience collection
	[2024-10-08 03:56:28,486][11574] Decorrelating experience for 96 frames...
	[2024-10-08 03:56:29,363][11559] Signal inference workers to resume experience collection...
	[2024-10-08 03:56:29,364][11572] InferenceWorker_p0-w0: resuming experience collection
	[2024-10-08 03:56:31,711][11572] Updated weights for policy 0, policy_version 988 (0.0367)
	[2024-10-08 03:56:32,936][04317] Fps is (10 sec: 13107.3, 60 sec: 13107.3, 300 sec: 13107.3). Total num frames: 4071424. Throughput: 0: 2352.4. Samples: 11762. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2024-10-08 03:56:32,939][04317] Avg episode reward: [(0, '14.564')]
	[2024-10-08 03:56:33,778][11572] Updated weights for policy 0, policy_version 998 (0.0011)
	[2024-10-08 03:56:35,844][11572] Updated weights for policy 0, policy_version 1008 (0.0012)
	[2024-10-08 03:56:37,840][11572] Updated weights for policy 0, policy_version 1018 (0.0012)
	[2024-10-08 03:56:37,936][04317] Fps is (10 sec: 16384.0, 60 sec: 16384.0, 300 sec: 16384.0). Total num frames: 4169728. Throughput: 0: 4189.4. Samples: 41894. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2024-10-08 03:56:37,938][04317] Avg episode reward: [(0, '23.495')]
	[2024-10-08 03:56:37,945][11559] Saving new best policy, reward=23.495!
	[2024-10-08 03:56:39,464][04317] Heartbeat connected on Batcher_0
	[2024-10-08 03:56:39,468][04317] Heartbeat connected on LearnerWorker_p0
	[2024-10-08 03:56:39,477][04317] Heartbeat connected on InferenceWorker_p0-w0
	[2024-10-08 03:56:39,482][04317] Heartbeat connected on RolloutWorker_w0
	[2024-10-08 03:56:39,485][04317] Heartbeat connected on RolloutWorker_w1
	[2024-10-08 03:56:39,488][04317] Heartbeat connected on RolloutWorker_w2
	[2024-10-08 03:56:39,494][04317] Heartbeat connected on RolloutWorker_w3
	[2024-10-08 03:56:39,497][04317] Heartbeat connected on RolloutWorker_w4
	[2024-10-08 03:56:39,500][04317] Heartbeat connected on RolloutWorker_w5
	[2024-10-08 03:56:39,503][04317] Heartbeat connected on RolloutWorker_w6
	[2024-10-08 03:56:39,509][04317] Heartbeat connected on RolloutWorker_w7
	[2024-10-08 03:56:39,798][11572] Updated weights for policy 0, policy_version 1028 (0.0011)
	[2024-10-08 03:56:41,888][11572] Updated weights for policy 0, policy_version 1038 (0.0012)
	[2024-10-08 03:56:42,936][04317] Fps is (10 sec: 20070.4, 60 sec: 17749.4, 300 sec: 17749.4). Total num frames: 4272128. Throughput: 0: 3788.4. Samples: 56826. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2024-10-08 03:56:42,938][04317] Avg episode reward: [(0, '22.151')]
	[2024-10-08 03:56:43,864][11572] Updated weights for policy 0, policy_version 1048 (0.0011)
	[2024-10-08 03:56:45,832][11572] Updated weights for policy 0, policy_version 1058 (0.0011)
	[2024-10-08 03:56:47,936][04317] Fps is (10 sec: 20070.3, 60 sec: 18227.2, 300 sec: 18227.2). Total num frames: 4370432. Throughput: 0: 4394.3. Samples: 87886. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2024-10-08 03:56:47,940][04317] Avg episode reward: [(0, '19.779')]
	[2024-10-08 03:56:47,993][11572] Updated weights for policy 0, policy_version 1068 (0.0011)
	[2024-10-08 03:56:50,094][11572] Updated weights for policy 0, policy_version 1078 (0.0011)
	[2024-10-08 03:56:52,090][11572] Updated weights for policy 0, policy_version 1088 (0.0012)
	[2024-10-08 03:56:52,936][04317] Fps is (10 sec: 20070.3, 60 sec: 18677.8, 300 sec: 18677.8). Total num frames: 4472832. Throughput: 0: 4704.7. Samples: 117618. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
	[2024-10-08 03:56:52,939][04317] Avg episode reward: [(0, '19.246')]
	[2024-10-08 03:56:54,057][11572] Updated weights for policy 0, policy_version 1098 (0.0011)
	[2024-10-08 03:56:56,005][11572] Updated weights for policy 0, policy_version 1108 (0.0011)
	[2024-10-08 03:56:57,936][04317] Fps is (10 sec: 20480.0, 60 sec: 18978.1, 300 sec: 18978.1). Total num frames: 4575232. Throughput: 0: 4442.5. Samples: 133276. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
	[2024-10-08 03:56:57,938][04317] Avg episode reward: [(0, '20.329')]
	[2024-10-08 03:56:57,959][11572] Updated weights for policy 0, policy_version 1118 (0.0011)
	[2024-10-08 03:56:59,914][11572] Updated weights for policy 0, policy_version 1128 (0.0011)
	[2024-10-08 03:57:01,928][11572] Updated weights for policy 0, policy_version 1138 (0.0011)
	[2024-10-08 03:57:02,937][04317] Fps is (10 sec: 20479.6, 60 sec: 19192.6, 300 sec: 19192.6). Total num frames: 4677632. Throughput: 0: 4697.9. Samples: 164428. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
	[2024-10-08 03:57:02,939][04317] Avg episode reward: [(0, '21.221')]
	[2024-10-08 03:57:03,986][11572] Updated weights for policy 0, policy_version 1148 (0.0011)
	[2024-10-08 03:57:05,965][11572] Updated weights for policy 0, policy_version 1158 (0.0011)
	[2024-10-08 03:57:07,936][04317] Fps is (10 sec: 20480.2, 60 sec: 19353.6, 300 sec: 19353.6). Total num frames: 4780032. Throughput: 0: 4875.9. Samples: 195036. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
	[2024-10-08 03:57:07,938][04317] Avg episode reward: [(0, '22.147')]
	[2024-10-08 03:57:07,947][11572] Updated weights for policy 0, policy_version 1168 (0.0011)
	[2024-10-08 03:57:09,921][11572] Updated weights for policy 0, policy_version 1178 (0.0011)
	[2024-10-08 03:57:11,904][11572] Updated weights for policy 0, policy_version 1188 (0.0011)
	[2024-10-08 03:57:12,936][04317] Fps is (10 sec: 20889.7, 60 sec: 19569.7, 300 sec: 19569.7). Total num frames: 4886528. Throughput: 0: 4681.1. Samples: 210648. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
	[2024-10-08 03:57:12,939][04317] Avg episode reward: [(0, '22.034')]
	[2024-10-08 03:57:13,883][11572] Updated weights for policy 0, policy_version 1198 (0.0011)
	[2024-10-08 03:57:15,919][11572] Updated weights for policy 0, policy_version 1208 (0.0012)
	[2024-10-08 03:57:17,936][04317] Fps is (10 sec: 20479.8, 60 sec: 19578.9, 300 sec: 19578.9). Total num frames: 4984832. Throughput: 0: 5100.6. Samples: 241290. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0)
	[2024-10-08 03:57:17,939][04317] Avg episode reward: [(0, '23.041')]
	[2024-10-08 03:57:17,977][11572] Updated weights for policy 0, policy_version 1218 (0.0012)
	[2024-10-08 03:57:20,011][11572] Updated weights for policy 0, policy_version 1228 (0.0011)
	[2024-10-08 03:57:21,955][11572] Updated weights for policy 0, policy_version 1238 (0.0011)
	[2024-10-08 03:57:22,936][04317] Fps is (10 sec: 20480.3, 60 sec: 19735.3, 300 sec: 19735.3). Total num frames: 5091328. Throughput: 0: 5116.0. Samples: 272114. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0)
	[2024-10-08 03:57:22,938][04317] Avg episode reward: [(0, '23.406')]
	[2024-10-08 03:57:23,909][11572] Updated weights for policy 0, policy_version 1248 (0.0011)
	[2024-10-08 03:57:25,870][11572] Updated weights for policy 0, policy_version 1258 (0.0012)
	[2024-10-08 03:57:27,831][11572] Updated weights for policy 0, policy_version 1268 (0.0011)
	[2024-10-08 03:57:27,936][04317] Fps is (10 sec: 20889.7, 60 sec: 19797.3, 300 sec: 19797.3). Total num frames: 5193728. Throughput: 0: 5131.5. Samples: 287742. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0)
	[2024-10-08 03:57:27,939][04317] Avg episode reward: [(0, '22.520')]
	[2024-10-08 03:57:29,825][11572] Updated weights for policy 0, policy_version 1278 (0.0012)
	[2024-10-08 03:57:31,902][11572] Updated weights for policy 0, policy_version 1288 (0.0011)
	[2024-10-08 03:57:32,936][04317] Fps is (10 sec: 20480.2, 60 sec: 20411.8, 300 sec: 19849.9). Total num frames: 5296128. Throughput: 0: 5118.4. Samples: 318214. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
	[2024-10-08 03:57:32,939][04317] Avg episode reward: [(0, '21.058')]
	[2024-10-08 03:57:33,913][11572] Updated weights for policy 0, policy_version 1298 (0.0011)
	[2024-10-08 03:57:35,855][11572] Updated weights for policy 0, policy_version 1308 (0.0011)
	[2024-10-08 03:57:37,819][11572] Updated weights for policy 0, policy_version 1318 (0.0012)
	[2024-10-08 03:57:37,936][04317] Fps is (10 sec: 20480.0, 60 sec: 20480.0, 300 sec: 19894.9). Total num frames: 5398528. Throughput: 0: 5151.2. Samples: 349420. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
	[2024-10-08 03:57:37,938][04317] Avg episode reward: [(0, '23.475')]
	[2024-10-08 03:57:39,779][11572] Updated weights for policy 0, policy_version 1328 (0.0011)
	[2024-10-08 03:57:41,763][11572] Updated weights for policy 0, policy_version 1338 (0.0011)
	[2024-10-08 03:57:42,936][04317] Fps is (10 sec: 20479.8, 60 sec: 20480.0, 300 sec: 19933.9). Total num frames: 5500928. Throughput: 0: 5150.7. Samples: 365058. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0)
	[2024-10-08 03:57:42,938][04317] Avg episode reward: [(0, '21.367')]
	[2024-10-08 03:57:43,842][11572] Updated weights for policy 0, policy_version 1348 (0.0012)
	[2024-10-08 03:57:45,945][11572] Updated weights for policy 0, policy_version 1358 (0.0012)
	[2024-10-08 03:57:47,936][04317] Fps is (10 sec: 20070.3, 60 sec: 20480.0, 300 sec: 19916.8). Total num frames: 5599232. Throughput: 0: 5121.6. Samples: 394898. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0)
	[2024-10-08 03:57:47,938][04317] Avg episode reward: [(0, '21.273')]
	[2024-10-08 03:57:47,955][11572] Updated weights for policy 0, policy_version 1368 (0.0011)
	[2024-10-08 03:57:49,898][11572] Updated weights for policy 0, policy_version 1378 (0.0011)
	[2024-10-08 03:57:51,853][11572] Updated weights for policy 0, policy_version 1388 (0.0012)
	[2024-10-08 03:57:52,936][04317] Fps is (10 sec: 20480.1, 60 sec: 20548.3, 300 sec: 19998.1). Total num frames: 5705728. Throughput: 0: 5137.0. Samples: 426202. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0)
	[2024-10-08 03:57:52,939][04317] Avg episode reward: [(0, '22.656')]
	[2024-10-08 03:57:53,822][11572] Updated weights for policy 0, policy_version 1398 (0.0012)
	[2024-10-08 03:57:55,755][11572] Updated weights for policy 0, policy_version 1408 (0.0011)
	[2024-10-08 03:57:57,738][11572] Updated weights for policy 0, policy_version 1418 (0.0012)
	[2024-10-08 03:57:57,936][04317] Fps is (10 sec: 21299.4, 60 sec: 20616.5, 300 sec: 20070.4). Total num frames: 5812224. Throughput: 0: 5141.0. Samples: 441992. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0)
	[2024-10-08 03:57:57,938][04317] Avg episode reward: [(0, '25.252')]
	[2024-10-08 03:57:57,946][11559] Saving new best policy, reward=25.252!
	[2024-10-08 03:57:59,789][11572] Updated weights for policy 0, policy_version 1428 (0.0012)
	[2024-10-08 03:58:01,804][11572] Updated weights for policy 0, policy_version 1438 (0.0011)
	[2024-10-08 03:58:02,936][04317] Fps is (10 sec: 20479.9, 60 sec: 20548.3, 300 sec: 20048.8). Total num frames: 5910528. Throughput: 0: 5130.9. Samples: 472182. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2024-10-08 03:58:02,938][04317] Avg episode reward: [(0, '23.229')]
	[2024-10-08 03:58:03,788][11572] Updated weights for policy 0, policy_version 1448 (0.0011)
	[2024-10-08 03:58:05,745][11572] Updated weights for policy 0, policy_version 1458 (0.0011)
	[2024-10-08 03:58:07,696][11572] Updated weights for policy 0, policy_version 1468 (0.0011)
	[2024-10-08 03:58:07,936][04317] Fps is (10 sec: 20480.0, 60 sec: 20616.5, 300 sec: 20111.4). Total num frames: 6017024. Throughput: 0: 5141.4. Samples: 503478. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
	[2024-10-08 03:58:07,939][04317] Avg episode reward: [(0, '23.110')]
	[2024-10-08 03:58:09,655][11572] Updated weights for policy 0, policy_version 1478 (0.0011)
	[2024-10-08 03:58:11,644][11572] Updated weights for policy 0, policy_version 1488 (0.0011)
	[2024-10-08 03:58:12,937][04317] Fps is (10 sec: 20889.0, 60 sec: 20548.2, 300 sec: 20128.9). Total num frames: 6119424. Throughput: 0: 5144.6. Samples: 519252. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
	[2024-10-08 03:58:12,939][04317] Avg episode reward: [(0, '24.684')]
	[2024-10-08 03:58:13,756][11572] Updated weights for policy 0, policy_version 1498 (0.0011)
	[2024-10-08 03:58:15,764][11572] Updated weights for policy 0, policy_version 1508 (0.0012)
	[2024-10-08 03:58:17,722][11572] Updated weights for policy 0, policy_version 1518 (0.0011)
	[2024-10-08 03:58:17,936][04317] Fps is (10 sec: 20480.1, 60 sec: 20616.6, 300 sec: 20144.9). Total num frames: 6221824. Throughput: 0: 5140.4. Samples: 549534. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
	[2024-10-08 03:58:17,938][04317] Avg episode reward: [(0, '21.926')]
	[2024-10-08 03:58:17,946][11559] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001519_6221824.pth...
	[2024-10-08 03:58:18,015][11559] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000527_2158592.pth
	[2024-10-08 03:58:19,708][11572] Updated weights for policy 0, policy_version 1528 (0.0011)
	[2024-10-08 03:58:21,672][11572] Updated weights for policy 0, policy_version 1538 (0.0011)
	[2024-10-08 03:58:22,936][04317] Fps is (10 sec: 20480.6, 60 sec: 20548.2, 300 sec: 20159.4). Total num frames: 6324224. Throughput: 0: 5138.9. Samples: 580672. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
	[2024-10-08 03:58:22,938][04317] Avg episode reward: [(0, '23.383')]
	[2024-10-08 03:58:23,642][11572] Updated weights for policy 0, policy_version 1548 (0.0011)
	[2024-10-08 03:58:25,603][11572] Updated weights for policy 0, policy_version 1558 (0.0011)
	[2024-10-08 03:58:27,682][11572] Updated weights for policy 0, policy_version 1568 (0.0012)
	[2024-10-08 03:58:27,936][04317] Fps is (10 sec: 20479.9, 60 sec: 20548.3, 300 sec: 20172.8). Total num frames: 6426624. Throughput: 0: 5136.9. Samples: 596220. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
	[2024-10-08 03:58:27,938][04317] Avg episode reward: [(0, '21.375')]
	[2024-10-08 03:58:29,699][11572] Updated weights for policy 0, policy_version 1578 (0.0011)
	[2024-10-08 03:58:31,644][11572] Updated weights for policy 0, policy_version 1588 (0.0011)
	[2024-10-08 03:58:32,936][04317] Fps is (10 sec: 20480.0, 60 sec: 20548.2, 300 sec: 20185.1). Total num frames: 6529024. Throughput: 0: 5151.0. Samples: 626694. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
	[2024-10-08 03:58:32,938][04317] Avg episode reward: [(0, '22.632')]
	[2024-10-08 03:58:33,614][11572] Updated weights for policy 0, policy_version 1598 (0.0012)
	[2024-10-08 03:58:35,577][11572] Updated weights for policy 0, policy_version 1608 (0.0011)
	[2024-10-08 03:58:37,531][11572] Updated weights for policy 0, policy_version 1618 (0.0011)
	[2024-10-08 03:58:37,936][04317] Fps is (10 sec: 20889.8, 60 sec: 20616.6, 300 sec: 20227.9). Total num frames: 6635520. Throughput: 0: 5153.1. Samples: 658090. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
	[2024-10-08 03:58:37,939][04317] Avg episode reward: [(0, '21.081')]
	[2024-10-08 03:58:39,476][11572] Updated weights for policy 0, policy_version 1628 (0.0011)
	[2024-10-08 03:58:41,565][11572] Updated weights for policy 0, policy_version 1638 (0.0012)
	[2024-10-08 03:58:42,936][04317] Fps is (10 sec: 20480.0, 60 sec: 20548.3, 300 sec: 20206.9). Total num frames: 6733824. Throughput: 0: 5143.7. Samples: 673460. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
	[2024-10-08 03:58:42,939][04317] Avg episode reward: [(0, '22.563')]
	[2024-10-08 03:58:43,616][11572] Updated weights for policy 0, policy_version 1648 (0.0012)
	[2024-10-08 03:58:45,577][11572] Updated weights for policy 0, policy_version 1658 (0.0011)
	[2024-10-08 03:58:47,531][11572] Updated weights for policy 0, policy_version 1668 (0.0012)
	[2024-10-08 03:58:47,937][04317] Fps is (10 sec: 20478.8, 60 sec: 20684.6, 300 sec: 20245.9). Total num frames: 6840320. Throughput: 0: 5150.1. Samples: 703940. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
	[2024-10-08 03:58:47,941][04317] Avg episode reward: [(0, '24.330')]
	[2024-10-08 03:58:49,526][11572] Updated weights for policy 0, policy_version 1678 (0.0012)
	[2024-10-08 03:58:51,483][11572] Updated weights for policy 0, policy_version 1688 (0.0011)
	[2024-10-08 03:58:52,936][04317] Fps is (10 sec: 20889.6, 60 sec: 20616.5, 300 sec: 20254.0). Total num frames: 6942720. Throughput: 0: 5147.2. Samples: 735100. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2024-10-08 03:58:52,938][04317] Avg episode reward: [(0, '24.165')]
	[2024-10-08 03:58:53,440][11572] Updated weights for policy 0, policy_version 1698 (0.0012)
	[2024-10-08 03:58:55,510][11572] Updated weights for policy 0, policy_version 1708 (0.0012)
	[2024-10-08 03:58:57,601][11572] Updated weights for policy 0, policy_version 1718 (0.0012)
	[2024-10-08 03:58:57,936][04317] Fps is (10 sec: 20071.4, 60 sec: 20480.0, 300 sec: 20234.2). Total num frames: 7041024. Throughput: 0: 5129.5. Samples: 750078. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2024-10-08 03:58:57,938][04317] Avg episode reward: [(0, '23.100')]
	[2024-10-08 03:58:59,590][11572] Updated weights for policy 0, policy_version 1728 (0.0011)
	[2024-10-08 03:59:01,542][11572] Updated weights for policy 0, policy_version 1738 (0.0011)
	[2024-10-08 03:59:02,936][04317] Fps is (10 sec: 20480.1, 60 sec: 20616.6, 300 sec: 20268.6). Total num frames: 7147520. Throughput: 0: 5141.2. Samples: 780888. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
	[2024-10-08 03:59:02,939][04317] Avg episode reward: [(0, '23.079')]
	[2024-10-08 03:59:03,476][11572] Updated weights for policy 0, policy_version 1748 (0.0012)
	[2024-10-08 03:59:05,456][11572] Updated weights for policy 0, policy_version 1758 (0.0011)
	[2024-10-08 03:59:07,418][11572] Updated weights for policy 0, policy_version 1768 (0.0011)
	[2024-10-08 03:59:07,936][04317] Fps is (10 sec: 20889.4, 60 sec: 20548.2, 300 sec: 20275.2). Total num frames: 7249920. Throughput: 0: 5146.9. Samples: 812284. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
	[2024-10-08 03:59:07,939][04317] Avg episode reward: [(0, '23.355')]
	[2024-10-08 03:59:09,442][11572] Updated weights for policy 0, policy_version 1778 (0.0011)
	[2024-10-08 03:59:11,547][11572] Updated weights for policy 0, policy_version 1788 (0.0011)
	[2024-10-08 03:59:12,936][04317] Fps is (10 sec: 20479.9, 60 sec: 20548.4, 300 sec: 20281.4). Total num frames: 7352320. Throughput: 0: 5129.9. Samples: 827066. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
	[2024-10-08 03:59:12,939][04317] Avg episode reward: [(0, '24.926')]
	[2024-10-08 03:59:13,510][11572] Updated weights for policy 0, policy_version 1798 (0.0011)
	[2024-10-08 03:59:15,471][11572] Updated weights for policy 0, policy_version 1808 (0.0012)
	[2024-10-08 03:59:17,450][11572] Updated weights for policy 0, policy_version 1818 (0.0012)
	[2024-10-08 03:59:17,936][04317] Fps is (10 sec: 20480.3, 60 sec: 20548.3, 300 sec: 20287.3). Total num frames: 7454720. Throughput: 0: 5142.8. Samples: 858120. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
	[2024-10-08 03:59:17,938][04317] Avg episode reward: [(0, '24.797')]
	[2024-10-08 03:59:19,427][11572] Updated weights for policy 0, policy_version 1828 (0.0012)
	[2024-10-08 03:59:21,399][11572] Updated weights for policy 0, policy_version 1838 (0.0012)
	[2024-10-08 03:59:22,936][04317] Fps is (10 sec: 20479.9, 60 sec: 20548.3, 300 sec: 20292.8). Total num frames: 7557120. Throughput: 0: 5134.9. Samples: 889160. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
	[2024-10-08 03:59:22,938][04317] Avg episode reward: [(0, '22.900')]
	[2024-10-08 03:59:23,422][11572] Updated weights for policy 0, policy_version 1848 (0.0012)
	[2024-10-08 03:59:25,493][11572] Updated weights for policy 0, policy_version 1858 (0.0011)
	[2024-10-08 03:59:27,504][11572] Updated weights for policy 0, policy_version 1868 (0.0011)
	[2024-10-08 03:59:27,936][04317] Fps is (10 sec: 20480.1, 60 sec: 20548.3, 300 sec: 20298.0). Total num frames: 7659520. Throughput: 0: 5121.8. Samples: 903942. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
	[2024-10-08 03:59:27,939][04317] Avg episode reward: [(0, '22.327')]
	[2024-10-08 03:59:29,471][11572] Updated weights for policy 0, policy_version 1878 (0.0011)
	[2024-10-08 03:59:31,445][11572] Updated weights for policy 0, policy_version 1888 (0.0011)
	[2024-10-08 03:59:32,936][04317] Fps is (10 sec: 20480.1, 60 sec: 20548.3, 300 sec: 20302.9). Total num frames: 7761920. Throughput: 0: 5134.1. Samples: 934970. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
	[2024-10-08 03:59:32,938][04317] Avg episode reward: [(0, '24.365')]
	[2024-10-08 03:59:33,424][11572] Updated weights for policy 0, policy_version 1898 (0.0012)
	[2024-10-08 03:59:35,432][11572] Updated weights for policy 0, policy_version 1908 (0.0011)
	[2024-10-08 03:59:37,434][11572] Updated weights for policy 0, policy_version 1918 (0.0011)
	[2024-10-08 03:59:37,936][04317] Fps is (10 sec: 20479.8, 60 sec: 20480.0, 300 sec: 20307.5). Total num frames: 7864320. Throughput: 0: 5121.7. Samples: 965578. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
	[2024-10-08 03:59:37,938][04317] Avg episode reward: [(0, '24.042')]
	[2024-10-08 03:59:39,534][11572] Updated weights for policy 0, policy_version 1928 (0.0011)
	[2024-10-08 03:59:41,544][11572] Updated weights for policy 0, policy_version 1938 (0.0011)
	[2024-10-08 03:59:42,936][04317] Fps is (10 sec: 20480.1, 60 sec: 20548.3, 300 sec: 20312.0). Total num frames: 7966720. Throughput: 0: 5120.1. Samples: 980480. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
	[2024-10-08 03:59:42,939][04317] Avg episode reward: [(0, '25.167')]
	[2024-10-08 03:59:43,521][11572] Updated weights for policy 0, policy_version 1948 (0.0012)
	[2024-10-08 03:59:45,534][11572] Updated weights for policy 0, policy_version 1958 (0.0011)
	[2024-10-08 03:59:47,477][11572] Updated weights for policy 0, policy_version 1968 (0.0012)
	[2024-10-08 03:59:47,936][04317] Fps is (10 sec: 20480.1, 60 sec: 20480.2, 300 sec: 20316.2). Total num frames: 8069120. Throughput: 0: 5125.1. Samples: 1011516. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
	[2024-10-08 03:59:47,938][04317] Avg episode reward: [(0, '21.983')]
	[2024-10-08 03:59:49,482][11572] Updated weights for policy 0, policy_version 1978 (0.0011)
	[2024-10-08 03:59:51,516][11572] Updated weights for policy 0, policy_version 1988 (0.0011)
	[2024-10-08 03:59:52,936][04317] Fps is (10 sec: 20070.3, 60 sec: 20411.7, 300 sec: 20300.2). Total num frames: 8167424. Throughput: 0: 5101.8. Samples: 1041864. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
	[2024-10-08 03:59:52,939][04317] Avg episode reward: [(0, '25.703')]
	[2024-10-08 03:59:52,941][11559] Saving new best policy, reward=25.703!
	[2024-10-08 03:59:53,627][11572] Updated weights for policy 0, policy_version 1998 (0.0012)
	[2024-10-08 03:59:55,635][11572] Updated weights for policy 0, policy_version 2008 (0.0012)
	[2024-10-08 03:59:57,590][11572] Updated weights for policy 0, policy_version 2018 (0.0011)
	[2024-10-08 03:59:57,936][04317] Fps is (10 sec: 20070.4, 60 sec: 20480.0, 300 sec: 20304.5). Total num frames: 8269824. Throughput: 0: 5109.3. Samples: 1056984. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0)
	[2024-10-08 03:59:57,938][04317] Avg episode reward: [(0, '24.339')]
	[2024-10-08 03:59:59,570][11572] Updated weights for policy 0, policy_version 2028 (0.0011)
	[2024-10-08 04:00:01,521][11572] Updated weights for policy 0, policy_version 2038 (0.0011)
	[2024-10-08 04:00:02,936][04317] Fps is (10 sec: 20889.8, 60 sec: 20480.0, 300 sec: 20327.6). Total num frames: 8376320. Throughput: 0: 5115.6. Samples: 1088324. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0)
	[2024-10-08 04:00:02,939][04317] Avg episode reward: [(0, '23.672')]
	[2024-10-08 04:00:03,492][11572] Updated weights for policy 0, policy_version 2048 (0.0011)
	[2024-10-08 04:00:05,511][11572] Updated weights for policy 0, policy_version 2058 (0.0011)
	[2024-10-08 04:00:07,620][11572] Updated weights for policy 0, policy_version 2068 (0.0012)
	[2024-10-08 04:00:07,936][04317] Fps is (10 sec: 20480.0, 60 sec: 20411.8, 300 sec: 20312.4). Total num frames: 8474624. Throughput: 0: 5097.0. Samples: 1118526. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0)
	[2024-10-08 04:00:07,939][04317] Avg episode reward: [(0, '22.304')]
	[2024-10-08 04:00:09,626][11572] Updated weights for policy 0, policy_version 2078 (0.0011)
	[2024-10-08 04:00:11,610][11572] Updated weights for policy 0, policy_version 2088 (0.0011)
	[2024-10-08 04:00:12,936][04317] Fps is (10 sec: 20070.4, 60 sec: 20411.8, 300 sec: 20316.2). Total num frames: 8577024. Throughput: 0: 5109.4. Samples: 1133864. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0)
	[2024-10-08 04:00:12,939][04317] Avg episode reward: [(0, '26.880')]
	[2024-10-08 04:00:12,941][11559] Saving new best policy, reward=26.880!
	[2024-10-08 04:00:13,594][11572] Updated weights for policy 0, policy_version 2098 (0.0011)
	[2024-10-08 04:00:15,552][11572] Updated weights for policy 0, policy_version 2108 (0.0011)
	[2024-10-08 04:00:17,525][11572] Updated weights for policy 0, policy_version 2118 (0.0011)
	[2024-10-08 04:00:17,936][04317] Fps is (10 sec: 20889.5, 60 sec: 20480.0, 300 sec: 20337.5). Total num frames: 8683520. Throughput: 0: 5108.3. Samples: 1164842. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0)
	[2024-10-08 04:00:17,939][04317] Avg episode reward: [(0, '22.555')]
	[2024-10-08 04:00:17,947][11559] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002120_8683520.pth...
	[2024-10-08 04:00:18,018][11559] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth
	[2024-10-08 04:00:19,564][11572] Updated weights for policy 0, policy_version 2128 (0.0012)
	[2024-10-08 04:00:21,654][11572] Updated weights for policy 0, policy_version 2138 (0.0012)
	[2024-10-08 04:00:22,936][04317] Fps is (10 sec: 20479.8, 60 sec: 20411.7, 300 sec: 20323.1). Total num frames: 8781824. Throughput: 0: 5090.2. Samples: 1194638. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
	[2024-10-08 04:00:22,939][04317] Avg episode reward: [(0, '25.120')]
	[2024-10-08 04:00:23,672][11572] Updated weights for policy 0, policy_version 2148 (0.0011)
	[2024-10-08 04:00:25,652][11572] Updated weights for policy 0, policy_version 2158 (0.0011)
	[2024-10-08 04:00:27,604][11572] Updated weights for policy 0, policy_version 2168 (0.0011)
	[2024-10-08 04:00:27,936][04317] Fps is (10 sec: 20070.3, 60 sec: 20411.7, 300 sec: 20326.4). Total num frames: 8884224. Throughput: 0: 5106.4. Samples: 1210268. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
	[2024-10-08 04:00:27,938][04317] Avg episode reward: [(0, '24.576')]
	[2024-10-08 04:00:29,594][11572] Updated weights for policy 0, policy_version 2178 (0.0012)
	[2024-10-08 04:00:31,552][11572] Updated weights for policy 0, policy_version 2188 (0.0011)
	[2024-10-08 04:00:32,936][04317] Fps is (10 sec: 20480.1, 60 sec: 20411.7, 300 sec: 20329.5). Total num frames: 8986624. Throughput: 0: 5111.4. Samples: 1241530. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0)
	[2024-10-08 04:00:32,939][04317] Avg episode reward: [(0, '23.833')]
	[2024-10-08 04:00:33,582][11572] Updated weights for policy 0, policy_version 2198 (0.0011)
	[2024-10-08 04:00:35,694][11572] Updated weights for policy 0, policy_version 2208 (0.0012)
	[2024-10-08 04:00:37,746][11572] Updated weights for policy 0, policy_version 2218 (0.0011)
	[2024-10-08 04:00:37,936][04317] Fps is (10 sec: 20480.2, 60 sec: 20411.7, 300 sec: 20332.5). Total num frames: 9089024. Throughput: 0: 5103.4. Samples: 1271518. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0)
	[2024-10-08 04:00:37,939][04317] Avg episode reward: [(0, '22.379')]
	[2024-10-08 04:00:39,725][11572] Updated weights for policy 0, policy_version 2228 (0.0012)
	[2024-10-08 04:00:41,675][11572] Updated weights for policy 0, policy_version 2238 (0.0011)
	[2024-10-08 04:00:42,936][04317] Fps is (10 sec: 20480.0, 60 sec: 20411.7, 300 sec: 20335.4). Total num frames: 9191424. Throughput: 0: 5114.3. Samples: 1287126. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0)
	[2024-10-08 04:00:42,939][04317] Avg episode reward: [(0, '23.347')]
	[2024-10-08 04:00:43,623][11572] Updated weights for policy 0, policy_version 2248 (0.0011)
	[2024-10-08 04:00:45,580][11572] Updated weights for policy 0, policy_version 2258 (0.0011)
	[2024-10-08 04:00:47,590][11572] Updated weights for policy 0, policy_version 2268 (0.0012)
	[2024-10-08 04:00:47,936][04317] Fps is (10 sec: 20479.9, 60 sec: 20411.7, 300 sec: 20338.2). Total num frames: 9293824. Throughput: 0: 5116.3. Samples: 1318560. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0)
	[2024-10-08 04:00:47,938][04317] Avg episode reward: [(0, '25.236')]
	[2024-10-08 04:00:49,700][11572] Updated weights for policy 0, policy_version 2278 (0.0012)
	[2024-10-08 04:00:51,695][11572] Updated weights for policy 0, policy_version 2288 (0.0011)
	[2024-10-08 04:00:52,936][04317] Fps is (10 sec: 20480.1, 60 sec: 20480.0, 300 sec: 20340.9). Total num frames: 9396224. Throughput: 0: 5109.5. Samples: 1348452. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
	[2024-10-08 04:00:52,938][04317] Avg episode reward: [(0, '24.036')]
	[2024-10-08 04:00:53,643][11572] Updated weights for policy 0, policy_version 2298 (0.0011)
	[2024-10-08 04:00:55,599][11572] Updated weights for policy 0, policy_version 2308 (0.0011)
	[2024-10-08 04:00:57,567][11572] Updated weights for policy 0, policy_version 2318 (0.0012)
	[2024-10-08 04:00:57,936][04317] Fps is (10 sec: 20480.0, 60 sec: 20480.0, 300 sec: 20343.5). Total num frames: 9498624. Throughput: 0: 5115.9. Samples: 1364082. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2024-10-08 04:00:57,939][04317] Avg episode reward: [(0, '27.558')]
	[2024-10-08 04:00:57,958][11559] Saving new best policy, reward=27.558!
	[2024-10-08 04:00:59,558][11572] Updated weights for policy 0, policy_version 2328 (0.0012)
	[2024-10-08 04:01:01,556][11572] Updated weights for policy 0, policy_version 2338 (0.0012)
	[2024-10-08 04:01:02,936][04317] Fps is (10 sec: 20479.9, 60 sec: 20411.7, 300 sec: 20345.9). Total num frames: 9601024. Throughput: 0: 5117.3. Samples: 1395122. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
	[2024-10-08 04:01:02,938][04317] Avg episode reward: [(0, '24.142')]
	[2024-10-08 04:01:03,643][11572] Updated weights for policy 0, policy_version 2348 (0.0011)
	[2024-10-08 04:01:05,713][11572] Updated weights for policy 0, policy_version 2358 (0.0012)
	[2024-10-08 04:01:07,669][11572] Updated weights for policy 0, policy_version 2368 (0.0011)
	[2024-10-08 04:01:07,936][04317] Fps is (10 sec: 20480.1, 60 sec: 20480.0, 300 sec: 20348.3). Total num frames: 9703424. Throughput: 0: 5125.6. Samples: 1425292. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
	[2024-10-08 04:01:07,939][04317] Avg episode reward: [(0, '24.929')]
	[2024-10-08 04:01:09,647][11572] Updated weights for policy 0, policy_version 2378 (0.0012)
	[2024-10-08 04:01:11,609][11572] Updated weights for policy 0, policy_version 2388 (0.0011)
	[2024-10-08 04:01:12,936][04317] Fps is (10 sec: 20480.0, 60 sec: 20480.0, 300 sec: 20350.7). Total num frames: 9805824. Throughput: 0: 5126.0. Samples: 1440936. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
	[2024-10-08 04:01:12,938][04317] Avg episode reward: [(0, '26.500')]
	[2024-10-08 04:01:13,542][11572] Updated weights for policy 0, policy_version 2398 (0.0011)
	[2024-10-08 04:01:15,512][11572] Updated weights for policy 0, policy_version 2408 (0.0011)
	[2024-10-08 04:01:17,620][11572] Updated weights for policy 0, policy_version 2418 (0.0011)
	[2024-10-08 04:01:17,936][04317] Fps is (10 sec: 20480.1, 60 sec: 20411.8, 300 sec: 20352.9). Total num frames: 9908224. Throughput: 0: 5118.2. Samples: 1471850. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
	[2024-10-08 04:01:17,939][04317] Avg episode reward: [(0, '26.629')]
	[2024-10-08 04:01:19,724][11572] Updated weights for policy 0, policy_version 2428 (0.0011)
	[2024-10-08 04:01:21,697][11572] Updated weights for policy 0, policy_version 2438 (0.0011)
	[2024-10-08 04:01:22,936][04317] Fps is (10 sec: 20480.0, 60 sec: 20480.0, 300 sec: 20355.0). Total num frames: 10010624. Throughput: 0: 5128.2. Samples: 1502288. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0)
	[2024-10-08 04:01:22,939][04317] Avg episode reward: [(0, '25.941')]
	[2024-10-08 04:01:23,639][11572] Updated weights for policy 0, policy_version 2448 (0.0011)
	[2024-10-08 04:01:25,617][11572] Updated weights for policy 0, policy_version 2458 (0.0011)
	[2024-10-08 04:01:27,569][11572] Updated weights for policy 0, policy_version 2468 (0.0011)
	[2024-10-08 04:01:27,937][04317] Fps is (10 sec: 20478.5, 60 sec: 20479.8, 300 sec: 20480.0). Total num frames: 10113024. Throughput: 0: 5128.7. Samples: 1517922. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
	[2024-10-08 04:01:27,939][04317] Avg episode reward: [(0, '27.799')]
	[2024-10-08 04:01:27,948][11559] Saving new best policy, reward=27.799!
	[2024-10-08 04:01:29,531][11572] Updated weights for policy 0, policy_version 2478 (0.0011)
	[2024-10-08 04:01:31,607][11572] Updated weights for policy 0, policy_version 2488 (0.0012)
	[2024-10-08 04:01:32,936][04317] Fps is (10 sec: 20480.0, 60 sec: 20480.0, 300 sec: 20493.9). Total num frames: 10215424. Throughput: 0: 5110.3. Samples: 1548524. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
	[2024-10-08 04:01:32,939][04317] Avg episode reward: [(0, '24.315')]
	[2024-10-08 04:01:33,671][11572] Updated weights for policy 0, policy_version 2498 (0.0011)
	[2024-10-08 04:01:35,648][11572] Updated weights for policy 0, policy_version 2508 (0.0012)
	[2024-10-08 04:01:37,601][11572] Updated weights for policy 0, policy_version 2518 (0.0011)
	[2024-10-08 04:01:37,937][04317] Fps is (10 sec: 20481.0, 60 sec: 20479.9, 300 sec: 20493.9). Total num frames: 10317824. Throughput: 0: 5130.6. Samples: 1579332. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
	[2024-10-08 04:01:37,939][04317] Avg episode reward: [(0, '24.389')]
	[2024-10-08 04:01:39,574][11572] Updated weights for policy 0, policy_version 2528 (0.0012)
	[2024-10-08 04:01:41,530][11572] Updated weights for policy 0, policy_version 2538 (0.0011)
	[2024-10-08 04:01:42,936][04317] Fps is (10 sec: 20889.7, 60 sec: 20548.3, 300 sec: 20521.7). Total num frames: 10424320. Throughput: 0: 5131.1. Samples: 1594982. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
	[2024-10-08 04:01:42,938][04317] Avg episode reward: [(0, '22.886')]
	[2024-10-08 04:01:43,497][11572] Updated weights for policy 0, policy_version 2548 (0.0011)
	[2024-10-08 04:01:45,553][11572] Updated weights for policy 0, policy_version 2558 (0.0012)
	[2024-10-08 04:01:47,647][11572] Updated weights for policy 0, policy_version 2568 (0.0012)
	[2024-10-08 04:01:47,936][04317] Fps is (10 sec: 20480.1, 60 sec: 20480.0, 300 sec: 20507.8). Total num frames: 10522624. Throughput: 0: 5114.9. Samples: 1625292. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
	[2024-10-08 04:01:47,938][04317] Avg episode reward: [(0, '24.352')]
	[2024-10-08 04:01:49,668][11572] Updated weights for policy 0, policy_version 2578 (0.0011)
	[2024-10-08 04:01:51,632][11572] Updated weights for policy 0, policy_version 2588 (0.0012)
	[2024-10-08 04:01:52,936][04317] Fps is (10 sec: 20070.4, 60 sec: 20480.0, 300 sec: 20507.8). Total num frames: 10625024. Throughput: 0: 5127.6. Samples: 1656032. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
	[2024-10-08 04:01:52,938][04317] Avg episode reward: [(0, '26.593')]
	[2024-10-08 04:01:53,611][11572] Updated weights for policy 0, policy_version 2598 (0.0011)
	[2024-10-08 04:01:55,585][11572] Updated weights for policy 0, policy_version 2608 (0.0011)
	[2024-10-08 04:01:57,539][11572] Updated weights for policy 0, policy_version 2618 (0.0011)
	[2024-10-08 04:01:57,936][04317] Fps is (10 sec: 20889.8, 60 sec: 20548.3, 300 sec: 20521.7). Total num frames: 10731520. Throughput: 0: 5127.3. Samples: 1671666. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0)
	[2024-10-08 04:01:57,939][04317] Avg episode reward: [(0, '26.743')]
	[2024-10-08 04:01:59,616][11572] Updated weights for policy 0, policy_version 2628 (0.0011)
	[2024-10-08 04:02:01,695][11572] Updated weights for policy 0, policy_version 2638 (0.0011)
	[2024-10-08 04:02:02,936][04317] Fps is (10 sec: 20480.1, 60 sec: 20480.0, 300 sec: 20507.8). Total num frames: 10829824. Throughput: 0: 5109.9. Samples: 1701796. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
	[2024-10-08 04:02:02,939][04317] Avg episode reward: [(0, '24.577')]
	[2024-10-08 04:02:03,664][11572] Updated weights for policy 0, policy_version 2648 (0.0011)
	[2024-10-08 04:02:05,605][11572] Updated weights for policy 0, policy_version 2658 (0.0011)
	[2024-10-08 04:02:07,574][11572] Updated weights for policy 0, policy_version 2668 (0.0011)
	[2024-10-08 04:02:07,936][04317] Fps is (10 sec: 20070.5, 60 sec: 20480.0, 300 sec: 20493.9). Total num frames: 10932224. Throughput: 0: 5132.2. Samples: 1733238. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0)
	[2024-10-08 04:02:07,940][04317] Avg episode reward: [(0, '24.281')]
	[2024-10-08 04:02:09,504][11572] Updated weights for policy 0, policy_version 2678 (0.0011)
	[2024-10-08 04:02:11,463][11572] Updated weights for policy 0, policy_version 2688 (0.0012)
	[2024-10-08 04:02:12,936][04317] Fps is (10 sec: 20889.5, 60 sec: 20548.3, 300 sec: 20521.7). Total num frames: 11038720. Throughput: 0: 5133.7. Samples: 1748934. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
	[2024-10-08 04:02:12,939][04317] Avg episode reward: [(0, '23.592')]
	[2024-10-08 04:02:13,476][11572] Updated weights for policy 0, policy_version 2698 (0.0012)
	[2024-10-08 04:02:15,564][11572] Updated weights for policy 0, policy_version 2708 (0.0012)
	[2024-10-08 04:02:17,540][11572] Updated weights for policy 0, policy_version 2718 (0.0011)
	[2024-10-08 04:02:17,937][04317] Fps is (10 sec: 20479.6, 60 sec: 20479.9, 300 sec: 20493.9). Total num frames: 11137024. Throughput: 0: 5124.2. Samples: 1779114. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
	[2024-10-08 04:02:17,939][04317] Avg episode reward: [(0, '26.850')]
	[2024-10-08 04:02:17,956][11559] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002720_11141120.pth...
	[2024-10-08 04:02:18,026][11559] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001519_6221824.pth
	[2024-10-08 04:02:19,538][11572] Updated weights for policy 0, policy_version 2728 (0.0011)
	[2024-10-08 04:02:21,496][11572] Updated weights for policy 0, policy_version 2738 (0.0010)
	[2024-10-08 04:02:22,936][04317] Fps is (10 sec: 20480.0, 60 sec: 20548.3, 300 sec: 20507.8). Total num frames: 11243520. Throughput: 0: 5131.4. Samples: 1810242. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
	[2024-10-08 04:02:22,939][04317] Avg episode reward: [(0, '23.670')]
	[2024-10-08 04:02:23,453][11572] Updated weights for policy 0, policy_version 2748 (0.0012)
	[2024-10-08 04:02:25,434][11572] Updated weights for policy 0, policy_version 2758 (0.0011)
	[2024-10-08 04:02:27,458][11572] Updated weights for policy 0, policy_version 2768 (0.0011)
	[2024-10-08 04:02:27,936][04317] Fps is (10 sec: 20889.7, 60 sec: 20548.5, 300 sec: 20507.8). Total num frames: 11345920. Throughput: 0: 5130.5. Samples: 1825854. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
	[2024-10-08 04:02:27,939][04317] Avg episode reward: [(0, '26.611')]
	[2024-10-08 04:02:29,549][11572] Updated weights for policy 0, policy_version 2778 (0.0011)
	[2024-10-08 04:02:31,561][11572] Updated weights for policy 0, policy_version 2788 (0.0011)
	[2024-10-08 04:02:32,943][04317] Fps is (10 sec: 20466.7, 60 sec: 20546.1, 300 sec: 20507.3). Total num frames: 11448320. Throughput: 0: 5125.5. Samples: 1855974. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
	[2024-10-08 04:02:32,947][04317] Avg episode reward: [(0, '26.837')]
	[2024-10-08 04:02:33,534][11572] Updated weights for policy 0, policy_version 2798 (0.0011)
	[2024-10-08 04:02:35,514][11572] Updated weights for policy 0, policy_version 2808 (0.0011)
	[2024-10-08 04:02:37,466][11572] Updated weights for policy 0, policy_version 2818 (0.0011)
	[2024-10-08 04:02:37,936][04317] Fps is (10 sec: 20480.2, 60 sec: 20548.3, 300 sec: 20507.8). Total num frames: 11550720. Throughput: 0: 5137.2. Samples: 1887204. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
	[2024-10-08 04:02:37,938][04317] Avg episode reward: [(0, '25.663')]
	[2024-10-08 04:02:39,443][11572] Updated weights for policy 0, policy_version 2828 (0.0011)
	[2024-10-08 04:02:41,492][11572] Updated weights for policy 0, policy_version 2838 (0.0011)
	[2024-10-08 04:02:42,936][04317] Fps is (10 sec: 20083.2, 60 sec: 20411.7, 300 sec: 20507.8). Total num frames: 11649024. Throughput: 0: 5131.3. Samples: 1902576. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
	[2024-10-08 04:02:42,939][04317] Avg episode reward: [(0, '24.921')]
	[2024-10-08 04:02:43,607][11572] Updated weights for policy 0, policy_version 2848 (0.0012)
	[2024-10-08 04:02:45,610][11572] Updated weights for policy 0, policy_version 2858 (0.0012)
	[2024-10-08 04:02:47,575][11572] Updated weights for policy 0, policy_version 2868 (0.0011)
	[2024-10-08 04:02:47,936][04317] Fps is (10 sec: 20070.4, 60 sec: 20480.0, 300 sec: 20493.9). Total num frames: 11751424. Throughput: 0: 5131.0. Samples: 1932692. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0)
	[2024-10-08 04:02:47,939][04317] Avg episode reward: [(0, '25.724')]
	[2024-10-08 04:02:49,580][11572] Updated weights for policy 0, policy_version 2878 (0.0011)
	[2024-10-08 04:02:51,543][11572] Updated weights for policy 0, policy_version 2888 (0.0011)
	[2024-10-08 04:02:52,936][04317] Fps is (10 sec: 20889.7, 60 sec: 20548.3, 300 sec: 20493.9). Total num frames: 11857920. Throughput: 0: 5121.6. Samples: 1963710. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
	[2024-10-08 04:02:52,938][04317] Avg episode reward: [(0, '26.737')]
	[2024-10-08 04:02:53,498][11572] Updated weights for policy 0, policy_version 2898 (0.0011)
	[2024-10-08 04:02:55,504][11572] Updated weights for policy 0, policy_version 2908 (0.0011)
	[2024-10-08 04:02:57,596][11572] Updated weights for policy 0, policy_version 2918 (0.0011)
	[2024-10-08 04:02:57,936][04317] Fps is (10 sec: 20480.1, 60 sec: 20411.7, 300 sec: 20493.9). Total num frames: 11956224. Throughput: 0: 5110.2. Samples: 1978892. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2024-10-08 04:02:57,938][04317] Avg episode reward: [(0, '26.981')]
	[2024-10-08 04:02:59,600][11572] Updated weights for policy 0, policy_version 2928 (0.0012)
	[2024-10-08 04:03:00,187][11559] Stopping Batcher_0...
	[2024-10-08 04:03:00,188][11559] Loop batcher_evt_loop terminating...
	[2024-10-08 04:03:00,188][11559] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002931_12005376.pth...
	[2024-10-08 04:03:00,187][04317] Component Batcher_0 stopped!
	[2024-10-08 04:03:00,203][11572] Weights refcount: 2 0
	[2024-10-08 04:03:00,205][11572] Stopping InferenceWorker_p0-w0...
	[2024-10-08 04:03:00,205][11572] Loop inference_proc0-0_evt_loop terminating...
	[2024-10-08 04:03:00,205][04317] Component InferenceWorker_p0-w0 stopped!
	[2024-10-08 04:03:00,234][11576] Stopping RolloutWorker_w3...
	[2024-10-08 04:03:00,235][11576] Loop rollout_proc3_evt_loop terminating...
	[2024-10-08 04:03:00,235][11580] Stopping RolloutWorker_w7...
	[2024-10-08 04:03:00,235][11574] Stopping RolloutWorker_w1...
	[2024-10-08 04:03:00,235][11575] Stopping RolloutWorker_w2...
	[2024-10-08 04:03:00,235][11580] Loop rollout_proc7_evt_loop terminating...
	[2024-10-08 04:03:00,235][11574] Loop rollout_proc1_evt_loop terminating...
	[2024-10-08 04:03:00,235][11575] Loop rollout_proc2_evt_loop terminating...
	[2024-10-08 04:03:00,236][11573] Stopping RolloutWorker_w0...
	[2024-10-08 04:03:00,237][11573] Loop rollout_proc0_evt_loop terminating...
	[2024-10-08 04:03:00,234][04317] Component RolloutWorker_w3 stopped!
	[2024-10-08 04:03:00,238][11578] Stopping RolloutWorker_w5...
	[2024-10-08 04:03:00,239][11577] Stopping RolloutWorker_w4...
	[2024-10-08 04:03:00,239][11578] Loop rollout_proc5_evt_loop terminating...
	[2024-10-08 04:03:00,237][04317] Component RolloutWorker_w7 stopped!
	[2024-10-08 04:03:00,239][11577] Loop rollout_proc4_evt_loop terminating...
	[2024-10-08 04:03:00,239][04317] Component RolloutWorker_w1 stopped!
	[2024-10-08 04:03:00,240][11579] Stopping RolloutWorker_w6...
	[2024-10-08 04:03:00,242][11579] Loop rollout_proc6_evt_loop terminating...
	[2024-10-08 04:03:00,241][04317] Component RolloutWorker_w2 stopped!
	[2024-10-08 04:03:00,243][04317] Component RolloutWorker_w0 stopped!
	[2024-10-08 04:03:00,244][04317] Component RolloutWorker_w5 stopped!
	[2024-10-08 04:03:00,246][04317] Component RolloutWorker_w4 stopped!
	[2024-10-08 04:03:00,247][04317] Component RolloutWorker_w6 stopped!
	[2024-10-08 04:03:00,269][11559] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002120_8683520.pth
	[2024-10-08 04:03:00,279][11559] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002931_12005376.pth...
	[2024-10-08 04:03:00,402][11559] Stopping LearnerWorker_p0...
	[2024-10-08 04:03:00,403][11559] Loop learner_proc0_evt_loop terminating...
	[2024-10-08 04:03:00,402][04317] Component LearnerWorker_p0 stopped!
	[2024-10-08 04:03:00,406][04317] Waiting for process learner_proc0 to stop...
	[2024-10-08 04:03:01,041][04317] Waiting for process inference_proc0-0 to join...
	[2024-10-08 04:03:01,043][04317] Waiting for process rollout_proc0 to join...
	[2024-10-08 04:03:01,046][04317] Waiting for process rollout_proc1 to join...
	[2024-10-08 04:03:01,048][04317] Waiting for process rollout_proc2 to join...
	[2024-10-08 04:03:01,050][04317] Waiting for process rollout_proc3 to join...
	[2024-10-08 04:03:01,053][04317] Waiting for process rollout_proc4 to join...
	[2024-10-08 04:03:01,055][04317] Waiting for process rollout_proc5 to join...
	[2024-10-08 04:03:01,057][04317] Waiting for process rollout_proc6 to join...
	[2024-10-08 04:03:01,061][04317] Waiting for process rollout_proc7 to join...
	[2024-10-08 04:03:01,063][04317] Batcher 0 profile tree view:
	batching: 31.8651, releasing_batches: 0.0427
	[2024-10-08 04:03:01,065][04317] InferenceWorker_p0-w0 profile tree view:
	wait_policy: 0.0000
	wait_policy_total: 6.1122
	update_model: 5.8496
	weight_update: 0.0012
	one_step: 0.0025
	handle_policy_step: 362.9040
	deserialize: 15.0768, stack: 2.3418, obs_to_device_normalize: 88.5838, forward: 166.3535, send_messages: 26.4312
	prepare_outputs: 47.7377
	to_cpu: 30.6777
	[2024-10-08 04:03:01,066][04317] Learner 0 profile tree view:
	misc: 0.0094, prepare_batch: 15.7864
	train: 46.5480
	epoch_init: 0.0110, minibatch_init: 0.0112, losses_postprocess: 0.6159, kl_divergence: 0.8342, after_optimizer: 1.1864
	calculate_losses: 15.2727
	losses_init: 0.0070, forward_head: 1.8093, bptt_initial: 6.5186, tail: 1.2817, advantages_returns: 0.3389, losses: 2.1307
	bptt: 2.8336
	bptt_forward_core: 2.7219
	update: 27.9222
	clip: 2.2075
	[2024-10-08 04:03:01,068][04317] RolloutWorker_w0 profile tree view:
	wait_for_trajectories: 0.2941, enqueue_policy_requests: 15.2562, env_step: 250.2268, overhead: 12.3881, complete_rollouts: 0.4594
	save_policy_outputs: 20.7463
	split_output_tensors: 7.1634
	[2024-10-08 04:03:01,069][04317] RolloutWorker_w7 profile tree view:
	wait_for_trajectories: 0.2953, enqueue_policy_requests: 15.3496, env_step: 249.8908, overhead: 12.3257, complete_rollouts: 0.4593
	save_policy_outputs: 20.5782
	split_output_tensors: 7.1420
	[2024-10-08 04:03:01,071][04317] Loop Runner_EvtLoop terminating...
	[2024-10-08 04:03:01,072][04317] Runner profile tree view:
	main_loop: 401.5654
	[2024-10-08 04:03:01,074][04317] Collected {0: 12005376}, FPS: 19920.8
	[2024-10-08 04:03:09,445][04317] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
	[2024-10-08 04:03:09,446][04317] Overriding arg 'num_workers' with value 1 passed from command line
	[2024-10-08 04:03:09,447][04317] Adding new argument 'no_render'=True that is not in the saved config file!
	[2024-10-08 04:03:09,449][04317] Adding new argument 'save_video'=True that is not in the saved config file!
	[2024-10-08 04:03:09,450][04317] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
	[2024-10-08 04:03:09,451][04317] Adding new argument 'video_name'=None that is not in the saved config file!
	[2024-10-08 04:03:09,452][04317] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file!
	[2024-10-08 04:03:09,454][04317] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
	[2024-10-08 04:03:09,455][04317] Adding new argument 'push_to_hub'=False that is not in the saved config file!
	[2024-10-08 04:03:09,456][04317] Adding new argument 'hf_repository'=None that is not in the saved config file!
	[2024-10-08 04:03:09,457][04317] Adding new argument 'policy_index'=0 that is not in the saved config file!
	[2024-10-08 04:03:09,458][04317] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
	[2024-10-08 04:03:09,461][04317] Adding new argument 'train_script'=None that is not in the saved config file!
	[2024-10-08 04:03:09,462][04317] Adding new argument 'enjoy_script'=None that is not in the saved config file!
	[2024-10-08 04:03:09,464][04317] Using frameskip 1 and render_action_repeat=4 for evaluation
	[2024-10-08 04:03:09,470][04317] RunningMeanStd input shape: (3, 72, 128)
	[2024-10-08 04:03:09,472][04317] RunningMeanStd input shape: (1,)
	[2024-10-08 04:03:09,483][04317] ConvEncoder: input_channels=3
	[2024-10-08 04:03:09,520][04317] Conv encoder output size: 512
	[2024-10-08 04:03:09,521][04317] Policy head output size: 512
	[2024-10-08 04:03:09,542][04317] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002931_12005376.pth...
	[2024-10-08 04:03:10,048][04317] Num frames 100...
	[2024-10-08 04:03:10,171][04317] Num frames 200...
	[2024-10-08 04:03:10,290][04317] Num frames 300...
	[2024-10-08 04:03:10,411][04317] Num frames 400...
	[2024-10-08 04:03:10,535][04317] Num frames 500...
	[2024-10-08 04:03:10,657][04317] Num frames 600...
	[2024-10-08 04:03:10,774][04317] Num frames 700...
	[2024-10-08 04:03:10,870][04317] Avg episode rewards: #0: 13.360, true rewards: #0: 7.360
	[2024-10-08 04:03:10,872][04317] Avg episode reward: 13.360, avg true_objective: 7.360
	[2024-10-08 04:03:10,945][04317] Num frames 800...
	[2024-10-08 04:03:11,059][04317] Num frames 900...
	[2024-10-08 04:03:11,177][04317] Num frames 1000...
	[2024-10-08 04:03:11,291][04317] Num frames 1100...
	[2024-10-08 04:03:11,406][04317] Num frames 1200...
	[2024-10-08 04:03:11,528][04317] Num frames 1300...
	[2024-10-08 04:03:11,650][04317] Num frames 1400...
	[2024-10-08 04:03:11,767][04317] Num frames 1500...
	[2024-10-08 04:03:11,880][04317] Num frames 1600...
	[2024-10-08 04:03:11,997][04317] Num frames 1700...
	[2024-10-08 04:03:12,113][04317] Num frames 1800...
	[2024-10-08 04:03:12,227][04317] Num frames 1900...
	[2024-10-08 04:03:12,343][04317] Num frames 2000...
	[2024-10-08 04:03:12,460][04317] Num frames 2100...
	[2024-10-08 04:03:12,575][04317] Num frames 2200...
	[2024-10-08 04:03:12,693][04317] Num frames 2300...
	[2024-10-08 04:03:12,789][04317] Avg episode rewards: #0: 25.180, true rewards: #0: 11.680
	[2024-10-08 04:03:12,790][04317] Avg episode reward: 25.180, avg true_objective: 11.680
	[2024-10-08 04:03:12,864][04317] Num frames 2400...
	[2024-10-08 04:03:12,978][04317] Num frames 2500...
	[2024-10-08 04:03:13,095][04317] Num frames 2600...
	[2024-10-08 04:03:13,210][04317] Num frames 2700...
	[2024-10-08 04:03:13,322][04317] Num frames 2800...
	[2024-10-08 04:03:13,438][04317] Num frames 2900...
	[2024-10-08 04:03:13,553][04317] Num frames 3000...
	[2024-10-08 04:03:13,671][04317] Num frames 3100...
	[2024-10-08 04:03:13,786][04317] Num frames 3200...
	[2024-10-08 04:03:13,901][04317] Num frames 3300...
	[2024-10-08 04:03:14,018][04317] Num frames 3400...
	[2024-10-08 04:03:14,138][04317] Num frames 3500...
	[2024-10-08 04:03:14,253][04317] Num frames 3600...
	[2024-10-08 04:03:14,368][04317] Num frames 3700...
	[2024-10-08 04:03:14,487][04317] Num frames 3800...
	[2024-10-08 04:03:14,603][04317] Num frames 3900...
	[2024-10-08 04:03:14,725][04317] Avg episode rewards: #0: 30.187, true rewards: #0: 13.187
	[2024-10-08 04:03:14,727][04317] Avg episode reward: 30.187, avg true_objective: 13.187
	[2024-10-08 04:03:14,778][04317] Num frames 4000...
	[2024-10-08 04:03:14,892][04317] Num frames 4100...
	[2024-10-08 04:03:15,006][04317] Num frames 4200...
	[2024-10-08 04:03:15,122][04317] Num frames 4300...
	[2024-10-08 04:03:15,239][04317] Num frames 4400...
	[2024-10-08 04:03:15,356][04317] Num frames 4500...
	[2024-10-08 04:03:15,471][04317] Num frames 4600...
	[2024-10-08 04:03:15,589][04317] Num frames 4700...
	[2024-10-08 04:03:15,708][04317] Num frames 4800...
	[2024-10-08 04:03:15,825][04317] Num frames 4900...
	[2024-10-08 04:03:15,944][04317] Num frames 5000...
	[2024-10-08 04:03:16,062][04317] Num frames 5100...
	[2024-10-08 04:03:16,176][04317] Num frames 5200...
	[2024-10-08 04:03:16,291][04317] Num frames 5300...
	[2024-10-08 04:03:16,418][04317] Avg episode rewards: #0: 30.410, true rewards: #0: 13.410
	[2024-10-08 04:03:16,419][04317] Avg episode reward: 30.410, avg true_objective: 13.410
	[2024-10-08 04:03:16,465][04317] Num frames 5400...
	[2024-10-08 04:03:16,578][04317] Num frames 5500...
	[2024-10-08 04:03:16,695][04317] Num frames 5600...
	[2024-10-08 04:03:16,809][04317] Num frames 5700...
	[2024-10-08 04:03:16,921][04317] Num frames 5800...
	[2024-10-08 04:03:17,037][04317] Num frames 5900...
	[2024-10-08 04:03:17,151][04317] Num frames 6000...
	[2024-10-08 04:03:17,264][04317] Num frames 6100...
	[2024-10-08 04:03:17,379][04317] Num frames 6200...
	[2024-10-08 04:03:17,493][04317] Num frames 6300...
	[2024-10-08 04:03:17,611][04317] Num frames 6400...
	[2024-10-08 04:03:17,730][04317] Num frames 6500...
	[2024-10-08 04:03:17,868][04317] Avg episode rewards: #0: 30.142, true rewards: #0: 13.142
	[2024-10-08 04:03:17,869][04317] Avg episode reward: 30.142, avg true_objective: 13.142
	[2024-10-08 04:03:17,905][04317] Num frames 6600...
	[2024-10-08 04:03:18,019][04317] Num frames 6700...
	[2024-10-08 04:03:18,134][04317] Num frames 6800...
	[2024-10-08 04:03:18,251][04317] Num frames 6900...
	[2024-10-08 04:03:18,366][04317] Num frames 7000...
	[2024-10-08 04:03:18,483][04317] Num frames 7100...
	[2024-10-08 04:03:18,600][04317] Num frames 7200...
	[2024-10-08 04:03:18,718][04317] Num frames 7300...
	[2024-10-08 04:03:18,834][04317] Num frames 7400...
	[2024-10-08 04:03:18,949][04317] Num frames 7500...
	[2024-10-08 04:03:19,068][04317] Num frames 7600...
	[2024-10-08 04:03:19,184][04317] Num frames 7700...
	[2024-10-08 04:03:19,298][04317] Num frames 7800...
	[2024-10-08 04:03:19,377][04317] Avg episode rewards: #0: 29.865, true rewards: #0: 13.032
	[2024-10-08 04:03:19,379][04317] Avg episode reward: 29.865, avg true_objective: 13.032
	[2024-10-08 04:03:19,477][04317] Num frames 7900...
	[2024-10-08 04:03:19,594][04317] Num frames 8000...
	[2024-10-08 04:03:19,752][04317] Avg episode rewards: #0: 26.411, true rewards: #0: 11.554
	[2024-10-08 04:03:19,754][04317] Avg episode reward: 26.411, avg true_objective: 11.554
	[2024-10-08 04:03:19,770][04317] Num frames 8100...
	[2024-10-08 04:03:19,884][04317] Num frames 8200...
	[2024-10-08 04:03:20,000][04317] Num frames 8300...
	[2024-10-08 04:03:20,114][04317] Num frames 8400...
	[2024-10-08 04:03:20,236][04317] Num frames 8500...
	[2024-10-08 04:03:20,352][04317] Num frames 8600...
	[2024-10-08 04:03:20,469][04317] Num frames 8700...
	[2024-10-08 04:03:20,583][04317] Num frames 8800...
	[2024-10-08 04:03:20,708][04317] Num frames 8900...
	[2024-10-08 04:03:20,824][04317] Num frames 9000...
	[2024-10-08 04:03:20,898][04317] Avg episode rewards: #0: 25.645, true rewards: #0: 11.270
	[2024-10-08 04:03:20,900][04317] Avg episode reward: 25.645, avg true_objective: 11.270
	[2024-10-08 04:03:20,999][04317] Num frames 9100...
	[2024-10-08 04:03:21,121][04317] Num frames 9200...
	[2024-10-08 04:03:21,243][04317] Num frames 9300...
	[2024-10-08 04:03:21,367][04317] Num frames 9400...
	[2024-10-08 04:03:21,484][04317] Num frames 9500...
	[2024-10-08 04:03:21,603][04317] Num frames 9600...
	[2024-10-08 04:03:21,724][04317] Num frames 9700...
	[2024-10-08 04:03:21,844][04317] Num frames 9800...
	[2024-10-08 04:03:21,962][04317] Num frames 9900...
	[2024-10-08 04:03:22,067][04317] Avg episode rewards: #0: 25.493, true rewards: #0: 11.049
	[2024-10-08 04:03:22,069][04317] Avg episode reward: 25.493, avg true_objective: 11.049
	[2024-10-08 04:03:22,137][04317] Num frames 10000...
	[2024-10-08 04:03:22,250][04317] Num frames 10100...
	[2024-10-08 04:03:22,365][04317] Num frames 10200...
	[2024-10-08 04:03:22,482][04317] Num frames 10300...
	[2024-10-08 04:03:22,597][04317] Num frames 10400...
	[2024-10-08 04:03:22,719][04317] Num frames 10500...
	[2024-10-08 04:03:22,833][04317] Num frames 10600...
	[2024-10-08 04:03:22,948][04317] Num frames 10700...
	[2024-10-08 04:03:23,065][04317] Num frames 10800...
	[2024-10-08 04:03:23,183][04317] Num frames 10900...
	[2024-10-08 04:03:23,298][04317] Num frames 11000...
	[2024-10-08 04:03:23,414][04317] Num frames 11100...
	[2024-10-08 04:03:23,530][04317] Num frames 11200...
	[2024-10-08 04:03:23,649][04317] Num frames 11300...
	[2024-10-08 04:03:23,767][04317] Num frames 11400...
	[2024-10-08 04:03:23,881][04317] Num frames 11500...
	[2024-10-08 04:03:23,997][04317] Num frames 11600...
	[2024-10-08 04:03:24,112][04317] Num frames 11700...
	[2024-10-08 04:03:24,229][04317] Num frames 11800...
	[2024-10-08 04:03:24,348][04317] Num frames 11900...
	[2024-10-08 04:03:24,476][04317] Avg episode rewards: #0: 28.460, true rewards: #0: 11.960
	[2024-10-08 04:03:24,477][04317] Avg episode reward: 28.460, avg true_objective: 11.960
	[2024-10-08 04:03:52,768][04317] Replay video saved to /content/train_dir/default_experiment/replay.mp4!
	[2024-10-08 04:04:55,645][04317] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
	[2024-10-08 04:04:55,646][04317] Overriding arg 'num_workers' with value 1 passed from command line
	[2024-10-08 04:04:55,647][04317] Adding new argument 'no_render'=True that is not in the saved config file!
	[2024-10-08 04:04:55,649][04317] Adding new argument 'save_video'=True that is not in the saved config file!
	[2024-10-08 04:04:55,650][04317] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
	[2024-10-08 04:04:55,652][04317] Adding new argument 'video_name'=None that is not in the saved config file!
	[2024-10-08 04:04:55,653][04317] Adding new argument 'max_num_frames'=100000 that is not in the saved config file!
	[2024-10-08 04:04:55,654][04317] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
	[2024-10-08 04:04:55,655][04317] Adding new argument 'push_to_hub'=True that is not in the saved config file!
	[2024-10-08 04:04:55,657][04317] Adding new argument 'hf_repository'='EntropicLettuce/rl_course_vizdoom_health_gathering_supreme_b' that is not in the saved config file!
	[2024-10-08 04:04:55,659][04317] Adding new argument 'policy_index'=0 that is not in the saved config file!
	[2024-10-08 04:04:55,660][04317] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
	[2024-10-08 04:04:55,661][04317] Adding new argument 'train_script'=None that is not in the saved config file!
	[2024-10-08 04:04:55,663][04317] Adding new argument 'enjoy_script'=None that is not in the saved config file!
	[2024-10-08 04:04:55,664][04317] Using frameskip 1 and render_action_repeat=4 for evaluation
	[2024-10-08 04:04:55,672][04317] RunningMeanStd input shape: (3, 72, 128)
	[2024-10-08 04:04:55,674][04317] RunningMeanStd input shape: (1,)
	[2024-10-08 04:04:55,685][04317] ConvEncoder: input_channels=3
	[2024-10-08 04:04:55,722][04317] Conv encoder output size: 512
	[2024-10-08 04:04:55,724][04317] Policy head output size: 512
	[2024-10-08 04:04:55,744][04317] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002931_12005376.pth...
	[2024-10-08 04:04:56,235][04317] Num frames 100...
	[2024-10-08 04:04:56,350][04317] Num frames 200...
	[2024-10-08 04:04:56,469][04317] Num frames 300...
	[2024-10-08 04:04:56,592][04317] Num frames 400...
	[2024-10-08 04:04:56,710][04317] Num frames 500...
	[2024-10-08 04:04:56,821][04317] Num frames 600...
	[2024-10-08 04:04:56,932][04317] Num frames 700...
	[2024-10-08 04:04:57,028][04317] Avg episode rewards: #0: 14.370, true rewards: #0: 7.370
	[2024-10-08 04:04:57,030][04317] Avg episode reward: 14.370, avg true_objective: 7.370
	[2024-10-08 04:04:57,104][04317] Num frames 800...
	[2024-10-08 04:04:57,218][04317] Num frames 900...
	[2024-10-08 04:04:57,333][04317] Num frames 1000...
	[2024-10-08 04:04:57,447][04317] Num frames 1100...
	[2024-10-08 04:04:57,560][04317] Num frames 1200...
	[2024-10-08 04:04:57,675][04317] Num frames 1300...
	[2024-10-08 04:04:57,788][04317] Num frames 1400...
	[2024-10-08 04:04:57,903][04317] Num frames 1500...
	[2024-10-08 04:04:58,016][04317] Num frames 1600...
	[2024-10-08 04:04:58,133][04317] Num frames 1700...
	[2024-10-08 04:04:58,258][04317] Avg episode rewards: #0: 16.305, true rewards: #0: 8.805
	[2024-10-08 04:04:58,259][04317] Avg episode reward: 16.305, avg true_objective: 8.805
	[2024-10-08 04:04:58,305][04317] Num frames 1800...
	[2024-10-08 04:04:58,421][04317] Num frames 1900...
	[2024-10-08 04:04:58,536][04317] Num frames 2000...
	[2024-10-08 04:04:58,656][04317] Num frames 2100...
	[2024-10-08 04:04:58,770][04317] Num frames 2200...
	[2024-10-08 04:04:58,881][04317] Num frames 2300...
	[2024-10-08 04:04:58,994][04317] Num frames 2400...
	[2024-10-08 04:04:59,088][04317] Avg episode rewards: #0: 14.443, true rewards: #0: 8.110
	[2024-10-08 04:04:59,090][04317] Avg episode reward: 14.443, avg true_objective: 8.110
	[2024-10-08 04:04:59,167][04317] Num frames 2500...
	[2024-10-08 04:04:59,281][04317] Num frames 2600...
	[2024-10-08 04:04:59,395][04317] Num frames 2700...
	[2024-10-08 04:04:59,509][04317] Num frames 2800...
	[2024-10-08 04:04:59,584][04317] Avg episode rewards: #0: 11.793, true rewards: #0: 7.042
	[2024-10-08 04:04:59,585][04317] Avg episode reward: 11.793, avg true_objective: 7.042
	[2024-10-08 04:04:59,684][04317] Num frames 2900...
	[2024-10-08 04:04:59,796][04317] Num frames 3000...
	[2024-10-08 04:04:59,909][04317] Num frames 3100...
	[2024-10-08 04:05:00,026][04317] Num frames 3200...
	[2024-10-08 04:05:00,102][04317] Avg episode rewards: #0: 11.436, true rewards: #0: 6.436
	[2024-10-08 04:05:00,104][04317] Avg episode reward: 11.436, avg true_objective: 6.436
	[2024-10-08 04:05:00,198][04317] Num frames 3300...
	[2024-10-08 04:05:00,315][04317] Num frames 3400...
	[2024-10-08 04:05:00,428][04317] Num frames 3500...
	[2024-10-08 04:05:00,546][04317] Num frames 3600...
	[2024-10-08 04:05:00,668][04317] Num frames 3700...
	[2024-10-08 04:05:00,784][04317] Num frames 3800...
	[2024-10-08 04:05:00,898][04317] Num frames 3900...
	[2024-10-08 04:05:01,013][04317] Num frames 4000...
	[2024-10-08 04:05:01,125][04317] Num frames 4100...
	[2024-10-08 04:05:01,239][04317] Num frames 4200...
	[2024-10-08 04:05:01,377][04317] Avg episode rewards: #0: 13.123, true rewards: #0: 7.123
	[2024-10-08 04:05:01,378][04317] Avg episode reward: 13.123, avg true_objective: 7.123
	[2024-10-08 04:05:01,410][04317] Num frames 4300...
	[2024-10-08 04:05:01,525][04317] Num frames 4400...
	[2024-10-08 04:05:01,645][04317] Num frames 4500...
	[2024-10-08 04:05:01,761][04317] Num frames 4600...
	[2024-10-08 04:05:01,880][04317] Num frames 4700...
	[2024-10-08 04:05:02,035][04317] Avg episode rewards: #0: 12.694, true rewards: #0: 6.837
	[2024-10-08 04:05:02,037][04317] Avg episode reward: 12.694, avg true_objective: 6.837
	[2024-10-08 04:05:02,055][04317] Num frames 4800...
	[2024-10-08 04:05:02,170][04317] Num frames 4900...
	[2024-10-08 04:05:02,287][04317] Num frames 5000...
	[2024-10-08 04:05:02,409][04317] Num frames 5100...
	[2024-10-08 04:05:02,528][04317] Num frames 5200...
	[2024-10-08 04:05:02,648][04317] Num frames 5300...
	[2024-10-08 04:05:02,769][04317] Num frames 5400...
	[2024-10-08 04:05:02,890][04317] Num frames 5500...
	[2024-10-08 04:05:03,011][04317] Num frames 5600...
	[2024-10-08 04:05:03,124][04317] Num frames 5700...
	[2024-10-08 04:05:03,231][04317] Avg episode rewards: #0: 13.308, true rewards: #0: 7.182
	[2024-10-08 04:05:03,233][04317] Avg episode reward: 13.308, avg true_objective: 7.182
	[2024-10-08 04:05:03,299][04317] Num frames 5800...
	[2024-10-08 04:05:03,422][04317] Num frames 5900...
	[2024-10-08 04:05:03,545][04317] Num frames 6000...
	[2024-10-08 04:05:03,667][04317] Num frames 6100...
	[2024-10-08 04:05:03,789][04317] Num frames 6200...
	[2024-10-08 04:05:03,915][04317] Num frames 6300...
	[2024-10-08 04:05:04,039][04317] Num frames 6400...
	[2024-10-08 04:05:04,164][04317] Num frames 6500...
	[2024-10-08 04:05:04,288][04317] Num frames 6600...
	[2024-10-08 04:05:04,410][04317] Num frames 6700...
	[2024-10-08 04:05:04,535][04317] Num frames 6800...
	[2024-10-08 04:05:04,654][04317] Num frames 6900...
	[2024-10-08 04:05:04,776][04317] Num frames 7000...
	[2024-10-08 04:05:04,897][04317] Num frames 7100...
	[2024-10-08 04:05:05,011][04317] Num frames 7200...
	[2024-10-08 04:05:05,135][04317] Avg episode rewards: #0: 15.622, true rewards: #0: 8.067
	[2024-10-08 04:05:05,137][04317] Avg episode reward: 15.622, avg true_objective: 8.067
	[2024-10-08 04:05:05,189][04317] Num frames 7300...
	[2024-10-08 04:05:05,309][04317] Num frames 7400...
	[2024-10-08 04:05:05,429][04317] Num frames 7500...
	[2024-10-08 04:05:05,551][04317] Num frames 7600...
	[2024-10-08 04:05:05,671][04317] Num frames 7700...
	[2024-10-08 04:05:05,788][04317] Num frames 7800...
	[2024-10-08 04:05:05,906][04317] Num frames 7900...
	[2024-10-08 04:05:06,029][04317] Num frames 8000...
	[2024-10-08 04:05:06,199][04317] Avg episode rewards: #0: 15.793, true rewards: #0: 8.093
	[2024-10-08 04:05:06,201][04317] Avg episode reward: 15.793, avg true_objective: 8.093
	[2024-10-08 04:05:25,177][04317] Replay video saved to /content/train_dir/default_experiment/replay.mp4!