[2023-02-27 10:46:05,198][00394] Saving configuration to /content/train_dir/default_experiment/config.json... [2023-02-27 10:46:05,201][00394] Rollout worker 0 uses device cpu [2023-02-27 10:46:05,204][00394] Rollout worker 1 uses device cpu [2023-02-27 10:46:05,205][00394] Rollout worker 2 uses device cpu [2023-02-27 10:46:05,207][00394] Rollout worker 3 uses device cpu [2023-02-27 10:46:05,209][00394] Rollout worker 4 uses device cpu [2023-02-27 10:46:05,210][00394] Rollout worker 5 uses device cpu [2023-02-27 10:46:05,211][00394] Rollout worker 6 uses device cpu [2023-02-27 10:46:05,213][00394] Rollout worker 7 uses device cpu [2023-02-27 10:46:05,405][00394] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-27 10:46:05,407][00394] InferenceWorker_p0-w0: min num requests: 2 [2023-02-27 10:46:05,440][00394] Starting all processes... [2023-02-27 10:46:05,441][00394] Starting process learner_proc0 [2023-02-27 10:46:05,499][00394] Starting all processes... [2023-02-27 10:46:05,507][00394] Starting process inference_proc0-0 [2023-02-27 10:46:05,508][00394] Starting process rollout_proc0 [2023-02-27 10:46:05,510][00394] Starting process rollout_proc1 [2023-02-27 10:46:05,510][00394] Starting process rollout_proc2 [2023-02-27 10:46:05,510][00394] Starting process rollout_proc3 [2023-02-27 10:46:05,510][00394] Starting process rollout_proc4 [2023-02-27 10:46:05,510][00394] Starting process rollout_proc5 [2023-02-27 10:46:05,510][00394] Starting process rollout_proc6 [2023-02-27 10:46:05,510][00394] Starting process rollout_proc7 [2023-02-27 10:46:14,657][11881] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-27 10:46:14,658][11881] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2023-02-27 10:46:15,299][11881] Num visible devices: 1 [2023-02-27 10:46:15,305][11899] Worker 4 uses CPU cores [0] [2023-02-27 10:46:15,323][11881] Starting seed is not provided [2023-02-27 10:46:15,324][11881] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-27 10:46:15,324][11881] Initializing actor-critic model on device cuda:0 [2023-02-27 10:46:15,325][11881] RunningMeanStd input shape: (3, 72, 128) [2023-02-27 10:46:15,327][11881] RunningMeanStd input shape: (1,) [2023-02-27 10:46:15,444][11881] ConvEncoder: input_channels=3 [2023-02-27 10:46:15,446][11897] Worker 1 uses CPU cores [1] [2023-02-27 10:46:15,510][11901] Worker 5 uses CPU cores [1] [2023-02-27 10:46:15,514][11900] Worker 3 uses CPU cores [1] [2023-02-27 10:46:15,538][11895] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-27 10:46:15,538][11895] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2023-02-27 10:46:15,539][11898] Worker 2 uses CPU cores [0] [2023-02-27 10:46:15,558][11895] Num visible devices: 1 [2023-02-27 10:46:15,560][11902] Worker 6 uses CPU cores [0] [2023-02-27 10:46:15,596][11903] Worker 7 uses CPU cores [1] [2023-02-27 10:46:15,712][11896] Worker 0 uses CPU cores [0] [2023-02-27 10:46:15,919][11881] Conv encoder output size: 512 [2023-02-27 10:46:15,920][11881] Policy head output size: 512 [2023-02-27 10:46:15,978][11881] Created Actor Critic model with architecture: [2023-02-27 10:46:15,978][11881] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2023-02-27 10:46:23,492][11881] Using optimizer [2023-02-27 10:46:23,493][11881] No checkpoints found [2023-02-27 10:46:23,494][11881] Did not load from checkpoint, starting from scratch! [2023-02-27 10:46:23,494][11881] Initialized policy 0 weights for model version 0 [2023-02-27 10:46:23,497][11881] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-27 10:46:23,505][11881] LearnerWorker_p0 finished initialization! [2023-02-27 10:46:23,688][11895] RunningMeanStd input shape: (3, 72, 128) [2023-02-27 10:46:23,689][11895] RunningMeanStd input shape: (1,) [2023-02-27 10:46:23,702][11895] ConvEncoder: input_channels=3 [2023-02-27 10:46:23,801][11895] Conv encoder output size: 512 [2023-02-27 10:46:23,801][11895] Policy head output size: 512 [2023-02-27 10:46:25,091][00394] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-02-27 10:46:25,398][00394] Heartbeat connected on Batcher_0 [2023-02-27 10:46:25,402][00394] Heartbeat connected on LearnerWorker_p0 [2023-02-27 10:46:25,416][00394] Heartbeat connected on RolloutWorker_w0 [2023-02-27 10:46:25,421][00394] Heartbeat connected on RolloutWorker_w1 [2023-02-27 10:46:25,425][00394] Heartbeat connected on RolloutWorker_w2 [2023-02-27 10:46:25,426][00394] Heartbeat connected on RolloutWorker_w3 [2023-02-27 10:46:25,430][00394] Heartbeat connected on RolloutWorker_w4 [2023-02-27 10:46:25,435][00394] Heartbeat connected on RolloutWorker_w5 [2023-02-27 10:46:25,436][00394] Heartbeat connected on RolloutWorker_w6 [2023-02-27 10:46:25,444][00394] Heartbeat connected on RolloutWorker_w7 [2023-02-27 10:46:26,160][00394] Inference worker 0-0 is ready! [2023-02-27 10:46:26,162][00394] All inference workers are ready! Signal rollout workers to start! [2023-02-27 10:46:26,167][00394] Heartbeat connected on InferenceWorker_p0-w0 [2023-02-27 10:46:26,294][11897] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-27 10:46:26,297][11903] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-27 10:46:26,338][11900] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-27 10:46:26,343][11901] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-27 10:46:26,346][11898] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-27 10:46:26,346][11896] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-27 10:46:26,351][11899] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-27 10:46:26,357][11902] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-27 10:46:27,217][11900] Decorrelating experience for 0 frames... [2023-02-27 10:46:27,217][11899] Decorrelating experience for 0 frames... [2023-02-27 10:46:27,218][11903] Decorrelating experience for 0 frames... [2023-02-27 10:46:27,215][11902] Decorrelating experience for 0 frames... [2023-02-27 10:46:27,602][11902] Decorrelating experience for 32 frames... [2023-02-27 10:46:28,027][11902] Decorrelating experience for 64 frames... [2023-02-27 10:46:28,194][11901] Decorrelating experience for 0 frames... [2023-02-27 10:46:28,207][11903] Decorrelating experience for 32 frames... [2023-02-27 10:46:28,226][11900] Decorrelating experience for 32 frames... [2023-02-27 10:46:29,138][11901] Decorrelating experience for 32 frames... [2023-02-27 10:46:29,153][11897] Decorrelating experience for 0 frames... [2023-02-27 10:46:29,446][11900] Decorrelating experience for 64 frames... [2023-02-27 10:46:29,448][11903] Decorrelating experience for 64 frames... [2023-02-27 10:46:30,091][00394] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-02-27 10:46:30,595][11897] Decorrelating experience for 32 frames... [2023-02-27 10:46:30,799][11901] Decorrelating experience for 64 frames... [2023-02-27 10:46:30,923][11903] Decorrelating experience for 96 frames... [2023-02-27 10:46:31,585][11902] Decorrelating experience for 96 frames... [2023-02-27 10:46:31,926][11897] Decorrelating experience for 64 frames... [2023-02-27 10:46:31,984][11901] Decorrelating experience for 96 frames... [2023-02-27 10:46:32,799][11900] Decorrelating experience for 96 frames... [2023-02-27 10:46:32,850][11897] Decorrelating experience for 96 frames... [2023-02-27 10:46:33,120][11899] Decorrelating experience for 32 frames... [2023-02-27 10:46:34,085][11896] Decorrelating experience for 0 frames... [2023-02-27 10:46:34,154][11899] Decorrelating experience for 64 frames... [2023-02-27 10:46:35,091][00394] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-02-27 10:46:35,156][11896] Decorrelating experience for 32 frames... [2023-02-27 10:46:35,350][11899] Decorrelating experience for 96 frames... [2023-02-27 10:46:36,140][11898] Decorrelating experience for 0 frames... [2023-02-27 10:46:36,239][11896] Decorrelating experience for 64 frames... [2023-02-27 10:46:37,860][11898] Decorrelating experience for 32 frames... [2023-02-27 10:46:38,301][11896] Decorrelating experience for 96 frames... [2023-02-27 10:46:40,091][00394] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 79.1. Samples: 1186. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-02-27 10:46:40,099][00394] Avg episode reward: [(0, '2.258')] [2023-02-27 10:46:40,606][11881] Signal inference workers to stop experience collection... [2023-02-27 10:46:40,624][11895] InferenceWorker_p0-w0: stopping experience collection [2023-02-27 10:46:40,693][11898] Decorrelating experience for 64 frames... [2023-02-27 10:46:41,062][11898] Decorrelating experience for 96 frames... [2023-02-27 10:46:42,875][11881] Signal inference workers to resume experience collection... [2023-02-27 10:46:42,877][11895] InferenceWorker_p0-w0: resuming experience collection [2023-02-27 10:46:45,091][00394] Fps is (10 sec: 1228.8, 60 sec: 614.4, 300 sec: 614.4). Total num frames: 12288. Throughput: 0: 176.9. Samples: 3538. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-02-27 10:46:45,094][00394] Avg episode reward: [(0, '3.078')] [2023-02-27 10:46:50,091][00394] Fps is (10 sec: 2867.2, 60 sec: 1146.9, 300 sec: 1146.9). Total num frames: 28672. Throughput: 0: 257.0. Samples: 6424. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 10:46:50,097][00394] Avg episode reward: [(0, '3.711')] [2023-02-27 10:46:53,617][11895] Updated weights for policy 0, policy_version 10 (0.0017) [2023-02-27 10:46:55,091][00394] Fps is (10 sec: 3276.7, 60 sec: 1501.9, 300 sec: 1501.9). Total num frames: 45056. Throughput: 0: 353.9. Samples: 10618. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-27 10:46:55,096][00394] Avg episode reward: [(0, '4.061')] [2023-02-27 10:47:00,091][00394] Fps is (10 sec: 3276.8, 60 sec: 1755.4, 300 sec: 1755.4). Total num frames: 61440. Throughput: 0: 441.3. Samples: 15446. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-27 10:47:00,093][00394] Avg episode reward: [(0, '4.278')] [2023-02-27 10:47:04,185][11895] Updated weights for policy 0, policy_version 20 (0.0013) [2023-02-27 10:47:05,091][00394] Fps is (10 sec: 3686.5, 60 sec: 2048.0, 300 sec: 2048.0). Total num frames: 81920. Throughput: 0: 469.9. Samples: 18794. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 10:47:05,095][00394] Avg episode reward: [(0, '4.538')] [2023-02-27 10:47:10,091][00394] Fps is (10 sec: 4096.0, 60 sec: 2275.6, 300 sec: 2275.6). Total num frames: 102400. Throughput: 0: 561.9. Samples: 25284. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 10:47:10,093][00394] Avg episode reward: [(0, '4.478')] [2023-02-27 10:47:10,106][11881] Saving new best policy, reward=4.478! [2023-02-27 10:47:15,091][00394] Fps is (10 sec: 3276.7, 60 sec: 2293.7, 300 sec: 2293.7). Total num frames: 114688. Throughput: 0: 654.9. Samples: 29470. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2023-02-27 10:47:15,094][00394] Avg episode reward: [(0, '4.442')] [2023-02-27 10:47:16,689][11895] Updated weights for policy 0, policy_version 30 (0.0025) [2023-02-27 10:47:20,091][00394] Fps is (10 sec: 3276.8, 60 sec: 2457.6, 300 sec: 2457.6). Total num frames: 135168. Throughput: 0: 702.0. Samples: 31592. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 10:47:20,094][00394] Avg episode reward: [(0, '4.419')] [2023-02-27 10:47:25,091][00394] Fps is (10 sec: 4096.0, 60 sec: 2594.1, 300 sec: 2594.1). Total num frames: 155648. Throughput: 0: 817.4. Samples: 37970. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 10:47:25,096][00394] Avg episode reward: [(0, '4.367')] [2023-02-27 10:47:26,944][11895] Updated weights for policy 0, policy_version 40 (0.0018) [2023-02-27 10:47:30,091][00394] Fps is (10 sec: 3276.8, 60 sec: 2798.9, 300 sec: 2583.6). Total num frames: 167936. Throughput: 0: 877.6. Samples: 43030. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-27 10:47:30,094][00394] Avg episode reward: [(0, '4.348')] [2023-02-27 10:47:35,092][00394] Fps is (10 sec: 2457.4, 60 sec: 3003.7, 300 sec: 2574.6). Total num frames: 180224. Throughput: 0: 851.2. Samples: 44728. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-27 10:47:35,095][00394] Avg episode reward: [(0, '4.427')] [2023-02-27 10:47:40,091][00394] Fps is (10 sec: 2457.6, 60 sec: 3208.5, 300 sec: 2566.8). Total num frames: 192512. Throughput: 0: 830.5. Samples: 47992. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 10:47:40,095][00394] Avg episode reward: [(0, '4.464')] [2023-02-27 10:47:43,270][11895] Updated weights for policy 0, policy_version 50 (0.0020) [2023-02-27 10:47:45,091][00394] Fps is (10 sec: 3277.2, 60 sec: 3345.1, 300 sec: 2662.4). Total num frames: 212992. Throughput: 0: 833.9. Samples: 52970. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-27 10:47:45,094][00394] Avg episode reward: [(0, '4.463')] [2023-02-27 10:47:50,091][00394] Fps is (10 sec: 4096.0, 60 sec: 3413.3, 300 sec: 2746.7). Total num frames: 233472. Throughput: 0: 834.3. Samples: 56336. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 10:47:50,094][00394] Avg episode reward: [(0, '4.405')] [2023-02-27 10:47:52,397][11895] Updated weights for policy 0, policy_version 60 (0.0016) [2023-02-27 10:47:55,094][00394] Fps is (10 sec: 3685.3, 60 sec: 3413.2, 300 sec: 2776.1). Total num frames: 249856. Throughput: 0: 831.8. Samples: 62716. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 10:47:55,101][00394] Avg episode reward: [(0, '4.580')] [2023-02-27 10:47:55,188][11881] Saving new best policy, reward=4.580! [2023-02-27 10:48:00,091][00394] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 2802.5). Total num frames: 266240. Throughput: 0: 829.0. Samples: 66776. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 10:48:00,094][00394] Avg episode reward: [(0, '4.795')] [2023-02-27 10:48:00,103][11881] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000065_266240.pth... [2023-02-27 10:48:00,295][11881] Saving new best policy, reward=4.795! [2023-02-27 10:48:05,091][00394] Fps is (10 sec: 3277.8, 60 sec: 3345.1, 300 sec: 2826.2). Total num frames: 282624. Throughput: 0: 828.6. Samples: 68878. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 10:48:05,094][00394] Avg episode reward: [(0, '4.575')] [2023-02-27 10:48:05,443][11895] Updated weights for policy 0, policy_version 70 (0.0035) [2023-02-27 10:48:10,091][00394] Fps is (10 sec: 4096.0, 60 sec: 3413.3, 300 sec: 2925.7). Total num frames: 307200. Throughput: 0: 834.8. Samples: 75534. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-27 10:48:10,093][00394] Avg episode reward: [(0, '4.513')] [2023-02-27 10:48:15,091][00394] Fps is (10 sec: 4096.0, 60 sec: 3481.6, 300 sec: 2941.7). Total num frames: 323584. Throughput: 0: 853.0. Samples: 81416. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-27 10:48:15,094][00394] Avg episode reward: [(0, '4.583')] [2023-02-27 10:48:16,035][11895] Updated weights for policy 0, policy_version 80 (0.0013) [2023-02-27 10:48:20,093][00394] Fps is (10 sec: 2866.6, 60 sec: 3345.0, 300 sec: 2920.6). Total num frames: 335872. Throughput: 0: 860.9. Samples: 83470. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-27 10:48:20,096][00394] Avg episode reward: [(0, '4.648')] [2023-02-27 10:48:25,091][00394] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 2969.6). Total num frames: 356352. Throughput: 0: 886.3. Samples: 87874. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-27 10:48:25,093][00394] Avg episode reward: [(0, '4.414')] [2023-02-27 10:48:27,627][11895] Updated weights for policy 0, policy_version 90 (0.0015) [2023-02-27 10:48:30,091][00394] Fps is (10 sec: 4096.8, 60 sec: 3481.6, 300 sec: 3014.7). Total num frames: 376832. Throughput: 0: 924.3. Samples: 94564. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 10:48:30,094][00394] Avg episode reward: [(0, '4.283')] [2023-02-27 10:48:35,091][00394] Fps is (10 sec: 4095.8, 60 sec: 3618.2, 300 sec: 3056.2). Total num frames: 397312. Throughput: 0: 922.5. Samples: 97850. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-27 10:48:35,094][00394] Avg episode reward: [(0, '4.520')] [2023-02-27 10:48:38,798][11895] Updated weights for policy 0, policy_version 100 (0.0024) [2023-02-27 10:48:40,091][00394] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3034.1). Total num frames: 409600. Throughput: 0: 881.3. Samples: 102374. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-27 10:48:40,099][00394] Avg episode reward: [(0, '4.614')] [2023-02-27 10:48:45,091][00394] Fps is (10 sec: 3277.0, 60 sec: 3618.1, 300 sec: 3072.0). Total num frames: 430080. Throughput: 0: 894.9. Samples: 107048. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-27 10:48:45,093][00394] Avg episode reward: [(0, '4.602')] [2023-02-27 10:48:49,686][11895] Updated weights for policy 0, policy_version 110 (0.0014) [2023-02-27 10:48:50,091][00394] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3107.3). Total num frames: 450560. Throughput: 0: 921.7. Samples: 110356. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 10:48:50,096][00394] Avg episode reward: [(0, '4.400')] [2023-02-27 10:48:55,091][00394] Fps is (10 sec: 3686.4, 60 sec: 3618.3, 300 sec: 3113.0). Total num frames: 466944. Throughput: 0: 919.5. Samples: 116910. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 10:48:55,097][00394] Avg episode reward: [(0, '4.426')] [2023-02-27 10:49:00,091][00394] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3118.2). Total num frames: 483328. Throughput: 0: 884.4. Samples: 121212. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 10:49:00,094][00394] Avg episode reward: [(0, '4.502')] [2023-02-27 10:49:02,394][11895] Updated weights for policy 0, policy_version 120 (0.0020) [2023-02-27 10:49:05,091][00394] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3123.2). Total num frames: 499712. Throughput: 0: 887.2. Samples: 123394. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 10:49:05,099][00394] Avg episode reward: [(0, '4.613')] [2023-02-27 10:49:10,091][00394] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3177.5). Total num frames: 524288. Throughput: 0: 925.5. Samples: 129522. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 10:49:10,097][00394] Avg episode reward: [(0, '4.571')] [2023-02-27 10:49:11,981][11895] Updated weights for policy 0, policy_version 130 (0.0025) [2023-02-27 10:49:15,091][00394] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3180.4). Total num frames: 540672. Throughput: 0: 917.3. Samples: 135844. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 10:49:15,094][00394] Avg episode reward: [(0, '4.442')] [2023-02-27 10:49:20,093][00394] Fps is (10 sec: 3276.2, 60 sec: 3686.4, 300 sec: 3183.1). Total num frames: 557056. Throughput: 0: 890.4. Samples: 137920. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 10:49:20,095][00394] Avg episode reward: [(0, '4.483')] [2023-02-27 10:49:25,091][00394] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3163.0). Total num frames: 569344. Throughput: 0: 885.8. Samples: 142234. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-27 10:49:25,094][00394] Avg episode reward: [(0, '4.508')] [2023-02-27 10:49:25,116][11895] Updated weights for policy 0, policy_version 140 (0.0026) [2023-02-27 10:49:30,096][00394] Fps is (10 sec: 3685.2, 60 sec: 3617.8, 300 sec: 3210.3). Total num frames: 593920. Throughput: 0: 925.1. Samples: 148682. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 10:49:30,100][00394] Avg episode reward: [(0, '4.674')] [2023-02-27 10:49:33,926][11895] Updated weights for policy 0, policy_version 150 (0.0013) [2023-02-27 10:49:35,091][00394] Fps is (10 sec: 4505.6, 60 sec: 3618.2, 300 sec: 3233.7). Total num frames: 614400. Throughput: 0: 926.0. Samples: 152028. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-27 10:49:35,096][00394] Avg episode reward: [(0, '4.588')] [2023-02-27 10:49:40,091][00394] Fps is (10 sec: 3688.3, 60 sec: 3686.4, 300 sec: 3234.8). Total num frames: 630784. Throughput: 0: 892.0. Samples: 157052. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-27 10:49:40,094][00394] Avg episode reward: [(0, '4.361')] [2023-02-27 10:49:45,091][00394] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3215.4). Total num frames: 643072. Throughput: 0: 890.0. Samples: 161264. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 10:49:45,094][00394] Avg episode reward: [(0, '4.558')] [2023-02-27 10:49:47,068][11895] Updated weights for policy 0, policy_version 160 (0.0020) [2023-02-27 10:49:50,091][00394] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3256.8). Total num frames: 667648. Throughput: 0: 914.0. Samples: 164524. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 10:49:50,093][00394] Avg episode reward: [(0, '4.765')] [2023-02-27 10:49:55,095][00394] Fps is (10 sec: 4503.7, 60 sec: 3686.2, 300 sec: 3276.7). Total num frames: 688128. Throughput: 0: 930.7. Samples: 171406. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 10:49:55,102][00394] Avg episode reward: [(0, '4.667')] [2023-02-27 10:49:56,880][11895] Updated weights for policy 0, policy_version 170 (0.0032) [2023-02-27 10:50:00,091][00394] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3276.8). Total num frames: 704512. Throughput: 0: 895.2. Samples: 176130. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-27 10:50:00,093][00394] Avg episode reward: [(0, '4.381')] [2023-02-27 10:50:00,112][11881] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000172_704512.pth... [2023-02-27 10:50:05,091][00394] Fps is (10 sec: 2868.4, 60 sec: 3618.1, 300 sec: 3258.2). Total num frames: 716800. Throughput: 0: 894.7. Samples: 178180. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 10:50:05,094][00394] Avg episode reward: [(0, '4.656')] [2023-02-27 10:50:08,900][11895] Updated weights for policy 0, policy_version 180 (0.0026) [2023-02-27 10:50:10,091][00394] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3295.0). Total num frames: 741376. Throughput: 0: 929.0. Samples: 184038. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 10:50:10,094][00394] Avg episode reward: [(0, '4.764')] [2023-02-27 10:50:15,093][00394] Fps is (10 sec: 4504.5, 60 sec: 3686.3, 300 sec: 3312.4). Total num frames: 761856. Throughput: 0: 934.2. Samples: 190720. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-27 10:50:15,102][00394] Avg episode reward: [(0, '4.482')] [2023-02-27 10:50:19,997][11895] Updated weights for policy 0, policy_version 190 (0.0013) [2023-02-27 10:50:20,091][00394] Fps is (10 sec: 3686.4, 60 sec: 3686.5, 300 sec: 3311.7). Total num frames: 778240. Throughput: 0: 910.9. Samples: 193018. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-27 10:50:20,095][00394] Avg episode reward: [(0, '4.550')] [2023-02-27 10:50:25,091][00394] Fps is (10 sec: 2867.8, 60 sec: 3686.4, 300 sec: 3293.9). Total num frames: 790528. Throughput: 0: 894.9. Samples: 197324. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 10:50:25,097][00394] Avg episode reward: [(0, '4.616')] [2023-02-27 10:50:30,091][00394] Fps is (10 sec: 3686.4, 60 sec: 3686.7, 300 sec: 3327.0). Total num frames: 815104. Throughput: 0: 935.6. Samples: 203368. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 10:50:30,097][00394] Avg episode reward: [(0, '4.793')] [2023-02-27 10:50:30,906][11895] Updated weights for policy 0, policy_version 200 (0.0023) [2023-02-27 10:50:35,091][00394] Fps is (10 sec: 4505.7, 60 sec: 3686.4, 300 sec: 3342.3). Total num frames: 835584. Throughput: 0: 938.5. Samples: 206756. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 10:50:35,096][00394] Avg episode reward: [(0, '4.526')] [2023-02-27 10:50:40,091][00394] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3325.0). Total num frames: 847872. Throughput: 0: 907.2. Samples: 212226. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 10:50:40,093][00394] Avg episode reward: [(0, '4.335')] [2023-02-27 10:50:43,418][11895] Updated weights for policy 0, policy_version 210 (0.0017) [2023-02-27 10:50:45,091][00394] Fps is (10 sec: 2867.1, 60 sec: 3686.4, 300 sec: 3324.1). Total num frames: 864256. Throughput: 0: 892.0. Samples: 216268. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-27 10:50:45,099][00394] Avg episode reward: [(0, '4.276')] [2023-02-27 10:50:50,091][00394] Fps is (10 sec: 3686.3, 60 sec: 3618.1, 300 sec: 3338.6). Total num frames: 884736. Throughput: 0: 905.9. Samples: 218946. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 10:50:50,100][00394] Avg episode reward: [(0, '4.339')] [2023-02-27 10:50:53,421][11895] Updated weights for policy 0, policy_version 220 (0.0026) [2023-02-27 10:50:55,091][00394] Fps is (10 sec: 4096.1, 60 sec: 3618.4, 300 sec: 3352.7). Total num frames: 905216. Throughput: 0: 920.6. Samples: 225466. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 10:50:55,098][00394] Avg episode reward: [(0, '4.519')] [2023-02-27 10:51:00,091][00394] Fps is (10 sec: 3686.5, 60 sec: 3618.1, 300 sec: 3351.3). Total num frames: 921600. Throughput: 0: 883.3. Samples: 230466. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2023-02-27 10:51:00,095][00394] Avg episode reward: [(0, '4.457')] [2023-02-27 10:51:05,091][00394] Fps is (10 sec: 2867.2, 60 sec: 3618.1, 300 sec: 3335.3). Total num frames: 933888. Throughput: 0: 876.8. Samples: 232474. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-27 10:51:05,099][00394] Avg episode reward: [(0, '4.572')] [2023-02-27 10:51:06,886][11895] Updated weights for policy 0, policy_version 230 (0.0012) [2023-02-27 10:51:10,091][00394] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3348.7). Total num frames: 954368. Throughput: 0: 890.8. Samples: 237412. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-27 10:51:10,099][00394] Avg episode reward: [(0, '4.712')] [2023-02-27 10:51:15,091][00394] Fps is (10 sec: 4096.0, 60 sec: 3550.0, 300 sec: 3361.5). Total num frames: 974848. Throughput: 0: 902.0. Samples: 243958. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 10:51:15,096][00394] Avg episode reward: [(0, '4.564')] [2023-02-27 10:51:16,454][11895] Updated weights for policy 0, policy_version 240 (0.0019) [2023-02-27 10:51:20,094][00394] Fps is (10 sec: 3685.2, 60 sec: 3549.7, 300 sec: 3360.1). Total num frames: 991232. Throughput: 0: 889.1. Samples: 246770. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 10:51:20,103][00394] Avg episode reward: [(0, '4.446')] [2023-02-27 10:51:25,091][00394] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3401.8). Total num frames: 1003520. Throughput: 0: 857.5. Samples: 250812. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-27 10:51:25,098][00394] Avg episode reward: [(0, '4.326')] [2023-02-27 10:51:30,091][00394] Fps is (10 sec: 2868.1, 60 sec: 3413.3, 300 sec: 3457.3). Total num frames: 1019904. Throughput: 0: 870.7. Samples: 255450. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 10:51:30,094][00394] Avg episode reward: [(0, '4.488')] [2023-02-27 10:51:30,277][11895] Updated weights for policy 0, policy_version 250 (0.0021) [2023-02-27 10:51:35,091][00394] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3512.8). Total num frames: 1036288. Throughput: 0: 876.9. Samples: 258406. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 10:51:35,093][00394] Avg episode reward: [(0, '4.692')] [2023-02-27 10:51:40,091][00394] Fps is (10 sec: 2867.2, 60 sec: 3345.1, 300 sec: 3512.8). Total num frames: 1048576. Throughput: 0: 822.4. Samples: 262476. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-27 10:51:40,096][00394] Avg episode reward: [(0, '4.763')] [2023-02-27 10:51:45,095][00394] Fps is (10 sec: 2456.6, 60 sec: 3276.6, 300 sec: 3498.9). Total num frames: 1060864. Throughput: 0: 786.5. Samples: 265862. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-27 10:51:45,102][00394] Avg episode reward: [(0, '4.635')] [2023-02-27 10:51:45,614][11895] Updated weights for policy 0, policy_version 260 (0.0022) [2023-02-27 10:51:50,091][00394] Fps is (10 sec: 2867.2, 60 sec: 3208.5, 300 sec: 3499.0). Total num frames: 1077248. Throughput: 0: 788.0. Samples: 267934. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 10:51:50,094][00394] Avg episode reward: [(0, '4.479')] [2023-02-27 10:51:55,093][00394] Fps is (10 sec: 3687.1, 60 sec: 3208.4, 300 sec: 3512.8). Total num frames: 1097728. Throughput: 0: 809.4. Samples: 273838. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 10:51:55,099][00394] Avg episode reward: [(0, '4.394')] [2023-02-27 10:51:56,032][11895] Updated weights for policy 0, policy_version 270 (0.0018) [2023-02-27 10:52:00,091][00394] Fps is (10 sec: 4505.6, 60 sec: 3345.1, 300 sec: 3526.7). Total num frames: 1122304. Throughput: 0: 815.0. Samples: 280634. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 10:52:00,093][00394] Avg episode reward: [(0, '4.500')] [2023-02-27 10:52:00,113][11881] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000274_1122304.pth... [2023-02-27 10:52:00,296][11881] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000065_266240.pth [2023-02-27 10:52:05,091][00394] Fps is (10 sec: 3687.2, 60 sec: 3345.1, 300 sec: 3499.0). Total num frames: 1134592. Throughput: 0: 800.2. Samples: 282778. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2023-02-27 10:52:05,094][00394] Avg episode reward: [(0, '4.565')] [2023-02-27 10:52:08,296][11895] Updated weights for policy 0, policy_version 280 (0.0019) [2023-02-27 10:52:10,091][00394] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 3512.8). Total num frames: 1150976. Throughput: 0: 807.8. Samples: 287162. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 10:52:10,094][00394] Avg episode reward: [(0, '4.602')] [2023-02-27 10:52:15,096][00394] Fps is (10 sec: 3684.5, 60 sec: 3276.5, 300 sec: 3512.8). Total num frames: 1171456. Throughput: 0: 842.6. Samples: 293370. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 10:52:15,103][00394] Avg episode reward: [(0, '4.498')] [2023-02-27 10:52:18,005][11895] Updated weights for policy 0, policy_version 290 (0.0013) [2023-02-27 10:52:20,091][00394] Fps is (10 sec: 4505.6, 60 sec: 3413.5, 300 sec: 3526.7). Total num frames: 1196032. Throughput: 0: 849.6. Samples: 296640. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 10:52:20,097][00394] Avg episode reward: [(0, '4.467')] [2023-02-27 10:52:25,091][00394] Fps is (10 sec: 3688.3, 60 sec: 3413.3, 300 sec: 3526.7). Total num frames: 1208320. Throughput: 0: 877.1. Samples: 301946. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 10:52:25,098][00394] Avg episode reward: [(0, '4.652')] [2023-02-27 10:52:30,091][00394] Fps is (10 sec: 2867.2, 60 sec: 3413.3, 300 sec: 3540.6). Total num frames: 1224704. Throughput: 0: 900.3. Samples: 306372. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 10:52:30,094][00394] Avg episode reward: [(0, '4.686')] [2023-02-27 10:52:30,675][11895] Updated weights for policy 0, policy_version 300 (0.0032) [2023-02-27 10:52:35,091][00394] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3568.4). Total num frames: 1245184. Throughput: 0: 924.2. Samples: 309524. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 10:52:35,094][00394] Avg episode reward: [(0, '4.778')] [2023-02-27 10:52:39,752][11895] Updated weights for policy 0, policy_version 310 (0.0020) [2023-02-27 10:52:40,092][00394] Fps is (10 sec: 4505.0, 60 sec: 3686.3, 300 sec: 3582.2). Total num frames: 1269760. Throughput: 0: 941.5. Samples: 316204. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 10:52:40,096][00394] Avg episode reward: [(0, '4.845')] [2023-02-27 10:52:40,108][11881] Saving new best policy, reward=4.845! [2023-02-27 10:52:45,091][00394] Fps is (10 sec: 3686.4, 60 sec: 3686.6, 300 sec: 3554.5). Total num frames: 1282048. Throughput: 0: 901.6. Samples: 321206. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 10:52:45,095][00394] Avg episode reward: [(0, '4.879')] [2023-02-27 10:52:45,099][11881] Saving new best policy, reward=4.879! [2023-02-27 10:52:50,091][00394] Fps is (10 sec: 2867.6, 60 sec: 3686.4, 300 sec: 3554.5). Total num frames: 1298432. Throughput: 0: 899.8. Samples: 323270. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-27 10:52:50,094][00394] Avg episode reward: [(0, '4.768')] [2023-02-27 10:52:52,636][11895] Updated weights for policy 0, policy_version 320 (0.0026) [2023-02-27 10:52:55,091][00394] Fps is (10 sec: 3686.4, 60 sec: 3686.5, 300 sec: 3568.4). Total num frames: 1318912. Throughput: 0: 926.9. Samples: 328872. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 10:52:55,093][00394] Avg episode reward: [(0, '4.615')] [2023-02-27 10:53:00,091][00394] Fps is (10 sec: 4505.4, 60 sec: 3686.4, 300 sec: 3596.1). Total num frames: 1343488. Throughput: 0: 940.9. Samples: 335706. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 10:53:00,094][00394] Avg episode reward: [(0, '4.531')] [2023-02-27 10:53:02,282][11895] Updated weights for policy 0, policy_version 330 (0.0015) [2023-02-27 10:53:05,091][00394] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3554.5). Total num frames: 1355776. Throughput: 0: 926.6. Samples: 338338. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 10:53:05,093][00394] Avg episode reward: [(0, '4.554')] [2023-02-27 10:53:10,091][00394] Fps is (10 sec: 2867.3, 60 sec: 3686.4, 300 sec: 3554.5). Total num frames: 1372160. Throughput: 0: 901.7. Samples: 342522. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2023-02-27 10:53:10,094][00394] Avg episode reward: [(0, '4.470')] [2023-02-27 10:53:14,640][11895] Updated weights for policy 0, policy_version 340 (0.0014) [2023-02-27 10:53:15,091][00394] Fps is (10 sec: 3686.3, 60 sec: 3686.7, 300 sec: 3582.3). Total num frames: 1392640. Throughput: 0: 933.4. Samples: 348376. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-27 10:53:15,094][00394] Avg episode reward: [(0, '4.440')] [2023-02-27 10:53:20,091][00394] Fps is (10 sec: 4505.6, 60 sec: 3686.4, 300 sec: 3596.1). Total num frames: 1417216. Throughput: 0: 938.7. Samples: 351764. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-27 10:53:20,098][00394] Avg episode reward: [(0, '4.749')] [2023-02-27 10:53:25,091][00394] Fps is (10 sec: 3686.5, 60 sec: 3686.4, 300 sec: 3568.4). Total num frames: 1429504. Throughput: 0: 915.0. Samples: 357380. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-27 10:53:25,101][00394] Avg episode reward: [(0, '4.941')] [2023-02-27 10:53:25,106][11881] Saving new best policy, reward=4.941! [2023-02-27 10:53:25,772][11895] Updated weights for policy 0, policy_version 350 (0.0012) [2023-02-27 10:53:30,091][00394] Fps is (10 sec: 2867.2, 60 sec: 3686.4, 300 sec: 3554.5). Total num frames: 1445888. Throughput: 0: 896.5. Samples: 361550. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 10:53:30,095][00394] Avg episode reward: [(0, '5.061')] [2023-02-27 10:53:30,110][11881] Saving new best policy, reward=5.061! [2023-02-27 10:53:35,091][00394] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3582.3). Total num frames: 1466368. Throughput: 0: 907.3. Samples: 364098. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 10:53:35,097][00394] Avg episode reward: [(0, '4.842')] [2023-02-27 10:53:36,771][11895] Updated weights for policy 0, policy_version 360 (0.0027) [2023-02-27 10:53:40,091][00394] Fps is (10 sec: 4095.9, 60 sec: 3618.2, 300 sec: 3582.3). Total num frames: 1486848. Throughput: 0: 932.3. Samples: 370824. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 10:53:40,094][00394] Avg episode reward: [(0, '4.757')] [2023-02-27 10:53:45,091][00394] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3568.4). Total num frames: 1503232. Throughput: 0: 902.4. Samples: 376314. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 10:53:45,095][00394] Avg episode reward: [(0, '4.621')] [2023-02-27 10:53:48,734][11895] Updated weights for policy 0, policy_version 370 (0.0018) [2023-02-27 10:53:50,091][00394] Fps is (10 sec: 3276.9, 60 sec: 3686.4, 300 sec: 3568.4). Total num frames: 1519616. Throughput: 0: 890.7. Samples: 378418. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2023-02-27 10:53:50,099][00394] Avg episode reward: [(0, '4.645')] [2023-02-27 10:53:55,091][00394] Fps is (10 sec: 3686.5, 60 sec: 3686.4, 300 sec: 3582.3). Total num frames: 1540096. Throughput: 0: 913.6. Samples: 383634. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-27 10:53:55,097][00394] Avg episode reward: [(0, '4.730')] [2023-02-27 10:53:58,407][11895] Updated weights for policy 0, policy_version 380 (0.0025) [2023-02-27 10:54:00,091][00394] Fps is (10 sec: 4096.1, 60 sec: 3618.2, 300 sec: 3596.1). Total num frames: 1560576. Throughput: 0: 938.0. Samples: 390586. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 10:54:00,093][00394] Avg episode reward: [(0, '4.667')] [2023-02-27 10:54:00,175][11881] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000382_1564672.pth... [2023-02-27 10:54:00,300][11881] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000172_704512.pth [2023-02-27 10:54:05,091][00394] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3568.4). Total num frames: 1576960. Throughput: 0: 928.7. Samples: 393554. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 10:54:05,097][00394] Avg episode reward: [(0, '4.640')] [2023-02-27 10:54:10,093][00394] Fps is (10 sec: 3276.1, 60 sec: 3686.3, 300 sec: 3568.4). Total num frames: 1593344. Throughput: 0: 899.5. Samples: 397860. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 10:54:10,097][00394] Avg episode reward: [(0, '4.544')] [2023-02-27 10:54:11,048][11895] Updated weights for policy 0, policy_version 390 (0.0034) [2023-02-27 10:54:15,091][00394] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3582.3). Total num frames: 1613824. Throughput: 0: 930.2. Samples: 403408. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 10:54:15,093][00394] Avg episode reward: [(0, '4.669')] [2023-02-27 10:54:20,091][00394] Fps is (10 sec: 4096.8, 60 sec: 3618.1, 300 sec: 3610.0). Total num frames: 1634304. Throughput: 0: 948.2. Samples: 406766. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 10:54:20,094][00394] Avg episode reward: [(0, '4.552')] [2023-02-27 10:54:20,165][11895] Updated weights for policy 0, policy_version 400 (0.0015) [2023-02-27 10:54:25,104][00394] Fps is (10 sec: 4090.8, 60 sec: 3753.9, 300 sec: 3596.1). Total num frames: 1654784. Throughput: 0: 934.8. Samples: 412902. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 10:54:25,108][00394] Avg episode reward: [(0, '4.492')] [2023-02-27 10:54:30,095][00394] Fps is (10 sec: 3275.4, 60 sec: 3686.1, 300 sec: 3568.3). Total num frames: 1667072. Throughput: 0: 909.2. Samples: 417230. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-27 10:54:30,098][00394] Avg episode reward: [(0, '4.833')] [2023-02-27 10:54:32,925][11895] Updated weights for policy 0, policy_version 410 (0.0034) [2023-02-27 10:54:35,091][00394] Fps is (10 sec: 3281.0, 60 sec: 3686.4, 300 sec: 3582.3). Total num frames: 1687552. Throughput: 0: 913.1. Samples: 419508. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 10:54:35,096][00394] Avg episode reward: [(0, '4.803')] [2023-02-27 10:54:40,091][00394] Fps is (10 sec: 4097.7, 60 sec: 3686.4, 300 sec: 3610.0). Total num frames: 1708032. Throughput: 0: 952.6. Samples: 426500. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 10:54:40,096][00394] Avg episode reward: [(0, '4.606')] [2023-02-27 10:54:42,168][11895] Updated weights for policy 0, policy_version 420 (0.0022) [2023-02-27 10:54:45,104][00394] Fps is (10 sec: 4090.5, 60 sec: 3753.8, 300 sec: 3596.0). Total num frames: 1728512. Throughput: 0: 926.2. Samples: 432276. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 10:54:45,111][00394] Avg episode reward: [(0, '4.689')] [2023-02-27 10:54:50,091][00394] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3568.4). Total num frames: 1740800. Throughput: 0: 907.6. Samples: 434396. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 10:54:50,098][00394] Avg episode reward: [(0, '5.034')] [2023-02-27 10:54:54,597][11895] Updated weights for policy 0, policy_version 430 (0.0022) [2023-02-27 10:54:55,091][00394] Fps is (10 sec: 3281.2, 60 sec: 3686.4, 300 sec: 3582.3). Total num frames: 1761280. Throughput: 0: 920.5. Samples: 439282. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 10:54:55,097][00394] Avg episode reward: [(0, '5.295')] [2023-02-27 10:54:55,101][11881] Saving new best policy, reward=5.295! [2023-02-27 10:55:00,091][00394] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3610.0). Total num frames: 1781760. Throughput: 0: 946.4. Samples: 445994. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 10:55:00,100][00394] Avg episode reward: [(0, '5.437')] [2023-02-27 10:55:00,113][11881] Saving new best policy, reward=5.437! [2023-02-27 10:55:04,946][11895] Updated weights for policy 0, policy_version 440 (0.0014) [2023-02-27 10:55:05,091][00394] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3596.1). Total num frames: 1802240. Throughput: 0: 943.1. Samples: 449204. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-27 10:55:05,097][00394] Avg episode reward: [(0, '5.254')] [2023-02-27 10:55:10,091][00394] Fps is (10 sec: 3276.8, 60 sec: 3686.5, 300 sec: 3568.4). Total num frames: 1814528. Throughput: 0: 902.7. Samples: 453510. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 10:55:10,093][00394] Avg episode reward: [(0, '5.488')] [2023-02-27 10:55:10,108][11881] Saving new best policy, reward=5.488! [2023-02-27 10:55:15,091][00394] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3582.3). Total num frames: 1835008. Throughput: 0: 918.9. Samples: 458578. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 10:55:15,101][00394] Avg episode reward: [(0, '5.041')] [2023-02-27 10:55:16,714][11895] Updated weights for policy 0, policy_version 450 (0.0022) [2023-02-27 10:55:20,091][00394] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3610.0). Total num frames: 1855488. Throughput: 0: 941.8. Samples: 461890. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 10:55:20,098][00394] Avg episode reward: [(0, '4.847')] [2023-02-27 10:55:25,091][00394] Fps is (10 sec: 3686.4, 60 sec: 3618.9, 300 sec: 3582.3). Total num frames: 1871872. Throughput: 0: 927.8. Samples: 468250. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-27 10:55:25,097][00394] Avg episode reward: [(0, '5.037')] [2023-02-27 10:55:27,954][11895] Updated weights for policy 0, policy_version 460 (0.0014) [2023-02-27 10:55:30,091][00394] Fps is (10 sec: 3276.8, 60 sec: 3686.7, 300 sec: 3568.4). Total num frames: 1888256. Throughput: 0: 894.4. Samples: 472512. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-27 10:55:30,101][00394] Avg episode reward: [(0, '5.114')] [2023-02-27 10:55:35,091][00394] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3596.1). Total num frames: 1908736. Throughput: 0: 896.1. Samples: 474720. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 10:55:35,093][00394] Avg episode reward: [(0, '5.236')] [2023-02-27 10:55:38,578][11895] Updated weights for policy 0, policy_version 470 (0.0014) [2023-02-27 10:55:40,092][00394] Fps is (10 sec: 4095.7, 60 sec: 3686.4, 300 sec: 3610.0). Total num frames: 1929216. Throughput: 0: 937.2. Samples: 481456. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 10:55:40,094][00394] Avg episode reward: [(0, '4.911')] [2023-02-27 10:55:45,091][00394] Fps is (10 sec: 4096.0, 60 sec: 3687.2, 300 sec: 3610.0). Total num frames: 1949696. Throughput: 0: 918.8. Samples: 487342. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 10:55:45,094][00394] Avg episode reward: [(0, '4.651')] [2023-02-27 10:55:50,091][00394] Fps is (10 sec: 3277.1, 60 sec: 3686.4, 300 sec: 3582.3). Total num frames: 1961984. Throughput: 0: 894.2. Samples: 489444. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-27 10:55:50,098][00394] Avg episode reward: [(0, '4.544')] [2023-02-27 10:55:50,872][11895] Updated weights for policy 0, policy_version 480 (0.0021) [2023-02-27 10:55:55,094][00394] Fps is (10 sec: 2866.3, 60 sec: 3617.9, 300 sec: 3582.2). Total num frames: 1978368. Throughput: 0: 899.6. Samples: 493996. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 10:55:55,103][00394] Avg episode reward: [(0, '4.671')] [2023-02-27 10:56:00,091][00394] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3623.9). Total num frames: 2002944. Throughput: 0: 936.3. Samples: 500712. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 10:56:00,094][00394] Avg episode reward: [(0, '4.563')] [2023-02-27 10:56:00,105][11881] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000489_2002944.pth... [2023-02-27 10:56:00,224][11881] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000274_1122304.pth [2023-02-27 10:56:00,879][11895] Updated weights for policy 0, policy_version 490 (0.0037) [2023-02-27 10:56:05,091][00394] Fps is (10 sec: 4097.2, 60 sec: 3618.1, 300 sec: 3610.0). Total num frames: 2019328. Throughput: 0: 934.9. Samples: 503960. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-27 10:56:05,099][00394] Avg episode reward: [(0, '4.525')] [2023-02-27 10:56:10,091][00394] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3596.1). Total num frames: 2035712. Throughput: 0: 892.9. Samples: 508430. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 10:56:10,093][00394] Avg episode reward: [(0, '4.580')] [2023-02-27 10:56:13,612][11895] Updated weights for policy 0, policy_version 500 (0.0029) [2023-02-27 10:56:15,091][00394] Fps is (10 sec: 3276.7, 60 sec: 3618.1, 300 sec: 3596.2). Total num frames: 2052096. Throughput: 0: 905.8. Samples: 513274. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 10:56:15,094][00394] Avg episode reward: [(0, '4.593')] [2023-02-27 10:56:20,095][00394] Fps is (10 sec: 3684.8, 60 sec: 3617.9, 300 sec: 3623.9). Total num frames: 2072576. Throughput: 0: 927.2. Samples: 516446. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 10:56:20,102][00394] Avg episode reward: [(0, '4.710')] [2023-02-27 10:56:22,830][11895] Updated weights for policy 0, policy_version 510 (0.0015) [2023-02-27 10:56:25,100][00394] Fps is (10 sec: 4092.3, 60 sec: 3685.8, 300 sec: 3637.7). Total num frames: 2093056. Throughput: 0: 922.8. Samples: 522992. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 10:56:25,109][00394] Avg episode reward: [(0, '4.703')] [2023-02-27 10:56:30,095][00394] Fps is (10 sec: 3686.5, 60 sec: 3686.2, 300 sec: 3637.8). Total num frames: 2109440. Throughput: 0: 887.0. Samples: 527260. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 10:56:30,099][00394] Avg episode reward: [(0, '4.625')] [2023-02-27 10:56:35,091][00394] Fps is (10 sec: 3279.9, 60 sec: 3618.1, 300 sec: 3651.7). Total num frames: 2125824. Throughput: 0: 887.7. Samples: 529392. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 10:56:35,098][00394] Avg episode reward: [(0, '4.499')] [2023-02-27 10:56:35,765][11895] Updated weights for policy 0, policy_version 520 (0.0033) [2023-02-27 10:56:40,092][00394] Fps is (10 sec: 3687.5, 60 sec: 3618.1, 300 sec: 3679.5). Total num frames: 2146304. Throughput: 0: 928.6. Samples: 535780. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 10:56:40,096][00394] Avg episode reward: [(0, '4.336')] [2023-02-27 10:56:45,091][00394] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3693.3). Total num frames: 2166784. Throughput: 0: 921.5. Samples: 542178. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 10:56:45,099][00394] Avg episode reward: [(0, '4.486')] [2023-02-27 10:56:45,752][11895] Updated weights for policy 0, policy_version 530 (0.0013) [2023-02-27 10:56:50,100][00394] Fps is (10 sec: 3274.1, 60 sec: 3617.6, 300 sec: 3665.5). Total num frames: 2179072. Throughput: 0: 895.1. Samples: 544246. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 10:56:50,108][00394] Avg episode reward: [(0, '4.689')] [2023-02-27 10:56:55,091][00394] Fps is (10 sec: 3276.8, 60 sec: 3686.6, 300 sec: 3651.7). Total num frames: 2199552. Throughput: 0: 892.4. Samples: 548590. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 10:56:55,093][00394] Avg episode reward: [(0, '4.882')] [2023-02-27 10:56:57,676][11895] Updated weights for policy 0, policy_version 540 (0.0042) [2023-02-27 10:57:00,091][00394] Fps is (10 sec: 4099.8, 60 sec: 3618.1, 300 sec: 3679.5). Total num frames: 2220032. Throughput: 0: 933.0. Samples: 555258. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 10:57:00,093][00394] Avg episode reward: [(0, '4.583')] [2023-02-27 10:57:05,091][00394] Fps is (10 sec: 4095.8, 60 sec: 3686.4, 300 sec: 3693.3). Total num frames: 2240512. Throughput: 0: 937.8. Samples: 558644. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 10:57:05,099][00394] Avg episode reward: [(0, '4.628')] [2023-02-27 10:57:08,599][11895] Updated weights for policy 0, policy_version 550 (0.0014) [2023-02-27 10:57:10,094][00394] Fps is (10 sec: 3685.4, 60 sec: 3686.2, 300 sec: 3679.5). Total num frames: 2256896. Throughput: 0: 898.9. Samples: 563436. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 10:57:10,095][00394] Avg episode reward: [(0, '4.688')] [2023-02-27 10:57:15,091][00394] Fps is (10 sec: 3276.9, 60 sec: 3686.4, 300 sec: 3651.7). Total num frames: 2273280. Throughput: 0: 900.9. Samples: 567798. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 10:57:15,094][00394] Avg episode reward: [(0, '4.626')] [2023-02-27 10:57:19,757][11895] Updated weights for policy 0, policy_version 560 (0.0018) [2023-02-27 10:57:20,091][00394] Fps is (10 sec: 3687.4, 60 sec: 3686.7, 300 sec: 3679.5). Total num frames: 2293760. Throughput: 0: 926.4. Samples: 571078. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 10:57:20,097][00394] Avg episode reward: [(0, '4.682')] [2023-02-27 10:57:25,091][00394] Fps is (10 sec: 4096.0, 60 sec: 3687.0, 300 sec: 3693.3). Total num frames: 2314240. Throughput: 0: 931.5. Samples: 577696. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 10:57:25,101][00394] Avg episode reward: [(0, '4.613')] [2023-02-27 10:57:30,091][00394] Fps is (10 sec: 3276.8, 60 sec: 3618.4, 300 sec: 3665.6). Total num frames: 2326528. Throughput: 0: 891.9. Samples: 582312. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 10:57:30,101][00394] Avg episode reward: [(0, '4.590')] [2023-02-27 10:57:31,742][11895] Updated weights for policy 0, policy_version 570 (0.0017) [2023-02-27 10:57:35,091][00394] Fps is (10 sec: 2867.2, 60 sec: 3618.1, 300 sec: 3637.8). Total num frames: 2342912. Throughput: 0: 894.1. Samples: 584470. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 10:57:35,097][00394] Avg episode reward: [(0, '4.771')] [2023-02-27 10:57:40,091][00394] Fps is (10 sec: 4096.0, 60 sec: 3686.5, 300 sec: 3679.5). Total num frames: 2367488. Throughput: 0: 927.5. Samples: 590326. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 10:57:40,094][00394] Avg episode reward: [(0, '4.931')] [2023-02-27 10:57:41,805][11895] Updated weights for policy 0, policy_version 580 (0.0022) [2023-02-27 10:57:45,091][00394] Fps is (10 sec: 4505.5, 60 sec: 3686.4, 300 sec: 3693.3). Total num frames: 2387968. Throughput: 0: 930.6. Samples: 597134. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 10:57:45,101][00394] Avg episode reward: [(0, '4.799')] [2023-02-27 10:57:50,095][00394] Fps is (10 sec: 3275.5, 60 sec: 3686.7, 300 sec: 3665.5). Total num frames: 2400256. Throughput: 0: 904.2. Samples: 599338. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 10:57:50,101][00394] Avg episode reward: [(0, '4.697')] [2023-02-27 10:57:54,837][11895] Updated weights for policy 0, policy_version 590 (0.0048) [2023-02-27 10:57:55,091][00394] Fps is (10 sec: 2867.2, 60 sec: 3618.1, 300 sec: 3637.8). Total num frames: 2416640. Throughput: 0: 890.6. Samples: 603512. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 10:57:55,094][00394] Avg episode reward: [(0, '4.594')] [2023-02-27 10:58:00,091][00394] Fps is (10 sec: 3687.9, 60 sec: 3618.1, 300 sec: 3665.6). Total num frames: 2437120. Throughput: 0: 932.4. Samples: 609756. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 10:58:00,094][00394] Avg episode reward: [(0, '4.580')] [2023-02-27 10:58:00,106][11881] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000595_2437120.pth... [2023-02-27 10:58:00,248][11881] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000382_1564672.pth [2023-02-27 10:58:03,999][11895] Updated weights for policy 0, policy_version 600 (0.0014) [2023-02-27 10:58:05,091][00394] Fps is (10 sec: 4505.6, 60 sec: 3686.4, 300 sec: 3693.3). Total num frames: 2461696. Throughput: 0: 932.2. Samples: 613026. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-27 10:58:05,097][00394] Avg episode reward: [(0, '4.558')] [2023-02-27 10:58:10,093][00394] Fps is (10 sec: 3685.6, 60 sec: 3618.2, 300 sec: 3665.5). Total num frames: 2473984. Throughput: 0: 901.8. Samples: 618280. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 10:58:10,096][00394] Avg episode reward: [(0, '4.538')] [2023-02-27 10:58:15,091][00394] Fps is (10 sec: 2867.1, 60 sec: 3618.1, 300 sec: 3637.8). Total num frames: 2490368. Throughput: 0: 892.7. Samples: 622486. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 10:58:15,100][00394] Avg episode reward: [(0, '4.574')] [2023-02-27 10:58:16,883][11895] Updated weights for policy 0, policy_version 610 (0.0025) [2023-02-27 10:58:20,093][00394] Fps is (10 sec: 3686.4, 60 sec: 3618.0, 300 sec: 3665.5). Total num frames: 2510848. Throughput: 0: 909.9. Samples: 625416. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-27 10:58:20,101][00394] Avg episode reward: [(0, '4.678')] [2023-02-27 10:58:25,091][00394] Fps is (10 sec: 4096.2, 60 sec: 3618.1, 300 sec: 3679.5). Total num frames: 2531328. Throughput: 0: 929.8. Samples: 632168. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 10:58:25,094][00394] Avg episode reward: [(0, '4.738')] [2023-02-27 10:58:26,502][11895] Updated weights for policy 0, policy_version 620 (0.0021) [2023-02-27 10:58:30,091][00394] Fps is (10 sec: 3687.2, 60 sec: 3686.4, 300 sec: 3665.6). Total num frames: 2547712. Throughput: 0: 890.8. Samples: 637222. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-27 10:58:30,098][00394] Avg episode reward: [(0, '4.743')] [2023-02-27 10:58:35,091][00394] Fps is (10 sec: 2867.2, 60 sec: 3618.1, 300 sec: 3637.8). Total num frames: 2560000. Throughput: 0: 888.2. Samples: 639302. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 10:58:35,098][00394] Avg episode reward: [(0, '4.675')] [2023-02-27 10:58:38,975][11895] Updated weights for policy 0, policy_version 630 (0.0028) [2023-02-27 10:58:40,092][00394] Fps is (10 sec: 3685.9, 60 sec: 3618.1, 300 sec: 3665.6). Total num frames: 2584576. Throughput: 0: 915.8. Samples: 644724. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 10:58:40,099][00394] Avg episode reward: [(0, '4.629')] [2023-02-27 10:58:45,091][00394] Fps is (10 sec: 4505.6, 60 sec: 3618.1, 300 sec: 3679.5). Total num frames: 2605056. Throughput: 0: 925.5. Samples: 651402. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 10:58:45,094][00394] Avg episode reward: [(0, '4.921')] [2023-02-27 10:58:50,024][11895] Updated weights for policy 0, policy_version 640 (0.0018) [2023-02-27 10:58:50,097][00394] Fps is (10 sec: 3684.8, 60 sec: 3686.3, 300 sec: 3665.5). Total num frames: 2621440. Throughput: 0: 910.2. Samples: 653990. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 10:58:50,099][00394] Avg episode reward: [(0, '4.830')] [2023-02-27 10:58:55,091][00394] Fps is (10 sec: 2867.1, 60 sec: 3618.1, 300 sec: 3637.8). Total num frames: 2633728. Throughput: 0: 887.6. Samples: 658220. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-27 10:58:55,097][00394] Avg episode reward: [(0, '4.688')] [2023-02-27 10:59:00,091][00394] Fps is (10 sec: 3278.6, 60 sec: 3618.1, 300 sec: 3651.7). Total num frames: 2654208. Throughput: 0: 918.7. Samples: 663826. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 10:59:00,093][00394] Avg episode reward: [(0, '4.633')] [2023-02-27 10:59:01,307][11895] Updated weights for policy 0, policy_version 650 (0.0023) [2023-02-27 10:59:05,091][00394] Fps is (10 sec: 4505.7, 60 sec: 3618.1, 300 sec: 3679.5). Total num frames: 2678784. Throughput: 0: 928.2. Samples: 667182. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 10:59:05,094][00394] Avg episode reward: [(0, '4.841')] [2023-02-27 10:59:10,095][00394] Fps is (10 sec: 3684.9, 60 sec: 3618.0, 300 sec: 3651.6). Total num frames: 2691072. Throughput: 0: 905.1. Samples: 672900. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 10:59:10,102][00394] Avg episode reward: [(0, '4.859')] [2023-02-27 10:59:13,325][11895] Updated weights for policy 0, policy_version 660 (0.0013) [2023-02-27 10:59:15,091][00394] Fps is (10 sec: 2867.2, 60 sec: 3618.2, 300 sec: 3637.8). Total num frames: 2707456. Throughput: 0: 887.2. Samples: 677144. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 10:59:15,098][00394] Avg episode reward: [(0, '4.698')] [2023-02-27 10:59:20,091][00394] Fps is (10 sec: 3687.9, 60 sec: 3618.3, 300 sec: 3638.0). Total num frames: 2727936. Throughput: 0: 898.8. Samples: 679748. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 10:59:20,098][00394] Avg episode reward: [(0, '4.608')] [2023-02-27 10:59:23,337][11895] Updated weights for policy 0, policy_version 670 (0.0017) [2023-02-27 10:59:25,091][00394] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3665.6). Total num frames: 2748416. Throughput: 0: 927.7. Samples: 686468. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 10:59:25,098][00394] Avg episode reward: [(0, '4.740')] [2023-02-27 10:59:30,091][00394] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3651.7). Total num frames: 2764800. Throughput: 0: 900.0. Samples: 691900. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 10:59:30,097][00394] Avg episode reward: [(0, '4.913')] [2023-02-27 10:59:35,091][00394] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3637.8). Total num frames: 2781184. Throughput: 0: 888.8. Samples: 693982. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 10:59:35,094][00394] Avg episode reward: [(0, '4.905')] [2023-02-27 10:59:36,072][11895] Updated weights for policy 0, policy_version 680 (0.0014) [2023-02-27 10:59:40,091][00394] Fps is (10 sec: 3686.4, 60 sec: 3618.2, 300 sec: 3638.0). Total num frames: 2801664. Throughput: 0: 909.0. Samples: 699124. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 10:59:40,096][00394] Avg episode reward: [(0, '5.060')] [2023-02-27 10:59:45,091][00394] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3665.6). Total num frames: 2822144. Throughput: 0: 936.3. Samples: 705958. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 10:59:45,096][00394] Avg episode reward: [(0, '4.948')] [2023-02-27 10:59:45,286][11895] Updated weights for policy 0, policy_version 690 (0.0024) [2023-02-27 10:59:50,091][00394] Fps is (10 sec: 3686.4, 60 sec: 3618.5, 300 sec: 3651.7). Total num frames: 2838528. Throughput: 0: 930.4. Samples: 709050. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 10:59:50,099][00394] Avg episode reward: [(0, '4.965')] [2023-02-27 10:59:55,091][00394] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3637.8). Total num frames: 2854912. Throughput: 0: 898.7. Samples: 713340. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 10:59:55,098][00394] Avg episode reward: [(0, '4.864')] [2023-02-27 10:59:58,083][11895] Updated weights for policy 0, policy_version 700 (0.0012) [2023-02-27 11:00:00,091][00394] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3637.8). Total num frames: 2875392. Throughput: 0: 923.6. Samples: 718706. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-27 11:00:00,094][00394] Avg episode reward: [(0, '4.912')] [2023-02-27 11:00:00,103][11881] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000702_2875392.pth... [2023-02-27 11:00:00,218][11881] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000489_2002944.pth [2023-02-27 11:00:05,091][00394] Fps is (10 sec: 4096.1, 60 sec: 3618.1, 300 sec: 3665.6). Total num frames: 2895872. Throughput: 0: 938.5. Samples: 721980. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-02-27 11:00:05,096][00394] Avg episode reward: [(0, '4.854')] [2023-02-27 11:00:07,250][11895] Updated weights for policy 0, policy_version 710 (0.0012) [2023-02-27 11:00:10,091][00394] Fps is (10 sec: 3686.4, 60 sec: 3686.7, 300 sec: 3651.7). Total num frames: 2912256. Throughput: 0: 927.0. Samples: 728182. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 11:00:10,100][00394] Avg episode reward: [(0, '4.650')] [2023-02-27 11:00:15,091][00394] Fps is (10 sec: 3276.7, 60 sec: 3686.4, 300 sec: 3637.8). Total num frames: 2928640. Throughput: 0: 901.3. Samples: 732460. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 11:00:15,098][00394] Avg episode reward: [(0, '4.565')] [2023-02-27 11:00:19,892][11895] Updated weights for policy 0, policy_version 720 (0.0026) [2023-02-27 11:00:20,091][00394] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3651.7). Total num frames: 2949120. Throughput: 0: 903.8. Samples: 734654. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 11:00:20,099][00394] Avg episode reward: [(0, '4.593')] [2023-02-27 11:00:25,091][00394] Fps is (10 sec: 4096.1, 60 sec: 3686.4, 300 sec: 3665.6). Total num frames: 2969600. Throughput: 0: 936.4. Samples: 741264. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-27 11:00:25,100][00394] Avg episode reward: [(0, '4.593')] [2023-02-27 11:00:30,091][00394] Fps is (10 sec: 3686.3, 60 sec: 3686.4, 300 sec: 3651.7). Total num frames: 2985984. Throughput: 0: 915.5. Samples: 747154. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 11:00:30,098][00394] Avg episode reward: [(0, '4.668')] [2023-02-27 11:00:30,545][11895] Updated weights for policy 0, policy_version 730 (0.0020) [2023-02-27 11:00:35,091][00394] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3637.8). Total num frames: 3002368. Throughput: 0: 894.2. Samples: 749288. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-27 11:00:35,093][00394] Avg episode reward: [(0, '4.537')] [2023-02-27 11:00:40,091][00394] Fps is (10 sec: 3276.9, 60 sec: 3618.1, 300 sec: 3623.9). Total num frames: 3018752. Throughput: 0: 902.4. Samples: 753948. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 11:00:40,100][00394] Avg episode reward: [(0, '4.617')] [2023-02-27 11:00:42,078][11895] Updated weights for policy 0, policy_version 740 (0.0012) [2023-02-27 11:00:45,091][00394] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3665.6). Total num frames: 3043328. Throughput: 0: 932.6. Samples: 760674. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 11:00:45,098][00394] Avg episode reward: [(0, '4.546')] [2023-02-27 11:00:50,091][00394] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3665.6). Total num frames: 3059712. Throughput: 0: 933.4. Samples: 763982. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 11:00:50,098][00394] Avg episode reward: [(0, '4.621')] [2023-02-27 11:00:53,674][11895] Updated weights for policy 0, policy_version 750 (0.0019) [2023-02-27 11:00:55,091][00394] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3637.8). Total num frames: 3076096. Throughput: 0: 891.9. Samples: 768318. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-27 11:00:55,099][00394] Avg episode reward: [(0, '4.585')] [2023-02-27 11:01:00,091][00394] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3637.8). Total num frames: 3092480. Throughput: 0: 909.2. Samples: 773374. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 11:01:00,094][00394] Avg episode reward: [(0, '4.499')] [2023-02-27 11:01:04,235][11895] Updated weights for policy 0, policy_version 760 (0.0022) [2023-02-27 11:01:05,091][00394] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3651.7). Total num frames: 3112960. Throughput: 0: 932.8. Samples: 776630. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 11:01:05,093][00394] Avg episode reward: [(0, '4.640')] [2023-02-27 11:01:10,091][00394] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3665.6). Total num frames: 3133440. Throughput: 0: 928.8. Samples: 783060. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 11:01:10,093][00394] Avg episode reward: [(0, '4.705')] [2023-02-27 11:01:15,091][00394] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3637.9). Total num frames: 3145728. Throughput: 0: 894.8. Samples: 787420. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 11:01:15,098][00394] Avg episode reward: [(0, '4.611')] [2023-02-27 11:01:16,611][11895] Updated weights for policy 0, policy_version 770 (0.0025) [2023-02-27 11:01:20,091][00394] Fps is (10 sec: 3276.7, 60 sec: 3618.1, 300 sec: 3637.9). Total num frames: 3166208. Throughput: 0: 894.8. Samples: 789552. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 11:01:20,093][00394] Avg episode reward: [(0, '4.614')] [2023-02-27 11:01:25,091][00394] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3651.7). Total num frames: 3186688. Throughput: 0: 932.5. Samples: 795912. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 11:01:25,093][00394] Avg episode reward: [(0, '4.762')] [2023-02-27 11:01:26,264][11895] Updated weights for policy 0, policy_version 780 (0.0026) [2023-02-27 11:01:30,091][00394] Fps is (10 sec: 4096.1, 60 sec: 3686.4, 300 sec: 3665.6). Total num frames: 3207168. Throughput: 0: 920.4. Samples: 802090. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-27 11:01:30,095][00394] Avg episode reward: [(0, '4.586')] [2023-02-27 11:01:35,091][00394] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3637.8). Total num frames: 3219456. Throughput: 0: 894.1. Samples: 804218. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 11:01:35,098][00394] Avg episode reward: [(0, '4.527')] [2023-02-27 11:01:38,969][11895] Updated weights for policy 0, policy_version 790 (0.0013) [2023-02-27 11:01:40,091][00394] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3637.8). Total num frames: 3239936. Throughput: 0: 895.3. Samples: 808606. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 11:01:40,098][00394] Avg episode reward: [(0, '4.563')] [2023-02-27 11:01:45,091][00394] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3665.7). Total num frames: 3260416. Throughput: 0: 935.2. Samples: 815460. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 11:01:45,098][00394] Avg episode reward: [(0, '4.822')] [2023-02-27 11:01:48,173][11895] Updated weights for policy 0, policy_version 800 (0.0015) [2023-02-27 11:01:50,091][00394] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3665.6). Total num frames: 3280896. Throughput: 0: 935.6. Samples: 818730. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 11:01:50,094][00394] Avg episode reward: [(0, '4.806')] [2023-02-27 11:01:55,091][00394] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3637.8). Total num frames: 3293184. Throughput: 0: 893.5. Samples: 823268. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 11:01:55,096][00394] Avg episode reward: [(0, '4.770')] [2023-02-27 11:02:00,091][00394] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3637.8). Total num frames: 3313664. Throughput: 0: 901.6. Samples: 827990. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 11:02:00,097][00394] Avg episode reward: [(0, '4.617')] [2023-02-27 11:02:00,107][11881] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000809_3313664.pth... [2023-02-27 11:02:00,218][11881] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000595_2437120.pth [2023-02-27 11:02:00,988][11895] Updated weights for policy 0, policy_version 810 (0.0019) [2023-02-27 11:02:05,091][00394] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3651.7). Total num frames: 3334144. Throughput: 0: 928.8. Samples: 831346. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 11:02:05,093][00394] Avg episode reward: [(0, '4.651')] [2023-02-27 11:02:10,091][00394] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3665.6). Total num frames: 3354624. Throughput: 0: 936.8. Samples: 838066. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 11:02:10,098][00394] Avg episode reward: [(0, '4.839')] [2023-02-27 11:02:11,008][11895] Updated weights for policy 0, policy_version 820 (0.0014) [2023-02-27 11:02:15,091][00394] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3637.8). Total num frames: 3366912. Throughput: 0: 895.3. Samples: 842380. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-27 11:02:15,098][00394] Avg episode reward: [(0, '4.789')] [2023-02-27 11:02:20,091][00394] Fps is (10 sec: 2867.2, 60 sec: 3618.1, 300 sec: 3623.9). Total num frames: 3383296. Throughput: 0: 894.8. Samples: 844484. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 11:02:20,094][00394] Avg episode reward: [(0, '4.735')] [2023-02-27 11:02:23,186][11895] Updated weights for policy 0, policy_version 830 (0.0019) [2023-02-27 11:02:25,091][00394] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3665.6). Total num frames: 3407872. Throughput: 0: 931.8. Samples: 850538. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 11:02:25,098][00394] Avg episode reward: [(0, '4.634')] [2023-02-27 11:02:30,096][00394] Fps is (10 sec: 4503.3, 60 sec: 3686.1, 300 sec: 3679.4). Total num frames: 3428352. Throughput: 0: 925.1. Samples: 857096. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-27 11:02:30,100][00394] Avg episode reward: [(0, '4.475')] [2023-02-27 11:02:34,278][11895] Updated weights for policy 0, policy_version 840 (0.0023) [2023-02-27 11:02:35,091][00394] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3637.8). Total num frames: 3440640. Throughput: 0: 899.2. Samples: 859196. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 11:02:35,102][00394] Avg episode reward: [(0, '4.495')] [2023-02-27 11:02:40,091][00394] Fps is (10 sec: 2868.7, 60 sec: 3618.1, 300 sec: 3623.9). Total num frames: 3457024. Throughput: 0: 893.6. Samples: 863482. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-27 11:02:40,098][00394] Avg episode reward: [(0, '4.626')] [2023-02-27 11:02:44,952][11895] Updated weights for policy 0, policy_version 850 (0.0016) [2023-02-27 11:02:45,091][00394] Fps is (10 sec: 4096.1, 60 sec: 3686.4, 300 sec: 3665.6). Total num frames: 3481600. Throughput: 0: 934.8. Samples: 870058. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 11:02:45,094][00394] Avg episode reward: [(0, '4.720')] [2023-02-27 11:02:50,091][00394] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3665.6). Total num frames: 3497984. Throughput: 0: 935.3. Samples: 873434. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 11:02:50,095][00394] Avg episode reward: [(0, '4.558')] [2023-02-27 11:02:55,095][00394] Fps is (10 sec: 3275.5, 60 sec: 3686.2, 300 sec: 3651.6). Total num frames: 3514368. Throughput: 0: 893.9. Samples: 878294. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 11:02:55,097][00394] Avg episode reward: [(0, '4.391')] [2023-02-27 11:02:57,327][11895] Updated weights for policy 0, policy_version 860 (0.0017) [2023-02-27 11:03:00,091][00394] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3623.9). Total num frames: 3530752. Throughput: 0: 896.3. Samples: 882714. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-27 11:03:00,094][00394] Avg episode reward: [(0, '4.404')] [2023-02-27 11:03:05,091][00394] Fps is (10 sec: 3687.9, 60 sec: 3618.1, 300 sec: 3651.7). Total num frames: 3551232. Throughput: 0: 924.4. Samples: 886084. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-27 11:03:05,097][00394] Avg episode reward: [(0, '4.604')] [2023-02-27 11:03:06,997][11895] Updated weights for policy 0, policy_version 870 (0.0019) [2023-02-27 11:03:10,091][00394] Fps is (10 sec: 4505.6, 60 sec: 3686.4, 300 sec: 3679.5). Total num frames: 3575808. Throughput: 0: 937.6. Samples: 892728. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-27 11:03:10,099][00394] Avg episode reward: [(0, '4.688')] [2023-02-27 11:03:15,091][00394] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3651.7). Total num frames: 3588096. Throughput: 0: 896.5. Samples: 897436. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-27 11:03:15,097][00394] Avg episode reward: [(0, '4.614')] [2023-02-27 11:03:19,737][11895] Updated weights for policy 0, policy_version 880 (0.0016) [2023-02-27 11:03:20,091][00394] Fps is (10 sec: 2867.2, 60 sec: 3686.4, 300 sec: 3637.8). Total num frames: 3604480. Throughput: 0: 896.2. Samples: 899524. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 11:03:20,093][00394] Avg episode reward: [(0, '4.798')] [2023-02-27 11:03:25,091][00394] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3651.7). Total num frames: 3624960. Throughput: 0: 932.2. Samples: 905432. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 11:03:25,099][00394] Avg episode reward: [(0, '4.654')] [2023-02-27 11:03:29,060][11895] Updated weights for policy 0, policy_version 890 (0.0013) [2023-02-27 11:03:30,092][00394] Fps is (10 sec: 4095.6, 60 sec: 3618.4, 300 sec: 3679.4). Total num frames: 3645440. Throughput: 0: 933.9. Samples: 912086. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 11:03:30,099][00394] Avg episode reward: [(0, '4.436')] [2023-02-27 11:03:35,091][00394] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3651.7). Total num frames: 3661824. Throughput: 0: 906.8. Samples: 914240. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 11:03:35,099][00394] Avg episode reward: [(0, '4.556')] [2023-02-27 11:03:40,091][00394] Fps is (10 sec: 3277.1, 60 sec: 3686.4, 300 sec: 3637.8). Total num frames: 3678208. Throughput: 0: 894.6. Samples: 918546. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 11:03:40,094][00394] Avg episode reward: [(0, '4.492')] [2023-02-27 11:03:41,786][11895] Updated weights for policy 0, policy_version 900 (0.0033) [2023-02-27 11:03:45,091][00394] Fps is (10 sec: 3686.3, 60 sec: 3618.1, 300 sec: 3651.8). Total num frames: 3698688. Throughput: 0: 936.5. Samples: 924858. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 11:03:45,097][00394] Avg episode reward: [(0, '4.363')] [2023-02-27 11:03:50,091][00394] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3679.5). Total num frames: 3719168. Throughput: 0: 935.4. Samples: 928176. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-27 11:03:50,096][00394] Avg episode reward: [(0, '4.451')] [2023-02-27 11:03:51,891][11895] Updated weights for policy 0, policy_version 910 (0.0018) [2023-02-27 11:03:55,091][00394] Fps is (10 sec: 3686.4, 60 sec: 3686.6, 300 sec: 3665.6). Total num frames: 3735552. Throughput: 0: 901.4. Samples: 933292. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 11:03:55,098][00394] Avg episode reward: [(0, '4.517')] [2023-02-27 11:04:00,091][00394] Fps is (10 sec: 2867.2, 60 sec: 3618.1, 300 sec: 3623.9). Total num frames: 3747840. Throughput: 0: 891.5. Samples: 937552. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 11:04:00,100][00394] Avg episode reward: [(0, '4.497')] [2023-02-27 11:04:00,113][11881] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000915_3747840.pth... [2023-02-27 11:04:00,236][11881] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000702_2875392.pth [2023-02-27 11:04:03,914][11895] Updated weights for policy 0, policy_version 920 (0.0046) [2023-02-27 11:04:05,091][00394] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3665.6). Total num frames: 3772416. Throughput: 0: 913.9. Samples: 940648. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 11:04:05,093][00394] Avg episode reward: [(0, '4.528')] [2023-02-27 11:04:10,094][00394] Fps is (10 sec: 4504.2, 60 sec: 3617.9, 300 sec: 3679.4). Total num frames: 3792896. Throughput: 0: 935.0. Samples: 947508. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 11:04:10,097][00394] Avg episode reward: [(0, '4.720')] [2023-02-27 11:04:14,801][11895] Updated weights for policy 0, policy_version 930 (0.0013) [2023-02-27 11:04:15,091][00394] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3665.6). Total num frames: 3809280. Throughput: 0: 896.5. Samples: 952428. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 11:04:15,098][00394] Avg episode reward: [(0, '4.708')] [2023-02-27 11:04:20,092][00394] Fps is (10 sec: 2867.9, 60 sec: 3618.1, 300 sec: 3637.8). Total num frames: 3821568. Throughput: 0: 896.7. Samples: 954592. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 11:04:20,098][00394] Avg episode reward: [(0, '4.744')] [2023-02-27 11:04:25,091][00394] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3665.6). Total num frames: 3846144. Throughput: 0: 925.7. Samples: 960204. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 11:04:25,093][00394] Avg episode reward: [(0, '4.699')] [2023-02-27 11:04:25,848][11895] Updated weights for policy 0, policy_version 940 (0.0019) [2023-02-27 11:04:30,091][00394] Fps is (10 sec: 4505.9, 60 sec: 3686.5, 300 sec: 3679.5). Total num frames: 3866624. Throughput: 0: 930.4. Samples: 966728. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 11:04:30,098][00394] Avg episode reward: [(0, '4.686')] [2023-02-27 11:04:35,091][00394] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3665.6). Total num frames: 3883008. Throughput: 0: 911.8. Samples: 969208. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 11:04:35,098][00394] Avg episode reward: [(0, '4.696')] [2023-02-27 11:04:37,938][11895] Updated weights for policy 0, policy_version 950 (0.0012) [2023-02-27 11:04:40,091][00394] Fps is (10 sec: 2867.2, 60 sec: 3618.1, 300 sec: 3637.8). Total num frames: 3895296. Throughput: 0: 891.2. Samples: 973396. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-27 11:04:40,094][00394] Avg episode reward: [(0, '4.786')] [2023-02-27 11:04:45,091][00394] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3651.7). Total num frames: 3915776. Throughput: 0: 924.2. Samples: 979142. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-27 11:04:45,095][00394] Avg episode reward: [(0, '4.642')] [2023-02-27 11:04:48,443][11895] Updated weights for policy 0, policy_version 960 (0.0029) [2023-02-27 11:04:50,091][00394] Fps is (10 sec: 4095.9, 60 sec: 3618.1, 300 sec: 3665.6). Total num frames: 3936256. Throughput: 0: 928.2. Samples: 982418. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-27 11:04:50,093][00394] Avg episode reward: [(0, '4.615')] [2023-02-27 11:04:55,091][00394] Fps is (10 sec: 3686.3, 60 sec: 3618.1, 300 sec: 3651.7). Total num frames: 3952640. Throughput: 0: 899.3. Samples: 987976. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-27 11:04:55,098][00394] Avg episode reward: [(0, '4.546')] [2023-02-27 11:05:00,091][00394] Fps is (10 sec: 3276.9, 60 sec: 3686.4, 300 sec: 3637.8). Total num frames: 3969024. Throughput: 0: 885.0. Samples: 992254. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 11:05:00,094][00394] Avg episode reward: [(0, '4.628')] [2023-02-27 11:05:01,291][11895] Updated weights for policy 0, policy_version 970 (0.0011) [2023-02-27 11:05:05,091][00394] Fps is (10 sec: 3686.5, 60 sec: 3618.1, 300 sec: 3651.7). Total num frames: 3989504. Throughput: 0: 897.7. Samples: 994986. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-27 11:05:05,098][00394] Avg episode reward: [(0, '4.637')] [2023-02-27 11:05:08,730][11881] Stopping Batcher_0... [2023-02-27 11:05:08,732][11881] Loop batcher_evt_loop terminating... [2023-02-27 11:05:08,734][11881] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2023-02-27 11:05:08,738][00394] Component Batcher_0 stopped! [2023-02-27 11:05:08,783][11895] Weights refcount: 2 0 [2023-02-27 11:05:08,793][00394] Component InferenceWorker_p0-w0 stopped! [2023-02-27 11:05:08,793][11895] Stopping InferenceWorker_p0-w0... [2023-02-27 11:05:08,804][11895] Loop inference_proc0-0_evt_loop terminating... [2023-02-27 11:05:08,811][00394] Component RolloutWorker_w0 stopped! [2023-02-27 11:05:08,816][00394] Component RolloutWorker_w5 stopped! [2023-02-27 11:05:08,819][11901] Stopping RolloutWorker_w5... [2023-02-27 11:05:08,811][11896] Stopping RolloutWorker_w0... [2023-02-27 11:05:08,828][11902] Stopping RolloutWorker_w6... [2023-02-27 11:05:08,826][11896] Loop rollout_proc0_evt_loop terminating... [2023-02-27 11:05:08,827][00394] Component RolloutWorker_w6 stopped! [2023-02-27 11:05:08,820][11901] Loop rollout_proc5_evt_loop terminating... [2023-02-27 11:05:08,836][11899] Stopping RolloutWorker_w4... [2023-02-27 11:05:08,836][00394] Component RolloutWorker_w4 stopped! [2023-02-27 11:05:08,832][11902] Loop rollout_proc6_evt_loop terminating... [2023-02-27 11:05:08,849][00394] Component RolloutWorker_w3 stopped! [2023-02-27 11:05:08,850][11898] Stopping RolloutWorker_w2... [2023-02-27 11:05:08,838][11899] Loop rollout_proc4_evt_loop terminating... [2023-02-27 11:05:08,852][00394] Component RolloutWorker_w2 stopped! [2023-02-27 11:05:08,856][11900] Stopping RolloutWorker_w3... [2023-02-27 11:05:08,861][11898] Loop rollout_proc2_evt_loop terminating... [2023-02-27 11:05:08,871][00394] Component RolloutWorker_w1 stopped! [2023-02-27 11:05:08,875][11897] Stopping RolloutWorker_w1... [2023-02-27 11:05:08,875][11897] Loop rollout_proc1_evt_loop terminating... [2023-02-27 11:05:08,879][00394] Component RolloutWorker_w7 stopped! [2023-02-27 11:05:08,883][11903] Stopping RolloutWorker_w7... [2023-02-27 11:05:08,876][11900] Loop rollout_proc3_evt_loop terminating... [2023-02-27 11:05:08,889][11903] Loop rollout_proc7_evt_loop terminating... [2023-02-27 11:05:08,919][11881] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000809_3313664.pth [2023-02-27 11:05:08,932][11881] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2023-02-27 11:05:09,118][00394] Component LearnerWorker_p0 stopped! [2023-02-27 11:05:09,123][00394] Waiting for process learner_proc0 to stop... [2023-02-27 11:05:09,126][11881] Stopping LearnerWorker_p0... [2023-02-27 11:05:09,127][11881] Loop learner_proc0_evt_loop terminating... [2023-02-27 11:05:11,086][00394] Waiting for process inference_proc0-0 to join... [2023-02-27 11:05:11,753][00394] Waiting for process rollout_proc0 to join... [2023-02-27 11:05:11,760][00394] Waiting for process rollout_proc1 to join... [2023-02-27 11:05:12,402][00394] Waiting for process rollout_proc2 to join... [2023-02-27 11:05:12,411][00394] Waiting for process rollout_proc3 to join... [2023-02-27 11:05:12,414][00394] Waiting for process rollout_proc4 to join... [2023-02-27 11:05:12,416][00394] Waiting for process rollout_proc5 to join... [2023-02-27 11:05:12,417][00394] Waiting for process rollout_proc6 to join... [2023-02-27 11:05:12,421][00394] Waiting for process rollout_proc7 to join... [2023-02-27 11:05:12,422][00394] Batcher 0 profile tree view: batching: 25.7700, releasing_batches: 0.0255 [2023-02-27 11:05:12,424][00394] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0000 wait_policy_total: 554.6697 update_model: 7.7464 weight_update: 0.0014 one_step: 0.0059 handle_policy_step: 516.4659 deserialize: 15.0708, stack: 2.9974, obs_to_device_normalize: 114.0557, forward: 248.2549, send_messages: 26.1759 prepare_outputs: 83.6037 to_cpu: 51.9044 [2023-02-27 11:05:12,425][00394] Learner 0 profile tree view: misc: 0.0067, prepare_batch: 16.5836 train: 76.4678 epoch_init: 0.0097, minibatch_init: 0.0103, losses_postprocess: 0.5752, kl_divergence: 0.5372, after_optimizer: 32.9634 calculate_losses: 27.5546 losses_init: 0.0143, forward_head: 1.8299, bptt_initial: 18.0597, tail: 1.1343, advantages_returns: 0.2970, losses: 3.5739 bptt: 2.3090 bptt_forward_core: 2.2016 update: 14.2526 clip: 1.4293 [2023-02-27 11:05:12,428][00394] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.4081, enqueue_policy_requests: 149.7261, env_step: 841.2801, overhead: 21.5995, complete_rollouts: 7.4514 save_policy_outputs: 20.6615 split_output_tensors: 10.3818 [2023-02-27 11:05:12,431][00394] RolloutWorker_w7 profile tree view: wait_for_trajectories: 0.3089, enqueue_policy_requests: 154.1436, env_step: 839.9402, overhead: 20.9316, complete_rollouts: 6.7222 save_policy_outputs: 20.2122 split_output_tensors: 9.9854 [2023-02-27 11:05:12,432][00394] Loop Runner_EvtLoop terminating... [2023-02-27 11:05:12,434][00394] Runner profile tree view: main_loop: 1146.9944 [2023-02-27 11:05:12,437][00394] Collected {0: 4005888}, FPS: 3492.5 [2023-02-27 11:06:05,797][00394] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2023-02-27 11:06:05,799][00394] Overriding arg 'num_workers' with value 1 passed from command line [2023-02-27 11:06:05,804][00394] Adding new argument 'no_render'=True that is not in the saved config file! [2023-02-27 11:06:05,806][00394] Adding new argument 'save_video'=True that is not in the saved config file! [2023-02-27 11:06:05,809][00394] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2023-02-27 11:06:05,812][00394] Adding new argument 'video_name'=None that is not in the saved config file! [2023-02-27 11:06:05,814][00394] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! [2023-02-27 11:06:05,816][00394] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2023-02-27 11:06:05,817][00394] Adding new argument 'push_to_hub'=False that is not in the saved config file! [2023-02-27 11:06:05,819][00394] Adding new argument 'hf_repository'=None that is not in the saved config file! [2023-02-27 11:06:05,821][00394] Adding new argument 'policy_index'=0 that is not in the saved config file! [2023-02-27 11:06:05,823][00394] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2023-02-27 11:06:05,824][00394] Adding new argument 'train_script'=None that is not in the saved config file! [2023-02-27 11:06:05,826][00394] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2023-02-27 11:06:05,828][00394] Using frameskip 1 and render_action_repeat=4 for evaluation [2023-02-27 11:06:05,853][00394] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-27 11:06:05,856][00394] RunningMeanStd input shape: (3, 72, 128) [2023-02-27 11:06:05,859][00394] RunningMeanStd input shape: (1,) [2023-02-27 11:06:05,876][00394] ConvEncoder: input_channels=3 [2023-02-27 11:06:06,533][00394] Conv encoder output size: 512 [2023-02-27 11:06:06,535][00394] Policy head output size: 512 [2023-02-27 11:06:08,982][00394] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2023-02-27 11:06:10,281][00394] Num frames 100... [2023-02-27 11:06:10,402][00394] Num frames 200... [2023-02-27 11:06:10,529][00394] Num frames 300... [2023-02-27 11:06:10,679][00394] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840 [2023-02-27 11:06:10,680][00394] Avg episode reward: 3.840, avg true_objective: 3.840 [2023-02-27 11:06:10,713][00394] Num frames 400... [2023-02-27 11:06:10,827][00394] Num frames 500... [2023-02-27 11:06:10,951][00394] Num frames 600... [2023-02-27 11:06:11,073][00394] Num frames 700... [2023-02-27 11:06:11,196][00394] Num frames 800... [2023-02-27 11:06:11,290][00394] Avg episode rewards: #0: 4.660, true rewards: #0: 4.160 [2023-02-27 11:06:11,291][00394] Avg episode reward: 4.660, avg true_objective: 4.160 [2023-02-27 11:06:11,378][00394] Num frames 900... [2023-02-27 11:06:11,514][00394] Num frames 1000... [2023-02-27 11:06:11,632][00394] Num frames 1100... [2023-02-27 11:06:11,752][00394] Num frames 1200... [2023-02-27 11:06:11,869][00394] Num frames 1300... [2023-02-27 11:06:11,997][00394] Num frames 1400... [2023-02-27 11:06:12,065][00394] Avg episode rewards: #0: 6.360, true rewards: #0: 4.693 [2023-02-27 11:06:12,067][00394] Avg episode reward: 6.360, avg true_objective: 4.693 [2023-02-27 11:06:12,176][00394] Num frames 1500... [2023-02-27 11:06:12,303][00394] Num frames 1600... [2023-02-27 11:06:12,419][00394] Num frames 1700... [2023-02-27 11:06:12,587][00394] Avg episode rewards: #0: 5.730, true rewards: #0: 4.480 [2023-02-27 11:06:12,589][00394] Avg episode reward: 5.730, avg true_objective: 4.480 [2023-02-27 11:06:12,602][00394] Num frames 1800... [2023-02-27 11:06:12,712][00394] Num frames 1900... [2023-02-27 11:06:12,827][00394] Num frames 2000... [2023-02-27 11:06:12,947][00394] Num frames 2100... [2023-02-27 11:06:13,092][00394] Avg episode rewards: #0: 5.352, true rewards: #0: 4.352 [2023-02-27 11:06:13,095][00394] Avg episode reward: 5.352, avg true_objective: 4.352 [2023-02-27 11:06:13,125][00394] Num frames 2200... [2023-02-27 11:06:13,243][00394] Num frames 2300... [2023-02-27 11:06:13,362][00394] Num frames 2400... [2023-02-27 11:06:13,481][00394] Num frames 2500... [2023-02-27 11:06:13,612][00394] Num frames 2600... [2023-02-27 11:06:13,697][00394] Avg episode rewards: #0: 5.373, true rewards: #0: 4.373 [2023-02-27 11:06:13,700][00394] Avg episode reward: 5.373, avg true_objective: 4.373 [2023-02-27 11:06:13,791][00394] Num frames 2700... [2023-02-27 11:06:13,911][00394] Num frames 2800... [2023-02-27 11:06:14,031][00394] Num frames 2900... [2023-02-27 11:06:14,155][00394] Num frames 3000... [2023-02-27 11:06:14,222][00394] Avg episode rewards: #0: 5.154, true rewards: #0: 4.297 [2023-02-27 11:06:14,223][00394] Avg episode reward: 5.154, avg true_objective: 4.297 [2023-02-27 11:06:14,336][00394] Num frames 3100... [2023-02-27 11:06:14,503][00394] Num frames 3200... [2023-02-27 11:06:14,686][00394] Num frames 3300... [2023-02-27 11:06:14,897][00394] Avg episode rewards: #0: 4.990, true rewards: #0: 4.240 [2023-02-27 11:06:14,902][00394] Avg episode reward: 4.990, avg true_objective: 4.240 [2023-02-27 11:06:14,930][00394] Num frames 3400... [2023-02-27 11:06:15,100][00394] Num frames 3500... [2023-02-27 11:06:15,263][00394] Num frames 3600... [2023-02-27 11:06:15,429][00394] Num frames 3700... [2023-02-27 11:06:15,602][00394] Num frames 3800... [2023-02-27 11:06:15,732][00394] Avg episode rewards: #0: 5.044, true rewards: #0: 4.267 [2023-02-27 11:06:15,738][00394] Avg episode reward: 5.044, avg true_objective: 4.267 [2023-02-27 11:06:15,845][00394] Num frames 3900... [2023-02-27 11:06:16,014][00394] Num frames 4000... [2023-02-27 11:06:16,181][00394] Num frames 4100... [2023-02-27 11:06:16,356][00394] Num frames 4200... [2023-02-27 11:06:16,456][00394] Avg episode rewards: #0: 4.924, true rewards: #0: 4.224 [2023-02-27 11:06:16,459][00394] Avg episode reward: 4.924, avg true_objective: 4.224 [2023-02-27 11:06:38,142][00394] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2023-02-27 11:07:55,668][00394] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2023-02-27 11:07:55,669][00394] Overriding arg 'num_workers' with value 1 passed from command line [2023-02-27 11:07:55,673][00394] Adding new argument 'no_render'=True that is not in the saved config file! [2023-02-27 11:07:55,675][00394] Adding new argument 'save_video'=True that is not in the saved config file! [2023-02-27 11:07:55,677][00394] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2023-02-27 11:07:55,679][00394] Adding new argument 'video_name'=None that is not in the saved config file! [2023-02-27 11:07:55,680][00394] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2023-02-27 11:07:55,681][00394] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2023-02-27 11:07:55,683][00394] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2023-02-27 11:07:55,684][00394] Adding new argument 'hf_repository'='Clawoo/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2023-02-27 11:07:55,685][00394] Adding new argument 'policy_index'=0 that is not in the saved config file! [2023-02-27 11:07:55,686][00394] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2023-02-27 11:07:55,687][00394] Adding new argument 'train_script'=None that is not in the saved config file! [2023-02-27 11:07:55,689][00394] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2023-02-27 11:07:55,690][00394] Using frameskip 1 and render_action_repeat=4 for evaluation [2023-02-27 11:07:55,719][00394] RunningMeanStd input shape: (3, 72, 128) [2023-02-27 11:07:55,721][00394] RunningMeanStd input shape: (1,) [2023-02-27 11:07:55,736][00394] ConvEncoder: input_channels=3 [2023-02-27 11:07:55,775][00394] Conv encoder output size: 512 [2023-02-27 11:07:55,779][00394] Policy head output size: 512 [2023-02-27 11:07:55,799][00394] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2023-02-27 11:07:56,280][00394] Num frames 100... [2023-02-27 11:07:56,402][00394] Num frames 200... [2023-02-27 11:07:56,521][00394] Num frames 300... [2023-02-27 11:07:56,647][00394] Num frames 400... [2023-02-27 11:07:56,765][00394] Avg episode rewards: #0: 5.480, true rewards: #0: 4.480 [2023-02-27 11:07:56,767][00394] Avg episode reward: 5.480, avg true_objective: 4.480 [2023-02-27 11:07:56,840][00394] Num frames 500... [2023-02-27 11:07:56,964][00394] Num frames 600... [2023-02-27 11:07:57,096][00394] Num frames 700... [2023-02-27 11:07:57,225][00394] Num frames 800... [2023-02-27 11:07:57,329][00394] Avg episode rewards: #0: 4.660, true rewards: #0: 4.160 [2023-02-27 11:07:57,331][00394] Avg episode reward: 4.660, avg true_objective: 4.160 [2023-02-27 11:07:57,424][00394] Num frames 900... [2023-02-27 11:07:57,554][00394] Num frames 1000... [2023-02-27 11:07:57,688][00394] Num frames 1100... [2023-02-27 11:07:57,811][00394] Num frames 1200... [2023-02-27 11:07:57,894][00394] Avg episode rewards: #0: 4.387, true rewards: #0: 4.053 [2023-02-27 11:07:57,895][00394] Avg episode reward: 4.387, avg true_objective: 4.053 [2023-02-27 11:07:57,998][00394] Num frames 1300... [2023-02-27 11:07:58,138][00394] Num frames 1400... [2023-02-27 11:07:58,318][00394] Num frames 1500... [2023-02-27 11:07:58,489][00394] Num frames 1600... [2023-02-27 11:07:58,653][00394] Avg episode rewards: #0: 4.660, true rewards: #0: 4.160 [2023-02-27 11:07:58,659][00394] Avg episode reward: 4.660, avg true_objective: 4.160 [2023-02-27 11:07:58,720][00394] Num frames 1700... [2023-02-27 11:07:58,881][00394] Num frames 1800... [2023-02-27 11:07:59,047][00394] Num frames 1900... [2023-02-27 11:07:59,207][00394] Num frames 2000... [2023-02-27 11:07:59,386][00394] Num frames 2100... [2023-02-27 11:07:59,468][00394] Avg episode rewards: #0: 4.824, true rewards: #0: 4.224 [2023-02-27 11:07:59,474][00394] Avg episode reward: 4.824, avg true_objective: 4.224 [2023-02-27 11:07:59,614][00394] Num frames 2200... [2023-02-27 11:07:59,776][00394] Num frames 2300... [2023-02-27 11:07:59,945][00394] Num frames 2400... [2023-02-27 11:08:00,159][00394] Avg episode rewards: #0: 4.660, true rewards: #0: 4.160 [2023-02-27 11:08:00,162][00394] Avg episode reward: 4.660, avg true_objective: 4.160 [2023-02-27 11:08:00,173][00394] Num frames 2500... [2023-02-27 11:08:00,347][00394] Num frames 2600... [2023-02-27 11:08:00,516][00394] Num frames 2700... [2023-02-27 11:08:00,680][00394] Num frames 2800... [2023-02-27 11:08:00,866][00394] Avg episode rewards: #0: 4.543, true rewards: #0: 4.114 [2023-02-27 11:08:00,869][00394] Avg episode reward: 4.543, avg true_objective: 4.114 [2023-02-27 11:08:00,909][00394] Num frames 2900... [2023-02-27 11:08:01,076][00394] Num frames 3000... [2023-02-27 11:08:01,238][00394] Num frames 3100... [2023-02-27 11:08:01,409][00394] Num frames 3200... [2023-02-27 11:08:01,579][00394] Avg episode rewards: #0: 4.455, true rewards: #0: 4.080 [2023-02-27 11:08:01,581][00394] Avg episode reward: 4.455, avg true_objective: 4.080 [2023-02-27 11:08:01,642][00394] Num frames 3300... [2023-02-27 11:08:01,772][00394] Num frames 3400... [2023-02-27 11:08:01,901][00394] Num frames 3500... [2023-02-27 11:08:02,025][00394] Num frames 3600... [2023-02-27 11:08:02,136][00394] Avg episode rewards: #0: 4.387, true rewards: #0: 4.053 [2023-02-27 11:08:02,138][00394] Avg episode reward: 4.387, avg true_objective: 4.053 [2023-02-27 11:08:02,201][00394] Num frames 3700... [2023-02-27 11:08:02,318][00394] Num frames 3800... [2023-02-27 11:08:02,442][00394] Num frames 3900... [2023-02-27 11:08:02,570][00394] Num frames 4000... [2023-02-27 11:08:02,663][00394] Avg episode rewards: #0: 4.332, true rewards: #0: 4.032 [2023-02-27 11:08:02,665][00394] Avg episode reward: 4.332, avg true_objective: 4.032 [2023-02-27 11:08:21,442][00394] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2023-02-27 11:08:25,610][00394] The model has been pushed to https://huggingface.co./Clawoo/rl_course_vizdoom_health_gathering_supreme [2023-02-27 11:09:47,580][00394] Loading legacy config file train_dir/doom_health_gathering_supreme_2222/cfg.json instead of train_dir/doom_health_gathering_supreme_2222/config.json [2023-02-27 11:09:47,582][00394] Loading existing experiment configuration from train_dir/doom_health_gathering_supreme_2222/config.json [2023-02-27 11:09:47,584][00394] Overriding arg 'experiment' with value 'doom_health_gathering_supreme_2222' passed from command line [2023-02-27 11:09:47,588][00394] Overriding arg 'train_dir' with value 'train_dir' passed from command line [2023-02-27 11:09:47,590][00394] Overriding arg 'num_workers' with value 1 passed from command line [2023-02-27 11:09:47,592][00394] Adding new argument 'lr_adaptive_min'=1e-06 that is not in the saved config file! [2023-02-27 11:09:47,594][00394] Adding new argument 'lr_adaptive_max'=0.01 that is not in the saved config file! [2023-02-27 11:09:47,595][00394] Adding new argument 'env_gpu_observations'=True that is not in the saved config file! [2023-02-27 11:09:47,596][00394] Adding new argument 'no_render'=True that is not in the saved config file! [2023-02-27 11:09:47,597][00394] Adding new argument 'save_video'=True that is not in the saved config file! [2023-02-27 11:09:47,599][00394] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2023-02-27 11:09:47,600][00394] Adding new argument 'video_name'=None that is not in the saved config file! [2023-02-27 11:09:47,601][00394] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! [2023-02-27 11:09:47,603][00394] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2023-02-27 11:09:47,604][00394] Adding new argument 'push_to_hub'=False that is not in the saved config file! [2023-02-27 11:09:47,605][00394] Adding new argument 'hf_repository'=None that is not in the saved config file! [2023-02-27 11:09:47,606][00394] Adding new argument 'policy_index'=0 that is not in the saved config file! [2023-02-27 11:09:47,608][00394] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2023-02-27 11:09:47,609][00394] Adding new argument 'train_script'=None that is not in the saved config file! [2023-02-27 11:09:47,610][00394] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2023-02-27 11:09:47,611][00394] Using frameskip 1 and render_action_repeat=4 for evaluation [2023-02-27 11:09:47,648][00394] RunningMeanStd input shape: (3, 72, 128) [2023-02-27 11:09:47,650][00394] RunningMeanStd input shape: (1,) [2023-02-27 11:09:47,668][00394] ConvEncoder: input_channels=3 [2023-02-27 11:09:47,711][00394] Conv encoder output size: 512 [2023-02-27 11:09:47,713][00394] Policy head output size: 512 [2023-02-27 11:09:47,736][00394] Loading state from checkpoint train_dir/doom_health_gathering_supreme_2222/checkpoint_p0/checkpoint_000539850_4422451200.pth... [2023-02-27 11:09:48,209][00394] Num frames 100... [2023-02-27 11:09:48,342][00394] Num frames 200... [2023-02-27 11:09:48,460][00394] Num frames 300... [2023-02-27 11:09:48,581][00394] Num frames 400... [2023-02-27 11:09:48,701][00394] Num frames 500... [2023-02-27 11:09:48,826][00394] Num frames 600... [2023-02-27 11:09:48,953][00394] Num frames 700... [2023-02-27 11:09:49,094][00394] Num frames 800... [2023-02-27 11:09:49,212][00394] Num frames 900... [2023-02-27 11:09:49,338][00394] Num frames 1000... [2023-02-27 11:09:49,467][00394] Num frames 1100... [2023-02-27 11:09:49,588][00394] Num frames 1200... [2023-02-27 11:09:49,720][00394] Num frames 1300... [2023-02-27 11:09:49,841][00394] Num frames 1400... [2023-02-27 11:09:49,971][00394] Num frames 1500... [2023-02-27 11:09:50,094][00394] Num frames 1600... [2023-02-27 11:09:50,218][00394] Num frames 1700... [2023-02-27 11:09:50,350][00394] Num frames 1800... [2023-02-27 11:09:50,468][00394] Num frames 1900... [2023-02-27 11:09:50,591][00394] Num frames 2000... [2023-02-27 11:09:50,720][00394] Num frames 2100... [2023-02-27 11:09:50,772][00394] Avg episode rewards: #0: 64.998, true rewards: #0: 21.000 [2023-02-27 11:09:50,774][00394] Avg episode reward: 64.998, avg true_objective: 21.000 [2023-02-27 11:09:50,897][00394] Num frames 2200... [2023-02-27 11:09:51,029][00394] Num frames 2300... [2023-02-27 11:09:51,151][00394] Num frames 2400... [2023-02-27 11:09:51,277][00394] Num frames 2500... [2023-02-27 11:09:51,405][00394] Num frames 2600... [2023-02-27 11:09:51,520][00394] Num frames 2700... [2023-02-27 11:09:51,637][00394] Num frames 2800... [2023-02-27 11:09:51,759][00394] Num frames 2900... [2023-02-27 11:09:51,874][00394] Num frames 3000... [2023-02-27 11:09:51,993][00394] Num frames 3100... [2023-02-27 11:09:52,112][00394] Num frames 3200... [2023-02-27 11:09:52,238][00394] Num frames 3300... [2023-02-27 11:09:52,373][00394] Num frames 3400... [2023-02-27 11:09:52,490][00394] Num frames 3500... [2023-02-27 11:09:52,615][00394] Num frames 3600... [2023-02-27 11:09:52,733][00394] Num frames 3700... [2023-02-27 11:09:52,857][00394] Num frames 3800... [2023-02-27 11:09:52,985][00394] Num frames 3900... [2023-02-27 11:09:53,107][00394] Num frames 4000... [2023-02-27 11:09:53,234][00394] Num frames 4100... [2023-02-27 11:09:53,372][00394] Num frames 4200... [2023-02-27 11:09:53,426][00394] Avg episode rewards: #0: 63.499, true rewards: #0: 21.000 [2023-02-27 11:09:53,429][00394] Avg episode reward: 63.499, avg true_objective: 21.000 [2023-02-27 11:09:53,550][00394] Num frames 4300... [2023-02-27 11:09:53,673][00394] Num frames 4400... [2023-02-27 11:09:53,793][00394] Num frames 4500... [2023-02-27 11:09:53,918][00394] Num frames 4600... [2023-02-27 11:09:54,036][00394] Num frames 4700... [2023-02-27 11:09:54,153][00394] Num frames 4800... [2023-02-27 11:09:54,282][00394] Num frames 4900... [2023-02-27 11:09:54,417][00394] Num frames 5000... [2023-02-27 11:09:54,534][00394] Num frames 5100... [2023-02-27 11:09:54,657][00394] Num frames 5200... [2023-02-27 11:09:54,772][00394] Num frames 5300... [2023-02-27 11:09:54,904][00394] Num frames 5400... [2023-02-27 11:09:55,034][00394] Num frames 5500... [2023-02-27 11:09:55,153][00394] Num frames 5600... [2023-02-27 11:09:55,282][00394] Num frames 5700... [2023-02-27 11:09:55,414][00394] Num frames 5800... [2023-02-27 11:09:55,539][00394] Num frames 5900... [2023-02-27 11:09:55,658][00394] Num frames 6000... [2023-02-27 11:09:55,779][00394] Num frames 6100... [2023-02-27 11:09:55,942][00394] Num frames 6200... [2023-02-27 11:09:56,116][00394] Num frames 6300... [2023-02-27 11:09:56,171][00394] Avg episode rewards: #0: 62.665, true rewards: #0: 21.000 [2023-02-27 11:09:56,173][00394] Avg episode reward: 62.665, avg true_objective: 21.000 [2023-02-27 11:09:56,344][00394] Num frames 6400... [2023-02-27 11:09:56,526][00394] Num frames 6500... [2023-02-27 11:09:56,692][00394] Num frames 6600... [2023-02-27 11:09:56,868][00394] Num frames 6700... [2023-02-27 11:09:57,041][00394] Num frames 6800... [2023-02-27 11:09:57,212][00394] Num frames 6900... [2023-02-27 11:09:57,399][00394] Num frames 7000... [2023-02-27 11:09:57,574][00394] Num frames 7100... [2023-02-27 11:09:57,736][00394] Num frames 7200... [2023-02-27 11:09:57,904][00394] Num frames 7300... [2023-02-27 11:09:58,082][00394] Num frames 7400... [2023-02-27 11:09:58,248][00394] Num frames 7500... [2023-02-27 11:09:58,431][00394] Num frames 7600... [2023-02-27 11:09:58,620][00394] Num frames 7700... [2023-02-27 11:09:58,791][00394] Num frames 7800... [2023-02-27 11:09:58,980][00394] Num frames 7900... [2023-02-27 11:09:59,157][00394] Num frames 8000... [2023-02-27 11:09:59,340][00394] Num frames 8100... [2023-02-27 11:09:59,508][00394] Num frames 8200... [2023-02-27 11:09:59,664][00394] Num frames 8300... [2023-02-27 11:09:59,794][00394] Num frames 8400... [2023-02-27 11:09:59,850][00394] Avg episode rewards: #0: 63.249, true rewards: #0: 21.000 [2023-02-27 11:09:59,852][00394] Avg episode reward: 63.249, avg true_objective: 21.000 [2023-02-27 11:09:59,976][00394] Num frames 8500... [2023-02-27 11:10:00,090][00394] Num frames 8600... [2023-02-27 11:10:00,210][00394] Num frames 8700... [2023-02-27 11:10:00,335][00394] Num frames 8800... [2023-02-27 11:10:00,455][00394] Num frames 8900... [2023-02-27 11:10:00,576][00394] Num frames 9000... [2023-02-27 11:10:00,701][00394] Num frames 9100... [2023-02-27 11:10:00,819][00394] Num frames 9200... [2023-02-27 11:10:00,942][00394] Num frames 9300... [2023-02-27 11:10:01,064][00394] Num frames 9400... [2023-02-27 11:10:01,185][00394] Num frames 9500... [2023-02-27 11:10:01,308][00394] Num frames 9600... [2023-02-27 11:10:01,434][00394] Num frames 9700... [2023-02-27 11:10:01,549][00394] Num frames 9800... [2023-02-27 11:10:01,680][00394] Num frames 9900... [2023-02-27 11:10:01,800][00394] Num frames 10000... [2023-02-27 11:10:01,927][00394] Num frames 10100... [2023-02-27 11:10:02,050][00394] Num frames 10200... [2023-02-27 11:10:02,165][00394] Num frames 10300... [2023-02-27 11:10:02,302][00394] Num frames 10400... [2023-02-27 11:10:02,427][00394] Num frames 10500... [2023-02-27 11:10:02,479][00394] Avg episode rewards: #0: 61.399, true rewards: #0: 21.000 [2023-02-27 11:10:02,482][00394] Avg episode reward: 61.399, avg true_objective: 21.000 [2023-02-27 11:10:02,600][00394] Num frames 10600... [2023-02-27 11:10:02,730][00394] Num frames 10700... [2023-02-27 11:10:02,852][00394] Num frames 10800... [2023-02-27 11:10:02,967][00394] Num frames 10900... [2023-02-27 11:10:03,085][00394] Num frames 11000... [2023-02-27 11:10:03,201][00394] Num frames 11100... [2023-02-27 11:10:03,317][00394] Num frames 11200... [2023-02-27 11:10:03,443][00394] Num frames 11300... [2023-02-27 11:10:03,559][00394] Num frames 11400... [2023-02-27 11:10:03,693][00394] Num frames 11500... [2023-02-27 11:10:03,810][00394] Num frames 11600... [2023-02-27 11:10:03,933][00394] Num frames 11700... [2023-02-27 11:10:04,057][00394] Num frames 11800... [2023-02-27 11:10:04,185][00394] Num frames 11900... [2023-02-27 11:10:04,311][00394] Num frames 12000... [2023-02-27 11:10:04,437][00394] Num frames 12100... [2023-02-27 11:10:04,565][00394] Num frames 12200... [2023-02-27 11:10:04,688][00394] Num frames 12300... [2023-02-27 11:10:04,810][00394] Num frames 12400... [2023-02-27 11:10:04,934][00394] Num frames 12500... [2023-02-27 11:10:05,065][00394] Num frames 12600... [2023-02-27 11:10:05,117][00394] Avg episode rewards: #0: 61.332, true rewards: #0: 21.000 [2023-02-27 11:10:05,119][00394] Avg episode reward: 61.332, avg true_objective: 21.000 [2023-02-27 11:10:05,240][00394] Num frames 12700... [2023-02-27 11:10:05,359][00394] Num frames 12800... [2023-02-27 11:10:05,483][00394] Num frames 12900... [2023-02-27 11:10:05,600][00394] Num frames 13000... [2023-02-27 11:10:05,720][00394] Num frames 13100... [2023-02-27 11:10:05,847][00394] Num frames 13200... [2023-02-27 11:10:05,965][00394] Num frames 13300... [2023-02-27 11:10:06,090][00394] Num frames 13400... [2023-02-27 11:10:06,209][00394] Num frames 13500... [2023-02-27 11:10:06,346][00394] Num frames 13600... [2023-02-27 11:10:06,477][00394] Num frames 13700... [2023-02-27 11:10:06,593][00394] Num frames 13800... [2023-02-27 11:10:06,723][00394] Num frames 13900... [2023-02-27 11:10:06,840][00394] Num frames 14000... [2023-02-27 11:10:06,965][00394] Num frames 14100... [2023-02-27 11:10:07,084][00394] Num frames 14200... [2023-02-27 11:10:07,211][00394] Num frames 14300... [2023-02-27 11:10:07,375][00394] Avg episode rewards: #0: 59.844, true rewards: #0: 20.560 [2023-02-27 11:10:07,377][00394] Avg episode reward: 59.844, avg true_objective: 20.560 [2023-02-27 11:10:07,390][00394] Num frames 14400... [2023-02-27 11:10:07,507][00394] Num frames 14500... [2023-02-27 11:10:07,627][00394] Num frames 14600... [2023-02-27 11:10:07,747][00394] Num frames 14700... [2023-02-27 11:10:07,870][00394] Num frames 14800... [2023-02-27 11:10:07,987][00394] Num frames 14900... [2023-02-27 11:10:08,107][00394] Num frames 15000... [2023-02-27 11:10:08,224][00394] Num frames 15100... [2023-02-27 11:10:08,353][00394] Num frames 15200... [2023-02-27 11:10:08,470][00394] Num frames 15300... [2023-02-27 11:10:08,588][00394] Num frames 15400... [2023-02-27 11:10:08,706][00394] Num frames 15500... [2023-02-27 11:10:08,840][00394] Num frames 15600... [2023-02-27 11:10:08,963][00394] Num frames 15700... [2023-02-27 11:10:09,083][00394] Num frames 15800... [2023-02-27 11:10:09,203][00394] Num frames 15900... [2023-02-27 11:10:09,336][00394] Num frames 16000... [2023-02-27 11:10:09,455][00394] Num frames 16100... [2023-02-27 11:10:09,581][00394] Num frames 16200... [2023-02-27 11:10:09,744][00394] Num frames 16300... [2023-02-27 11:10:09,920][00394] Num frames 16400... [2023-02-27 11:10:10,136][00394] Avg episode rewards: #0: 60.739, true rewards: #0: 20.615 [2023-02-27 11:10:10,138][00394] Avg episode reward: 60.739, avg true_objective: 20.615 [2023-02-27 11:10:10,154][00394] Num frames 16500... [2023-02-27 11:10:10,325][00394] Num frames 16600... [2023-02-27 11:10:10,494][00394] Num frames 16700... [2023-02-27 11:10:10,664][00394] Num frames 16800... [2023-02-27 11:10:10,845][00394] Num frames 16900... [2023-02-27 11:10:11,021][00394] Num frames 17000... [2023-02-27 11:10:11,189][00394] Num frames 17100... [2023-02-27 11:10:11,361][00394] Num frames 17200... [2023-02-27 11:10:11,546][00394] Num frames 17300... [2023-02-27 11:10:11,719][00394] Num frames 17400... [2023-02-27 11:10:11,898][00394] Num frames 17500... [2023-02-27 11:10:12,071][00394] Num frames 17600... [2023-02-27 11:10:12,245][00394] Num frames 17700... [2023-02-27 11:10:12,424][00394] Num frames 17800... [2023-02-27 11:10:12,599][00394] Num frames 17900... [2023-02-27 11:10:12,772][00394] Num frames 18000... [2023-02-27 11:10:12,969][00394] Num frames 18100... [2023-02-27 11:10:13,144][00394] Num frames 18200... [2023-02-27 11:10:13,316][00394] Num frames 18300... [2023-02-27 11:10:13,446][00394] Num frames 18400... [2023-02-27 11:10:13,566][00394] Num frames 18500... [2023-02-27 11:10:13,739][00394] Avg episode rewards: #0: 61.212, true rewards: #0: 20.658 [2023-02-27 11:10:13,741][00394] Avg episode reward: 61.212, avg true_objective: 20.658 [2023-02-27 11:10:13,754][00394] Num frames 18600... [2023-02-27 11:10:13,881][00394] Num frames 18700... [2023-02-27 11:10:14,010][00394] Num frames 18800... [2023-02-27 11:10:14,133][00394] Num frames 18900... [2023-02-27 11:10:14,250][00394] Num frames 19000... [2023-02-27 11:10:14,374][00394] Num frames 19100... [2023-02-27 11:10:14,492][00394] Num frames 19200... [2023-02-27 11:10:14,611][00394] Num frames 19300... [2023-02-27 11:10:14,735][00394] Num frames 19400... [2023-02-27 11:10:14,853][00394] Num frames 19500... [2023-02-27 11:10:14,993][00394] Num frames 19600... [2023-02-27 11:10:15,117][00394] Num frames 19700... [2023-02-27 11:10:15,244][00394] Num frames 19800... [2023-02-27 11:10:15,373][00394] Num frames 19900... [2023-02-27 11:10:15,492][00394] Num frames 20000... [2023-02-27 11:10:15,612][00394] Num frames 20100... [2023-02-27 11:10:15,729][00394] Num frames 20200... [2023-02-27 11:10:15,859][00394] Num frames 20300... [2023-02-27 11:10:15,983][00394] Num frames 20400... [2023-02-27 11:10:16,110][00394] Num frames 20500... [2023-02-27 11:10:16,229][00394] Num frames 20600... [2023-02-27 11:10:16,405][00394] Avg episode rewards: #0: 61.591, true rewards: #0: 20.692 [2023-02-27 11:10:16,407][00394] Avg episode reward: 61.591, avg true_objective: 20.692 [2023-02-27 11:12:28,617][00394] Replay video saved to train_dir/doom_health_gathering_supreme_2222/replay.mp4! [2023-02-27 11:16:16,032][00394] Environment doom_basic already registered, overwriting... [2023-02-27 11:16:16,038][00394] Environment doom_two_colors_easy already registered, overwriting... [2023-02-27 11:16:16,040][00394] Environment doom_two_colors_hard already registered, overwriting... [2023-02-27 11:16:16,042][00394] Environment doom_dm already registered, overwriting... [2023-02-27 11:16:16,044][00394] Environment doom_dwango5 already registered, overwriting... [2023-02-27 11:16:16,045][00394] Environment doom_my_way_home_flat_actions already registered, overwriting... [2023-02-27 11:16:16,047][00394] Environment doom_defend_the_center_flat_actions already registered, overwriting... [2023-02-27 11:16:16,048][00394] Environment doom_my_way_home already registered, overwriting... [2023-02-27 11:16:16,050][00394] Environment doom_deadly_corridor already registered, overwriting... [2023-02-27 11:16:16,051][00394] Environment doom_defend_the_center already registered, overwriting... [2023-02-27 11:16:16,052][00394] Environment doom_defend_the_line already registered, overwriting... [2023-02-27 11:16:16,056][00394] Environment doom_health_gathering already registered, overwriting... [2023-02-27 11:16:16,057][00394] Environment doom_health_gathering_supreme already registered, overwriting... [2023-02-27 11:16:16,058][00394] Environment doom_battle already registered, overwriting... [2023-02-27 11:16:16,059][00394] Environment doom_battle2 already registered, overwriting... [2023-02-27 11:16:16,061][00394] Environment doom_duel_bots already registered, overwriting... [2023-02-27 11:16:16,062][00394] Environment doom_deathmatch_bots already registered, overwriting... [2023-02-27 11:16:16,063][00394] Environment doom_duel already registered, overwriting... [2023-02-27 11:16:16,065][00394] Environment doom_deathmatch_full already registered, overwriting... [2023-02-27 11:16:16,066][00394] Environment doom_benchmark already registered, overwriting... [2023-02-27 11:16:16,067][00394] register_encoder_factory: [2023-02-27 11:16:16,105][00394] Loading legacy config file train_dir/doom_deathmatch_bots_2222/cfg.json instead of train_dir/doom_deathmatch_bots_2222/config.json [2023-02-27 11:16:16,107][00394] Loading existing experiment configuration from train_dir/doom_deathmatch_bots_2222/config.json [2023-02-27 11:16:16,108][00394] Overriding arg 'experiment' with value 'doom_deathmatch_bots_2222' passed from command line [2023-02-27 11:16:16,110][00394] Overriding arg 'train_dir' with value 'train_dir' passed from command line [2023-02-27 11:16:16,111][00394] Overriding arg 'num_workers' with value 1 passed from command line [2023-02-27 11:16:16,114][00394] Adding new argument 'lr_adaptive_min'=1e-06 that is not in the saved config file! [2023-02-27 11:16:16,115][00394] Adding new argument 'lr_adaptive_max'=0.01 that is not in the saved config file! [2023-02-27 11:16:16,116][00394] Adding new argument 'env_gpu_observations'=True that is not in the saved config file! [2023-02-27 11:16:16,118][00394] Adding new argument 'no_render'=True that is not in the saved config file! [2023-02-27 11:16:16,119][00394] Adding new argument 'save_video'=True that is not in the saved config file! [2023-02-27 11:16:16,120][00394] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2023-02-27 11:16:16,121][00394] Adding new argument 'video_name'=None that is not in the saved config file! [2023-02-27 11:16:16,123][00394] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! [2023-02-27 11:16:16,124][00394] Adding new argument 'max_num_episodes'=1 that is not in the saved config file! [2023-02-27 11:16:16,125][00394] Adding new argument 'push_to_hub'=False that is not in the saved config file! [2023-02-27 11:16:16,127][00394] Adding new argument 'hf_repository'=None that is not in the saved config file! [2023-02-27 11:16:16,128][00394] Adding new argument 'policy_index'=0 that is not in the saved config file! [2023-02-27 11:16:16,129][00394] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2023-02-27 11:16:16,131][00394] Adding new argument 'train_script'=None that is not in the saved config file! [2023-02-27 11:16:16,132][00394] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2023-02-27 11:16:16,133][00394] Using frameskip 1 and render_action_repeat=4 for evaluation [2023-02-27 11:16:16,177][00394] Port 40300 is available [2023-02-27 11:16:16,179][00394] Using port 40300 [2023-02-27 11:16:16,183][00394] RunningMeanStd input shape: (23,) [2023-02-27 11:16:16,186][00394] RunningMeanStd input shape: (3, 72, 128) [2023-02-27 11:16:16,189][00394] RunningMeanStd input shape: (1,) [2023-02-27 11:16:16,204][00394] ConvEncoder: input_channels=3 [2023-02-27 11:16:16,251][00394] Conv encoder output size: 512 [2023-02-27 11:16:16,254][00394] Policy head output size: 512 [2023-02-27 11:16:16,300][00394] Loading state from checkpoint train_dir/doom_deathmatch_bots_2222/checkpoint_p0/checkpoint_000282220_2311946240.pth... [2023-02-27 11:16:16,336][00394] Using port 40300 on host... [2023-02-27 11:16:16,669][00394] Initialized w:0 v:0 player:0 [2023-02-27 11:16:16,841][00394] Num frames 100... [2023-02-27 11:16:17,008][00394] Num frames 200... [2023-02-27 11:16:17,189][00394] Num frames 300... [2023-02-27 11:16:17,377][00394] Num frames 400... [2023-02-27 11:16:17,553][00394] Num frames 500... [2023-02-27 11:16:17,722][00394] Num frames 600... [2023-02-27 11:16:17,887][00394] Num frames 700... [2023-02-27 11:16:18,056][00394] Num frames 800... [2023-02-27 11:16:18,236][00394] Num frames 900... [2023-02-27 11:16:18,419][00394] Num frames 1000... [2023-02-27 11:16:18,586][00394] Num frames 1100... [2023-02-27 11:16:18,760][00394] Num frames 1200... [2023-02-27 11:16:18,940][00394] Num frames 1300... [2023-02-27 11:16:19,126][00394] Num frames 1400... [2023-02-27 11:16:19,305][00394] Num frames 1500... [2023-02-27 11:16:19,478][00394] Num frames 1600... [2023-02-27 11:16:19,647][00394] Num frames 1700... [2023-02-27 11:16:19,829][00394] Num frames 1800... [2023-02-27 11:16:20,003][00394] Num frames 1900... [2023-02-27 11:16:20,170][00394] Num frames 2000... [2023-02-27 11:16:20,358][00394] Num frames 2100... [2023-02-27 11:16:20,533][00394] Num frames 2200... [2023-02-27 11:16:20,706][00394] Num frames 2300... [2023-02-27 11:16:20,892][00394] Num frames 2400... [2023-02-27 11:16:21,061][00394] Num frames 2500... [2023-02-27 11:16:21,234][00394] Num frames 2600... [2023-02-27 11:16:21,437][00394] Num frames 2700... [2023-02-27 11:16:21,608][00394] Num frames 2800... [2023-02-27 11:16:21,769][00394] Num frames 2900... [2023-02-27 11:16:21,941][00394] Num frames 3000... [2023-02-27 11:16:22,181][00394] Num frames 3100... [2023-02-27 11:16:22,459][00394] Num frames 3200... [2023-02-27 11:16:22,697][00394] Num frames 3300... [2023-02-27 11:16:22,940][00394] Num frames 3400... [2023-02-27 11:16:23,191][00394] Num frames 3500... [2023-02-27 11:16:23,483][00394] Num frames 3600... [2023-02-27 11:16:23,733][00394] Num frames 3700... [2023-02-27 11:16:23,993][00394] Num frames 3800... [2023-02-27 11:16:24,238][00394] Num frames 3900... [2023-02-27 11:16:24,494][00394] Num frames 4000... [2023-02-27 11:16:24,738][00394] Num frames 4100... [2023-02-27 11:16:24,991][00394] Num frames 4200... [2023-02-27 11:16:25,241][00394] Num frames 4300... [2023-02-27 11:16:25,494][00394] Num frames 4400... [2023-02-27 11:16:25,753][00394] Num frames 4500... [2023-02-27 11:16:25,944][00394] Num frames 4600... [2023-02-27 11:16:26,127][00394] Num frames 4700... [2023-02-27 11:16:26,305][00394] Num frames 4800... [2023-02-27 11:16:26,483][00394] Num frames 4900... [2023-02-27 11:16:26,680][00394] Num frames 5000... [2023-02-27 11:16:26,850][00394] Num frames 5100... [2023-02-27 11:16:27,029][00394] Num frames 5200... [2023-02-27 11:16:27,203][00394] Num frames 5300... [2023-02-27 11:16:27,381][00394] Num frames 5400... [2023-02-27 11:16:27,565][00394] Num frames 5500... [2023-02-27 11:16:27,739][00394] Num frames 5600... [2023-02-27 11:16:27,919][00394] Num frames 5700... [2023-02-27 11:16:28,091][00394] Num frames 5800... [2023-02-27 11:16:28,258][00394] Num frames 5900... [2023-02-27 11:16:28,436][00394] Num frames 6000... [2023-02-27 11:16:28,623][00394] Num frames 6100... [2023-02-27 11:16:28,801][00394] Num frames 6200... [2023-02-27 11:16:28,968][00394] Num frames 6300... [2023-02-27 11:16:29,150][00394] Num frames 6400... [2023-02-27 11:16:29,327][00394] Num frames 6500... [2023-02-27 11:16:29,502][00394] Num frames 6600... [2023-02-27 11:16:29,686][00394] Num frames 6700... [2023-02-27 11:16:29,855][00394] Num frames 6800... [2023-02-27 11:16:30,026][00394] Num frames 6900... [2023-02-27 11:16:30,204][00394] Num frames 7000... [2023-02-27 11:16:30,384][00394] Num frames 7100... [2023-02-27 11:16:30,556][00394] Num frames 7200... [2023-02-27 11:16:30,734][00394] Num frames 7300... [2023-02-27 11:16:30,904][00394] Num frames 7400... [2023-02-27 11:16:31,084][00394] Num frames 7500... [2023-02-27 11:16:31,267][00394] Num frames 7600... [2023-02-27 11:16:31,442][00394] Num frames 7700... [2023-02-27 11:16:31,651][00394] Num frames 7800... [2023-02-27 11:16:31,830][00394] Num frames 7900... [2023-02-27 11:16:32,004][00394] Num frames 8000... [2023-02-27 11:16:32,179][00394] Num frames 8100... [2023-02-27 11:16:32,360][00394] Num frames 8200... [2023-02-27 11:16:32,540][00394] Num frames 8300... [2023-02-27 11:16:32,715][00394] DAMAGECOUNT value on done: 6533.0 [2023-02-27 11:16:32,717][00394] Sum rewards: 91.524, reward structure: {'DEATHCOUNT': '-12.750', 'HEALTH': '-5.080', 'AMMO5': '0.007', 'AMMO2': '0.022', 'AMMO4': '0.107', 'AMMO3': '0.204', 'WEAPON4': '0.300', 'WEAPON5': '0.300', 'weapon5': '0.654', 'weapon4': '1.066', 'WEAPON3': '1.500', 'weapon2': '1.656', 'HITCOUNT': '3.430', 'weapon3': '12.508', 'DAMAGECOUNT': '19.599', 'FRAGCOUNT': '68.000'} [2023-02-27 11:16:32,781][00394] Avg episode rewards: #0: 91.519, true rewards: #0: 68.000 [2023-02-27 11:16:32,783][00394] Avg episode reward: 91.519, avg true_objective: 68.000 [2023-02-27 11:16:32,790][00394] Num frames 8400... [2023-02-27 11:17:26,161][00394] Replay video saved to train_dir/doom_deathmatch_bots_2222/replay.mp4! [2023-02-27 11:23:23,959][00394] Environment doom_basic already registered, overwriting... [2023-02-27 11:23:23,961][00394] Environment doom_two_colors_easy already registered, overwriting... [2023-02-27 11:23:23,963][00394] Environment doom_two_colors_hard already registered, overwriting... [2023-02-27 11:23:23,966][00394] Environment doom_dm already registered, overwriting... [2023-02-27 11:23:23,968][00394] Environment doom_dwango5 already registered, overwriting... [2023-02-27 11:23:23,969][00394] Environment doom_my_way_home_flat_actions already registered, overwriting... [2023-02-27 11:23:23,970][00394] Environment doom_defend_the_center_flat_actions already registered, overwriting... [2023-02-27 11:23:23,971][00394] Environment doom_my_way_home already registered, overwriting... [2023-02-27 11:23:23,974][00394] Environment doom_deadly_corridor already registered, overwriting... [2023-02-27 11:23:23,975][00394] Environment doom_defend_the_center already registered, overwriting... [2023-02-27 11:23:23,978][00394] Environment doom_defend_the_line already registered, overwriting... [2023-02-27 11:23:23,979][00394] Environment doom_health_gathering already registered, overwriting... [2023-02-27 11:23:23,981][00394] Environment doom_health_gathering_supreme already registered, overwriting... [2023-02-27 11:23:23,984][00394] Environment doom_battle already registered, overwriting... [2023-02-27 11:23:23,986][00394] Environment doom_battle2 already registered, overwriting... [2023-02-27 11:23:23,987][00394] Environment doom_duel_bots already registered, overwriting... [2023-02-27 11:23:23,988][00394] Environment doom_deathmatch_bots already registered, overwriting... [2023-02-27 11:23:23,992][00394] Environment doom_duel already registered, overwriting... [2023-02-27 11:23:23,993][00394] Environment doom_deathmatch_full already registered, overwriting... [2023-02-27 11:23:23,994][00394] Environment doom_benchmark already registered, overwriting... [2023-02-27 11:23:23,995][00394] register_encoder_factory: [2023-02-27 11:23:24,031][00394] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2023-02-27 11:23:24,042][00394] Experiment dir /content/train_dir/default_experiment already exists! [2023-02-27 11:23:24,043][00394] Resuming existing experiment from /content/train_dir/default_experiment... [2023-02-27 11:23:24,045][00394] Weights and Biases integration disabled [2023-02-27 11:23:24,050][00394] Environment var CUDA_VISIBLE_DEVICES is 0 [2023-02-27 11:23:26,034][00394] Starting experiment with the following configuration: help=False algo=APPO env=doom_health_gathering_supreme experiment=default_experiment train_dir=/content/train_dir restart_behavior=resume device=gpu seed=None num_policies=1 async_rl=True serial_mode=False batched_sampling=False num_batches_to_accumulate=2 worker_num_splits=2 policy_workers_per_policy=1 max_policy_lag=1000 num_workers=8 num_envs_per_worker=4 batch_size=1024 num_batches_per_epoch=1 num_epochs=1 rollout=32 recurrence=32 shuffle_minibatches=False gamma=0.99 reward_scale=1.0 reward_clip=1000.0 value_bootstrap=False normalize_returns=True exploration_loss_coeff=0.001 value_loss_coeff=0.5 kl_loss_coeff=0.0 exploration_loss=symmetric_kl gae_lambda=0.95 ppo_clip_ratio=0.1 ppo_clip_value=0.2 with_vtrace=False vtrace_rho=1.0 vtrace_c=1.0 optimizer=adam adam_eps=1e-06 adam_beta1=0.9 adam_beta2=0.999 max_grad_norm=4.0 learning_rate=0.0001 lr_schedule=constant lr_schedule_kl_threshold=0.008 lr_adaptive_min=1e-06 lr_adaptive_max=0.01 obs_subtract_mean=0.0 obs_scale=255.0 normalize_input=True normalize_input_keys=None decorrelate_experience_max_seconds=0 decorrelate_envs_on_one_worker=True actor_worker_gpus=[] set_workers_cpu_affinity=True force_envs_single_thread=False default_niceness=0 log_to_file=True experiment_summaries_interval=10 flush_summaries_interval=30 stats_avg=100 summaries_use_frameskip=True heartbeat_interval=20 heartbeat_reporting_interval=600 train_for_env_steps=4000000 train_for_seconds=10000000000 save_every_sec=120 keep_checkpoints=2 load_checkpoint_kind=latest save_milestones_sec=-1 save_best_every_sec=5 save_best_metric=reward save_best_after=100000 benchmark=False encoder_mlp_layers=[512, 512] encoder_conv_architecture=convnet_simple encoder_conv_mlp_layers=[512] use_rnn=True rnn_size=512 rnn_type=gru rnn_num_layers=1 decoder_mlp_layers=[] nonlinearity=elu policy_initialization=orthogonal policy_init_gain=1.0 actor_critic_share_weights=True adaptive_stddev=True continuous_tanh_scale=0.0 initial_stddev=1.0 use_env_info_cache=False env_gpu_actions=False env_gpu_observations=True env_frameskip=4 env_framestack=1 pixel_format=CHW use_record_episode_statistics=False with_wandb=False wandb_user=None wandb_project=sample_factory wandb_group=None wandb_job_type=SF wandb_tags=[] with_pbt=False pbt_mix_policies_in_one_env=True pbt_period_env_steps=5000000 pbt_start_mutation=20000000 pbt_replace_fraction=0.3 pbt_mutation_rate=0.15 pbt_replace_reward_gap=0.1 pbt_replace_reward_gap_absolute=1e-06 pbt_optimize_gamma=False pbt_target_objective=true_objective pbt_perturb_min=1.1 pbt_perturb_max=1.5 num_agents=-1 num_humans=0 num_bots=-1 start_bot_difficulty=None timelimit=None res_w=128 res_h=72 wide_aspect_ratio=False eval_env_frameskip=1 fps=35 command_line=--env=doom_health_gathering_supreme --num_workers=8 --num_envs_per_worker=4 --train_for_env_steps=4000000 cli_args={'env': 'doom_health_gathering_supreme', 'num_workers': 8, 'num_envs_per_worker': 4, 'train_for_env_steps': 4000000} git_hash=unknown git_repo_name=not a git repository [2023-02-27 11:23:26,037][00394] Saving configuration to /content/train_dir/default_experiment/config.json... [2023-02-27 11:23:26,041][00394] Rollout worker 0 uses device cpu [2023-02-27 11:23:26,045][00394] Rollout worker 1 uses device cpu [2023-02-27 11:23:26,046][00394] Rollout worker 2 uses device cpu [2023-02-27 11:23:26,048][00394] Rollout worker 3 uses device cpu [2023-02-27 11:23:26,049][00394] Rollout worker 4 uses device cpu [2023-02-27 11:23:26,051][00394] Rollout worker 5 uses device cpu [2023-02-27 11:23:26,052][00394] Rollout worker 6 uses device cpu [2023-02-27 11:23:26,054][00394] Rollout worker 7 uses device cpu [2023-02-27 11:23:26,172][00394] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-27 11:23:26,173][00394] InferenceWorker_p0-w0: min num requests: 2 [2023-02-27 11:23:26,211][00394] Starting all processes... [2023-02-27 11:23:26,213][00394] Starting process learner_proc0 [2023-02-27 11:23:26,371][00394] Starting all processes... [2023-02-27 11:23:26,378][00394] Starting process inference_proc0-0 [2023-02-27 11:23:26,378][00394] Starting process rollout_proc0 [2023-02-27 11:23:26,381][00394] Starting process rollout_proc1 [2023-02-27 11:23:26,381][00394] Starting process rollout_proc2 [2023-02-27 11:23:26,381][00394] Starting process rollout_proc3 [2023-02-27 11:23:26,381][00394] Starting process rollout_proc4 [2023-02-27 11:23:26,381][00394] Starting process rollout_proc5 [2023-02-27 11:23:26,381][00394] Starting process rollout_proc6 [2023-02-27 11:23:26,381][00394] Starting process rollout_proc7 [2023-02-27 11:23:34,354][23894] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-27 11:23:34,361][23894] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2023-02-27 11:23:34,414][23894] Num visible devices: 1 [2023-02-27 11:23:34,451][23894] Starting seed is not provided [2023-02-27 11:23:34,452][23894] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-27 11:23:34,453][23894] Initializing actor-critic model on device cuda:0 [2023-02-27 11:23:34,454][23894] RunningMeanStd input shape: (3, 72, 128) [2023-02-27 11:23:34,461][23894] RunningMeanStd input shape: (1,) [2023-02-27 11:23:34,562][23894] ConvEncoder: input_channels=3 [2023-02-27 11:23:35,408][23894] Conv encoder output size: 512 [2023-02-27 11:23:35,418][23894] Policy head output size: 512 [2023-02-27 11:23:35,440][23908] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-27 11:23:35,445][23908] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2023-02-27 11:23:35,517][23908] Num visible devices: 1 [2023-02-27 11:23:35,539][23894] Created Actor Critic model with architecture: [2023-02-27 11:23:35,567][23894] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2023-02-27 11:23:36,088][23909] Worker 0 uses CPU cores [0] [2023-02-27 11:23:37,035][23910] Worker 1 uses CPU cores [1] [2023-02-27 11:23:37,240][23917] Worker 3 uses CPU cores [1] [2023-02-27 11:23:37,626][23921] Worker 2 uses CPU cores [0] [2023-02-27 11:23:37,892][23923] Worker 4 uses CPU cores [0] [2023-02-27 11:23:38,086][23925] Worker 5 uses CPU cores [1] [2023-02-27 11:23:38,140][23931] Worker 7 uses CPU cores [1] [2023-02-27 11:23:38,395][23933] Worker 6 uses CPU cores [0] [2023-02-27 11:23:41,048][23894] Using optimizer [2023-02-27 11:23:41,049][23894] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2023-02-27 11:23:41,085][23894] Loading model from checkpoint [2023-02-27 11:23:41,089][23894] Loaded experiment state at self.train_step=978, self.env_steps=4005888 [2023-02-27 11:23:41,090][23894] Initialized policy 0 weights for model version 978 [2023-02-27 11:23:41,092][23894] LearnerWorker_p0 finished initialization! [2023-02-27 11:23:41,094][23894] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-27 11:23:41,345][23908] RunningMeanStd input shape: (3, 72, 128) [2023-02-27 11:23:41,346][23908] RunningMeanStd input shape: (1,) [2023-02-27 11:23:41,360][23908] ConvEncoder: input_channels=3 [2023-02-27 11:23:41,469][23908] Conv encoder output size: 512 [2023-02-27 11:23:41,470][23908] Policy head output size: 512 [2023-02-27 11:23:43,850][00394] Inference worker 0-0 is ready! [2023-02-27 11:23:43,853][00394] All inference workers are ready! Signal rollout workers to start! [2023-02-27 11:23:43,973][23909] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-27 11:23:43,979][23923] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-27 11:23:43,995][23921] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-27 11:23:44,014][23933] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-27 11:23:44,053][00394] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 4005888. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-02-27 11:23:44,101][23910] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-27 11:23:44,115][23931] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-27 11:23:44,118][23917] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-27 11:23:44,137][23925] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-27 11:23:45,000][23931] Decorrelating experience for 0 frames... [2023-02-27 11:23:45,002][23910] Decorrelating experience for 0 frames... [2023-02-27 11:23:45,573][23933] Decorrelating experience for 0 frames... [2023-02-27 11:23:45,578][23921] Decorrelating experience for 0 frames... [2023-02-27 11:23:45,581][23909] Decorrelating experience for 0 frames... [2023-02-27 11:23:45,583][23923] Decorrelating experience for 0 frames... [2023-02-27 11:23:45,817][23910] Decorrelating experience for 32 frames... [2023-02-27 11:23:45,824][23931] Decorrelating experience for 32 frames... [2023-02-27 11:23:46,164][00394] Heartbeat connected on Batcher_0 [2023-02-27 11:23:46,171][00394] Heartbeat connected on LearnerWorker_p0 [2023-02-27 11:23:46,223][00394] Heartbeat connected on InferenceWorker_p0-w0 [2023-02-27 11:23:46,526][23923] Decorrelating experience for 32 frames... [2023-02-27 11:23:46,527][23909] Decorrelating experience for 32 frames... [2023-02-27 11:23:46,910][23925] Decorrelating experience for 0 frames... [2023-02-27 11:23:47,222][23910] Decorrelating experience for 64 frames... [2023-02-27 11:23:47,233][23923] Decorrelating experience for 64 frames... [2023-02-27 11:23:47,811][23921] Decorrelating experience for 32 frames... [2023-02-27 11:23:47,852][23931] Decorrelating experience for 64 frames... [2023-02-27 11:23:48,202][23923] Decorrelating experience for 96 frames... [2023-02-27 11:23:48,256][23925] Decorrelating experience for 32 frames... [2023-02-27 11:23:48,384][00394] Heartbeat connected on RolloutWorker_w4 [2023-02-27 11:23:48,555][23910] Decorrelating experience for 96 frames... [2023-02-27 11:23:48,811][00394] Heartbeat connected on RolloutWorker_w1 [2023-02-27 11:23:49,051][00394] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 4005888. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-02-27 11:23:49,270][23931] Decorrelating experience for 96 frames... [2023-02-27 11:23:49,477][00394] Heartbeat connected on RolloutWorker_w7 [2023-02-27 11:23:49,631][23925] Decorrelating experience for 64 frames... [2023-02-27 11:23:49,780][23933] Decorrelating experience for 32 frames... [2023-02-27 11:23:49,899][23921] Decorrelating experience for 64 frames... [2023-02-27 11:23:51,415][23925] Decorrelating experience for 96 frames... [2023-02-27 11:23:51,489][23917] Decorrelating experience for 0 frames... [2023-02-27 11:23:51,819][00394] Heartbeat connected on RolloutWorker_w5 [2023-02-27 11:23:52,861][23921] Decorrelating experience for 96 frames... [2023-02-27 11:23:52,868][23933] Decorrelating experience for 64 frames... [2023-02-27 11:23:53,735][00394] Heartbeat connected on RolloutWorker_w2 [2023-02-27 11:23:54,054][00394] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 4005888. Throughput: 0: 119.6. Samples: 1196. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-02-27 11:23:54,065][00394] Avg episode reward: [(0, '2.480')] [2023-02-27 11:23:56,929][23917] Decorrelating experience for 32 frames... [2023-02-27 11:23:57,516][23894] Signal inference workers to stop experience collection... [2023-02-27 11:23:57,537][23908] InferenceWorker_p0-w0: stopping experience collection [2023-02-27 11:23:57,618][23909] Decorrelating experience for 64 frames... [2023-02-27 11:23:58,371][23917] Decorrelating experience for 64 frames... [2023-02-27 11:23:58,937][23933] Decorrelating experience for 96 frames... [2023-02-27 11:23:59,051][00394] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 4005888. Throughput: 0: 157.2. Samples: 2358. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-02-27 11:23:59,058][00394] Avg episode reward: [(0, '3.160')] [2023-02-27 11:23:59,201][23917] Decorrelating experience for 96 frames... [2023-02-27 11:23:59,194][00394] Heartbeat connected on RolloutWorker_w6 [2023-02-27 11:23:59,303][00394] Heartbeat connected on RolloutWorker_w3 [2023-02-27 11:23:59,443][23909] Decorrelating experience for 96 frames... [2023-02-27 11:23:59,503][00394] Heartbeat connected on RolloutWorker_w0 [2023-02-27 11:24:01,278][23894] Signal inference workers to resume experience collection... [2023-02-27 11:24:01,281][23894] Stopping Batcher_0... [2023-02-27 11:24:01,282][23894] Loop batcher_evt_loop terminating... [2023-02-27 11:24:01,325][23908] Weights refcount: 2 0 [2023-02-27 11:24:01,342][23908] Stopping InferenceWorker_p0-w0... [2023-02-27 11:24:01,343][23908] Loop inference_proc0-0_evt_loop terminating... [2023-02-27 11:24:01,375][00394] Component Batcher_0 stopped! [2023-02-27 11:24:01,378][00394] Component InferenceWorker_p0-w0 stopped! [2023-02-27 11:24:01,583][00394] Component RolloutWorker_w7 stopped! [2023-02-27 11:24:01,582][23931] Stopping RolloutWorker_w7... [2023-02-27 11:24:01,595][23931] Loop rollout_proc7_evt_loop terminating... [2023-02-27 11:24:01,603][23925] Stopping RolloutWorker_w5... [2023-02-27 11:24:01,605][00394] Component RolloutWorker_w5 stopped! [2023-02-27 11:24:01,614][23910] Stopping RolloutWorker_w1... [2023-02-27 11:24:01,615][23910] Loop rollout_proc1_evt_loop terminating... [2023-02-27 11:24:01,615][00394] Component RolloutWorker_w1 stopped! [2023-02-27 11:24:01,604][23925] Loop rollout_proc5_evt_loop terminating... [2023-02-27 11:24:01,626][23917] Stopping RolloutWorker_w3... [2023-02-27 11:24:01,626][23917] Loop rollout_proc3_evt_loop terminating... [2023-02-27 11:24:01,626][00394] Component RolloutWorker_w3 stopped! [2023-02-27 11:24:01,636][00394] Component RolloutWorker_w2 stopped! [2023-02-27 11:24:01,646][00394] Component RolloutWorker_w4 stopped! [2023-02-27 11:24:01,646][23923] Stopping RolloutWorker_w4... [2023-02-27 11:24:01,640][23921] Stopping RolloutWorker_w2... [2023-02-27 11:24:01,648][23923] Loop rollout_proc4_evt_loop terminating... [2023-02-27 11:24:01,660][23921] Loop rollout_proc2_evt_loop terminating... [2023-02-27 11:24:01,676][00394] Component RolloutWorker_w6 stopped! [2023-02-27 11:24:01,679][23933] Stopping RolloutWorker_w6... [2023-02-27 11:24:01,688][00394] Component RolloutWorker_w0 stopped! [2023-02-27 11:24:01,691][23909] Stopping RolloutWorker_w0... [2023-02-27 11:24:01,684][23933] Loop rollout_proc6_evt_loop terminating... [2023-02-27 11:24:01,696][23909] Loop rollout_proc0_evt_loop terminating... [2023-02-27 11:24:04,479][23894] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000980_4014080.pth... [2023-02-27 11:24:04,614][23894] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000915_3747840.pth [2023-02-27 11:24:04,618][23894] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000980_4014080.pth... [2023-02-27 11:24:04,751][23894] Stopping LearnerWorker_p0... [2023-02-27 11:24:04,752][23894] Loop learner_proc0_evt_loop terminating... [2023-02-27 11:24:04,751][00394] Component LearnerWorker_p0 stopped! [2023-02-27 11:24:04,755][00394] Waiting for process learner_proc0 to stop... [2023-02-27 11:24:05,883][00394] Waiting for process inference_proc0-0 to join... [2023-02-27 11:24:05,887][00394] Waiting for process rollout_proc0 to join... [2023-02-27 11:24:05,890][00394] Waiting for process rollout_proc1 to join... [2023-02-27 11:24:05,893][00394] Waiting for process rollout_proc2 to join... [2023-02-27 11:24:05,895][00394] Waiting for process rollout_proc3 to join... [2023-02-27 11:24:05,900][00394] Waiting for process rollout_proc4 to join... [2023-02-27 11:24:05,902][00394] Waiting for process rollout_proc5 to join... [2023-02-27 11:24:05,903][00394] Waiting for process rollout_proc6 to join... [2023-02-27 11:24:05,905][00394] Waiting for process rollout_proc7 to join... [2023-02-27 11:24:05,907][00394] Batcher 0 profile tree view: batching: 0.0452, releasing_batches: 0.0024 [2023-02-27 11:24:05,915][00394] InferenceWorker_p0-w0 profile tree view: update_model: 0.0912 wait_policy: 0.0046 wait_policy_total: 8.8212 one_step: 0.0031 handle_policy_step: 4.4875 deserialize: 0.0534, stack: 0.0127, obs_to_device_normalize: 0.4584, forward: 3.3266, send_messages: 0.1138 prepare_outputs: 0.3910 to_cpu: 0.2344 [2023-02-27 11:24:05,917][00394] Learner 0 profile tree view: misc: 0.0000, prepare_batch: 9.0379 train: 2.3827 epoch_init: 0.0000, minibatch_init: 0.0000, losses_postprocess: 0.0006, kl_divergence: 0.0035, after_optimizer: 0.0410 calculate_losses: 0.4591 losses_init: 0.0000, forward_head: 0.1491, bptt_initial: 0.2404, tail: 0.0125, advantages_returns: 0.0064, losses: 0.0396 bptt: 0.0090 bptt_forward_core: 0.0088 update: 1.8570 clip: 0.0439 [2023-02-27 11:24:05,919][00394] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.0003, enqueue_policy_requests: 0.0006 [2023-02-27 11:24:05,920][00394] RolloutWorker_w7 profile tree view: wait_for_trajectories: 0.0015, enqueue_policy_requests: 1.6415, env_step: 4.3093, overhead: 0.2027, complete_rollouts: 0.0084 save_policy_outputs: 0.1547 split_output_tensors: 0.0689 [2023-02-27 11:24:05,927][00394] Loop Runner_EvtLoop terminating... [2023-02-27 11:24:05,929][00394] Runner profile tree view: main_loop: 39.7182 [2023-02-27 11:24:05,933][00394] Collected {0: 4014080}, FPS: 206.3 [2023-02-27 11:24:32,485][00394] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2023-02-27 11:24:32,488][00394] Overriding arg 'num_workers' with value 1 passed from command line [2023-02-27 11:24:32,489][00394] Adding new argument 'no_render'=True that is not in the saved config file! [2023-02-27 11:24:32,492][00394] Adding new argument 'save_video'=True that is not in the saved config file! [2023-02-27 11:24:32,495][00394] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2023-02-27 11:24:32,497][00394] Adding new argument 'video_name'=None that is not in the saved config file! [2023-02-27 11:24:32,498][00394] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! [2023-02-27 11:24:32,500][00394] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2023-02-27 11:24:32,505][00394] Adding new argument 'push_to_hub'=False that is not in the saved config file! [2023-02-27 11:24:32,507][00394] Adding new argument 'hf_repository'=None that is not in the saved config file! [2023-02-27 11:24:32,511][00394] Adding new argument 'policy_index'=0 that is not in the saved config file! [2023-02-27 11:24:32,513][00394] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2023-02-27 11:24:32,514][00394] Adding new argument 'train_script'=None that is not in the saved config file! [2023-02-27 11:24:32,515][00394] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2023-02-27 11:24:32,517][00394] Using frameskip 1 and render_action_repeat=4 for evaluation [2023-02-27 11:24:32,560][00394] RunningMeanStd input shape: (3, 72, 128) [2023-02-27 11:24:32,563][00394] RunningMeanStd input shape: (1,) [2023-02-27 11:24:32,594][00394] ConvEncoder: input_channels=3 [2023-02-27 11:24:32,747][00394] Conv encoder output size: 512 [2023-02-27 11:24:32,750][00394] Policy head output size: 512 [2023-02-27 11:24:32,848][00394] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000980_4014080.pth... [2023-02-27 11:24:33,902][00394] Num frames 100... [2023-02-27 11:24:34,023][00394] Num frames 200... [2023-02-27 11:24:34,145][00394] Num frames 300... [2023-02-27 11:24:34,271][00394] Num frames 400... [2023-02-27 11:24:34,347][00394] Avg episode rewards: #0: 5.160, true rewards: #0: 4.160 [2023-02-27 11:24:34,349][00394] Avg episode reward: 5.160, avg true_objective: 4.160 [2023-02-27 11:24:34,462][00394] Num frames 500... [2023-02-27 11:24:34,579][00394] Num frames 600... [2023-02-27 11:24:34,714][00394] Num frames 700... [2023-02-27 11:24:34,838][00394] Num frames 800... [2023-02-27 11:24:34,893][00394] Avg episode rewards: #0: 4.500, true rewards: #0: 4.000 [2023-02-27 11:24:34,896][00394] Avg episode reward: 4.500, avg true_objective: 4.000 [2023-02-27 11:24:35,008][00394] Num frames 900... [2023-02-27 11:24:35,131][00394] Num frames 1000... [2023-02-27 11:24:35,254][00394] Num frames 1100... [2023-02-27 11:24:35,375][00394] Num frames 1200... [2023-02-27 11:24:35,487][00394] Avg episode rewards: #0: 4.827, true rewards: #0: 4.160 [2023-02-27 11:24:35,489][00394] Avg episode reward: 4.827, avg true_objective: 4.160 [2023-02-27 11:24:35,559][00394] Num frames 1300... [2023-02-27 11:24:35,677][00394] Num frames 1400... [2023-02-27 11:24:35,795][00394] Num frames 1500... [2023-02-27 11:24:35,911][00394] Num frames 1600... [2023-02-27 11:24:36,004][00394] Avg episode rewards: #0: 4.580, true rewards: #0: 4.080 [2023-02-27 11:24:36,006][00394] Avg episode reward: 4.580, avg true_objective: 4.080 [2023-02-27 11:24:36,090][00394] Num frames 1700... [2023-02-27 11:24:36,208][00394] Num frames 1800... [2023-02-27 11:24:36,334][00394] Num frames 1900... [2023-02-27 11:24:36,458][00394] Num frames 2000... [2023-02-27 11:24:36,577][00394] Avg episode rewards: #0: 4.496, true rewards: #0: 4.096 [2023-02-27 11:24:36,581][00394] Avg episode reward: 4.496, avg true_objective: 4.096 [2023-02-27 11:24:36,644][00394] Num frames 2100... [2023-02-27 11:24:36,762][00394] Num frames 2200... [2023-02-27 11:24:36,875][00394] Num frames 2300... [2023-02-27 11:24:36,980][00394] Avg episode rewards: #0: 4.393, true rewards: #0: 3.893 [2023-02-27 11:24:36,982][00394] Avg episode reward: 4.393, avg true_objective: 3.893 [2023-02-27 11:24:37,060][00394] Num frames 2400... [2023-02-27 11:24:37,179][00394] Num frames 2500... [2023-02-27 11:24:37,304][00394] Num frames 2600... [2023-02-27 11:24:37,425][00394] Num frames 2700... [2023-02-27 11:24:37,561][00394] Num frames 2800... [2023-02-27 11:24:37,681][00394] Num frames 2900... [2023-02-27 11:24:37,787][00394] Avg episode rewards: #0: 5.063, true rewards: #0: 4.206 [2023-02-27 11:24:37,789][00394] Avg episode reward: 5.063, avg true_objective: 4.206 [2023-02-27 11:24:37,860][00394] Num frames 3000... [2023-02-27 11:24:37,975][00394] Num frames 3100... [2023-02-27 11:24:38,092][00394] Num frames 3200... [2023-02-27 11:24:38,209][00394] Num frames 3300... [2023-02-27 11:24:38,334][00394] Num frames 3400... [2023-02-27 11:24:38,419][00394] Avg episode rewards: #0: 5.280, true rewards: #0: 4.280 [2023-02-27 11:24:38,420][00394] Avg episode reward: 5.280, avg true_objective: 4.280 [2023-02-27 11:24:38,519][00394] Num frames 3500... [2023-02-27 11:24:38,647][00394] Num frames 3600... [2023-02-27 11:24:38,761][00394] Num frames 3700... [2023-02-27 11:24:38,882][00394] Num frames 3800... [2023-02-27 11:24:38,947][00394] Avg episode rewards: #0: 5.120, true rewards: #0: 4.231 [2023-02-27 11:24:38,950][00394] Avg episode reward: 5.120, avg true_objective: 4.231 [2023-02-27 11:24:39,067][00394] Num frames 3900... [2023-02-27 11:24:39,181][00394] Num frames 4000... [2023-02-27 11:24:39,305][00394] Num frames 4100... [2023-02-27 11:24:39,467][00394] Avg episode rewards: #0: 4.992, true rewards: #0: 4.192 [2023-02-27 11:24:39,469][00394] Avg episode reward: 4.992, avg true_objective: 4.192 [2023-02-27 11:25:02,582][00394] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2023-02-27 11:26:49,971][00394] Environment doom_basic already registered, overwriting... [2023-02-27 11:26:49,974][00394] Environment doom_two_colors_easy already registered, overwriting... [2023-02-27 11:26:49,976][00394] Environment doom_two_colors_hard already registered, overwriting... [2023-02-27 11:26:49,979][00394] Environment doom_dm already registered, overwriting... [2023-02-27 11:26:49,982][00394] Environment doom_dwango5 already registered, overwriting... [2023-02-27 11:26:49,983][00394] Environment doom_my_way_home_flat_actions already registered, overwriting... [2023-02-27 11:26:49,985][00394] Environment doom_defend_the_center_flat_actions already registered, overwriting... [2023-02-27 11:26:49,988][00394] Environment doom_my_way_home already registered, overwriting... [2023-02-27 11:26:49,990][00394] Environment doom_deadly_corridor already registered, overwriting... [2023-02-27 11:26:49,992][00394] Environment doom_defend_the_center already registered, overwriting... [2023-02-27 11:26:49,994][00394] Environment doom_defend_the_line already registered, overwriting... [2023-02-27 11:26:49,996][00394] Environment doom_health_gathering already registered, overwriting... [2023-02-27 11:26:50,000][00394] Environment doom_health_gathering_supreme already registered, overwriting... [2023-02-27 11:26:50,002][00394] Environment doom_battle already registered, overwriting... [2023-02-27 11:26:50,003][00394] Environment doom_battle2 already registered, overwriting... [2023-02-27 11:26:50,004][00394] Environment doom_duel_bots already registered, overwriting... [2023-02-27 11:26:50,006][00394] Environment doom_deathmatch_bots already registered, overwriting... [2023-02-27 11:26:50,009][00394] Environment doom_duel already registered, overwriting... [2023-02-27 11:26:50,011][00394] Environment doom_deathmatch_full already registered, overwriting... [2023-02-27 11:26:50,013][00394] Environment doom_benchmark already registered, overwriting... [2023-02-27 11:26:50,016][00394] register_encoder_factory: [2023-02-27 11:26:50,057][00394] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2023-02-27 11:26:50,058][00394] Overriding arg 'train_for_env_steps' with value 8000000 passed from command line [2023-02-27 11:26:50,065][00394] Experiment dir /content/train_dir/default_experiment already exists! [2023-02-27 11:26:50,070][00394] Resuming existing experiment from /content/train_dir/default_experiment... [2023-02-27 11:26:50,072][00394] Weights and Biases integration disabled [2023-02-27 11:26:50,076][00394] Environment var CUDA_VISIBLE_DEVICES is 0 [2023-02-27 11:26:53,716][00394] Starting experiment with the following configuration: help=False algo=APPO env=doom_health_gathering_supreme experiment=default_experiment train_dir=/content/train_dir restart_behavior=resume device=gpu seed=None num_policies=1 async_rl=True serial_mode=False batched_sampling=False num_batches_to_accumulate=2 worker_num_splits=2 policy_workers_per_policy=1 max_policy_lag=1000 num_workers=8 num_envs_per_worker=4 batch_size=1024 num_batches_per_epoch=1 num_epochs=1 rollout=32 recurrence=32 shuffle_minibatches=False gamma=0.99 reward_scale=1.0 reward_clip=1000.0 value_bootstrap=False normalize_returns=True exploration_loss_coeff=0.001 value_loss_coeff=0.5 kl_loss_coeff=0.0 exploration_loss=symmetric_kl gae_lambda=0.95 ppo_clip_ratio=0.1 ppo_clip_value=0.2 with_vtrace=False vtrace_rho=1.0 vtrace_c=1.0 optimizer=adam adam_eps=1e-06 adam_beta1=0.9 adam_beta2=0.999 max_grad_norm=4.0 learning_rate=0.0001 lr_schedule=constant lr_schedule_kl_threshold=0.008 lr_adaptive_min=1e-06 lr_adaptive_max=0.01 obs_subtract_mean=0.0 obs_scale=255.0 normalize_input=True normalize_input_keys=None decorrelate_experience_max_seconds=0 decorrelate_envs_on_one_worker=True actor_worker_gpus=[] set_workers_cpu_affinity=True force_envs_single_thread=False default_niceness=0 log_to_file=True experiment_summaries_interval=10 flush_summaries_interval=30 stats_avg=100 summaries_use_frameskip=True heartbeat_interval=20 heartbeat_reporting_interval=600 train_for_env_steps=8000000 train_for_seconds=10000000000 save_every_sec=120 keep_checkpoints=2 load_checkpoint_kind=latest save_milestones_sec=-1 save_best_every_sec=5 save_best_metric=reward save_best_after=100000 benchmark=False encoder_mlp_layers=[512, 512] encoder_conv_architecture=convnet_simple encoder_conv_mlp_layers=[512] use_rnn=True rnn_size=512 rnn_type=gru rnn_num_layers=1 decoder_mlp_layers=[] nonlinearity=elu policy_initialization=orthogonal policy_init_gain=1.0 actor_critic_share_weights=True adaptive_stddev=True continuous_tanh_scale=0.0 initial_stddev=1.0 use_env_info_cache=False env_gpu_actions=False env_gpu_observations=True env_frameskip=4 env_framestack=1 pixel_format=CHW use_record_episode_statistics=False with_wandb=False wandb_user=None wandb_project=sample_factory wandb_group=None wandb_job_type=SF wandb_tags=[] with_pbt=False pbt_mix_policies_in_one_env=True pbt_period_env_steps=5000000 pbt_start_mutation=20000000 pbt_replace_fraction=0.3 pbt_mutation_rate=0.15 pbt_replace_reward_gap=0.1 pbt_replace_reward_gap_absolute=1e-06 pbt_optimize_gamma=False pbt_target_objective=true_objective pbt_perturb_min=1.1 pbt_perturb_max=1.5 num_agents=-1 num_humans=0 num_bots=-1 start_bot_difficulty=None timelimit=None res_w=128 res_h=72 wide_aspect_ratio=False eval_env_frameskip=1 fps=35 command_line=--env=doom_health_gathering_supreme --num_workers=8 --num_envs_per_worker=4 --train_for_env_steps=4000000 cli_args={'env': 'doom_health_gathering_supreme', 'num_workers': 8, 'num_envs_per_worker': 4, 'train_for_env_steps': 4000000} git_hash=unknown git_repo_name=not a git repository [2023-02-27 11:26:53,720][00394] Saving configuration to /content/train_dir/default_experiment/config.json... [2023-02-27 11:26:53,724][00394] Rollout worker 0 uses device cpu [2023-02-27 11:26:53,727][00394] Rollout worker 1 uses device cpu [2023-02-27 11:26:53,728][00394] Rollout worker 2 uses device cpu [2023-02-27 11:26:53,729][00394] Rollout worker 3 uses device cpu [2023-02-27 11:26:53,731][00394] Rollout worker 4 uses device cpu [2023-02-27 11:26:53,734][00394] Rollout worker 5 uses device cpu [2023-02-27 11:26:53,735][00394] Rollout worker 6 uses device cpu [2023-02-27 11:26:53,737][00394] Rollout worker 7 uses device cpu [2023-02-27 11:26:53,862][00394] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-27 11:26:53,864][00394] InferenceWorker_p0-w0: min num requests: 2 [2023-02-27 11:26:53,905][00394] Starting all processes... [2023-02-27 11:26:53,907][00394] Starting process learner_proc0 [2023-02-27 11:26:54,060][00394] Starting all processes... [2023-02-27 11:26:54,072][00394] Starting process inference_proc0-0 [2023-02-27 11:26:54,078][00394] Starting process rollout_proc0 [2023-02-27 11:26:54,078][00394] Starting process rollout_proc1 [2023-02-27 11:26:54,078][00394] Starting process rollout_proc2 [2023-02-27 11:26:54,079][00394] Starting process rollout_proc3 [2023-02-27 11:26:54,079][00394] Starting process rollout_proc4 [2023-02-27 11:26:54,079][00394] Starting process rollout_proc5 [2023-02-27 11:26:54,079][00394] Starting process rollout_proc6 [2023-02-27 11:26:54,079][00394] Starting process rollout_proc7 [2023-02-27 11:27:01,608][28704] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-27 11:27:01,612][28704] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2023-02-27 11:27:01,687][28704] Num visible devices: 1 [2023-02-27 11:27:01,722][28704] Starting seed is not provided [2023-02-27 11:27:01,723][28704] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-27 11:27:01,724][28704] Initializing actor-critic model on device cuda:0 [2023-02-27 11:27:01,725][28704] RunningMeanStd input shape: (3, 72, 128) [2023-02-27 11:27:01,729][28704] RunningMeanStd input shape: (1,) [2023-02-27 11:27:01,909][28704] ConvEncoder: input_channels=3 [2023-02-27 11:27:03,458][28704] Conv encoder output size: 512 [2023-02-27 11:27:03,464][28704] Policy head output size: 512 [2023-02-27 11:27:03,656][28704] Created Actor Critic model with architecture: [2023-02-27 11:27:03,667][28704] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2023-02-27 11:27:04,348][28718] Worker 0 uses CPU cores [0] [2023-02-27 11:27:05,196][28719] Worker 1 uses CPU cores [1] [2023-02-27 11:27:05,782][28720] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-27 11:27:05,782][28720] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2023-02-27 11:27:05,886][28720] Num visible devices: 1 [2023-02-27 11:27:06,144][28728] Worker 2 uses CPU cores [0] [2023-02-27 11:27:06,284][28723] Worker 3 uses CPU cores [1] [2023-02-27 11:27:06,567][28731] Worker 5 uses CPU cores [1] [2023-02-27 11:27:06,881][28735] Worker 7 uses CPU cores [1] [2023-02-27 11:27:06,981][28741] Worker 6 uses CPU cores [0] [2023-02-27 11:27:07,024][28733] Worker 4 uses CPU cores [0] [2023-02-27 11:27:13,278][28704] Using optimizer [2023-02-27 11:27:13,280][28704] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000980_4014080.pth... [2023-02-27 11:27:13,318][28704] Loading model from checkpoint [2023-02-27 11:27:13,323][28704] Loaded experiment state at self.train_step=980, self.env_steps=4014080 [2023-02-27 11:27:13,324][28704] Initialized policy 0 weights for model version 980 [2023-02-27 11:27:13,328][28704] LearnerWorker_p0 finished initialization! [2023-02-27 11:27:13,331][28704] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-27 11:27:13,534][28720] RunningMeanStd input shape: (3, 72, 128) [2023-02-27 11:27:13,535][28720] RunningMeanStd input shape: (1,) [2023-02-27 11:27:13,547][28720] ConvEncoder: input_channels=3 [2023-02-27 11:27:13,657][28720] Conv encoder output size: 512 [2023-02-27 11:27:13,657][28720] Policy head output size: 512 [2023-02-27 11:27:13,855][00394] Heartbeat connected on Batcher_0 [2023-02-27 11:27:13,862][00394] Heartbeat connected on LearnerWorker_p0 [2023-02-27 11:27:13,877][00394] Heartbeat connected on RolloutWorker_w0 [2023-02-27 11:27:13,879][00394] Heartbeat connected on RolloutWorker_w1 [2023-02-27 11:27:13,885][00394] Heartbeat connected on RolloutWorker_w2 [2023-02-27 11:27:13,889][00394] Heartbeat connected on RolloutWorker_w3 [2023-02-27 11:27:13,893][00394] Heartbeat connected on RolloutWorker_w4 [2023-02-27 11:27:13,895][00394] Heartbeat connected on RolloutWorker_w5 [2023-02-27 11:27:13,899][00394] Heartbeat connected on RolloutWorker_w6 [2023-02-27 11:27:13,911][00394] Heartbeat connected on RolloutWorker_w7 [2023-02-27 11:27:15,076][00394] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 4014080. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-02-27 11:27:16,066][00394] Inference worker 0-0 is ready! [2023-02-27 11:27:16,068][00394] All inference workers are ready! Signal rollout workers to start! [2023-02-27 11:27:16,070][00394] Heartbeat connected on InferenceWorker_p0-w0 [2023-02-27 11:27:16,204][28719] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-27 11:27:16,207][28735] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-27 11:27:16,213][28723] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-27 11:27:16,209][28731] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-27 11:27:16,237][28733] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-27 11:27:16,243][28741] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-27 11:27:16,295][28728] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-27 11:27:16,297][28718] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-27 11:27:17,599][28735] Decorrelating experience for 0 frames... [2023-02-27 11:27:17,601][28723] Decorrelating experience for 0 frames... [2023-02-27 11:27:17,790][28741] Decorrelating experience for 0 frames... [2023-02-27 11:27:17,796][28718] Decorrelating experience for 0 frames... [2023-02-27 11:27:17,794][28733] Decorrelating experience for 0 frames... [2023-02-27 11:27:18,259][28719] Decorrelating experience for 0 frames... [2023-02-27 11:27:18,924][28719] Decorrelating experience for 32 frames... [2023-02-27 11:27:19,435][28733] Decorrelating experience for 32 frames... [2023-02-27 11:27:19,458][28718] Decorrelating experience for 32 frames... [2023-02-27 11:27:19,479][28728] Decorrelating experience for 0 frames... [2023-02-27 11:27:19,910][28741] Decorrelating experience for 32 frames... [2023-02-27 11:27:20,076][00394] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 4014080. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-02-27 11:27:20,534][28719] Decorrelating experience for 64 frames... [2023-02-27 11:27:20,898][28723] Decorrelating experience for 32 frames... [2023-02-27 11:27:21,236][28728] Decorrelating experience for 32 frames... [2023-02-27 11:27:21,497][28733] Decorrelating experience for 64 frames... [2023-02-27 11:27:22,039][28719] Decorrelating experience for 96 frames... [2023-02-27 11:27:22,052][28741] Decorrelating experience for 64 frames... [2023-02-27 11:27:22,447][28731] Decorrelating experience for 0 frames... [2023-02-27 11:27:22,566][28723] Decorrelating experience for 64 frames... [2023-02-27 11:27:23,219][28731] Decorrelating experience for 32 frames... [2023-02-27 11:27:23,415][28723] Decorrelating experience for 96 frames... [2023-02-27 11:27:23,634][28718] Decorrelating experience for 64 frames... [2023-02-27 11:27:23,963][28731] Decorrelating experience for 64 frames... [2023-02-27 11:27:24,524][28741] Decorrelating experience for 96 frames... [2023-02-27 11:27:24,736][28733] Decorrelating experience for 96 frames... [2023-02-27 11:27:25,031][28728] Decorrelating experience for 64 frames... [2023-02-27 11:27:25,077][00394] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 4014080. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-02-27 11:27:25,515][28735] Decorrelating experience for 32 frames... [2023-02-27 11:27:25,797][28731] Decorrelating experience for 96 frames... [2023-02-27 11:27:26,284][28735] Decorrelating experience for 64 frames... [2023-02-27 11:27:26,719][28735] Decorrelating experience for 96 frames... [2023-02-27 11:27:26,826][28728] Decorrelating experience for 96 frames... [2023-02-27 11:27:27,020][28718] Decorrelating experience for 96 frames... [2023-02-27 11:27:30,078][00394] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 4014080. Throughput: 0: 19.5. Samples: 292. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-02-27 11:27:30,085][00394] Avg episode reward: [(0, '1.541')] [2023-02-27 11:27:30,543][28704] Signal inference workers to stop experience collection... [2023-02-27 11:27:30,552][28720] InferenceWorker_p0-w0: stopping experience collection [2023-02-27 11:27:34,354][28704] Signal inference workers to resume experience collection... [2023-02-27 11:27:34,378][28720] InferenceWorker_p0-w0: resuming experience collection [2023-02-27 11:27:35,089][00394] Fps is (10 sec: 409.6, 60 sec: 204.8, 300 sec: 204.8). Total num frames: 4018176. Throughput: 0: 115.1. Samples: 2302. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-02-27 11:27:35,159][00394] Avg episode reward: [(0, '2.274')] [2023-02-27 11:27:40,077][00394] Fps is (10 sec: 1638.4, 60 sec: 655.4, 300 sec: 655.4). Total num frames: 4030464. Throughput: 0: 135.7. Samples: 3392. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-02-27 11:27:40,106][00394] Avg episode reward: [(0, '3.792')] [2023-02-27 11:27:45,077][00394] Fps is (10 sec: 2457.4, 60 sec: 955.7, 300 sec: 955.7). Total num frames: 4042752. Throughput: 0: 233.6. Samples: 7008. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 11:27:45,080][00394] Avg episode reward: [(0, '4.209')] [2023-02-27 11:27:48,189][28720] Updated weights for policy 0, policy_version 990 (0.0041) [2023-02-27 11:27:50,076][00394] Fps is (10 sec: 2867.2, 60 sec: 1287.3, 300 sec: 1287.3). Total num frames: 4059136. Throughput: 0: 323.3. Samples: 11316. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2023-02-27 11:27:50,084][00394] Avg episode reward: [(0, '4.616')] [2023-02-27 11:27:55,076][00394] Fps is (10 sec: 3686.7, 60 sec: 1638.4, 300 sec: 1638.4). Total num frames: 4079616. Throughput: 0: 441.3. Samples: 17652. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 11:27:55,084][00394] Avg episode reward: [(0, '4.680')] [2023-02-27 11:27:58,256][28720] Updated weights for policy 0, policy_version 1000 (0.0024) [2023-02-27 11:28:00,077][00394] Fps is (10 sec: 4096.0, 60 sec: 1911.5, 300 sec: 1911.5). Total num frames: 4100096. Throughput: 0: 463.9. Samples: 20876. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 11:28:00,082][00394] Avg episode reward: [(0, '4.666')] [2023-02-27 11:28:05,076][00394] Fps is (10 sec: 3276.8, 60 sec: 1966.1, 300 sec: 1966.1). Total num frames: 4112384. Throughput: 0: 553.2. Samples: 24896. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2023-02-27 11:28:05,081][00394] Avg episode reward: [(0, '4.756')] [2023-02-27 11:28:10,076][00394] Fps is (10 sec: 2457.6, 60 sec: 2010.8, 300 sec: 2010.8). Total num frames: 4124672. Throughput: 0: 598.0. Samples: 26908. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2023-02-27 11:28:10,085][00394] Avg episode reward: [(0, '4.509')] [2023-02-27 11:28:12,058][28720] Updated weights for policy 0, policy_version 1010 (0.0015) [2023-02-27 11:28:15,076][00394] Fps is (10 sec: 3686.4, 60 sec: 2252.8, 300 sec: 2252.8). Total num frames: 4149248. Throughput: 0: 710.5. Samples: 32266. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-27 11:28:15,082][00394] Avg episode reward: [(0, '4.530')] [2023-02-27 11:28:20,077][00394] Fps is (10 sec: 4096.0, 60 sec: 2525.9, 300 sec: 2331.6). Total num frames: 4165632. Throughput: 0: 806.1. Samples: 38578. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 11:28:20,079][00394] Avg episode reward: [(0, '4.674')] [2023-02-27 11:28:22,958][28720] Updated weights for policy 0, policy_version 1020 (0.0017) [2023-02-27 11:28:25,079][00394] Fps is (10 sec: 3275.9, 60 sec: 2798.8, 300 sec: 2399.0). Total num frames: 4182016. Throughput: 0: 831.4. Samples: 40806. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 11:28:25,084][00394] Avg episode reward: [(0, '4.655')] [2023-02-27 11:28:30,077][00394] Fps is (10 sec: 2867.2, 60 sec: 3003.7, 300 sec: 2403.0). Total num frames: 4194304. Throughput: 0: 836.5. Samples: 44652. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 11:28:30,080][00394] Avg episode reward: [(0, '4.734')] [2023-02-27 11:28:35,076][00394] Fps is (10 sec: 3277.7, 60 sec: 3276.8, 300 sec: 2508.8). Total num frames: 4214784. Throughput: 0: 856.4. Samples: 49852. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-27 11:28:35,085][00394] Avg episode reward: [(0, '4.729')] [2023-02-27 11:28:35,862][28720] Updated weights for policy 0, policy_version 1030 (0.0042) [2023-02-27 11:28:40,076][00394] Fps is (10 sec: 4096.0, 60 sec: 3413.3, 300 sec: 2602.2). Total num frames: 4235264. Throughput: 0: 787.0. Samples: 53068. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-27 11:28:40,086][00394] Avg episode reward: [(0, '4.711')] [2023-02-27 11:28:45,076][00394] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 2639.6). Total num frames: 4251648. Throughput: 0: 840.8. Samples: 58712. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 11:28:45,080][00394] Avg episode reward: [(0, '4.692')] [2023-02-27 11:28:47,994][28720] Updated weights for policy 0, policy_version 1040 (0.0017) [2023-02-27 11:28:50,077][00394] Fps is (10 sec: 2867.1, 60 sec: 3413.3, 300 sec: 2630.1). Total num frames: 4263936. Throughput: 0: 842.2. Samples: 62796. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-27 11:28:50,084][00394] Avg episode reward: [(0, '4.736')] [2023-02-27 11:28:50,096][28704] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001041_4263936.pth... [2023-02-27 11:28:50,497][28704] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth [2023-02-27 11:28:55,077][00394] Fps is (10 sec: 2867.2, 60 sec: 3345.1, 300 sec: 2662.4). Total num frames: 4280320. Throughput: 0: 840.6. Samples: 64736. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 11:28:55,087][00394] Avg episode reward: [(0, '4.644')] [2023-02-27 11:28:59,598][28720] Updated weights for policy 0, policy_version 1050 (0.0035) [2023-02-27 11:29:00,076][00394] Fps is (10 sec: 3686.5, 60 sec: 3345.1, 300 sec: 2730.7). Total num frames: 4300800. Throughput: 0: 851.4. Samples: 70580. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 11:29:00,079][00394] Avg episode reward: [(0, '4.745')] [2023-02-27 11:29:05,076][00394] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 2755.5). Total num frames: 4317184. Throughput: 0: 842.9. Samples: 76510. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 11:29:05,080][00394] Avg episode reward: [(0, '4.610')] [2023-02-27 11:29:10,077][00394] Fps is (10 sec: 3276.5, 60 sec: 3481.6, 300 sec: 2778.1). Total num frames: 4333568. Throughput: 0: 838.6. Samples: 78542. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 11:29:10,082][00394] Avg episode reward: [(0, '4.632')] [2023-02-27 11:29:12,996][28720] Updated weights for policy 0, policy_version 1060 (0.0017) [2023-02-27 11:29:15,076][00394] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 2764.8). Total num frames: 4345856. Throughput: 0: 841.2. Samples: 82504. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-27 11:29:15,080][00394] Avg episode reward: [(0, '4.668')] [2023-02-27 11:29:20,077][00394] Fps is (10 sec: 3276.9, 60 sec: 3345.1, 300 sec: 2818.0). Total num frames: 4366336. Throughput: 0: 856.7. Samples: 88402. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 11:29:20,087][00394] Avg episode reward: [(0, '4.717')] [2023-02-27 11:29:23,231][28720] Updated weights for policy 0, policy_version 1070 (0.0013) [2023-02-27 11:29:25,076][00394] Fps is (10 sec: 4096.0, 60 sec: 3413.5, 300 sec: 2867.2). Total num frames: 4386816. Throughput: 0: 851.3. Samples: 91376. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-27 11:29:25,080][00394] Avg episode reward: [(0, '4.644')] [2023-02-27 11:29:30,076][00394] Fps is (10 sec: 3276.9, 60 sec: 3413.3, 300 sec: 2852.0). Total num frames: 4399104. Throughput: 0: 833.1. Samples: 96202. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 11:29:30,081][00394] Avg episode reward: [(0, '4.565')] [2023-02-27 11:29:35,077][00394] Fps is (10 sec: 2457.4, 60 sec: 3276.8, 300 sec: 2837.9). Total num frames: 4411392. Throughput: 0: 830.0. Samples: 100146. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 11:29:35,085][00394] Avg episode reward: [(0, '4.619')] [2023-02-27 11:29:37,494][28720] Updated weights for policy 0, policy_version 1080 (0.0017) [2023-02-27 11:29:40,080][00394] Fps is (10 sec: 3275.6, 60 sec: 3276.6, 300 sec: 2881.3). Total num frames: 4431872. Throughput: 0: 910.9. Samples: 105730. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-27 11:29:40,087][00394] Avg episode reward: [(0, '4.631')] [2023-02-27 11:29:45,076][00394] Fps is (10 sec: 4096.3, 60 sec: 3345.1, 300 sec: 2921.8). Total num frames: 4452352. Throughput: 0: 852.9. Samples: 108962. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 11:29:45,081][00394] Avg episode reward: [(0, '4.656')] [2023-02-27 11:29:47,803][28720] Updated weights for policy 0, policy_version 1090 (0.0012) [2023-02-27 11:29:50,076][00394] Fps is (10 sec: 3687.7, 60 sec: 3413.4, 300 sec: 2933.3). Total num frames: 4468736. Throughput: 0: 834.9. Samples: 114082. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 11:29:50,083][00394] Avg episode reward: [(0, '4.699')] [2023-02-27 11:29:55,076][00394] Fps is (10 sec: 2867.2, 60 sec: 3345.1, 300 sec: 2918.4). Total num frames: 4481024. Throughput: 0: 835.8. Samples: 116152. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-27 11:29:55,087][00394] Avg episode reward: [(0, '4.580')] [2023-02-27 11:30:00,077][00394] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 2929.3). Total num frames: 4497408. Throughput: 0: 838.5. Samples: 120238. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-27 11:30:00,087][00394] Avg episode reward: [(0, '4.658')] [2023-02-27 11:30:01,370][28720] Updated weights for policy 0, policy_version 1100 (0.0012) [2023-02-27 11:30:05,077][00394] Fps is (10 sec: 3686.3, 60 sec: 3345.0, 300 sec: 2963.6). Total num frames: 4517888. Throughput: 0: 846.9. Samples: 126512. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-27 11:30:05,082][00394] Avg episode reward: [(0, '4.944')] [2023-02-27 11:30:10,076][00394] Fps is (10 sec: 4096.0, 60 sec: 3413.4, 300 sec: 2995.9). Total num frames: 4538368. Throughput: 0: 904.4. Samples: 132074. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 11:30:10,084][00394] Avg episode reward: [(0, '4.777')] [2023-02-27 11:30:12,704][28720] Updated weights for policy 0, policy_version 1110 (0.0014) [2023-02-27 11:30:15,076][00394] Fps is (10 sec: 3276.9, 60 sec: 3413.3, 300 sec: 2981.0). Total num frames: 4550656. Throughput: 0: 839.5. Samples: 133980. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 11:30:15,080][00394] Avg episode reward: [(0, '4.716')] [2023-02-27 11:30:20,077][00394] Fps is (10 sec: 2457.6, 60 sec: 3276.8, 300 sec: 2966.8). Total num frames: 4562944. Throughput: 0: 836.7. Samples: 137798. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 11:30:20,085][00394] Avg episode reward: [(0, '4.711')] [2023-02-27 11:30:25,077][00394] Fps is (10 sec: 3276.5, 60 sec: 3276.8, 300 sec: 2996.5). Total num frames: 4583424. Throughput: 0: 846.1. Samples: 143804. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-27 11:30:25,079][00394] Avg episode reward: [(0, '4.960')] [2023-02-27 11:30:25,518][28720] Updated weights for policy 0, policy_version 1120 (0.0019) [2023-02-27 11:30:30,077][00394] Fps is (10 sec: 4095.7, 60 sec: 3413.3, 300 sec: 3024.7). Total num frames: 4603904. Throughput: 0: 842.4. Samples: 146872. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-27 11:30:30,081][00394] Avg episode reward: [(0, '4.738')] [2023-02-27 11:30:35,076][00394] Fps is (10 sec: 3686.7, 60 sec: 3481.7, 300 sec: 3031.0). Total num frames: 4620288. Throughput: 0: 832.3. Samples: 151536. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-27 11:30:35,079][00394] Avg episode reward: [(0, '4.826')] [2023-02-27 11:30:37,951][28720] Updated weights for policy 0, policy_version 1130 (0.0016) [2023-02-27 11:30:40,076][00394] Fps is (10 sec: 2867.4, 60 sec: 3345.3, 300 sec: 3017.1). Total num frames: 4632576. Throughput: 0: 877.5. Samples: 155638. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 11:30:40,099][00394] Avg episode reward: [(0, '4.553')] [2023-02-27 11:30:45,077][00394] Fps is (10 sec: 3276.7, 60 sec: 3345.1, 300 sec: 3042.7). Total num frames: 4653056. Throughput: 0: 845.4. Samples: 158280. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-27 11:30:45,080][00394] Avg episode reward: [(0, '4.617')] [2023-02-27 11:30:49,067][28720] Updated weights for policy 0, policy_version 1140 (0.0028) [2023-02-27 11:30:50,076][00394] Fps is (10 sec: 4096.0, 60 sec: 3413.3, 300 sec: 3067.2). Total num frames: 4673536. Throughput: 0: 845.1. Samples: 164540. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-27 11:30:50,079][00394] Avg episode reward: [(0, '4.763')] [2023-02-27 11:30:50,090][28704] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001141_4673536.pth... [2023-02-27 11:30:50,356][28704] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000980_4014080.pth [2023-02-27 11:30:55,076][00394] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3053.4). Total num frames: 4685824. Throughput: 0: 781.7. Samples: 167252. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-27 11:30:55,088][00394] Avg episode reward: [(0, '4.723')] [2023-02-27 11:31:00,076][00394] Fps is (10 sec: 2457.6, 60 sec: 3345.1, 300 sec: 3040.1). Total num frames: 4698112. Throughput: 0: 826.7. Samples: 171182. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 11:31:00,087][00394] Avg episode reward: [(0, '4.513')] [2023-02-27 11:31:03,436][28720] Updated weights for policy 0, policy_version 1150 (0.0017) [2023-02-27 11:31:05,076][00394] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 3045.3). Total num frames: 4714496. Throughput: 0: 836.1. Samples: 175424. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 11:31:05,078][00394] Avg episode reward: [(0, '4.453')] [2023-02-27 11:31:10,077][00394] Fps is (10 sec: 4095.9, 60 sec: 3345.1, 300 sec: 3085.1). Total num frames: 4739072. Throughput: 0: 774.7. Samples: 178664. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-27 11:31:10,080][00394] Avg episode reward: [(0, '4.617')] [2023-02-27 11:31:12,965][28720] Updated weights for policy 0, policy_version 1160 (0.0012) [2023-02-27 11:31:15,081][00394] Fps is (10 sec: 4094.3, 60 sec: 3413.1, 300 sec: 3089.0). Total num frames: 4755456. Throughput: 0: 850.1. Samples: 185128. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-27 11:31:15,089][00394] Avg episode reward: [(0, '4.722')] [2023-02-27 11:31:20,076][00394] Fps is (10 sec: 2867.3, 60 sec: 3413.3, 300 sec: 3076.2). Total num frames: 4767744. Throughput: 0: 834.4. Samples: 189084. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2023-02-27 11:31:20,083][00394] Avg episode reward: [(0, '4.642')] [2023-02-27 11:31:25,076][00394] Fps is (10 sec: 2868.4, 60 sec: 3345.1, 300 sec: 3080.2). Total num frames: 4784128. Throughput: 0: 832.2. Samples: 193088. Policy #0 lag: (min: 0.0, avg: 0.3, max: 2.0) [2023-02-27 11:31:25,079][00394] Avg episode reward: [(0, '4.671')] [2023-02-27 11:31:26,956][28720] Updated weights for policy 0, policy_version 1170 (0.0024) [2023-02-27 11:31:30,077][00394] Fps is (10 sec: 3686.3, 60 sec: 3345.1, 300 sec: 3100.1). Total num frames: 4804608. Throughput: 0: 844.0. Samples: 196258. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 11:31:30,082][00394] Avg episode reward: [(0, '4.659')] [2023-02-27 11:31:35,077][00394] Fps is (10 sec: 4095.5, 60 sec: 3413.3, 300 sec: 3119.2). Total num frames: 4825088. Throughput: 0: 850.8. Samples: 202826. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 11:31:35,085][00394] Avg episode reward: [(0, '4.703')] [2023-02-27 11:31:37,747][28720] Updated weights for policy 0, policy_version 1180 (0.0020) [2023-02-27 11:31:40,076][00394] Fps is (10 sec: 3276.9, 60 sec: 3413.3, 300 sec: 3106.8). Total num frames: 4837376. Throughput: 0: 840.4. Samples: 205068. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-27 11:31:40,083][00394] Avg episode reward: [(0, '4.487')] [2023-02-27 11:31:45,078][00394] Fps is (10 sec: 2867.1, 60 sec: 3345.0, 300 sec: 3109.9). Total num frames: 4853760. Throughput: 0: 844.1. Samples: 209168. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-27 11:31:45,085][00394] Avg episode reward: [(0, '4.583')] [2023-02-27 11:31:50,076][00394] Fps is (10 sec: 3276.8, 60 sec: 3276.8, 300 sec: 3113.0). Total num frames: 4870144. Throughput: 0: 867.4. Samples: 214456. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 11:31:50,085][00394] Avg episode reward: [(0, '4.684')] [2023-02-27 11:31:50,154][28720] Updated weights for policy 0, policy_version 1190 (0.0033) [2023-02-27 11:31:55,081][00394] Fps is (10 sec: 4094.5, 60 sec: 3481.3, 300 sec: 3145.1). Total num frames: 4894720. Throughput: 0: 936.7. Samples: 220822. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 11:31:55,089][00394] Avg episode reward: [(0, '4.705')] [2023-02-27 11:32:00,076][00394] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3133.1). Total num frames: 4907008. Throughput: 0: 845.7. Samples: 223182. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 11:32:00,082][00394] Avg episode reward: [(0, '4.726')] [2023-02-27 11:32:02,021][28720] Updated weights for policy 0, policy_version 1200 (0.0032) [2023-02-27 11:32:05,077][00394] Fps is (10 sec: 2868.5, 60 sec: 3481.6, 300 sec: 3135.6). Total num frames: 4923392. Throughput: 0: 848.7. Samples: 227274. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 11:32:05,080][00394] Avg episode reward: [(0, '4.733')] [2023-02-27 11:32:10,077][00394] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3138.0). Total num frames: 4939776. Throughput: 0: 803.6. Samples: 229248. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 11:32:10,083][00394] Avg episode reward: [(0, '4.694')] [2023-02-27 11:32:13,417][28720] Updated weights for policy 0, policy_version 1210 (0.0022) [2023-02-27 11:32:15,076][00394] Fps is (10 sec: 3686.5, 60 sec: 3413.6, 300 sec: 3207.4). Total num frames: 4960256. Throughput: 0: 879.2. Samples: 235822. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-27 11:32:15,078][00394] Avg episode reward: [(0, '4.683')] [2023-02-27 11:32:20,076][00394] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3262.9). Total num frames: 4976640. Throughput: 0: 859.5. Samples: 241502. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 11:32:20,081][00394] Avg episode reward: [(0, '4.701')] [2023-02-27 11:32:25,095][00394] Fps is (10 sec: 3270.6, 60 sec: 3480.5, 300 sec: 3318.2). Total num frames: 4993024. Throughput: 0: 895.9. Samples: 245400. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-27 11:32:25,098][00394] Avg episode reward: [(0, '4.740')] [2023-02-27 11:32:26,727][28720] Updated weights for policy 0, policy_version 1220 (0.0012) [2023-02-27 11:32:30,076][00394] Fps is (10 sec: 2867.2, 60 sec: 3345.1, 300 sec: 3346.2). Total num frames: 5005312. Throughput: 0: 850.9. Samples: 247458. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-27 11:32:30,079][00394] Avg episode reward: [(0, '4.675')] [2023-02-27 11:32:35,077][00394] Fps is (10 sec: 3693.4, 60 sec: 3413.4, 300 sec: 3387.9). Total num frames: 5029888. Throughput: 0: 870.1. Samples: 253610. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-27 11:32:35,080][00394] Avg episode reward: [(0, '4.383')] [2023-02-27 11:32:36,903][28720] Updated weights for policy 0, policy_version 1230 (0.0021) [2023-02-27 11:32:40,079][00394] Fps is (10 sec: 4094.9, 60 sec: 3481.5, 300 sec: 3401.7). Total num frames: 5046272. Throughput: 0: 798.8. Samples: 256764. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 11:32:40,081][00394] Avg episode reward: [(0, '4.505')] [2023-02-27 11:32:45,077][00394] Fps is (10 sec: 3276.5, 60 sec: 3481.6, 300 sec: 3401.8). Total num frames: 5062656. Throughput: 0: 852.2. Samples: 261530. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 11:32:45,080][00394] Avg episode reward: [(0, '4.537')] [2023-02-27 11:32:50,078][00394] Fps is (10 sec: 2867.4, 60 sec: 3413.2, 300 sec: 3374.0). Total num frames: 5074944. Throughput: 0: 849.7. Samples: 265514. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 11:32:50,090][00394] Avg episode reward: [(0, '4.521')] [2023-02-27 11:32:50,104][28704] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001239_5074944.pth... [2023-02-27 11:32:50,409][28704] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001041_4263936.pth [2023-02-27 11:32:50,835][28720] Updated weights for policy 0, policy_version 1240 (0.0022) [2023-02-27 11:32:55,077][00394] Fps is (10 sec: 3277.1, 60 sec: 3345.3, 300 sec: 3374.0). Total num frames: 5095424. Throughput: 0: 865.6. Samples: 268202. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 11:32:55,083][00394] Avg episode reward: [(0, '4.492')] [2023-02-27 11:33:00,076][00394] Fps is (10 sec: 4096.8, 60 sec: 3481.6, 300 sec: 3401.8). Total num frames: 5115904. Throughput: 0: 855.4. Samples: 274316. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-27 11:33:00,078][00394] Avg episode reward: [(0, '4.708')] [2023-02-27 11:33:00,795][28720] Updated weights for policy 0, policy_version 1250 (0.0028) [2023-02-27 11:33:05,079][00394] Fps is (10 sec: 3275.8, 60 sec: 3413.2, 300 sec: 3401.7). Total num frames: 5128192. Throughput: 0: 840.1. Samples: 279308. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 11:33:05,088][00394] Avg episode reward: [(0, '4.976')] [2023-02-27 11:33:10,077][00394] Fps is (10 sec: 2867.2, 60 sec: 3413.3, 300 sec: 3374.0). Total num frames: 5144576. Throughput: 0: 799.4. Samples: 281360. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 11:33:10,084][00394] Avg episode reward: [(0, '4.947')] [2023-02-27 11:33:14,284][28720] Updated weights for policy 0, policy_version 1260 (0.0019) [2023-02-27 11:33:15,076][00394] Fps is (10 sec: 3277.8, 60 sec: 3345.1, 300 sec: 3374.0). Total num frames: 5160960. Throughput: 0: 856.1. Samples: 285984. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 11:33:15,084][00394] Avg episode reward: [(0, '4.783')] [2023-02-27 11:33:20,076][00394] Fps is (10 sec: 4096.0, 60 sec: 3481.6, 300 sec: 3401.8). Total num frames: 5185536. Throughput: 0: 863.6. Samples: 292474. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-27 11:33:20,085][00394] Avg episode reward: [(0, '4.546')] [2023-02-27 11:33:25,077][00394] Fps is (10 sec: 3686.0, 60 sec: 3414.4, 300 sec: 3401.8). Total num frames: 5197824. Throughput: 0: 858.3. Samples: 295386. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 11:33:25,083][00394] Avg episode reward: [(0, '4.616')] [2023-02-27 11:33:25,510][28720] Updated weights for policy 0, policy_version 1270 (0.0023) [2023-02-27 11:33:30,077][00394] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3387.9). Total num frames: 5214208. Throughput: 0: 840.6. Samples: 299358. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 11:33:30,084][00394] Avg episode reward: [(0, '4.631')] [2023-02-27 11:33:35,076][00394] Fps is (10 sec: 3277.1, 60 sec: 3345.1, 300 sec: 3374.0). Total num frames: 5230592. Throughput: 0: 851.0. Samples: 303808. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 11:33:35,086][00394] Avg episode reward: [(0, '4.687')] [2023-02-27 11:33:37,723][28720] Updated weights for policy 0, policy_version 1280 (0.0017) [2023-02-27 11:33:40,076][00394] Fps is (10 sec: 3686.4, 60 sec: 3413.5, 300 sec: 3387.9). Total num frames: 5251072. Throughput: 0: 937.1. Samples: 310372. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 11:33:40,079][00394] Avg episode reward: [(0, '4.622')] [2023-02-27 11:33:45,076][00394] Fps is (10 sec: 4096.0, 60 sec: 3481.7, 300 sec: 3415.7). Total num frames: 5271552. Throughput: 0: 873.0. Samples: 313600. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 11:33:45,084][00394] Avg episode reward: [(0, '4.765')] [2023-02-27 11:33:49,544][28720] Updated weights for policy 0, policy_version 1290 (0.0018) [2023-02-27 11:33:50,076][00394] Fps is (10 sec: 3276.8, 60 sec: 3481.7, 300 sec: 3401.8). Total num frames: 5283840. Throughput: 0: 851.2. Samples: 317608. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-27 11:33:50,083][00394] Avg episode reward: [(0, '4.712')] [2023-02-27 11:33:55,077][00394] Fps is (10 sec: 2457.6, 60 sec: 3345.1, 300 sec: 3374.0). Total num frames: 5296128. Throughput: 0: 850.7. Samples: 319640. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-27 11:33:55,085][00394] Avg episode reward: [(0, '4.666')] [2023-02-27 11:34:00,077][00394] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3387.9). Total num frames: 5316608. Throughput: 0: 867.6. Samples: 325026. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 11:34:00,084][00394] Avg episode reward: [(0, '4.550')] [2023-02-27 11:34:01,218][28720] Updated weights for policy 0, policy_version 1300 (0.0012) [2023-02-27 11:34:05,076][00394] Fps is (10 sec: 4096.0, 60 sec: 3481.8, 300 sec: 3401.8). Total num frames: 5337088. Throughput: 0: 867.8. Samples: 331524. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-27 11:34:05,082][00394] Avg episode reward: [(0, '4.653')] [2023-02-27 11:34:10,076][00394] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3415.6). Total num frames: 5353472. Throughput: 0: 847.5. Samples: 333522. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-27 11:34:10,079][00394] Avg episode reward: [(0, '4.766')] [2023-02-27 11:34:13,902][28720] Updated weights for policy 0, policy_version 1310 (0.0013) [2023-02-27 11:34:15,077][00394] Fps is (10 sec: 2867.0, 60 sec: 3413.3, 300 sec: 3387.9). Total num frames: 5365760. Throughput: 0: 852.6. Samples: 337724. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-27 11:34:15,087][00394] Avg episode reward: [(0, '4.831')] [2023-02-27 11:34:20,077][00394] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3387.9). Total num frames: 5386240. Throughput: 0: 878.4. Samples: 343334. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 11:34:20,087][00394] Avg episode reward: [(0, '4.664')] [2023-02-27 11:34:24,267][28720] Updated weights for policy 0, policy_version 1320 (0.0024) [2023-02-27 11:34:25,078][00394] Fps is (10 sec: 4095.7, 60 sec: 3481.6, 300 sec: 3415.6). Total num frames: 5406720. Throughput: 0: 874.5. Samples: 349724. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 11:34:25,084][00394] Avg episode reward: [(0, '4.430')] [2023-02-27 11:34:30,079][00394] Fps is (10 sec: 3685.6, 60 sec: 3481.5, 300 sec: 3429.5). Total num frames: 5423104. Throughput: 0: 848.0. Samples: 351760. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-27 11:34:30,083][00394] Avg episode reward: [(0, '4.522')] [2023-02-27 11:34:35,077][00394] Fps is (10 sec: 2867.6, 60 sec: 3413.3, 300 sec: 3401.8). Total num frames: 5435392. Throughput: 0: 849.6. Samples: 355842. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-27 11:34:35,079][00394] Avg episode reward: [(0, '4.574')] [2023-02-27 11:34:38,069][28720] Updated weights for policy 0, policy_version 1330 (0.0017) [2023-02-27 11:34:40,076][00394] Fps is (10 sec: 3277.5, 60 sec: 3413.3, 300 sec: 3401.8). Total num frames: 5455872. Throughput: 0: 925.1. Samples: 361270. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-27 11:34:40,079][00394] Avg episode reward: [(0, '4.510')] [2023-02-27 11:34:45,077][00394] Fps is (10 sec: 4096.0, 60 sec: 3413.3, 300 sec: 3415.6). Total num frames: 5476352. Throughput: 0: 876.4. Samples: 364462. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 11:34:45,084][00394] Avg episode reward: [(0, '4.484')] [2023-02-27 11:34:48,202][28720] Updated weights for policy 0, policy_version 1340 (0.0014) [2023-02-27 11:34:50,077][00394] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3429.5). Total num frames: 5492736. Throughput: 0: 852.7. Samples: 369896. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-27 11:34:50,081][00394] Avg episode reward: [(0, '4.566')] [2023-02-27 11:34:50,097][28704] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001341_5492736.pth... [2023-02-27 11:34:50,439][28704] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001141_4673536.pth [2023-02-27 11:34:55,083][00394] Fps is (10 sec: 2865.4, 60 sec: 3481.2, 300 sec: 3415.6). Total num frames: 5505024. Throughput: 0: 852.6. Samples: 371894. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 11:34:55,088][00394] Avg episode reward: [(0, '4.740')] [2023-02-27 11:35:00,076][00394] Fps is (10 sec: 2867.2, 60 sec: 3413.3, 300 sec: 3401.8). Total num frames: 5521408. Throughput: 0: 846.3. Samples: 375806. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 11:35:00,079][00394] Avg episode reward: [(0, '4.383')] [2023-02-27 11:35:01,519][28720] Updated weights for policy 0, policy_version 1350 (0.0019) [2023-02-27 11:35:05,076][00394] Fps is (10 sec: 3688.8, 60 sec: 3413.3, 300 sec: 3401.8). Total num frames: 5541888. Throughput: 0: 864.9. Samples: 382254. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 11:35:05,084][00394] Avg episode reward: [(0, '4.639')] [2023-02-27 11:35:10,076][00394] Fps is (10 sec: 4096.0, 60 sec: 3481.6, 300 sec: 3429.5). Total num frames: 5562368. Throughput: 0: 795.3. Samples: 385512. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 11:35:10,079][00394] Avg episode reward: [(0, '4.791')] [2023-02-27 11:35:12,630][28720] Updated weights for policy 0, policy_version 1360 (0.0014) [2023-02-27 11:35:15,076][00394] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3429.5). Total num frames: 5574656. Throughput: 0: 848.6. Samples: 389946. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 11:35:15,092][00394] Avg episode reward: [(0, '4.585')] [2023-02-27 11:35:20,076][00394] Fps is (10 sec: 2867.2, 60 sec: 3413.3, 300 sec: 3415.7). Total num frames: 5591040. Throughput: 0: 844.5. Samples: 393844. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 11:35:20,087][00394] Avg episode reward: [(0, '4.543')] [2023-02-27 11:35:24,910][28720] Updated weights for policy 0, policy_version 1370 (0.0029) [2023-02-27 11:35:25,077][00394] Fps is (10 sec: 3686.2, 60 sec: 3413.4, 300 sec: 3415.7). Total num frames: 5611520. Throughput: 0: 862.8. Samples: 400096. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 11:35:25,085][00394] Avg episode reward: [(0, '4.536')] [2023-02-27 11:35:30,076][00394] Fps is (10 sec: 4096.0, 60 sec: 3481.7, 300 sec: 3429.5). Total num frames: 5632000. Throughput: 0: 862.0. Samples: 403254. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 11:35:30,085][00394] Avg episode reward: [(0, '4.557')] [2023-02-27 11:35:35,076][00394] Fps is (10 sec: 3277.0, 60 sec: 3481.6, 300 sec: 3429.5). Total num frames: 5644288. Throughput: 0: 841.2. Samples: 407748. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 11:35:35,082][00394] Avg episode reward: [(0, '4.731')] [2023-02-27 11:35:37,573][28720] Updated weights for policy 0, policy_version 1380 (0.0021) [2023-02-27 11:35:40,077][00394] Fps is (10 sec: 2457.5, 60 sec: 3345.0, 300 sec: 3401.8). Total num frames: 5656576. Throughput: 0: 843.4. Samples: 409844. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 11:35:40,090][00394] Avg episode reward: [(0, '4.603')] [2023-02-27 11:35:45,076][00394] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3401.8). Total num frames: 5677056. Throughput: 0: 871.5. Samples: 415022. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 11:35:45,080][00394] Avg episode reward: [(0, '4.480')] [2023-02-27 11:35:48,196][28720] Updated weights for policy 0, policy_version 1390 (0.0018) [2023-02-27 11:35:50,076][00394] Fps is (10 sec: 4505.9, 60 sec: 3481.6, 300 sec: 3443.4). Total num frames: 5701632. Throughput: 0: 870.8. Samples: 421442. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 11:35:50,079][00394] Avg episode reward: [(0, '4.531')] [2023-02-27 11:35:55,076][00394] Fps is (10 sec: 3686.4, 60 sec: 3482.0, 300 sec: 3443.4). Total num frames: 5713920. Throughput: 0: 852.7. Samples: 423884. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 11:35:55,086][00394] Avg episode reward: [(0, '4.624')] [2023-02-27 11:36:00,077][00394] Fps is (10 sec: 2457.6, 60 sec: 3413.3, 300 sec: 3429.5). Total num frames: 5726208. Throughput: 0: 846.2. Samples: 428024. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 11:36:00,083][00394] Avg episode reward: [(0, '4.870')] [2023-02-27 11:36:01,747][28720] Updated weights for policy 0, policy_version 1400 (0.0038) [2023-02-27 11:36:05,076][00394] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3415.7). Total num frames: 5746688. Throughput: 0: 871.1. Samples: 433044. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-27 11:36:05,082][00394] Avg episode reward: [(0, '5.034')] [2023-02-27 11:36:10,076][00394] Fps is (10 sec: 4096.0, 60 sec: 3413.3, 300 sec: 3429.6). Total num frames: 5767168. Throughput: 0: 802.9. Samples: 436228. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 11:36:10,079][00394] Avg episode reward: [(0, '4.836')] [2023-02-27 11:36:11,278][28720] Updated weights for policy 0, policy_version 1410 (0.0012) [2023-02-27 11:36:15,079][00394] Fps is (10 sec: 3685.4, 60 sec: 3481.4, 300 sec: 3443.4). Total num frames: 5783552. Throughput: 0: 865.5. Samples: 442204. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 11:36:15,082][00394] Avg episode reward: [(0, '4.974')] [2023-02-27 11:36:20,076][00394] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3443.4). Total num frames: 5799936. Throughput: 0: 857.8. Samples: 446348. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 11:36:20,083][00394] Avg episode reward: [(0, '4.792')] [2023-02-27 11:36:25,040][28720] Updated weights for policy 0, policy_version 1420 (0.0013) [2023-02-27 11:36:25,076][00394] Fps is (10 sec: 3277.7, 60 sec: 3413.4, 300 sec: 3429.5). Total num frames: 5816320. Throughput: 0: 918.5. Samples: 451178. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-27 11:36:25,082][00394] Avg episode reward: [(0, '4.773')] [2023-02-27 11:36:30,076][00394] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3429.5). Total num frames: 5836800. Throughput: 0: 873.7. Samples: 454340. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 11:36:30,079][00394] Avg episode reward: [(0, '4.719')] [2023-02-27 11:36:35,076][00394] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3443.4). Total num frames: 5853184. Throughput: 0: 867.1. Samples: 460462. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 11:36:35,080][00394] Avg episode reward: [(0, '4.703')] [2023-02-27 11:36:35,416][28720] Updated weights for policy 0, policy_version 1430 (0.0016) [2023-02-27 11:36:40,095][00394] Fps is (10 sec: 3270.7, 60 sec: 3548.8, 300 sec: 3443.2). Total num frames: 5869568. Throughput: 0: 856.8. Samples: 462456. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 11:36:40,098][00394] Avg episode reward: [(0, '4.611')] [2023-02-27 11:36:45,086][00394] Fps is (10 sec: 2864.6, 60 sec: 3412.8, 300 sec: 3429.4). Total num frames: 5881856. Throughput: 0: 855.2. Samples: 466516. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 11:36:45,088][00394] Avg episode reward: [(0, '4.491')] [2023-02-27 11:36:48,421][28720] Updated weights for policy 0, policy_version 1440 (0.0012) [2023-02-27 11:36:50,077][00394] Fps is (10 sec: 3282.8, 60 sec: 3345.1, 300 sec: 3415.7). Total num frames: 5902336. Throughput: 0: 872.8. Samples: 472318. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 11:36:50,080][00394] Avg episode reward: [(0, '4.489')] [2023-02-27 11:36:50,097][28704] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001441_5902336.pth... [2023-02-27 11:36:50,394][28704] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001239_5074944.pth [2023-02-27 11:36:55,076][00394] Fps is (10 sec: 4099.7, 60 sec: 3481.6, 300 sec: 3443.4). Total num frames: 5922816. Throughput: 0: 933.0. Samples: 478212. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 11:36:55,079][00394] Avg episode reward: [(0, '4.634')] [2023-02-27 11:37:00,079][00394] Fps is (10 sec: 3276.0, 60 sec: 3481.5, 300 sec: 3429.5). Total num frames: 5935104. Throughput: 0: 844.2. Samples: 480192. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 11:37:00,082][00394] Avg episode reward: [(0, '4.715')] [2023-02-27 11:37:00,515][28720] Updated weights for policy 0, policy_version 1450 (0.0042) [2023-02-27 11:37:05,078][00394] Fps is (10 sec: 2457.2, 60 sec: 3345.0, 300 sec: 3415.6). Total num frames: 5947392. Throughput: 0: 842.7. Samples: 484272. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 11:37:05,081][00394] Avg episode reward: [(0, '4.802')] [2023-02-27 11:37:10,076][00394] Fps is (10 sec: 3687.4, 60 sec: 3413.3, 300 sec: 3429.5). Total num frames: 5971968. Throughput: 0: 791.8. Samples: 486808. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 11:37:10,079][00394] Avg episode reward: [(0, '4.684')] [2023-02-27 11:37:11,963][28720] Updated weights for policy 0, policy_version 1460 (0.0020) [2023-02-27 11:37:15,076][00394] Fps is (10 sec: 4506.3, 60 sec: 3481.8, 300 sec: 3443.4). Total num frames: 5992448. Throughput: 0: 865.8. Samples: 493302. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 11:37:15,084][00394] Avg episode reward: [(0, '4.776')] [2023-02-27 11:37:20,085][00394] Fps is (10 sec: 3274.0, 60 sec: 3412.8, 300 sec: 3429.7). Total num frames: 6004736. Throughput: 0: 843.6. Samples: 498430. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-27 11:37:20,090][00394] Avg episode reward: [(0, '4.756')] [2023-02-27 11:37:24,817][28720] Updated weights for policy 0, policy_version 1470 (0.0019) [2023-02-27 11:37:25,077][00394] Fps is (10 sec: 2867.0, 60 sec: 3413.3, 300 sec: 3443.4). Total num frames: 6021120. Throughput: 0: 844.1. Samples: 500426. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 11:37:25,084][00394] Avg episode reward: [(0, '4.636')] [2023-02-27 11:37:30,077][00394] Fps is (10 sec: 3279.5, 60 sec: 3345.1, 300 sec: 3415.6). Total num frames: 6037504. Throughput: 0: 852.7. Samples: 504882. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 11:37:30,079][00394] Avg episode reward: [(0, '4.713')] [2023-02-27 11:37:35,076][00394] Fps is (10 sec: 3686.7, 60 sec: 3413.3, 300 sec: 3429.6). Total num frames: 6057984. Throughput: 0: 870.4. Samples: 511488. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 11:37:35,079][00394] Avg episode reward: [(0, '4.634')] [2023-02-27 11:37:35,245][28720] Updated weights for policy 0, policy_version 1480 (0.0015) [2023-02-27 11:37:40,077][00394] Fps is (10 sec: 4096.1, 60 sec: 3482.7, 300 sec: 3443.4). Total num frames: 6078464. Throughput: 0: 811.6. Samples: 514732. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 11:37:40,079][00394] Avg episode reward: [(0, '4.674')] [2023-02-27 11:37:45,080][00394] Fps is (10 sec: 3275.6, 60 sec: 3481.9, 300 sec: 3443.4). Total num frames: 6090752. Throughput: 0: 857.5. Samples: 518782. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-27 11:37:45,083][00394] Avg episode reward: [(0, '4.676')] [2023-02-27 11:37:48,818][28720] Updated weights for policy 0, policy_version 1490 (0.0019) [2023-02-27 11:37:50,076][00394] Fps is (10 sec: 2867.2, 60 sec: 3413.3, 300 sec: 3429.5). Total num frames: 6107136. Throughput: 0: 865.1. Samples: 523200. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-27 11:37:50,079][00394] Avg episode reward: [(0, '4.782')] [2023-02-27 11:37:55,076][00394] Fps is (10 sec: 3687.7, 60 sec: 3413.3, 300 sec: 3429.5). Total num frames: 6127616. Throughput: 0: 881.2. Samples: 526462. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 11:37:55,079][00394] Avg episode reward: [(0, '4.692')] [2023-02-27 11:37:58,232][28720] Updated weights for policy 0, policy_version 1500 (0.0026) [2023-02-27 11:38:00,081][00394] Fps is (10 sec: 4094.1, 60 sec: 3549.7, 300 sec: 3457.3). Total num frames: 6148096. Throughput: 0: 880.1. Samples: 532910. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 11:38:00,083][00394] Avg episode reward: [(0, '4.593')] [2023-02-27 11:38:05,083][00394] Fps is (10 sec: 3274.6, 60 sec: 3549.6, 300 sec: 3443.3). Total num frames: 6160384. Throughput: 0: 859.8. Samples: 537120. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 11:38:05,093][00394] Avg episode reward: [(0, '4.639')] [2023-02-27 11:38:10,076][00394] Fps is (10 sec: 2868.5, 60 sec: 3413.3, 300 sec: 3443.4). Total num frames: 6176768. Throughput: 0: 861.7. Samples: 539200. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 11:38:10,083][00394] Avg episode reward: [(0, '4.565')] [2023-02-27 11:38:11,852][28720] Updated weights for policy 0, policy_version 1510 (0.0022) [2023-02-27 11:38:15,076][00394] Fps is (10 sec: 3688.9, 60 sec: 3413.3, 300 sec: 3429.5). Total num frames: 6197248. Throughput: 0: 885.2. Samples: 544716. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 11:38:15,079][00394] Avg episode reward: [(0, '4.348')] [2023-02-27 11:38:20,078][00394] Fps is (10 sec: 4095.9, 60 sec: 3550.4, 300 sec: 3457.3). Total num frames: 6217728. Throughput: 0: 883.1. Samples: 551230. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 11:38:20,085][00394] Avg episode reward: [(0, '4.572')] [2023-02-27 11:38:22,040][28720] Updated weights for policy 0, policy_version 1520 (0.0020) [2023-02-27 11:38:25,076][00394] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3443.4). Total num frames: 6230016. Throughput: 0: 858.8. Samples: 553376. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-27 11:38:25,086][00394] Avg episode reward: [(0, '4.750')] [2023-02-27 11:38:30,077][00394] Fps is (10 sec: 2867.3, 60 sec: 3481.6, 300 sec: 3443.4). Total num frames: 6246400. Throughput: 0: 855.6. Samples: 557280. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 11:38:30,083][00394] Avg episode reward: [(0, '4.910')] [2023-02-27 11:38:35,068][28720] Updated weights for policy 0, policy_version 1530 (0.0042) [2023-02-27 11:38:35,082][00394] Fps is (10 sec: 3684.3, 60 sec: 3481.3, 300 sec: 3443.4). Total num frames: 6266880. Throughput: 0: 876.9. Samples: 562666. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 11:38:35,089][00394] Avg episode reward: [(0, '4.823')] [2023-02-27 11:38:40,076][00394] Fps is (10 sec: 4096.0, 60 sec: 3481.6, 300 sec: 3443.4). Total num frames: 6287360. Throughput: 0: 876.5. Samples: 565906. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 11:38:40,083][00394] Avg episode reward: [(0, '4.687')] [2023-02-27 11:38:45,077][00394] Fps is (10 sec: 3688.2, 60 sec: 3550.0, 300 sec: 3457.3). Total num frames: 6303744. Throughput: 0: 861.5. Samples: 571676. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-27 11:38:45,083][00394] Avg episode reward: [(0, '4.761')] [2023-02-27 11:38:46,402][28720] Updated weights for policy 0, policy_version 1540 (0.0023) [2023-02-27 11:38:50,077][00394] Fps is (10 sec: 2867.0, 60 sec: 3481.5, 300 sec: 3457.3). Total num frames: 6316032. Throughput: 0: 859.2. Samples: 575780. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-27 11:38:50,086][00394] Avg episode reward: [(0, '4.670')] [2023-02-27 11:38:50,102][28704] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001542_6316032.pth... [2023-02-27 11:38:50,446][28704] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001341_5492736.pth [2023-02-27 11:38:55,077][00394] Fps is (10 sec: 2867.2, 60 sec: 3413.3, 300 sec: 3443.4). Total num frames: 6332416. Throughput: 0: 927.2. Samples: 580924. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 11:38:55,081][00394] Avg episode reward: [(0, '4.694')] [2023-02-27 11:38:58,255][28720] Updated weights for policy 0, policy_version 1550 (0.0030) [2023-02-27 11:39:00,076][00394] Fps is (10 sec: 3686.7, 60 sec: 3413.6, 300 sec: 3443.4). Total num frames: 6352896. Throughput: 0: 874.1. Samples: 584052. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-27 11:39:00,080][00394] Avg episode reward: [(0, '4.717')] [2023-02-27 11:39:05,076][00394] Fps is (10 sec: 4096.4, 60 sec: 3550.3, 300 sec: 3457.3). Total num frames: 6373376. Throughput: 0: 859.6. Samples: 589910. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 11:39:05,084][00394] Avg episode reward: [(0, '4.546')] [2023-02-27 11:39:10,076][00394] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3457.3). Total num frames: 6385664. Throughput: 0: 858.8. Samples: 592022. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 11:39:10,084][00394] Avg episode reward: [(0, '4.630')] [2023-02-27 11:39:10,434][28720] Updated weights for policy 0, policy_version 1560 (0.0012) [2023-02-27 11:39:15,077][00394] Fps is (10 sec: 2867.1, 60 sec: 3413.3, 300 sec: 3443.4). Total num frames: 6402048. Throughput: 0: 863.9. Samples: 596158. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-27 11:39:15,091][00394] Avg episode reward: [(0, '4.526')] [2023-02-27 11:39:20,076][00394] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3443.4). Total num frames: 6422528. Throughput: 0: 884.3. Samples: 602454. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-27 11:39:20,081][00394] Avg episode reward: [(0, '4.591')] [2023-02-27 11:39:21,284][28720] Updated weights for policy 0, policy_version 1570 (0.0012) [2023-02-27 11:39:25,076][00394] Fps is (10 sec: 4096.2, 60 sec: 3549.9, 300 sec: 3457.3). Total num frames: 6443008. Throughput: 0: 883.4. Samples: 605658. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-27 11:39:25,079][00394] Avg episode reward: [(0, '4.671')] [2023-02-27 11:39:30,076][00394] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3457.3). Total num frames: 6455296. Throughput: 0: 857.8. Samples: 610278. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-27 11:39:30,088][00394] Avg episode reward: [(0, '4.647')] [2023-02-27 11:39:35,026][28720] Updated weights for policy 0, policy_version 1580 (0.0029) [2023-02-27 11:39:35,077][00394] Fps is (10 sec: 2867.2, 60 sec: 3413.7, 300 sec: 3443.4). Total num frames: 6471680. Throughput: 0: 855.3. Samples: 614270. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 11:39:35,084][00394] Avg episode reward: [(0, '4.691')] [2023-02-27 11:39:40,077][00394] Fps is (10 sec: 3686.2, 60 sec: 3413.3, 300 sec: 3443.4). Total num frames: 6492160. Throughput: 0: 804.6. Samples: 617130. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 11:39:40,079][00394] Avg episode reward: [(0, '4.726')] [2023-02-27 11:39:44,571][28720] Updated weights for policy 0, policy_version 1590 (0.0013) [2023-02-27 11:39:45,079][00394] Fps is (10 sec: 4094.9, 60 sec: 3481.5, 300 sec: 3457.3). Total num frames: 6512640. Throughput: 0: 882.3. Samples: 623756. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-27 11:39:45,095][00394] Avg episode reward: [(0, '4.731')] [2023-02-27 11:39:50,077][00394] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3457.4). Total num frames: 6524928. Throughput: 0: 856.7. Samples: 628460. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-27 11:39:50,086][00394] Avg episode reward: [(0, '4.881')] [2023-02-27 11:39:55,087][00394] Fps is (10 sec: 2865.1, 60 sec: 3481.1, 300 sec: 3457.2). Total num frames: 6541312. Throughput: 0: 854.8. Samples: 630496. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 11:39:55,095][00394] Avg episode reward: [(0, '4.934')] [2023-02-27 11:39:58,254][28720] Updated weights for policy 0, policy_version 1600 (0.0028) [2023-02-27 11:40:00,077][00394] Fps is (10 sec: 3276.9, 60 sec: 3413.3, 300 sec: 3443.4). Total num frames: 6557696. Throughput: 0: 868.9. Samples: 635258. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 11:40:00,086][00394] Avg episode reward: [(0, '4.927')] [2023-02-27 11:40:05,076][00394] Fps is (10 sec: 4100.2, 60 sec: 3481.6, 300 sec: 3457.3). Total num frames: 6582272. Throughput: 0: 872.3. Samples: 641708. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 11:40:05,078][00394] Avg episode reward: [(0, '4.567')] [2023-02-27 11:40:08,505][28720] Updated weights for policy 0, policy_version 1610 (0.0015) [2023-02-27 11:40:10,081][00394] Fps is (10 sec: 4094.0, 60 sec: 3549.6, 300 sec: 3471.1). Total num frames: 6598656. Throughput: 0: 912.7. Samples: 646732. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 11:40:10,089][00394] Avg episode reward: [(0, '4.613')] [2023-02-27 11:40:15,077][00394] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3457.3). Total num frames: 6610944. Throughput: 0: 858.0. Samples: 648888. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 11:40:15,086][00394] Avg episode reward: [(0, '4.720')] [2023-02-27 11:40:20,076][00394] Fps is (10 sec: 2868.6, 60 sec: 3413.3, 300 sec: 3443.4). Total num frames: 6627328. Throughput: 0: 875.3. Samples: 653658. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-27 11:40:20,083][00394] Avg episode reward: [(0, '4.892')] [2023-02-27 11:40:21,184][28720] Updated weights for policy 0, policy_version 1620 (0.0017) [2023-02-27 11:40:25,076][00394] Fps is (10 sec: 4096.1, 60 sec: 3481.6, 300 sec: 3457.3). Total num frames: 6651904. Throughput: 0: 957.2. Samples: 660202. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 11:40:25,084][00394] Avg episode reward: [(0, '4.741')] [2023-02-27 11:40:30,077][00394] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3471.2). Total num frames: 6668288. Throughput: 0: 873.2. Samples: 663046. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-27 11:40:30,085][00394] Avg episode reward: [(0, '4.600')] [2023-02-27 11:40:32,646][28720] Updated weights for policy 0, policy_version 1630 (0.0023) [2023-02-27 11:40:35,076][00394] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3471.2). Total num frames: 6680576. Throughput: 0: 860.0. Samples: 667158. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 11:40:35,082][00394] Avg episode reward: [(0, '4.566')] [2023-02-27 11:40:40,076][00394] Fps is (10 sec: 2867.2, 60 sec: 3413.4, 300 sec: 3457.3). Total num frames: 6696960. Throughput: 0: 861.0. Samples: 669234. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 11:40:40,079][00394] Avg episode reward: [(0, '4.445')] [2023-02-27 11:40:44,037][28720] Updated weights for policy 0, policy_version 1640 (0.0017) [2023-02-27 11:40:45,076][00394] Fps is (10 sec: 4096.0, 60 sec: 3481.8, 300 sec: 3457.3). Total num frames: 6721536. Throughput: 0: 889.3. Samples: 675276. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-27 11:40:45,078][00394] Avg episode reward: [(0, '4.677')] [2023-02-27 11:40:50,077][00394] Fps is (10 sec: 4095.9, 60 sec: 3549.9, 300 sec: 3471.2). Total num frames: 6737920. Throughput: 0: 884.9. Samples: 681528. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 11:40:50,083][00394] Avg episode reward: [(0, '4.826')] [2023-02-27 11:40:50,093][28704] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001645_6737920.pth... [2023-02-27 11:40:50,410][28704] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001441_5902336.pth [2023-02-27 11:40:55,076][00394] Fps is (10 sec: 2867.2, 60 sec: 3482.2, 300 sec: 3471.2). Total num frames: 6750208. Throughput: 0: 863.3. Samples: 685574. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 11:40:55,084][00394] Avg episode reward: [(0, '4.844')] [2023-02-27 11:40:56,720][28720] Updated weights for policy 0, policy_version 1650 (0.0027) [2023-02-27 11:41:00,077][00394] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3457.3). Total num frames: 6766592. Throughput: 0: 861.1. Samples: 687636. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 11:41:00,080][00394] Avg episode reward: [(0, '4.546')] [2023-02-27 11:41:05,076][00394] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3457.3). Total num frames: 6787072. Throughput: 0: 886.9. Samples: 693568. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-27 11:41:05,089][00394] Avg episode reward: [(0, '4.725')] [2023-02-27 11:41:07,052][28720] Updated weights for policy 0, policy_version 1660 (0.0013) [2023-02-27 11:41:10,077][00394] Fps is (10 sec: 4095.8, 60 sec: 3481.9, 300 sec: 3471.2). Total num frames: 6807552. Throughput: 0: 812.8. Samples: 696778. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-27 11:41:10,086][00394] Avg episode reward: [(0, '4.788')] [2023-02-27 11:41:15,076][00394] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3471.2). Total num frames: 6823936. Throughput: 0: 866.4. Samples: 702032. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 11:41:15,090][00394] Avg episode reward: [(0, '4.888')] [2023-02-27 11:41:20,077][00394] Fps is (10 sec: 2867.3, 60 sec: 3481.6, 300 sec: 3457.3). Total num frames: 6836224. Throughput: 0: 865.6. Samples: 706112. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-27 11:41:20,084][00394] Avg episode reward: [(0, '4.878')] [2023-02-27 11:41:20,761][28720] Updated weights for policy 0, policy_version 1670 (0.0033) [2023-02-27 11:41:25,076][00394] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3457.3). Total num frames: 6856704. Throughput: 0: 875.2. Samples: 708616. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 11:41:25,098][00394] Avg episode reward: [(0, '4.636')] [2023-02-27 11:41:30,079][00394] Fps is (10 sec: 4095.0, 60 sec: 3481.5, 300 sec: 3471.2). Total num frames: 6877184. Throughput: 0: 882.6. Samples: 714994. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-27 11:41:30,082][00394] Avg episode reward: [(0, '4.630')] [2023-02-27 11:41:30,204][28720] Updated weights for policy 0, policy_version 1680 (0.0014) [2023-02-27 11:41:35,077][00394] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3471.4). Total num frames: 6893568. Throughput: 0: 857.5. Samples: 720116. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 11:41:35,082][00394] Avg episode reward: [(0, '4.545')] [2023-02-27 11:41:40,077][00394] Fps is (10 sec: 2867.9, 60 sec: 3481.6, 300 sec: 3471.3). Total num frames: 6905856. Throughput: 0: 811.9. Samples: 722108. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-27 11:41:40,083][00394] Avg episode reward: [(0, '4.724')] [2023-02-27 11:41:43,760][28720] Updated weights for policy 0, policy_version 1690 (0.0014) [2023-02-27 11:41:45,076][00394] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3471.2). Total num frames: 6926336. Throughput: 0: 870.0. Samples: 726786. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 11:41:45,081][00394] Avg episode reward: [(0, '4.553')] [2023-02-27 11:41:50,076][00394] Fps is (10 sec: 4096.1, 60 sec: 3481.6, 300 sec: 3471.2). Total num frames: 6946816. Throughput: 0: 886.4. Samples: 733456. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 11:41:50,078][00394] Avg episode reward: [(0, '4.512')] [2023-02-27 11:41:53,787][28720] Updated weights for policy 0, policy_version 1700 (0.0013) [2023-02-27 11:41:55,078][00394] Fps is (10 sec: 3686.0, 60 sec: 3549.8, 300 sec: 3485.1). Total num frames: 6963200. Throughput: 0: 885.9. Samples: 736646. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-27 11:41:55,081][00394] Avg episode reward: [(0, '4.643')] [2023-02-27 11:42:00,076][00394] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3499.0). Total num frames: 6979584. Throughput: 0: 861.9. Samples: 740818. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 11:42:00,083][00394] Avg episode reward: [(0, '4.595')] [2023-02-27 11:42:05,076][00394] Fps is (10 sec: 3277.2, 60 sec: 3481.6, 300 sec: 3471.2). Total num frames: 6995968. Throughput: 0: 869.2. Samples: 745228. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-27 11:42:05,085][00394] Avg episode reward: [(0, '4.417')] [2023-02-27 11:42:06,766][28720] Updated weights for policy 0, policy_version 1710 (0.0016) [2023-02-27 11:42:10,076][00394] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3471.2). Total num frames: 7016448. Throughput: 0: 885.6. Samples: 748470. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 11:42:10,079][00394] Avg episode reward: [(0, '4.591')] [2023-02-27 11:42:15,082][00394] Fps is (10 sec: 4093.8, 60 sec: 3549.5, 300 sec: 3499.0). Total num frames: 7036928. Throughput: 0: 891.1. Samples: 755096. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2023-02-27 11:42:15,084][00394] Avg episode reward: [(0, '4.621')] [2023-02-27 11:42:18,072][28720] Updated weights for policy 0, policy_version 1720 (0.0027) [2023-02-27 11:42:20,079][00394] Fps is (10 sec: 3275.9, 60 sec: 3549.7, 300 sec: 3485.0). Total num frames: 7049216. Throughput: 0: 866.2. Samples: 759096. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 11:42:20,082][00394] Avg episode reward: [(0, '4.649')] [2023-02-27 11:42:25,076][00394] Fps is (10 sec: 2458.9, 60 sec: 3413.3, 300 sec: 3471.2). Total num frames: 7061504. Throughput: 0: 918.7. Samples: 763448. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-27 11:42:25,079][00394] Avg episode reward: [(0, '4.756')] [2023-02-27 11:42:29,928][28720] Updated weights for policy 0, policy_version 1730 (0.0030) [2023-02-27 11:42:30,076][00394] Fps is (10 sec: 3687.4, 60 sec: 3481.7, 300 sec: 3485.1). Total num frames: 7086080. Throughput: 0: 884.5. Samples: 766590. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-27 11:42:30,079][00394] Avg episode reward: [(0, '4.815')] [2023-02-27 11:42:35,076][00394] Fps is (10 sec: 4505.6, 60 sec: 3549.9, 300 sec: 3485.1). Total num frames: 7106560. Throughput: 0: 883.9. Samples: 773232. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-27 11:42:35,079][00394] Avg episode reward: [(0, '4.763')] [2023-02-27 11:42:40,078][00394] Fps is (10 sec: 3276.4, 60 sec: 3549.8, 300 sec: 3485.1). Total num frames: 7118848. Throughput: 0: 858.0. Samples: 775254. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 11:42:40,081][00394] Avg episode reward: [(0, '4.646')] [2023-02-27 11:42:42,429][28720] Updated weights for policy 0, policy_version 1740 (0.0019) [2023-02-27 11:42:45,076][00394] Fps is (10 sec: 2457.6, 60 sec: 3413.3, 300 sec: 3471.2). Total num frames: 7131136. Throughput: 0: 857.5. Samples: 779404. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-27 11:42:45,085][00394] Avg episode reward: [(0, '4.664')] [2023-02-27 11:42:50,076][00394] Fps is (10 sec: 3277.2, 60 sec: 3413.3, 300 sec: 3471.2). Total num frames: 7151616. Throughput: 0: 883.5. Samples: 784984. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-27 11:42:50,082][00394] Avg episode reward: [(0, '4.687')] [2023-02-27 11:42:50,152][28704] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001747_7155712.pth... [2023-02-27 11:42:50,392][28704] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001542_6316032.pth [2023-02-27 11:42:53,004][28720] Updated weights for policy 0, policy_version 1750 (0.0015) [2023-02-27 11:42:55,077][00394] Fps is (10 sec: 4505.2, 60 sec: 3549.9, 300 sec: 3485.1). Total num frames: 7176192. Throughput: 0: 950.4. Samples: 791240. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 11:42:55,081][00394] Avg episode reward: [(0, '4.506')] [2023-02-27 11:43:00,079][00394] Fps is (10 sec: 3685.5, 60 sec: 3481.4, 300 sec: 3485.1). Total num frames: 7188480. Throughput: 0: 851.4. Samples: 793406. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 11:43:00,082][00394] Avg episode reward: [(0, '4.650')] [2023-02-27 11:43:05,077][00394] Fps is (10 sec: 2867.4, 60 sec: 3481.6, 300 sec: 3485.1). Total num frames: 7204864. Throughput: 0: 854.6. Samples: 797550. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 11:43:05,092][00394] Avg episode reward: [(0, '4.636')] [2023-02-27 11:43:06,682][28720] Updated weights for policy 0, policy_version 1760 (0.0025) [2023-02-27 11:43:10,076][00394] Fps is (10 sec: 3277.7, 60 sec: 3413.3, 300 sec: 3471.2). Total num frames: 7221248. Throughput: 0: 881.2. Samples: 803104. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-27 11:43:10,083][00394] Avg episode reward: [(0, '4.596')] [2023-02-27 11:43:15,076][00394] Fps is (10 sec: 3686.5, 60 sec: 3413.6, 300 sec: 3471.2). Total num frames: 7241728. Throughput: 0: 884.7. Samples: 806402. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-27 11:43:15,079][00394] Avg episode reward: [(0, '4.406')] [2023-02-27 11:43:15,909][28720] Updated weights for policy 0, policy_version 1770 (0.0017) [2023-02-27 11:43:20,076][00394] Fps is (10 sec: 4096.0, 60 sec: 3550.0, 300 sec: 3499.0). Total num frames: 7262208. Throughput: 0: 862.8. Samples: 812058. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-27 11:43:20,087][00394] Avg episode reward: [(0, '4.423')] [2023-02-27 11:43:25,076][00394] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3485.1). Total num frames: 7274496. Throughput: 0: 906.7. Samples: 816052. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-27 11:43:25,080][00394] Avg episode reward: [(0, '4.571')] [2023-02-27 11:43:29,656][28720] Updated weights for policy 0, policy_version 1780 (0.0021) [2023-02-27 11:43:30,077][00394] Fps is (10 sec: 2867.2, 60 sec: 3413.3, 300 sec: 3471.3). Total num frames: 7290880. Throughput: 0: 857.3. Samples: 817984. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-27 11:43:30,082][00394] Avg episode reward: [(0, '4.623')] [2023-02-27 11:43:35,077][00394] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3471.2). Total num frames: 7311360. Throughput: 0: 878.4. Samples: 824514. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-27 11:43:35,084][00394] Avg episode reward: [(0, '4.758')] [2023-02-27 11:43:39,808][28720] Updated weights for policy 0, policy_version 1790 (0.0031) [2023-02-27 11:43:40,077][00394] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3485.1). Total num frames: 7331840. Throughput: 0: 812.4. Samples: 827796. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-27 11:43:40,086][00394] Avg episode reward: [(0, '4.808')] [2023-02-27 11:43:45,076][00394] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3485.1). Total num frames: 7344128. Throughput: 0: 866.0. Samples: 832374. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-27 11:43:45,079][00394] Avg episode reward: [(0, '4.810')] [2023-02-27 11:43:50,076][00394] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3485.1). Total num frames: 7360512. Throughput: 0: 867.7. Samples: 836596. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 11:43:50,080][00394] Avg episode reward: [(0, '4.702')] [2023-02-27 11:43:52,502][28720] Updated weights for policy 0, policy_version 1800 (0.0025) [2023-02-27 11:43:55,081][00394] Fps is (10 sec: 3684.7, 60 sec: 3413.1, 300 sec: 3485.0). Total num frames: 7380992. Throughput: 0: 887.0. Samples: 843022. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 11:43:55,087][00394] Avg episode reward: [(0, '4.738')] [2023-02-27 11:44:00,078][00394] Fps is (10 sec: 4095.2, 60 sec: 3549.9, 300 sec: 3485.0). Total num frames: 7401472. Throughput: 0: 885.2. Samples: 846236. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 11:44:00,081][00394] Avg episode reward: [(0, '4.637')] [2023-02-27 11:44:03,866][28720] Updated weights for policy 0, policy_version 1810 (0.0020) [2023-02-27 11:44:05,077][00394] Fps is (10 sec: 3278.0, 60 sec: 3481.6, 300 sec: 3485.1). Total num frames: 7413760. Throughput: 0: 861.9. Samples: 850844. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-27 11:44:05,087][00394] Avg episode reward: [(0, '4.650')] [2023-02-27 11:44:10,077][00394] Fps is (10 sec: 2867.7, 60 sec: 3481.6, 300 sec: 3485.1). Total num frames: 7430144. Throughput: 0: 820.9. Samples: 852994. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-27 11:44:10,081][00394] Avg episode reward: [(0, '4.627')] [2023-02-27 11:44:15,076][00394] Fps is (10 sec: 3686.7, 60 sec: 3481.6, 300 sec: 3485.1). Total num frames: 7450624. Throughput: 0: 889.0. Samples: 857990. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 11:44:15,078][00394] Avg episode reward: [(0, '4.553')] [2023-02-27 11:44:15,543][28720] Updated weights for policy 0, policy_version 1820 (0.0017) [2023-02-27 11:44:20,076][00394] Fps is (10 sec: 4096.1, 60 sec: 3481.6, 300 sec: 3485.1). Total num frames: 7471104. Throughput: 0: 892.2. Samples: 864662. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-27 11:44:20,081][00394] Avg episode reward: [(0, '4.733')] [2023-02-27 11:44:25,077][00394] Fps is (10 sec: 3686.2, 60 sec: 3549.8, 300 sec: 3499.0). Total num frames: 7487488. Throughput: 0: 926.1. Samples: 869470. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-27 11:44:25,083][00394] Avg episode reward: [(0, '4.828')] [2023-02-27 11:44:27,349][28720] Updated weights for policy 0, policy_version 1830 (0.0013) [2023-02-27 11:44:30,077][00394] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3485.1). Total num frames: 7499776. Throughput: 0: 869.4. Samples: 871498. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-27 11:44:30,090][00394] Avg episode reward: [(0, '4.551')] [2023-02-27 11:44:35,080][00394] Fps is (10 sec: 3275.7, 60 sec: 3481.4, 300 sec: 3485.0). Total num frames: 7520256. Throughput: 0: 879.7. Samples: 876186. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-27 11:44:35,090][00394] Avg episode reward: [(0, '4.766')] [2023-02-27 11:44:38,583][28720] Updated weights for policy 0, policy_version 1840 (0.0013) [2023-02-27 11:44:40,079][00394] Fps is (10 sec: 4095.0, 60 sec: 3481.5, 300 sec: 3485.1). Total num frames: 7540736. Throughput: 0: 885.7. Samples: 882878. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 11:44:40,082][00394] Avg episode reward: [(0, '4.626')] [2023-02-27 11:44:45,076][00394] Fps is (10 sec: 3687.9, 60 sec: 3549.9, 300 sec: 3499.0). Total num frames: 7557120. Throughput: 0: 878.3. Samples: 885760. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 11:44:45,079][00394] Avg episode reward: [(0, '4.383')] [2023-02-27 11:44:50,076][00394] Fps is (10 sec: 3277.6, 60 sec: 3549.9, 300 sec: 3499.1). Total num frames: 7573504. Throughput: 0: 869.4. Samples: 889966. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 11:44:50,084][00394] Avg episode reward: [(0, '4.530')] [2023-02-27 11:44:50,105][28704] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001849_7573504.pth... [2023-02-27 11:44:50,532][28704] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001645_6737920.pth [2023-02-27 11:44:51,507][28720] Updated weights for policy 0, policy_version 1850 (0.0035) [2023-02-27 11:44:55,076][00394] Fps is (10 sec: 2867.2, 60 sec: 3413.6, 300 sec: 3485.1). Total num frames: 7585792. Throughput: 0: 865.1. Samples: 891924. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-27 11:44:55,079][00394] Avg episode reward: [(0, '4.588')] [2023-02-27 11:45:00,077][00394] Fps is (10 sec: 3686.4, 60 sec: 3481.7, 300 sec: 3485.1). Total num frames: 7610368. Throughput: 0: 882.9. Samples: 897720. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 11:45:00,081][00394] Avg episode reward: [(0, '4.450')] [2023-02-27 11:45:01,792][28720] Updated weights for policy 0, policy_version 1860 (0.0020) [2023-02-27 11:45:05,076][00394] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3485.1). Total num frames: 7626752. Throughput: 0: 874.5. Samples: 904014. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 11:45:05,081][00394] Avg episode reward: [(0, '4.608')] [2023-02-27 11:45:10,077][00394] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3499.0). Total num frames: 7643136. Throughput: 0: 813.3. Samples: 906066. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-27 11:45:10,086][00394] Avg episode reward: [(0, '4.638')] [2023-02-27 11:45:15,076][00394] Fps is (10 sec: 2867.2, 60 sec: 3413.3, 300 sec: 3485.1). Total num frames: 7655424. Throughput: 0: 860.6. Samples: 910226. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-27 11:45:15,079][00394] Avg episode reward: [(0, '4.559')] [2023-02-27 11:45:15,547][28720] Updated weights for policy 0, policy_version 1870 (0.0037) [2023-02-27 11:45:20,076][00394] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3471.2). Total num frames: 7675904. Throughput: 0: 881.9. Samples: 915870. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 11:45:20,085][00394] Avg episode reward: [(0, '4.603')] [2023-02-27 11:45:24,815][28720] Updated weights for policy 0, policy_version 1880 (0.0015) [2023-02-27 11:45:25,077][00394] Fps is (10 sec: 4505.5, 60 sec: 3549.9, 300 sec: 3499.0). Total num frames: 7700480. Throughput: 0: 806.7. Samples: 919178. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 11:45:25,081][00394] Avg episode reward: [(0, '4.621')] [2023-02-27 11:45:30,079][00394] Fps is (10 sec: 3685.4, 60 sec: 3549.7, 300 sec: 3498.9). Total num frames: 7712768. Throughput: 0: 857.3. Samples: 924342. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 11:45:30,090][00394] Avg episode reward: [(0, '4.505')] [2023-02-27 11:45:35,077][00394] Fps is (10 sec: 2457.5, 60 sec: 3413.5, 300 sec: 3485.1). Total num frames: 7725056. Throughput: 0: 854.2. Samples: 928406. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 11:45:35,081][00394] Avg episode reward: [(0, '4.492')] [2023-02-27 11:45:38,676][28720] Updated weights for policy 0, policy_version 1890 (0.0012) [2023-02-27 11:45:40,076][00394] Fps is (10 sec: 3277.6, 60 sec: 3413.5, 300 sec: 3471.2). Total num frames: 7745536. Throughput: 0: 935.1. Samples: 934002. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 11:45:40,078][00394] Avg episode reward: [(0, '4.602')] [2023-02-27 11:45:45,076][00394] Fps is (10 sec: 4096.3, 60 sec: 3481.6, 300 sec: 3485.1). Total num frames: 7766016. Throughput: 0: 880.2. Samples: 937328. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 11:45:45,079][00394] Avg episode reward: [(0, '4.650')] [2023-02-27 11:45:48,908][28720] Updated weights for policy 0, policy_version 1900 (0.0012) [2023-02-27 11:45:50,081][00394] Fps is (10 sec: 3684.7, 60 sec: 3481.3, 300 sec: 3498.9). Total num frames: 7782400. Throughput: 0: 860.3. Samples: 942732. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 11:45:50,083][00394] Avg episode reward: [(0, '4.519')] [2023-02-27 11:45:55,076][00394] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3499.0). Total num frames: 7798784. Throughput: 0: 861.7. Samples: 944844. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 11:45:55,080][00394] Avg episode reward: [(0, '4.551')] [2023-02-27 11:46:00,076][00394] Fps is (10 sec: 3278.3, 60 sec: 3413.3, 300 sec: 3485.1). Total num frames: 7815168. Throughput: 0: 864.0. Samples: 949104. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 11:46:00,084][00394] Avg episode reward: [(0, '4.654')] [2023-02-27 11:46:01,613][28720] Updated weights for policy 0, policy_version 1910 (0.0038) [2023-02-27 11:46:05,076][00394] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3485.1). Total num frames: 7835648. Throughput: 0: 886.8. Samples: 955776. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 11:46:05,079][00394] Avg episode reward: [(0, '4.678')] [2023-02-27 11:46:10,082][00394] Fps is (10 sec: 4093.6, 60 sec: 3549.5, 300 sec: 3498.9). Total num frames: 7856128. Throughput: 0: 937.0. Samples: 961346. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 11:46:10,086][00394] Avg episode reward: [(0, '4.685')] [2023-02-27 11:46:12,528][28720] Updated weights for policy 0, policy_version 1920 (0.0022) [2023-02-27 11:46:15,076][00394] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3499.0). Total num frames: 7868416. Throughput: 0: 869.1. Samples: 963450. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-27 11:46:15,081][00394] Avg episode reward: [(0, '4.930')] [2023-02-27 11:46:20,076][00394] Fps is (10 sec: 2868.9, 60 sec: 3481.6, 300 sec: 3485.1). Total num frames: 7884800. Throughput: 0: 870.1. Samples: 967558. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-27 11:46:20,083][00394] Avg episode reward: [(0, '5.040')] [2023-02-27 11:46:24,585][28720] Updated weights for policy 0, policy_version 1930 (0.0019) [2023-02-27 11:46:25,077][00394] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3485.1). Total num frames: 7905280. Throughput: 0: 818.7. Samples: 970844. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-27 11:46:25,082][00394] Avg episode reward: [(0, '4.886')] [2023-02-27 11:46:30,082][00394] Fps is (10 sec: 4093.9, 60 sec: 3549.7, 300 sec: 3498.9). Total num frames: 7925760. Throughput: 0: 888.8. Samples: 977330. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-27 11:46:30,086][00394] Avg episode reward: [(0, '4.829')] [2023-02-27 11:46:35,078][00394] Fps is (10 sec: 3276.4, 60 sec: 3549.8, 300 sec: 3498.9). Total num frames: 7938048. Throughput: 0: 867.2. Samples: 981752. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-27 11:46:35,081][00394] Avg episode reward: [(0, '4.901')] [2023-02-27 11:46:36,906][28720] Updated weights for policy 0, policy_version 1940 (0.0017) [2023-02-27 11:46:40,077][00394] Fps is (10 sec: 2868.4, 60 sec: 3481.5, 300 sec: 3485.1). Total num frames: 7954432. Throughput: 0: 913.0. Samples: 985932. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-27 11:46:40,089][00394] Avg episode reward: [(0, '5.057')] [2023-02-27 11:46:45,076][00394] Fps is (10 sec: 3686.9, 60 sec: 3481.6, 300 sec: 3485.1). Total num frames: 7974912. Throughput: 0: 888.9. Samples: 989106. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 11:46:45,086][00394] Avg episode reward: [(0, '5.047')] [2023-02-27 11:46:47,432][28720] Updated weights for policy 0, policy_version 1950 (0.0015) [2023-02-27 11:46:50,076][00394] Fps is (10 sec: 4096.4, 60 sec: 3550.1, 300 sec: 3499.0). Total num frames: 7995392. Throughput: 0: 887.5. Samples: 995714. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 11:46:50,081][00394] Avg episode reward: [(0, '4.649')] [2023-02-27 11:46:50,158][28704] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001953_7999488.pth... [2023-02-27 11:46:50,460][28704] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001747_7155712.pth [2023-02-27 11:46:53,432][28704] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001955_8007680.pth... [2023-02-27 11:46:53,443][00394] Component Batcher_0 stopped! [2023-02-27 11:46:53,442][28704] Stopping Batcher_0... [2023-02-27 11:46:53,454][28704] Loop batcher_evt_loop terminating... [2023-02-27 11:46:53,557][28720] Weights refcount: 2 0 [2023-02-27 11:46:53,611][00394] Component InferenceWorker_p0-w0 stopped! [2023-02-27 11:46:53,617][28720] Stopping InferenceWorker_p0-w0... [2023-02-27 11:46:53,618][28720] Loop inference_proc0-0_evt_loop terminating... [2023-02-27 11:46:53,639][28704] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001849_7573504.pth [2023-02-27 11:46:53,648][28704] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001955_8007680.pth... [2023-02-27 11:46:53,860][00394] Component LearnerWorker_p0 stopped! [2023-02-27 11:46:53,869][00394] Component RolloutWorker_w4 stopped! [2023-02-27 11:46:53,888][00394] Component RolloutWorker_w0 stopped! [2023-02-27 11:46:53,868][28733] Stopping RolloutWorker_w4... [2023-02-27 11:46:53,887][28718] Stopping RolloutWorker_w0... [2023-02-27 11:46:53,892][28728] Stopping RolloutWorker_w2... [2023-02-27 11:46:53,895][28728] Loop rollout_proc2_evt_loop terminating... [2023-02-27 11:46:53,891][28733] Loop rollout_proc4_evt_loop terminating... [2023-02-27 11:46:53,891][28704] Stopping LearnerWorker_p0... [2023-02-27 11:46:53,896][28704] Loop learner_proc0_evt_loop terminating... [2023-02-27 11:46:53,894][28718] Loop rollout_proc0_evt_loop terminating... [2023-02-27 11:46:53,898][28741] Stopping RolloutWorker_w6... [2023-02-27 11:46:53,899][28741] Loop rollout_proc6_evt_loop terminating... [2023-02-27 11:46:53,893][00394] Component RolloutWorker_w2 stopped! [2023-02-27 11:46:53,900][00394] Component RolloutWorker_w6 stopped! [2023-02-27 11:46:53,923][00394] Component RolloutWorker_w7 stopped! [2023-02-27 11:46:53,927][28735] Stopping RolloutWorker_w7... [2023-02-27 11:46:53,928][28735] Loop rollout_proc7_evt_loop terminating... [2023-02-27 11:46:53,943][00394] Component RolloutWorker_w5 stopped! [2023-02-27 11:46:53,950][28731] Stopping RolloutWorker_w5... [2023-02-27 11:46:53,950][28731] Loop rollout_proc5_evt_loop terminating... [2023-02-27 11:46:53,963][00394] Component RolloutWorker_w1 stopped! [2023-02-27 11:46:53,964][28719] Stopping RolloutWorker_w1... [2023-02-27 11:46:53,965][28719] Loop rollout_proc1_evt_loop terminating... [2023-02-27 11:46:53,993][00394] Component RolloutWorker_w3 stopped! [2023-02-27 11:46:53,996][00394] Waiting for process learner_proc0 to stop... [2023-02-27 11:46:54,000][28723] Stopping RolloutWorker_w3... [2023-02-27 11:46:54,027][28723] Loop rollout_proc3_evt_loop terminating... [2023-02-27 11:46:58,284][00394] Waiting for process inference_proc0-0 to join... [2023-02-27 11:46:58,286][00394] Waiting for process rollout_proc0 to join... [2023-02-27 11:46:58,288][00394] Waiting for process rollout_proc1 to join... [2023-02-27 11:46:58,293][00394] Waiting for process rollout_proc2 to join... [2023-02-27 11:46:58,294][00394] Waiting for process rollout_proc3 to join... [2023-02-27 11:46:58,295][00394] Waiting for process rollout_proc4 to join... [2023-02-27 11:46:58,296][00394] Waiting for process rollout_proc5 to join... [2023-02-27 11:46:58,298][00394] Waiting for process rollout_proc6 to join... [2023-02-27 11:46:58,299][00394] Waiting for process rollout_proc7 to join... [2023-02-27 11:46:58,302][00394] Batcher 0 profile tree view: batching: 29.2540, releasing_batches: 0.0524 [2023-02-27 11:46:58,312][00394] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0001 wait_policy_total: 547.5281 update_model: 8.4902 weight_update: 0.0014 one_step: 0.0234 handle_policy_step: 570.4642 deserialize: 17.2066, stack: 3.6083, obs_to_device_normalize: 126.4224, forward: 275.2417, send_messages: 29.7232 prepare_outputs: 88.1017 to_cpu: 54.0219 [2023-02-27 11:46:58,315][00394] Learner 0 profile tree view: misc: 0.0055, prepare_batch: 19.1094 train: 85.8167 epoch_init: 0.0118, minibatch_init: 0.0064, losses_postprocess: 0.5940, kl_divergence: 0.6677, after_optimizer: 3.9929 calculate_losses: 29.3184 losses_init: 0.0035, forward_head: 2.1917, bptt_initial: 18.8199, tail: 1.3381, advantages_returns: 0.3063, losses: 3.6391 bptt: 2.6715 bptt_forward_core: 2.5638 update: 50.4395 clip: 1.6576 [2023-02-27 11:46:58,317][00394] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.4058, enqueue_policy_requests: 143.4978, env_step: 871.1819, overhead: 29.5023, complete_rollouts: 7.7036 save_policy_outputs: 25.6963 split_output_tensors: 12.4900 [2023-02-27 11:46:58,319][00394] RolloutWorker_w7 profile tree view: wait_for_trajectories: 0.4221, enqueue_policy_requests: 166.1125, env_step: 847.2251, overhead: 29.6019, complete_rollouts: 6.4290 save_policy_outputs: 25.4015 split_output_tensors: 12.0902 [2023-02-27 11:46:58,326][00394] Loop Runner_EvtLoop terminating... [2023-02-27 11:46:58,328][00394] Runner profile tree view: main_loop: 1204.4236 [2023-02-27 11:46:58,331][00394] Collected {0: 8007680}, FPS: 3315.8 [2023-02-27 11:46:58,539][00394] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2023-02-27 11:46:58,541][00394] Overriding arg 'num_workers' with value 1 passed from command line [2023-02-27 11:46:58,544][00394] Adding new argument 'no_render'=True that is not in the saved config file! [2023-02-27 11:46:58,546][00394] Adding new argument 'save_video'=True that is not in the saved config file! [2023-02-27 11:46:58,548][00394] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2023-02-27 11:46:58,550][00394] Adding new argument 'video_name'=None that is not in the saved config file! [2023-02-27 11:46:58,552][00394] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! [2023-02-27 11:46:58,554][00394] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2023-02-27 11:46:58,555][00394] Adding new argument 'push_to_hub'=False that is not in the saved config file! [2023-02-27 11:46:58,557][00394] Adding new argument 'hf_repository'=None that is not in the saved config file! [2023-02-27 11:46:58,558][00394] Adding new argument 'policy_index'=0 that is not in the saved config file! [2023-02-27 11:46:58,560][00394] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2023-02-27 11:46:58,563][00394] Adding new argument 'train_script'=None that is not in the saved config file! [2023-02-27 11:46:58,564][00394] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2023-02-27 11:46:58,566][00394] Using frameskip 1 and render_action_repeat=4 for evaluation [2023-02-27 11:46:58,611][00394] RunningMeanStd input shape: (3, 72, 128) [2023-02-27 11:46:58,622][00394] RunningMeanStd input shape: (1,) [2023-02-27 11:46:58,645][00394] ConvEncoder: input_channels=3 [2023-02-27 11:46:58,843][00394] Conv encoder output size: 512 [2023-02-27 11:46:58,845][00394] Policy head output size: 512 [2023-02-27 11:46:58,974][00394] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001955_8007680.pth... [2023-02-27 11:47:00,352][00394] Num frames 100... [2023-02-27 11:47:00,468][00394] Num frames 200... [2023-02-27 11:47:00,601][00394] Num frames 300... [2023-02-27 11:47:00,759][00394] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840 [2023-02-27 11:47:00,762][00394] Avg episode reward: 3.840, avg true_objective: 3.840 [2023-02-27 11:47:00,787][00394] Num frames 400... [2023-02-27 11:47:00,901][00394] Num frames 500... [2023-02-27 11:47:01,019][00394] Num frames 600... [2023-02-27 11:47:01,130][00394] Num frames 700... [2023-02-27 11:47:01,251][00394] Avg episode rewards: #0: 4.295, true rewards: #0: 3.795 [2023-02-27 11:47:01,252][00394] Avg episode reward: 4.295, avg true_objective: 3.795 [2023-02-27 11:47:01,303][00394] Num frames 800... [2023-02-27 11:47:01,419][00394] Num frames 900... [2023-02-27 11:47:01,535][00394] Num frames 1000... [2023-02-27 11:47:01,647][00394] Num frames 1100... [2023-02-27 11:47:01,768][00394] Num frames 1200... [2023-02-27 11:47:01,834][00394] Avg episode rewards: #0: 4.690, true rewards: #0: 4.023 [2023-02-27 11:47:01,835][00394] Avg episode reward: 4.690, avg true_objective: 4.023 [2023-02-27 11:47:01,943][00394] Num frames 1300... [2023-02-27 11:47:02,058][00394] Num frames 1400... [2023-02-27 11:47:02,175][00394] Num frames 1500... [2023-02-27 11:47:02,291][00394] Num frames 1600... [2023-02-27 11:47:02,419][00394] Avg episode rewards: #0: 4.888, true rewards: #0: 4.137 [2023-02-27 11:47:02,421][00394] Avg episode reward: 4.888, avg true_objective: 4.137 [2023-02-27 11:47:02,474][00394] Num frames 1700... [2023-02-27 11:47:02,593][00394] Num frames 1800... [2023-02-27 11:47:02,705][00394] Num frames 1900... [2023-02-27 11:47:02,831][00394] Num frames 2000... [2023-02-27 11:47:02,930][00394] Avg episode rewards: #0: 4.678, true rewards: #0: 4.078 [2023-02-27 11:47:02,932][00394] Avg episode reward: 4.678, avg true_objective: 4.078 [2023-02-27 11:47:03,014][00394] Num frames 2100... [2023-02-27 11:47:03,128][00394] Num frames 2200... [2023-02-27 11:47:03,251][00394] Num frames 2300... [2023-02-27 11:47:03,371][00394] Num frames 2400... [2023-02-27 11:47:03,492][00394] Num frames 2500... [2023-02-27 11:47:03,642][00394] Avg episode rewards: #0: 5.138, true rewards: #0: 4.305 [2023-02-27 11:47:03,644][00394] Avg episode reward: 5.138, avg true_objective: 4.305 [2023-02-27 11:47:03,675][00394] Num frames 2600... [2023-02-27 11:47:03,797][00394] Num frames 2700... [2023-02-27 11:47:03,924][00394] Num frames 2800... [2023-02-27 11:47:04,043][00394] Num frames 2900... [2023-02-27 11:47:04,162][00394] Num frames 3000... [2023-02-27 11:47:04,292][00394] Num frames 3100... [2023-02-27 11:47:04,381][00394] Avg episode rewards: #0: 5.467, true rewards: #0: 4.467 [2023-02-27 11:47:04,383][00394] Avg episode reward: 5.467, avg true_objective: 4.467 [2023-02-27 11:47:04,471][00394] Num frames 3200... [2023-02-27 11:47:04,586][00394] Num frames 3300... [2023-02-27 11:47:04,702][00394] Num frames 3400... [2023-02-27 11:47:04,820][00394] Num frames 3500... [2023-02-27 11:47:04,960][00394] Avg episode rewards: #0: 5.469, true rewards: #0: 4.469 [2023-02-27 11:47:04,962][00394] Avg episode reward: 5.469, avg true_objective: 4.469 [2023-02-27 11:47:04,995][00394] Num frames 3600... [2023-02-27 11:47:05,108][00394] Num frames 3700... [2023-02-27 11:47:05,228][00394] Num frames 3800... [2023-02-27 11:47:05,339][00394] Num frames 3900... [2023-02-27 11:47:05,462][00394] Num frames 4000... [2023-02-27 11:47:05,614][00394] Avg episode rewards: #0: 5.652, true rewards: #0: 4.541 [2023-02-27 11:47:05,615][00394] Avg episode reward: 5.652, avg true_objective: 4.541 [2023-02-27 11:47:05,635][00394] Num frames 4100... [2023-02-27 11:47:05,752][00394] Num frames 4200... [2023-02-27 11:47:05,882][00394] Num frames 4300... [2023-02-27 11:47:05,996][00394] Num frames 4400... [2023-02-27 11:47:06,122][00394] Num frames 4500... [2023-02-27 11:47:06,220][00394] Avg episode rewards: #0: 5.635, true rewards: #0: 4.535 [2023-02-27 11:47:06,221][00394] Avg episode reward: 5.635, avg true_objective: 4.535 [2023-02-27 11:47:31,462][00394] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2023-02-27 11:47:31,608][00394] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2023-02-27 11:47:31,610][00394] Overriding arg 'num_workers' with value 1 passed from command line [2023-02-27 11:47:31,613][00394] Adding new argument 'no_render'=True that is not in the saved config file! [2023-02-27 11:47:31,615][00394] Adding new argument 'save_video'=True that is not in the saved config file! [2023-02-27 11:47:31,617][00394] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2023-02-27 11:47:31,618][00394] Adding new argument 'video_name'=None that is not in the saved config file! [2023-02-27 11:47:31,620][00394] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2023-02-27 11:47:31,621][00394] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2023-02-27 11:47:31,622][00394] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2023-02-27 11:47:31,624][00394] Adding new argument 'hf_repository'='Clawoo/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2023-02-27 11:47:31,625][00394] Adding new argument 'policy_index'=0 that is not in the saved config file! [2023-02-27 11:47:31,626][00394] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2023-02-27 11:47:31,628][00394] Adding new argument 'train_script'=None that is not in the saved config file! [2023-02-27 11:47:31,629][00394] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2023-02-27 11:47:31,630][00394] Using frameskip 1 and render_action_repeat=4 for evaluation [2023-02-27 11:47:31,655][00394] RunningMeanStd input shape: (3, 72, 128) [2023-02-27 11:47:31,657][00394] RunningMeanStd input shape: (1,) [2023-02-27 11:47:31,674][00394] ConvEncoder: input_channels=3 [2023-02-27 11:47:31,735][00394] Conv encoder output size: 512 [2023-02-27 11:47:31,739][00394] Policy head output size: 512 [2023-02-27 11:47:31,766][00394] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001955_8007680.pth... [2023-02-27 11:47:32,495][00394] Num frames 100... [2023-02-27 11:47:32,668][00394] Num frames 200... [2023-02-27 11:47:32,812][00394] Avg episode rewards: #0: 2.560, true rewards: #0: 2.560 [2023-02-27 11:47:32,815][00394] Avg episode reward: 2.560, avg true_objective: 2.560 [2023-02-27 11:47:32,891][00394] Num frames 300... [2023-02-27 11:47:33,042][00394] Num frames 400... [2023-02-27 11:47:33,206][00394] Num frames 500... [2023-02-27 11:47:33,364][00394] Num frames 600... [2023-02-27 11:47:33,502][00394] Avg episode rewards: #0: 3.860, true rewards: #0: 3.360 [2023-02-27 11:47:33,504][00394] Avg episode reward: 3.860, avg true_objective: 3.360 [2023-02-27 11:47:33,540][00394] Num frames 700... [2023-02-27 11:47:33,669][00394] Num frames 800... [2023-02-27 11:47:33,793][00394] Num frames 900... [2023-02-27 11:47:33,907][00394] Num frames 1000... [2023-02-27 11:47:34,027][00394] Avg episode rewards: #0: 4.520, true rewards: #0: 3.520 [2023-02-27 11:47:34,034][00394] Avg episode reward: 4.520, avg true_objective: 3.520 [2023-02-27 11:47:34,085][00394] Num frames 1100... [2023-02-27 11:47:34,204][00394] Num frames 1200... [2023-02-27 11:47:34,327][00394] Num frames 1300... [2023-02-27 11:47:34,443][00394] Num frames 1400... [2023-02-27 11:47:34,512][00394] Avg episode rewards: #0: 4.275, true rewards: #0: 3.525 [2023-02-27 11:47:34,513][00394] Avg episode reward: 4.275, avg true_objective: 3.525 [2023-02-27 11:47:34,624][00394] Num frames 1500... [2023-02-27 11:47:34,739][00394] Num frames 1600... [2023-02-27 11:47:34,868][00394] Num frames 1700... [2023-02-27 11:47:35,036][00394] Avg episode rewards: #0: 4.188, true rewards: #0: 3.588 [2023-02-27 11:47:35,037][00394] Avg episode reward: 4.188, avg true_objective: 3.588 [2023-02-27 11:47:35,049][00394] Num frames 1800... [2023-02-27 11:47:35,168][00394] Num frames 1900... [2023-02-27 11:47:35,283][00394] Num frames 2000... [2023-02-27 11:47:35,398][00394] Num frames 2100... [2023-02-27 11:47:35,541][00394] Avg episode rewards: #0: 4.130, true rewards: #0: 3.630 [2023-02-27 11:47:35,543][00394] Avg episode reward: 4.130, avg true_objective: 3.630 [2023-02-27 11:47:35,572][00394] Num frames 2200... [2023-02-27 11:47:35,696][00394] Num frames 2300... [2023-02-27 11:47:35,815][00394] Num frames 2400... [2023-02-27 11:47:35,929][00394] Num frames 2500... [2023-02-27 11:47:36,062][00394] Avg episode rewards: #0: 4.089, true rewards: #0: 3.660 [2023-02-27 11:47:36,065][00394] Avg episode reward: 4.089, avg true_objective: 3.660 [2023-02-27 11:47:36,115][00394] Num frames 2600... [2023-02-27 11:47:36,230][00394] Num frames 2700... [2023-02-27 11:47:36,347][00394] Num frames 2800... [2023-02-27 11:47:36,462][00394] Num frames 2900... [2023-02-27 11:47:36,570][00394] Avg episode rewards: #0: 4.058, true rewards: #0: 3.682 [2023-02-27 11:47:36,576][00394] Avg episode reward: 4.058, avg true_objective: 3.682 [2023-02-27 11:47:36,641][00394] Num frames 3000... [2023-02-27 11:47:36,765][00394] Num frames 3100... [2023-02-27 11:47:36,885][00394] Num frames 3200... [2023-02-27 11:47:36,999][00394] Num frames 3300... [2023-02-27 11:47:37,090][00394] Avg episode rewards: #0: 4.033, true rewards: #0: 3.700 [2023-02-27 11:47:37,093][00394] Avg episode reward: 4.033, avg true_objective: 3.700 [2023-02-27 11:47:37,177][00394] Num frames 3400... [2023-02-27 11:47:37,294][00394] Num frames 3500... [2023-02-27 11:47:37,416][00394] Num frames 3600... [2023-02-27 11:47:37,539][00394] Num frames 3700... [2023-02-27 11:47:37,612][00394] Avg episode rewards: #0: 4.014, true rewards: #0: 3.714 [2023-02-27 11:47:37,613][00394] Avg episode reward: 4.014, avg true_objective: 3.714 [2023-02-27 11:47:58,118][00394] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2023-02-27 11:48:01,600][00394] The model has been pushed to https://huggingface.co./Clawoo/rl_course_vizdoom_health_gathering_supreme [2023-02-27 11:49:57,137][00394] Environment doom_basic already registered, overwriting... [2023-02-27 11:49:57,140][00394] Environment doom_two_colors_easy already registered, overwriting... [2023-02-27 11:49:57,142][00394] Environment doom_two_colors_hard already registered, overwriting... [2023-02-27 11:49:57,145][00394] Environment doom_dm already registered, overwriting... [2023-02-27 11:49:57,146][00394] Environment doom_dwango5 already registered, overwriting... [2023-02-27 11:49:57,150][00394] Environment doom_my_way_home_flat_actions already registered, overwriting... [2023-02-27 11:49:57,151][00394] Environment doom_defend_the_center_flat_actions already registered, overwriting... [2023-02-27 11:49:57,152][00394] Environment doom_my_way_home already registered, overwriting... [2023-02-27 11:49:57,154][00394] Environment doom_deadly_corridor already registered, overwriting... [2023-02-27 11:49:57,155][00394] Environment doom_defend_the_center already registered, overwriting... [2023-02-27 11:49:57,157][00394] Environment doom_defend_the_line already registered, overwriting... [2023-02-27 11:49:57,159][00394] Environment doom_health_gathering already registered, overwriting... [2023-02-27 11:49:57,160][00394] Environment doom_health_gathering_supreme already registered, overwriting... [2023-02-27 11:49:57,162][00394] Environment doom_battle already registered, overwriting... [2023-02-27 11:49:57,163][00394] Environment doom_battle2 already registered, overwriting... [2023-02-27 11:49:57,165][00394] Environment doom_duel_bots already registered, overwriting... [2023-02-27 11:49:57,166][00394] Environment doom_deathmatch_bots already registered, overwriting... [2023-02-27 11:49:57,168][00394] Environment doom_duel already registered, overwriting... [2023-02-27 11:49:57,170][00394] Environment doom_deathmatch_full already registered, overwriting... [2023-02-27 11:49:57,171][00394] Environment doom_benchmark already registered, overwriting... [2023-02-27 11:49:57,172][00394] register_encoder_factory: [2023-02-27 11:49:57,208][00394] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2023-02-27 11:49:57,210][00394] Overriding arg 'train_for_env_steps' with value 12000000 passed from command line [2023-02-27 11:49:57,217][00394] Experiment dir /content/train_dir/default_experiment already exists! [2023-02-27 11:49:57,218][00394] Resuming existing experiment from /content/train_dir/default_experiment... [2023-02-27 11:49:57,221][00394] Weights and Biases integration disabled [2023-02-27 11:49:57,229][00394] Environment var CUDA_VISIBLE_DEVICES is 0 [2023-02-27 11:50:00,950][00394] Starting experiment with the following configuration: help=False algo=APPO env=doom_health_gathering_supreme experiment=default_experiment train_dir=/content/train_dir restart_behavior=resume device=gpu seed=None num_policies=1 async_rl=True serial_mode=False batched_sampling=False num_batches_to_accumulate=2 worker_num_splits=2 policy_workers_per_policy=1 max_policy_lag=1000 num_workers=8 num_envs_per_worker=4 batch_size=1024 num_batches_per_epoch=1 num_epochs=1 rollout=32 recurrence=32 shuffle_minibatches=False gamma=0.99 reward_scale=1.0 reward_clip=1000.0 value_bootstrap=False normalize_returns=True exploration_loss_coeff=0.001 value_loss_coeff=0.5 kl_loss_coeff=0.0 exploration_loss=symmetric_kl gae_lambda=0.95 ppo_clip_ratio=0.1 ppo_clip_value=0.2 with_vtrace=False vtrace_rho=1.0 vtrace_c=1.0 optimizer=adam adam_eps=1e-06 adam_beta1=0.9 adam_beta2=0.999 max_grad_norm=4.0 learning_rate=0.0001 lr_schedule=constant lr_schedule_kl_threshold=0.008 lr_adaptive_min=1e-06 lr_adaptive_max=0.01 obs_subtract_mean=0.0 obs_scale=255.0 normalize_input=True normalize_input_keys=None decorrelate_experience_max_seconds=0 decorrelate_envs_on_one_worker=True actor_worker_gpus=[] set_workers_cpu_affinity=True force_envs_single_thread=False default_niceness=0 log_to_file=True experiment_summaries_interval=10 flush_summaries_interval=30 stats_avg=100 summaries_use_frameskip=True heartbeat_interval=20 heartbeat_reporting_interval=600 train_for_env_steps=12000000 train_for_seconds=10000000000 save_every_sec=120 keep_checkpoints=2 load_checkpoint_kind=latest save_milestones_sec=-1 save_best_every_sec=5 save_best_metric=reward save_best_after=100000 benchmark=False encoder_mlp_layers=[512, 512] encoder_conv_architecture=convnet_simple encoder_conv_mlp_layers=[512] use_rnn=True rnn_size=512 rnn_type=gru rnn_num_layers=1 decoder_mlp_layers=[] nonlinearity=elu policy_initialization=orthogonal policy_init_gain=1.0 actor_critic_share_weights=True adaptive_stddev=True continuous_tanh_scale=0.0 initial_stddev=1.0 use_env_info_cache=False env_gpu_actions=False env_gpu_observations=True env_frameskip=4 env_framestack=1 pixel_format=CHW use_record_episode_statistics=False with_wandb=False wandb_user=None wandb_project=sample_factory wandb_group=None wandb_job_type=SF wandb_tags=[] with_pbt=False pbt_mix_policies_in_one_env=True pbt_period_env_steps=5000000 pbt_start_mutation=20000000 pbt_replace_fraction=0.3 pbt_mutation_rate=0.15 pbt_replace_reward_gap=0.1 pbt_replace_reward_gap_absolute=1e-06 pbt_optimize_gamma=False pbt_target_objective=true_objective pbt_perturb_min=1.1 pbt_perturb_max=1.5 num_agents=-1 num_humans=0 num_bots=-1 start_bot_difficulty=None timelimit=None res_w=128 res_h=72 wide_aspect_ratio=False eval_env_frameskip=1 fps=35 command_line=--env=doom_health_gathering_supreme --num_workers=8 --num_envs_per_worker=4 --train_for_env_steps=4000000 cli_args={'env': 'doom_health_gathering_supreme', 'num_workers': 8, 'num_envs_per_worker': 4, 'train_for_env_steps': 4000000} git_hash=unknown git_repo_name=not a git repository [2023-02-27 11:50:00,956][00394] Saving configuration to /content/train_dir/default_experiment/config.json... [2023-02-27 11:50:00,961][00394] Rollout worker 0 uses device cpu [2023-02-27 11:50:00,963][00394] Rollout worker 1 uses device cpu [2023-02-27 11:50:00,966][00394] Rollout worker 2 uses device cpu [2023-02-27 11:50:00,967][00394] Rollout worker 3 uses device cpu [2023-02-27 11:50:00,968][00394] Rollout worker 4 uses device cpu [2023-02-27 11:50:00,971][00394] Rollout worker 5 uses device cpu [2023-02-27 11:50:00,973][00394] Rollout worker 6 uses device cpu [2023-02-27 11:50:00,974][00394] Rollout worker 7 uses device cpu [2023-02-27 11:50:01,132][00394] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-27 11:50:01,133][00394] InferenceWorker_p0-w0: min num requests: 2 [2023-02-27 11:50:01,175][00394] Starting all processes... [2023-02-27 11:50:01,177][00394] Starting process learner_proc0 [2023-02-27 11:50:01,386][00394] Starting all processes... [2023-02-27 11:50:01,398][00394] Starting process inference_proc0-0 [2023-02-27 11:50:01,399][00394] Starting process rollout_proc0 [2023-02-27 11:50:01,399][00394] Starting process rollout_proc1 [2023-02-27 11:50:01,422][00394] Starting process rollout_proc2 [2023-02-27 11:50:01,422][00394] Starting process rollout_proc3 [2023-02-27 11:50:01,422][00394] Starting process rollout_proc4 [2023-02-27 11:50:01,422][00394] Starting process rollout_proc5 [2023-02-27 11:50:01,422][00394] Starting process rollout_proc6 [2023-02-27 11:50:01,422][00394] Starting process rollout_proc7 [2023-02-27 11:50:10,855][37536] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-27 11:50:10,862][37536] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2023-02-27 11:50:10,936][37536] Num visible devices: 1 [2023-02-27 11:50:10,962][37536] Starting seed is not provided [2023-02-27 11:50:10,963][37536] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-27 11:50:10,963][37536] Initializing actor-critic model on device cuda:0 [2023-02-27 11:50:10,964][37536] RunningMeanStd input shape: (3, 72, 128) [2023-02-27 11:50:10,976][37536] RunningMeanStd input shape: (1,) [2023-02-27 11:50:11,097][37536] ConvEncoder: input_channels=3 [2023-02-27 11:50:11,686][37554] Worker 0 uses CPU cores [0] [2023-02-27 11:50:11,860][37536] Conv encoder output size: 512 [2023-02-27 11:50:11,872][37536] Policy head output size: 512 [2023-02-27 11:50:11,970][37536] Created Actor Critic model with architecture: [2023-02-27 11:50:11,982][37536] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2023-02-27 11:50:12,372][37555] Worker 1 uses CPU cores [1] [2023-02-27 11:50:12,827][37566] Worker 2 uses CPU cores [0] [2023-02-27 11:50:13,010][37564] Worker 4 uses CPU cores [0] [2023-02-27 11:50:13,112][37556] Worker 3 uses CPU cores [1] [2023-02-27 11:50:13,130][37558] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-27 11:50:13,134][37558] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2023-02-27 11:50:13,195][37558] Num visible devices: 1 [2023-02-27 11:50:13,342][37569] Worker 5 uses CPU cores [1] [2023-02-27 11:50:13,518][37577] Worker 6 uses CPU cores [0] [2023-02-27 11:50:13,569][37576] Worker 7 uses CPU cores [1] [2023-02-27 11:50:20,445][37536] Using optimizer [2023-02-27 11:50:20,446][37536] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001955_8007680.pth... [2023-02-27 11:50:20,483][37536] Loading model from checkpoint [2023-02-27 11:50:20,488][37536] Loaded experiment state at self.train_step=1955, self.env_steps=8007680 [2023-02-27 11:50:20,489][37536] Initialized policy 0 weights for model version 1955 [2023-02-27 11:50:20,493][37536] LearnerWorker_p0 finished initialization! [2023-02-27 11:50:20,494][37536] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-27 11:50:20,693][37558] RunningMeanStd input shape: (3, 72, 128) [2023-02-27 11:50:20,694][37558] RunningMeanStd input shape: (1,) [2023-02-27 11:50:20,706][37558] ConvEncoder: input_channels=3 [2023-02-27 11:50:20,815][37558] Conv encoder output size: 512 [2023-02-27 11:50:20,816][37558] Policy head output size: 512 [2023-02-27 11:50:21,121][00394] Heartbeat connected on Batcher_0 [2023-02-27 11:50:21,129][00394] Heartbeat connected on LearnerWorker_p0 [2023-02-27 11:50:21,142][00394] Heartbeat connected on RolloutWorker_w0 [2023-02-27 11:50:21,147][00394] Heartbeat connected on RolloutWorker_w1 [2023-02-27 11:50:21,151][00394] Heartbeat connected on RolloutWorker_w2 [2023-02-27 11:50:21,156][00394] Heartbeat connected on RolloutWorker_w3 [2023-02-27 11:50:21,160][00394] Heartbeat connected on RolloutWorker_w4 [2023-02-27 11:50:21,165][00394] Heartbeat connected on RolloutWorker_w5 [2023-02-27 11:50:21,170][00394] Heartbeat connected on RolloutWorker_w6 [2023-02-27 11:50:21,175][00394] Heartbeat connected on RolloutWorker_w7 [2023-02-27 11:50:22,230][00394] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 8007680. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-02-27 11:50:23,256][00394] Inference worker 0-0 is ready! [2023-02-27 11:50:23,258][00394] All inference workers are ready! Signal rollout workers to start! [2023-02-27 11:50:23,262][00394] Heartbeat connected on InferenceWorker_p0-w0 [2023-02-27 11:50:23,404][37576] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-27 11:50:23,401][37566] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-27 11:50:23,407][37555] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-27 11:50:23,408][37554] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-27 11:50:23,421][37564] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-27 11:50:23,425][37577] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-27 11:50:23,431][37556] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-27 11:50:23,429][37569] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-27 11:50:24,404][37555] Decorrelating experience for 0 frames... [2023-02-27 11:50:24,405][37576] Decorrelating experience for 0 frames... [2023-02-27 11:50:24,754][37566] Decorrelating experience for 0 frames... [2023-02-27 11:50:24,756][37564] Decorrelating experience for 0 frames... [2023-02-27 11:50:24,761][37554] Decorrelating experience for 0 frames... [2023-02-27 11:50:25,350][37576] Decorrelating experience for 32 frames... [2023-02-27 11:50:25,365][37569] Decorrelating experience for 0 frames... [2023-02-27 11:50:25,601][37555] Decorrelating experience for 32 frames... [2023-02-27 11:50:25,967][37564] Decorrelating experience for 32 frames... [2023-02-27 11:50:25,979][37554] Decorrelating experience for 32 frames... [2023-02-27 11:50:25,985][37566] Decorrelating experience for 32 frames... [2023-02-27 11:50:26,201][37555] Decorrelating experience for 64 frames... [2023-02-27 11:50:26,736][37577] Decorrelating experience for 0 frames... [2023-02-27 11:50:27,153][37564] Decorrelating experience for 64 frames... [2023-02-27 11:50:27,192][37554] Decorrelating experience for 64 frames... [2023-02-27 11:50:27,230][00394] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 8007680. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-02-27 11:50:27,248][37569] Decorrelating experience for 32 frames... [2023-02-27 11:50:27,266][37556] Decorrelating experience for 0 frames... [2023-02-27 11:50:27,708][37555] Decorrelating experience for 96 frames... [2023-02-27 11:50:27,997][37577] Decorrelating experience for 32 frames... [2023-02-27 11:50:28,085][37566] Decorrelating experience for 64 frames... [2023-02-27 11:50:28,540][37576] Decorrelating experience for 64 frames... [2023-02-27 11:50:28,959][37554] Decorrelating experience for 96 frames... [2023-02-27 11:50:29,157][37577] Decorrelating experience for 64 frames... [2023-02-27 11:50:29,847][37564] Decorrelating experience for 96 frames... [2023-02-27 11:50:30,018][37577] Decorrelating experience for 96 frames... [2023-02-27 11:50:30,086][37556] Decorrelating experience for 32 frames... [2023-02-27 11:50:30,379][37569] Decorrelating experience for 64 frames... [2023-02-27 11:50:30,835][37566] Decorrelating experience for 96 frames... [2023-02-27 11:50:31,622][37576] Decorrelating experience for 96 frames... [2023-02-27 11:50:32,098][37556] Decorrelating experience for 64 frames... [2023-02-27 11:50:32,188][37569] Decorrelating experience for 96 frames... [2023-02-27 11:50:32,231][00394] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 8007680. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-02-27 11:50:32,795][37556] Decorrelating experience for 96 frames... [2023-02-27 11:50:37,230][00394] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 8007680. Throughput: 0: 65.1. Samples: 976. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-02-27 11:50:37,238][00394] Avg episode reward: [(0, '2.115')] [2023-02-27 11:50:38,312][37536] Signal inference workers to stop experience collection... [2023-02-27 11:50:38,355][37558] InferenceWorker_p0-w0: stopping experience collection [2023-02-27 11:50:42,231][00394] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 8007680. Throughput: 0: 130.7. Samples: 2614. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-02-27 11:50:42,312][00394] Avg episode reward: [(0, '2.333')] [2023-02-27 11:50:44,270][37536] Signal inference workers to resume experience collection... [2023-02-27 11:50:44,292][37558] InferenceWorker_p0-w0: resuming experience collection [2023-02-27 11:50:47,274][00394] Fps is (10 sec: 409.5, 60 sec: 163.8, 300 sec: 163.8). Total num frames: 8011776. Throughput: 0: 104.6. Samples: 2614. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-02-27 11:50:47,444][00394] Avg episode reward: [(0, '3.152')] [2023-02-27 11:50:52,266][00394] Fps is (10 sec: 819.2, 60 sec: 273.1, 300 sec: 273.1). Total num frames: 8015872. Throughput: 0: 120.7. Samples: 3620. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-02-27 11:50:52,470][00394] Avg episode reward: [(0, '3.152')] [2023-02-27 11:50:57,240][00394] Fps is (10 sec: 1228.2, 60 sec: 468.0, 300 sec: 468.0). Total num frames: 8024064. Throughput: 0: 131.6. Samples: 4608. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2023-02-27 11:50:57,301][00394] Avg episode reward: [(0, '3.550')] [2023-02-27 11:51:02,232][00394] Fps is (10 sec: 1638.3, 60 sec: 614.4, 300 sec: 614.4). Total num frames: 8032256. Throughput: 0: 148.6. Samples: 5946. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2023-02-27 11:51:02,275][00394] Avg episode reward: [(0, '4.166')] [2023-02-27 11:51:06,448][37558] Updated weights for policy 0, policy_version 1965 (0.0128) [2023-02-27 11:51:07,230][00394] Fps is (10 sec: 2459.2, 60 sec: 910.2, 300 sec: 910.2). Total num frames: 8048640. Throughput: 0: 212.8. Samples: 9578. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-02-27 11:51:07,253][00394] Avg episode reward: [(0, '4.601')] [2023-02-27 11:51:12,230][00394] Fps is (10 sec: 4096.4, 60 sec: 1310.7, 300 sec: 1310.7). Total num frames: 8073216. Throughput: 0: 358.1. Samples: 16114. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-27 11:51:12,232][00394] Avg episode reward: [(0, '5.238')] [2023-02-27 11:51:17,235][00394] Fps is (10 sec: 3686.4, 60 sec: 1415.0, 300 sec: 1415.0). Total num frames: 8085504. Throughput: 0: 429.0. Samples: 19304. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-27 11:51:17,408][00394] Avg episode reward: [(0, '5.073')] [2023-02-27 11:51:18,826][37558] Updated weights for policy 0, policy_version 1975 (0.0037) [2023-02-27 11:51:22,234][00394] Fps is (10 sec: 2457.6, 60 sec: 1501.9, 300 sec: 1501.9). Total num frames: 8097792. Throughput: 0: 482.6. Samples: 22694. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 11:51:22,329][00394] Avg episode reward: [(0, '5.126')] [2023-02-27 11:51:27,250][00394] Fps is (10 sec: 2047.5, 60 sec: 1638.3, 300 sec: 1512.3). Total num frames: 8105984. Throughput: 0: 518.4. Samples: 25942. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2023-02-27 11:51:27,394][00394] Avg episode reward: [(0, '4.843')] [2023-02-27 11:51:32,232][00394] Fps is (10 sec: 2048.0, 60 sec: 1843.2, 300 sec: 1579.9). Total num frames: 8118272. Throughput: 0: 553.7. Samples: 27530. Policy #0 lag: (min: 0.0, avg: 0.8, max: 1.0) [2023-02-27 11:51:32,300][00394] Avg episode reward: [(0, '4.578')] [2023-02-27 11:51:34,950][37558] Updated weights for policy 0, policy_version 1985 (0.0178) [2023-02-27 11:51:37,230][00394] Fps is (10 sec: 3277.6, 60 sec: 2184.5, 300 sec: 1747.6). Total num frames: 8138752. Throughput: 0: 618.9. Samples: 31472. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 11:51:37,252][00394] Avg episode reward: [(0, '4.323')] [2023-02-27 11:51:42,230][00394] Fps is (10 sec: 4096.1, 60 sec: 2525.9, 300 sec: 1894.4). Total num frames: 8159232. Throughput: 0: 739.6. Samples: 37884. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-27 11:51:42,232][00394] Avg episode reward: [(0, '4.572')] [2023-02-27 11:51:47,233][00394] Fps is (10 sec: 2866.2, 60 sec: 2594.1, 300 sec: 1879.3). Total num frames: 8167424. Throughput: 0: 774.5. Samples: 40800. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-27 11:51:47,345][00394] Avg episode reward: [(0, '4.701')] [2023-02-27 11:51:47,383][37558] Updated weights for policy 0, policy_version 1995 (0.0046) [2023-02-27 11:51:52,242][00394] Fps is (10 sec: 2047.9, 60 sec: 2730.7, 300 sec: 1911.5). Total num frames: 8179712. Throughput: 0: 761.4. Samples: 43842. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 11:51:52,411][00394] Avg episode reward: [(0, '4.798')] [2023-02-27 11:51:57,235][00394] Fps is (10 sec: 2458.3, 60 sec: 2799.2, 300 sec: 1940.2). Total num frames: 8192000. Throughput: 0: 687.5. Samples: 47052. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2023-02-27 11:51:57,341][00394] Avg episode reward: [(0, '4.876')] [2023-02-27 11:51:58,806][37536] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002001_8196096.pth... [2023-02-27 11:51:59,372][37536] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001953_7999488.pth [2023-02-27 11:52:02,230][00394] Fps is (10 sec: 2867.3, 60 sec: 2935.5, 300 sec: 2007.0). Total num frames: 8208384. Throughput: 0: 652.8. Samples: 48678. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2023-02-27 11:52:02,257][00394] Avg episode reward: [(0, '4.689')] [2023-02-27 11:52:03,041][37558] Updated weights for policy 0, policy_version 2005 (0.0133) [2023-02-27 11:52:07,230][00394] Fps is (10 sec: 3686.5, 60 sec: 3003.7, 300 sec: 2106.5). Total num frames: 8228864. Throughput: 0: 698.0. Samples: 54106. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 11:52:07,239][00394] Avg episode reward: [(0, '4.837')] [2023-02-27 11:52:12,230][00394] Fps is (10 sec: 3686.4, 60 sec: 2867.2, 300 sec: 2159.7). Total num frames: 8245248. Throughput: 0: 764.6. Samples: 60346. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-27 11:52:12,249][00394] Avg episode reward: [(0, '4.879')] [2023-02-27 11:52:16,160][37558] Updated weights for policy 0, policy_version 2015 (0.0046) [2023-02-27 11:52:17,233][00394] Fps is (10 sec: 2457.7, 60 sec: 2798.9, 300 sec: 2137.0). Total num frames: 8253440. Throughput: 0: 764.9. Samples: 61952. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 11:52:17,329][00394] Avg episode reward: [(0, '4.743')] [2023-02-27 11:52:22,237][00394] Fps is (10 sec: 2048.0, 60 sec: 2798.9, 300 sec: 2150.4). Total num frames: 8265728. Throughput: 0: 743.7. Samples: 64938. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) [2023-02-27 11:52:22,350][00394] Avg episode reward: [(0, '4.757')] [2023-02-27 11:52:27,252][00394] Fps is (10 sec: 2457.1, 60 sec: 2867.2, 300 sec: 2162.7). Total num frames: 8278016. Throughput: 0: 680.2. Samples: 68496. Policy #0 lag: (min: 0.0, avg: 1.4, max: 2.0) [2023-02-27 11:52:27,389][00394] Avg episode reward: [(0, '4.860')] [2023-02-27 11:52:30,626][37558] Updated weights for policy 0, policy_version 2025 (0.0181) [2023-02-27 11:52:32,230][00394] Fps is (10 sec: 3276.8, 60 sec: 3003.7, 300 sec: 2237.0). Total num frames: 8298496. Throughput: 0: 665.0. Samples: 70724. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2023-02-27 11:52:32,250][00394] Avg episode reward: [(0, '5.047')] [2023-02-27 11:52:37,230][00394] Fps is (10 sec: 4096.8, 60 sec: 3003.7, 300 sec: 2305.9). Total num frames: 8318976. Throughput: 0: 736.5. Samples: 76984. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 11:52:37,238][00394] Avg episode reward: [(0, '5.365')] [2023-02-27 11:52:42,230][00394] Fps is (10 sec: 2867.2, 60 sec: 2798.9, 300 sec: 2282.1). Total num frames: 8327168. Throughput: 0: 775.1. Samples: 81930. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 11:52:42,278][00394] Avg episode reward: [(0, '5.340')] [2023-02-27 11:52:43,872][37558] Updated weights for policy 0, policy_version 2035 (0.0046) [2023-02-27 11:52:47,237][00394] Fps is (10 sec: 2455.8, 60 sec: 2935.3, 300 sec: 2316.2). Total num frames: 8343552. Throughput: 0: 770.2. Samples: 83342. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2023-02-27 11:52:47,299][00394] Avg episode reward: [(0, '5.256')] [2023-02-27 11:52:52,236][00394] Fps is (10 sec: 2456.9, 60 sec: 2867.1, 300 sec: 2293.7). Total num frames: 8351744. Throughput: 0: 722.3. Samples: 86612. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2023-02-27 11:52:52,316][00394] Avg episode reward: [(0, '5.291')] [2023-02-27 11:52:57,230][00394] Fps is (10 sec: 2869.3, 60 sec: 3003.8, 300 sec: 2351.9). Total num frames: 8372224. Throughput: 0: 685.7. Samples: 91204. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2023-02-27 11:52:57,258][00394] Avg episode reward: [(0, '5.007')] [2023-02-27 11:52:57,821][37558] Updated weights for policy 0, policy_version 2045 (0.0145) [2023-02-27 11:53:02,230][00394] Fps is (10 sec: 4097.1, 60 sec: 3072.0, 300 sec: 2406.4). Total num frames: 8392704. Throughput: 0: 722.0. Samples: 94444. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-27 11:53:02,233][00394] Avg episode reward: [(0, '5.105')] [2023-02-27 11:53:07,230][00394] Fps is (10 sec: 3276.8, 60 sec: 2935.5, 300 sec: 2408.0). Total num frames: 8404992. Throughput: 0: 777.8. Samples: 99938. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 11:53:07,304][00394] Avg episode reward: [(0, '5.097')] [2023-02-27 11:53:11,764][37558] Updated weights for policy 0, policy_version 2055 (0.0137) [2023-02-27 11:53:12,230][00394] Fps is (10 sec: 2457.5, 60 sec: 2867.2, 300 sec: 2409.4). Total num frames: 8417280. Throughput: 0: 766.3. Samples: 102978. Policy #0 lag: (min: 0.0, avg: 0.9, max: 1.0) [2023-02-27 11:53:12,281][00394] Avg episode reward: [(0, '5.363')] [2023-02-27 11:53:17,251][00394] Fps is (10 sec: 2046.8, 60 sec: 2866.9, 300 sec: 2387.3). Total num frames: 8425472. Throughput: 0: 751.4. Samples: 104540. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2023-02-27 11:53:17,350][00394] Avg episode reward: [(0, '5.359')] [2023-02-27 11:53:22,235][00394] Fps is (10 sec: 2457.6, 60 sec: 2935.5, 300 sec: 2412.1). Total num frames: 8441856. Throughput: 0: 697.3. Samples: 108362. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2023-02-27 11:53:22,286][00394] Avg episode reward: [(0, '5.245')] [2023-02-27 11:53:25,593][37558] Updated weights for policy 0, policy_version 2065 (0.0094) [2023-02-27 11:53:27,230][00394] Fps is (10 sec: 3688.4, 60 sec: 3072.1, 300 sec: 2457.6). Total num frames: 8462336. Throughput: 0: 713.1. Samples: 114018. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 11:53:27,233][00394] Avg episode reward: [(0, '5.286')] [2023-02-27 11:53:32,237][00394] Fps is (10 sec: 3683.8, 60 sec: 3003.4, 300 sec: 2479.1). Total num frames: 8478720. Throughput: 0: 753.2. Samples: 117234. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 11:53:32,423][00394] Avg episode reward: [(0, '5.225')] [2023-02-27 11:53:37,251][00394] Fps is (10 sec: 2867.0, 60 sec: 2867.1, 300 sec: 2478.6). Total num frames: 8491008. Throughput: 0: 758.5. Samples: 120742. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-27 11:53:37,329][00394] Avg episode reward: [(0, '5.297')] [2023-02-27 11:53:41,060][37558] Updated weights for policy 0, policy_version 2075 (0.0106) [2023-02-27 11:53:42,245][00394] Fps is (10 sec: 2049.1, 60 sec: 2867.1, 300 sec: 2457.6). Total num frames: 8499200. Throughput: 0: 731.6. Samples: 124128. Policy #0 lag: (min: 0.0, avg: 0.8, max: 3.0) [2023-02-27 11:53:42,352][00394] Avg episode reward: [(0, '5.096')] [2023-02-27 11:53:47,235][00394] Fps is (10 sec: 2048.2, 60 sec: 2799.3, 300 sec: 2457.6). Total num frames: 8511488. Throughput: 0: 702.3. Samples: 126048. Policy #0 lag: (min: 1.0, avg: 1.1, max: 2.0) [2023-02-27 11:53:47,361][00394] Avg episode reward: [(0, '5.220')] [2023-02-27 11:53:52,230][00394] Fps is (10 sec: 3687.1, 60 sec: 3072.1, 300 sec: 2516.1). Total num frames: 8536064. Throughput: 0: 681.2. Samples: 130590. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2023-02-27 11:53:52,257][00394] Avg episode reward: [(0, '5.050')] [2023-02-27 11:53:53,055][37558] Updated weights for policy 0, policy_version 2085 (0.0122) [2023-02-27 11:53:57,235][00394] Fps is (10 sec: 4096.0, 60 sec: 3003.7, 300 sec: 2533.8). Total num frames: 8552448. Throughput: 0: 755.9. Samples: 136994. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 11:53:57,256][00394] Avg episode reward: [(0, '4.961')] [2023-02-27 11:53:59,074][37536] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002089_8556544.pth... [2023-02-27 11:53:59,613][37536] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001955_8007680.pth [2023-02-27 11:54:02,230][00394] Fps is (10 sec: 2457.6, 60 sec: 2798.9, 300 sec: 2513.5). Total num frames: 8560640. Throughput: 0: 754.9. Samples: 138506. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-27 11:54:02,328][00394] Avg episode reward: [(0, '4.816')] [2023-02-27 11:54:07,241][00394] Fps is (10 sec: 2048.1, 60 sec: 2798.9, 300 sec: 2512.2). Total num frames: 8572928. Throughput: 0: 733.8. Samples: 141382. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2023-02-27 11:54:07,330][00394] Avg episode reward: [(0, '4.872')] [2023-02-27 11:54:11,016][37558] Updated weights for policy 0, policy_version 2095 (0.0122) [2023-02-27 11:54:12,255][00394] Fps is (10 sec: 2047.9, 60 sec: 2730.6, 300 sec: 2493.2). Total num frames: 8581120. Throughput: 0: 683.4. Samples: 144772. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2023-02-27 11:54:12,494][00394] Avg episode reward: [(0, '4.778')] [2023-02-27 11:54:17,230][00394] Fps is (10 sec: 2867.1, 60 sec: 2935.7, 300 sec: 2527.3). Total num frames: 8601600. Throughput: 0: 654.9. Samples: 146700. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2023-02-27 11:54:17,300][00394] Avg episode reward: [(0, '4.929')] [2023-02-27 11:54:21,778][37558] Updated weights for policy 0, policy_version 2105 (0.0039) [2023-02-27 11:54:22,230][00394] Fps is (10 sec: 4096.3, 60 sec: 3003.8, 300 sec: 2560.0). Total num frames: 8622080. Throughput: 0: 706.0. Samples: 152512. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 11:54:22,235][00394] Avg episode reward: [(0, '5.038')] [2023-02-27 11:54:27,250][00394] Fps is (10 sec: 3276.9, 60 sec: 2867.2, 300 sec: 2557.9). Total num frames: 8634368. Throughput: 0: 758.3. Samples: 158252. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 11:54:27,418][00394] Avg episode reward: [(0, '4.775')] [2023-02-27 11:54:32,233][00394] Fps is (10 sec: 1638.3, 60 sec: 2662.7, 300 sec: 2523.1). Total num frames: 8638464. Throughput: 0: 738.1. Samples: 159264. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 11:54:32,312][00394] Avg episode reward: [(0, '4.764')] [2023-02-27 11:54:37,236][00394] Fps is (10 sec: 1638.4, 60 sec: 2662.4, 300 sec: 2521.8). Total num frames: 8650752. Throughput: 0: 676.2. Samples: 161020. Policy #0 lag: (min: 0.0, avg: 1.4, max: 2.0) [2023-02-27 11:54:37,294][00394] Avg episode reward: [(0, '4.841')] [2023-02-27 11:54:41,781][37558] Updated weights for policy 0, policy_version 2115 (0.0143) [2023-02-27 11:54:42,233][00394] Fps is (10 sec: 2457.7, 60 sec: 2730.8, 300 sec: 2520.6). Total num frames: 8663040. Throughput: 0: 616.0. Samples: 164712. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2023-02-27 11:54:42,301][00394] Avg episode reward: [(0, '4.694')] [2023-02-27 11:54:47,230][00394] Fps is (10 sec: 3276.8, 60 sec: 2867.2, 300 sec: 2550.3). Total num frames: 8683520. Throughput: 0: 635.1. Samples: 167086. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2023-02-27 11:54:47,258][00394] Avg episode reward: [(0, '4.942')] [2023-02-27 11:54:51,176][37558] Updated weights for policy 0, policy_version 2125 (0.0063) [2023-02-27 11:54:52,230][00394] Fps is (10 sec: 4505.6, 60 sec: 2867.2, 300 sec: 2594.1). Total num frames: 8708096. Throughput: 0: 714.8. Samples: 173546. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 11:54:52,235][00394] Avg episode reward: [(0, '4.947')] [2023-02-27 11:54:57,246][00394] Fps is (10 sec: 2867.1, 60 sec: 2662.4, 300 sec: 2561.9). Total num frames: 8712192. Throughput: 0: 730.8. Samples: 177658. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 11:54:57,354][00394] Avg episode reward: [(0, '4.804')] [2023-02-27 11:55:02,234][00394] Fps is (10 sec: 1638.2, 60 sec: 2730.6, 300 sec: 2560.0). Total num frames: 8724480. Throughput: 0: 721.4. Samples: 179162. Policy #0 lag: (min: 1.0, avg: 1.0, max: 2.0) [2023-02-27 11:55:02,334][00394] Avg episode reward: [(0, '4.853')] [2023-02-27 11:55:07,235][00394] Fps is (10 sec: 2457.7, 60 sec: 2730.7, 300 sec: 2558.2). Total num frames: 8736768. Throughput: 0: 666.1. Samples: 182488. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2023-02-27 11:55:07,296][00394] Avg episode reward: [(0, '4.786')] [2023-02-27 11:55:09,829][37558] Updated weights for policy 0, policy_version 2135 (0.0122) [2023-02-27 11:55:12,230][00394] Fps is (10 sec: 3277.2, 60 sec: 2935.5, 300 sec: 2584.7). Total num frames: 8757248. Throughput: 0: 636.6. Samples: 186898. Policy #0 lag: (min: 0.0, avg: 1.5, max: 3.0) [2023-02-27 11:55:12,252][00394] Avg episode reward: [(0, '4.689')] [2023-02-27 11:55:17,230][00394] Fps is (10 sec: 4096.0, 60 sec: 2935.5, 300 sec: 2610.3). Total num frames: 8777728. Throughput: 0: 685.2. Samples: 190098. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-27 11:55:17,240][00394] Avg episode reward: [(0, '4.512')] [2023-02-27 11:55:18,917][37558] Updated weights for policy 0, policy_version 2145 (0.0030) [2023-02-27 11:55:22,244][00394] Fps is (10 sec: 2867.2, 60 sec: 2730.7, 300 sec: 2638.1). Total num frames: 8785920. Throughput: 0: 781.0. Samples: 196164. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-27 11:55:22,449][00394] Avg episode reward: [(0, '4.689')] [2023-02-27 11:55:27,236][00394] Fps is (10 sec: 1638.4, 60 sec: 2662.4, 300 sec: 2665.9). Total num frames: 8794112. Throughput: 0: 732.6. Samples: 197678. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 11:55:27,337][00394] Avg episode reward: [(0, '4.732')] [2023-02-27 11:55:32,251][00394] Fps is (10 sec: 1638.4, 60 sec: 2730.7, 300 sec: 2693.6). Total num frames: 8802304. Throughput: 0: 713.8. Samples: 199206. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2023-02-27 11:55:32,404][00394] Avg episode reward: [(0, '4.780')] [2023-02-27 11:55:37,250][00394] Fps is (10 sec: 2048.0, 60 sec: 2730.7, 300 sec: 2735.3). Total num frames: 8814592. Throughput: 0: 636.0. Samples: 202166. Policy #0 lag: (min: 1.0, avg: 1.1, max: 2.0) [2023-02-27 11:55:37,346][00394] Avg episode reward: [(0, '4.881')] [2023-02-27 11:55:40,154][37558] Updated weights for policy 0, policy_version 2155 (0.0291) [2023-02-27 11:55:42,230][00394] Fps is (10 sec: 3276.8, 60 sec: 2867.2, 300 sec: 2790.9). Total num frames: 8835072. Throughput: 0: 639.5. Samples: 206434. Policy #0 lag: (min: 1.0, avg: 1.2, max: 3.0) [2023-02-27 11:55:42,254][00394] Avg episode reward: [(0, '5.001')] [2023-02-27 11:55:47,230][00394] Fps is (10 sec: 4096.1, 60 sec: 2867.2, 300 sec: 2846.4). Total num frames: 8855552. Throughput: 0: 677.1. Samples: 209630. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 11:55:47,232][00394] Avg episode reward: [(0, '4.742')] [2023-02-27 11:55:52,230][00394] Fps is (10 sec: 2867.2, 60 sec: 2594.1, 300 sec: 2846.4). Total num frames: 8863744. Throughput: 0: 732.9. Samples: 215468. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 11:55:52,370][00394] Avg episode reward: [(0, '4.640')] [2023-02-27 11:55:52,638][37558] Updated weights for policy 0, policy_version 2165 (0.0036) [2023-02-27 11:55:57,230][00394] Fps is (10 sec: 2457.6, 60 sec: 2798.9, 300 sec: 2874.2). Total num frames: 8880128. Throughput: 0: 704.1. Samples: 218582. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2023-02-27 11:55:57,279][00394] Avg episode reward: [(0, '4.524')] [2023-02-27 11:55:57,301][37536] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002168_8880128.pth... [2023-02-27 11:55:57,954][37536] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002001_8196096.pth [2023-02-27 11:56:02,232][00394] Fps is (10 sec: 2457.5, 60 sec: 2730.7, 300 sec: 2846.4). Total num frames: 8888320. Throughput: 0: 667.3. Samples: 220126. Policy #0 lag: (min: 0.0, avg: 1.0, max: 1.0) [2023-02-27 11:56:02,311][00394] Avg episode reward: [(0, '4.785')] [2023-02-27 11:56:07,230][00394] Fps is (10 sec: 2457.6, 60 sec: 2798.9, 300 sec: 2818.6). Total num frames: 8904704. Throughput: 0: 612.4. Samples: 223724. Policy #0 lag: (min: 0.0, avg: 1.5, max: 2.0) [2023-02-27 11:56:07,281][00394] Avg episode reward: [(0, '4.787')] [2023-02-27 11:56:07,994][37558] Updated weights for policy 0, policy_version 2175 (0.0141) [2023-02-27 11:56:12,230][00394] Fps is (10 sec: 3686.5, 60 sec: 2798.9, 300 sec: 2846.4). Total num frames: 8925184. Throughput: 0: 705.7. Samples: 229434. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-27 11:56:12,232][00394] Avg episode reward: [(0, '5.033')] [2023-02-27 11:56:17,231][00394] Fps is (10 sec: 3685.8, 60 sec: 2730.6, 300 sec: 2860.2). Total num frames: 8941568. Throughput: 0: 743.0. Samples: 232642. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 11:56:17,305][00394] Avg episode reward: [(0, '4.802')] [2023-02-27 11:56:20,838][37558] Updated weights for policy 0, policy_version 2185 (0.0056) [2023-02-27 11:56:22,240][00394] Fps is (10 sec: 2457.6, 60 sec: 2730.7, 300 sec: 2860.3). Total num frames: 8949760. Throughput: 0: 761.2. Samples: 236420. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 11:56:22,400][00394] Avg episode reward: [(0, '4.790')] [2023-02-27 11:56:27,236][00394] Fps is (10 sec: 1637.6, 60 sec: 2730.4, 300 sec: 2846.3). Total num frames: 8957952. Throughput: 0: 734.2. Samples: 239478. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2023-02-27 11:56:27,364][00394] Avg episode reward: [(0, '4.595')] [2023-02-27 11:56:32,232][00394] Fps is (10 sec: 2457.6, 60 sec: 2867.2, 300 sec: 2832.5). Total num frames: 8974336. Throughput: 0: 700.3. Samples: 241142. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2023-02-27 11:56:32,289][00394] Avg episode reward: [(0, '4.468')] [2023-02-27 11:56:36,291][37558] Updated weights for policy 0, policy_version 2195 (0.0199) [2023-02-27 11:56:37,230][00394] Fps is (10 sec: 3278.9, 60 sec: 2935.5, 300 sec: 2818.6). Total num frames: 8990720. Throughput: 0: 669.3. Samples: 245586. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2023-02-27 11:56:37,262][00394] Avg episode reward: [(0, '4.389')] [2023-02-27 11:56:42,230][00394] Fps is (10 sec: 4096.0, 60 sec: 3003.7, 300 sec: 2874.2). Total num frames: 9015296. Throughput: 0: 735.6. Samples: 251686. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 11:56:42,240][00394] Avg episode reward: [(0, '4.435')] [2023-02-27 11:56:47,230][00394] Fps is (10 sec: 2867.2, 60 sec: 2730.7, 300 sec: 2846.4). Total num frames: 9019392. Throughput: 0: 750.0. Samples: 253876. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 11:56:47,289][00394] Avg episode reward: [(0, '4.552')] [2023-02-27 11:56:52,252][00394] Fps is (10 sec: 1228.8, 60 sec: 2730.7, 300 sec: 2832.5). Total num frames: 9027584. Throughput: 0: 709.8. Samples: 255666. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2023-02-27 11:56:52,379][00394] Avg episode reward: [(0, '4.743')] [2023-02-27 11:56:53,925][37558] Updated weights for policy 0, policy_version 2205 (0.0066) [2023-02-27 11:56:57,237][00394] Fps is (10 sec: 2048.0, 60 sec: 2662.4, 300 sec: 2818.6). Total num frames: 9039872. Throughput: 0: 654.3. Samples: 258876. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2023-02-27 11:56:57,297][00394] Avg episode reward: [(0, '4.831')] [2023-02-27 11:57:02,230][00394] Fps is (10 sec: 2867.2, 60 sec: 2798.9, 300 sec: 2804.7). Total num frames: 9056256. Throughput: 0: 621.7. Samples: 260618. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2023-02-27 11:57:02,264][00394] Avg episode reward: [(0, '4.978')] [2023-02-27 11:57:06,002][37558] Updated weights for policy 0, policy_version 2215 (0.0098) [2023-02-27 11:57:07,230][00394] Fps is (10 sec: 3686.5, 60 sec: 2867.2, 300 sec: 2818.6). Total num frames: 9076736. Throughput: 0: 658.1. Samples: 266036. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 11:57:07,234][00394] Avg episode reward: [(0, '5.271')] [2023-02-27 11:57:12,231][00394] Fps is (10 sec: 3686.0, 60 sec: 2798.9, 300 sec: 2846.4). Total num frames: 9093120. Throughput: 0: 727.2. Samples: 272200. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 11:57:12,307][00394] Avg episode reward: [(0, '5.327')] [2023-02-27 11:57:17,267][00394] Fps is (10 sec: 2457.6, 60 sec: 2662.5, 300 sec: 2832.5). Total num frames: 9101312. Throughput: 0: 726.9. Samples: 273854. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 11:57:17,386][00394] Avg episode reward: [(0, '5.151')] [2023-02-27 11:57:21,701][37558] Updated weights for policy 0, policy_version 2225 (0.0094) [2023-02-27 11:57:22,230][00394] Fps is (10 sec: 2048.2, 60 sec: 2730.7, 300 sec: 2832.5). Total num frames: 9113600. Throughput: 0: 696.5. Samples: 276928. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2023-02-27 11:57:22,305][00394] Avg episode reward: [(0, '4.904')] [2023-02-27 11:57:27,233][00394] Fps is (10 sec: 2457.6, 60 sec: 2799.2, 300 sec: 2804.7). Total num frames: 9125888. Throughput: 0: 640.0. Samples: 280486. Policy #0 lag: (min: 1.0, avg: 1.3, max: 3.0) [2023-02-27 11:57:27,294][00394] Avg episode reward: [(0, '5.004')] [2023-02-27 11:57:32,230][00394] Fps is (10 sec: 3276.8, 60 sec: 2867.2, 300 sec: 2804.7). Total num frames: 9146368. Throughput: 0: 636.6. Samples: 282524. Policy #0 lag: (min: 1.0, avg: 1.1, max: 2.0) [2023-02-27 11:57:32,252][00394] Avg episode reward: [(0, '5.268')] [2023-02-27 11:57:34,113][37558] Updated weights for policy 0, policy_version 2235 (0.0104) [2023-02-27 11:57:37,235][00394] Fps is (10 sec: 4093.7, 60 sec: 2935.2, 300 sec: 2846.3). Total num frames: 9166848. Throughput: 0: 733.2. Samples: 288664. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 11:57:37,238][00394] Avg episode reward: [(0, '5.193')] [2023-02-27 11:57:42,234][00394] Fps is (10 sec: 2866.1, 60 sec: 2662.2, 300 sec: 2818.6). Total num frames: 9175040. Throughput: 0: 752.8. Samples: 292754. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2023-02-27 11:57:42,293][00394] Avg episode reward: [(0, '5.279')] [2023-02-27 11:57:47,236][00394] Fps is (10 sec: 2047.8, 60 sec: 2798.6, 300 sec: 2832.5). Total num frames: 9187328. Throughput: 0: 749.2. Samples: 294336. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2023-02-27 11:57:47,334][00394] Avg episode reward: [(0, '5.469')] [2023-02-27 11:57:51,151][37558] Updated weights for policy 0, policy_version 2245 (0.0121) [2023-02-27 11:57:52,252][00394] Fps is (10 sec: 2048.4, 60 sec: 2798.8, 300 sec: 2790.8). Total num frames: 9195520. Throughput: 0: 704.3. Samples: 297732. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2023-02-27 11:57:52,426][00394] Avg episode reward: [(0, '5.653')] [2023-02-27 11:57:53,425][37536] Saving new best policy, reward=5.653! [2023-02-27 11:57:57,230][00394] Fps is (10 sec: 2459.2, 60 sec: 2867.2, 300 sec: 2776.9). Total num frames: 9211904. Throughput: 0: 640.9. Samples: 301042. Policy #0 lag: (min: 0.0, avg: 1.5, max: 3.0) [2023-02-27 11:57:57,301][00394] Avg episode reward: [(0, '5.969')] [2023-02-27 11:57:57,315][37536] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002249_9211904.pth... [2023-02-27 11:57:57,738][37536] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002089_8556544.pth [2023-02-27 11:57:57,746][37536] Saving new best policy, reward=5.969! [2023-02-27 11:58:02,230][00394] Fps is (10 sec: 3277.5, 60 sec: 2867.2, 300 sec: 2790.8). Total num frames: 9228288. Throughput: 0: 654.5. Samples: 303308. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-27 11:58:02,254][00394] Avg episode reward: [(0, '5.989')] [2023-02-27 11:58:02,257][37536] Saving new best policy, reward=5.989! [2023-02-27 11:58:04,218][37558] Updated weights for policy 0, policy_version 2255 (0.0099) [2023-02-27 11:58:07,236][00394] Fps is (10 sec: 3276.4, 60 sec: 2798.9, 300 sec: 2804.7). Total num frames: 9244672. Throughput: 0: 708.6. Samples: 308818. Policy #0 lag: (min: 1.0, avg: 1.0, max: 2.0) [2023-02-27 11:58:07,325][00394] Avg episode reward: [(0, '5.814')] [2023-02-27 11:58:12,236][00394] Fps is (10 sec: 2456.4, 60 sec: 2662.2, 300 sec: 2804.7). Total num frames: 9252864. Throughput: 0: 696.7. Samples: 311842. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 11:58:12,324][00394] Avg episode reward: [(0, '6.029')] [2023-02-27 11:58:12,338][37536] Saving new best policy, reward=6.029! [2023-02-27 11:58:17,239][00394] Fps is (10 sec: 1638.6, 60 sec: 2662.4, 300 sec: 2777.0). Total num frames: 9261056. Throughput: 0: 685.1. Samples: 313354. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2023-02-27 11:58:17,392][00394] Avg episode reward: [(0, '5.923')] [2023-02-27 11:58:22,249][00394] Fps is (10 sec: 1639.0, 60 sec: 2594.1, 300 sec: 2735.3). Total num frames: 9269248. Throughput: 0: 615.8. Samples: 316374. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2023-02-27 11:58:22,438][00394] Avg episode reward: [(0, '5.821')] [2023-02-27 11:58:24,377][37558] Updated weights for policy 0, policy_version 2265 (0.0133) [2023-02-27 11:58:27,248][00394] Fps is (10 sec: 2048.0, 60 sec: 2594.1, 300 sec: 2721.5). Total num frames: 9281536. Throughput: 0: 602.9. Samples: 319884. Policy #0 lag: (min: 1.0, avg: 1.3, max: 3.0) [2023-02-27 11:58:27,436][00394] Avg episode reward: [(0, '5.828')] [2023-02-27 11:58:32,232][00394] Fps is (10 sec: 3277.1, 60 sec: 2594.1, 300 sec: 2749.2). Total num frames: 9302016. Throughput: 0: 614.7. Samples: 321992. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2023-02-27 11:58:32,256][00394] Avg episode reward: [(0, '5.925')] [2023-02-27 11:58:35,452][37558] Updated weights for policy 0, policy_version 2275 (0.0071) [2023-02-27 11:58:37,234][00394] Fps is (10 sec: 3686.2, 60 sec: 2526.1, 300 sec: 2777.0). Total num frames: 9318400. Throughput: 0: 670.2. Samples: 327890. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 11:58:37,337][00394] Avg episode reward: [(0, '6.120')] [2023-02-27 11:58:37,429][37536] Saving new best policy, reward=6.120! [2023-02-27 11:58:42,236][00394] Fps is (10 sec: 2457.6, 60 sec: 2526.0, 300 sec: 2763.1). Total num frames: 9326592. Throughput: 0: 661.1. Samples: 330790. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2023-02-27 11:58:42,319][00394] Avg episode reward: [(0, '6.365')] [2023-02-27 11:58:43,862][37536] Saving new best policy, reward=6.365! [2023-02-27 11:58:47,230][00394] Fps is (10 sec: 1638.5, 60 sec: 2457.9, 300 sec: 2707.5). Total num frames: 9334784. Throughput: 0: 641.4. Samples: 332170. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2023-02-27 11:58:47,358][00394] Avg episode reward: [(0, '6.538')] [2023-02-27 11:58:48,761][37536] Saving new best policy, reward=6.538! [2023-02-27 11:58:52,233][00394] Fps is (10 sec: 2048.0, 60 sec: 2526.0, 300 sec: 2693.6). Total num frames: 9347072. Throughput: 0: 586.4. Samples: 335206. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2023-02-27 11:58:52,285][00394] Avg episode reward: [(0, '6.896')] [2023-02-27 11:58:52,300][37536] Saving new best policy, reward=6.896! [2023-02-27 11:58:55,168][37558] Updated weights for policy 0, policy_version 2285 (0.0098) [2023-02-27 11:58:57,230][00394] Fps is (10 sec: 3276.8, 60 sec: 2594.1, 300 sec: 2735.3). Total num frames: 9367552. Throughput: 0: 614.4. Samples: 339488. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2023-02-27 11:58:57,250][00394] Avg episode reward: [(0, '7.330')] [2023-02-27 11:58:57,266][37536] Saving new best policy, reward=7.330! [2023-02-27 11:59:02,230][00394] Fps is (10 sec: 4096.0, 60 sec: 2662.4, 300 sec: 2763.1). Total num frames: 9388032. Throughput: 0: 648.8. Samples: 342548. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-27 11:59:02,235][00394] Avg episode reward: [(0, '7.778')] [2023-02-27 11:59:02,243][37536] Saving new best policy, reward=7.778! [2023-02-27 11:59:07,240][00394] Fps is (10 sec: 2457.0, 60 sec: 2457.5, 300 sec: 2749.2). Total num frames: 9392128. Throughput: 0: 691.9. Samples: 347510. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-27 11:59:07,384][00394] Avg episode reward: [(0, '7.842')] [2023-02-27 11:59:07,795][37536] Saving new best policy, reward=7.842! [2023-02-27 11:59:09,928][37558] Updated weights for policy 0, policy_version 2295 (0.0084) [2023-02-27 11:59:12,235][00394] Fps is (10 sec: 1638.4, 60 sec: 2526.1, 300 sec: 2721.4). Total num frames: 9404416. Throughput: 0: 671.8. Samples: 350116. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2023-02-27 11:59:12,337][00394] Avg episode reward: [(0, '7.795')] [2023-02-27 11:59:17,244][00394] Fps is (10 sec: 2048.3, 60 sec: 2525.8, 300 sec: 2679.7). Total num frames: 9412608. Throughput: 0: 659.4. Samples: 351666. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2023-02-27 11:59:17,331][00394] Avg episode reward: [(0, '8.083')] [2023-02-27 11:59:17,432][37536] Saving new best policy, reward=8.083! [2023-02-27 11:59:22,252][00394] Fps is (10 sec: 2047.9, 60 sec: 2594.2, 300 sec: 2679.8). Total num frames: 9424896. Throughput: 0: 601.7. Samples: 354968. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2023-02-27 11:59:22,342][00394] Avg episode reward: [(0, '8.314')] [2023-02-27 11:59:22,353][37536] Saving new best policy, reward=8.314! [2023-02-27 11:59:25,874][37558] Updated weights for policy 0, policy_version 2305 (0.0266) [2023-02-27 11:59:27,230][00394] Fps is (10 sec: 3277.2, 60 sec: 2730.7, 300 sec: 2735.3). Total num frames: 9445376. Throughput: 0: 634.1. Samples: 359324. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2023-02-27 11:59:27,260][00394] Avg episode reward: [(0, '8.033')] [2023-02-27 11:59:32,239][00394] Fps is (10 sec: 4096.2, 60 sec: 2730.7, 300 sec: 2763.1). Total num frames: 9465856. Throughput: 0: 668.7. Samples: 362262. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-27 11:59:32,311][00394] Avg episode reward: [(0, '7.705')] [2023-02-27 11:59:37,237][00394] Fps is (10 sec: 2457.6, 60 sec: 2525.9, 300 sec: 2735.3). Total num frames: 9469952. Throughput: 0: 702.9. Samples: 366838. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-27 11:59:37,358][00394] Avg episode reward: [(0, '7.461')] [2023-02-27 11:59:42,233][00394] Fps is (10 sec: 819.1, 60 sec: 2457.6, 300 sec: 2679.7). Total num frames: 9474048. Throughput: 0: 614.1. Samples: 367122. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-27 11:59:42,277][00394] Avg episode reward: [(0, '7.403')] [2023-02-27 11:59:46,707][37558] Updated weights for policy 0, policy_version 2315 (0.0096) [2023-02-27 11:59:47,233][00394] Fps is (10 sec: 1228.4, 60 sec: 2457.5, 300 sec: 2624.2). Total num frames: 9482240. Throughput: 0: 575.1. Samples: 368430. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-02-27 11:59:47,303][00394] Avg episode reward: [(0, '7.469')] [2023-02-27 11:59:52,240][00394] Fps is (10 sec: 2867.4, 60 sec: 2594.1, 300 sec: 2679.8). Total num frames: 9502720. Throughput: 0: 558.7. Samples: 372650. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2023-02-27 11:59:52,288][00394] Avg episode reward: [(0, '7.763')] [2023-02-27 11:59:57,233][00394] Fps is (10 sec: 3687.1, 60 sec: 2525.8, 300 sec: 2693.6). Total num frames: 9519104. Throughput: 0: 632.8. Samples: 378592. Policy #0 lag: (min: 0.0, avg: 0.5, max: 3.0) [2023-02-27 11:59:57,277][00394] Avg episode reward: [(0, '8.084')] [2023-02-27 11:59:57,495][37558] Updated weights for policy 0, policy_version 2325 (0.0129) [2023-02-27 11:59:57,553][37536] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002325_9523200.pth... [2023-02-27 11:59:58,292][37536] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002168_8880128.pth [2023-02-27 12:00:02,230][00394] Fps is (10 sec: 2457.6, 60 sec: 2321.1, 300 sec: 2679.8). Total num frames: 9527296. Throughput: 0: 630.6. Samples: 380044. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:00:02,311][00394] Avg episode reward: [(0, '8.314')] [2023-02-27 12:00:07,234][00394] Fps is (10 sec: 1638.6, 60 sec: 2389.4, 300 sec: 2638.1). Total num frames: 9535488. Throughput: 0: 622.0. Samples: 382956. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:00:07,322][00394] Avg episode reward: [(0, '8.245')] [2023-02-27 12:00:12,230][00394] Fps is (10 sec: 2048.0, 60 sec: 2389.3, 300 sec: 2610.3). Total num frames: 9547776. Throughput: 0: 594.3. Samples: 386066. Policy #0 lag: (min: 1.0, avg: 1.1, max: 2.0) [2023-02-27 12:00:12,318][00394] Avg episode reward: [(0, '8.171')] [2023-02-27 12:00:17,230][00394] Fps is (10 sec: 2457.6, 60 sec: 2457.6, 300 sec: 2624.2). Total num frames: 9560064. Throughput: 0: 565.5. Samples: 387708. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2023-02-27 12:00:17,309][00394] Avg episode reward: [(0, '7.626')] [2023-02-27 12:00:18,127][37536] Signal inference workers to stop experience collection... (50 times) [2023-02-27 12:00:18,163][37558] InferenceWorker_p0-w0: stopping experience collection (50 times) [2023-02-27 12:00:18,246][37536] Signal inference workers to resume experience collection... (50 times) [2023-02-27 12:00:18,247][37558] InferenceWorker_p0-w0: resuming experience collection (50 times) [2023-02-27 12:00:18,289][37558] Updated weights for policy 0, policy_version 2335 (0.0195) [2023-02-27 12:00:22,230][00394] Fps is (10 sec: 3276.8, 60 sec: 2594.2, 300 sec: 2665.9). Total num frames: 9580544. Throughput: 0: 561.9. Samples: 392124. Policy #0 lag: (min: 0.0, avg: 1.5, max: 2.0) [2023-02-27 12:00:22,272][00394] Avg episode reward: [(0, '8.279')] [2023-02-27 12:00:27,268][00394] Fps is (10 sec: 3275.3, 60 sec: 2457.4, 300 sec: 2679.7). Total num frames: 9592832. Throughput: 0: 679.0. Samples: 397678. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-27 12:00:27,583][00394] Avg episode reward: [(0, '8.496')] [2023-02-27 12:00:30,093][37536] Saving new best policy, reward=8.496! [2023-02-27 12:00:32,233][00394] Fps is (10 sec: 1638.4, 60 sec: 2184.5, 300 sec: 2652.0). Total num frames: 9596928. Throughput: 0: 677.7. Samples: 398926. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-27 12:00:32,365][00394] Avg episode reward: [(0, '8.810')] [2023-02-27 12:00:37,238][00394] Fps is (10 sec: 819.6, 60 sec: 2184.5, 300 sec: 2596.4). Total num frames: 9601024. Throughput: 0: 609.3. Samples: 400068. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-02-27 12:00:37,344][00394] Avg episode reward: [(0, '8.768')] [2023-02-27 12:00:38,715][37536] Saving new best policy, reward=8.810! [2023-02-27 12:00:39,024][37558] Updated weights for policy 0, policy_version 2345 (0.0177) [2023-02-27 12:00:42,230][00394] Fps is (10 sec: 819.2, 60 sec: 2184.6, 300 sec: 2540.9). Total num frames: 9605120. Throughput: 0: 480.3. Samples: 400204. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-02-27 12:00:42,271][00394] Avg episode reward: [(0, '8.897')] [2023-02-27 12:00:42,274][37536] Saving new best policy, reward=8.897! [2023-02-27 12:00:47,246][00394] Fps is (10 sec: 1638.4, 60 sec: 2252.9, 300 sec: 2554.8). Total num frames: 9617408. Throughput: 0: 490.0. Samples: 402092. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2023-02-27 12:00:47,317][00394] Avg episode reward: [(0, '9.288')] [2023-02-27 12:00:47,389][37536] Saving new best policy, reward=9.288! [2023-02-27 12:00:52,232][00394] Fps is (10 sec: 3276.8, 60 sec: 2252.8, 300 sec: 2568.7). Total num frames: 9637888. Throughput: 0: 523.0. Samples: 406490. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2023-02-27 12:00:52,265][00394] Avg episode reward: [(0, '9.842')] [2023-02-27 12:00:52,269][37536] Saving new best policy, reward=9.842! [2023-02-27 12:00:56,658][37558] Updated weights for policy 0, policy_version 2355 (0.0124) [2023-02-27 12:00:57,272][00394] Fps is (10 sec: 2866.9, 60 sec: 2116.3, 300 sec: 2568.7). Total num frames: 9646080. Throughput: 0: 551.6. Samples: 410888. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-27 12:00:57,651][00394] Avg episode reward: [(0, '9.967')] [2023-02-27 12:01:01,797][37536] Saving new best policy, reward=9.967! [2023-02-27 12:01:02,240][00394] Fps is (10 sec: 1228.5, 60 sec: 2047.9, 300 sec: 2527.0). Total num frames: 9650176. Throughput: 0: 525.9. Samples: 411376. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2023-02-27 12:01:02,446][00394] Avg episode reward: [(0, '10.146')] [2023-02-27 12:01:02,467][37536] Saving new best policy, reward=10.146! [2023-02-27 12:01:07,231][00394] Fps is (10 sec: 1228.8, 60 sec: 2048.0, 300 sec: 2485.4). Total num frames: 9658368. Throughput: 0: 468.1. Samples: 413188. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2023-02-27 12:01:07,318][00394] Avg episode reward: [(0, '10.575')] [2023-02-27 12:01:07,364][37536] Saving new best policy, reward=10.575! [2023-02-27 12:01:12,243][00394] Fps is (10 sec: 1638.8, 60 sec: 1979.7, 300 sec: 2457.6). Total num frames: 9666560. Throughput: 0: 415.2. Samples: 416362. Policy #0 lag: (min: 1.0, avg: 1.1, max: 2.0) [2023-02-27 12:01:12,401][00394] Avg episode reward: [(0, '10.165')] [2023-02-27 12:01:17,259][00394] Fps is (10 sec: 2457.7, 60 sec: 2048.0, 300 sec: 2485.4). Total num frames: 9682944. Throughput: 0: 425.4. Samples: 418068. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) [2023-02-27 12:01:17,360][00394] Avg episode reward: [(0, '10.283')] [2023-02-27 12:01:18,668][37558] Updated weights for policy 0, policy_version 2365 (0.0195) [2023-02-27 12:01:22,237][00394] Fps is (10 sec: 3276.8, 60 sec: 1979.7, 300 sec: 2513.2). Total num frames: 9699328. Throughput: 0: 500.4. Samples: 422584. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2023-02-27 12:01:22,294][00394] Avg episode reward: [(0, '10.206')] [2023-02-27 12:01:27,285][00394] Fps is (10 sec: 2867.4, 60 sec: 1979.9, 300 sec: 2499.3). Total num frames: 9711616. Throughput: 0: 611.3. Samples: 427712. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:01:27,497][00394] Avg episode reward: [(0, '10.040')] [2023-02-27 12:01:32,237][00394] Fps is (10 sec: 1638.4, 60 sec: 1979.7, 300 sec: 2457.6). Total num frames: 9715712. Throughput: 0: 585.5. Samples: 428440. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:01:32,348][00394] Avg episode reward: [(0, '10.124')] [2023-02-27 12:01:36,533][37558] Updated weights for policy 0, policy_version 2375 (0.0141) [2023-02-27 12:01:37,230][00394] Fps is (10 sec: 1638.4, 60 sec: 2116.3, 300 sec: 2415.9). Total num frames: 9728000. Throughput: 0: 536.4. Samples: 430626. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2023-02-27 12:01:37,252][00394] Avg episode reward: [(0, '10.079')] [2023-02-27 12:01:42,233][00394] Fps is (10 sec: 2047.9, 60 sec: 2184.5, 300 sec: 2429.8). Total num frames: 9736192. Throughput: 0: 512.8. Samples: 433966. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2023-02-27 12:01:42,334][00394] Avg episode reward: [(0, '10.419')] [2023-02-27 12:01:47,231][00394] Fps is (10 sec: 2867.2, 60 sec: 2321.1, 300 sec: 2471.5). Total num frames: 9756672. Throughput: 0: 547.7. Samples: 436020. Policy #0 lag: (min: 1.0, avg: 1.2, max: 3.0) [2023-02-27 12:01:47,261][00394] Avg episode reward: [(0, '10.740')] [2023-02-27 12:01:47,488][37536] Saving new best policy, reward=10.740! [2023-02-27 12:01:50,831][37558] Updated weights for policy 0, policy_version 2385 (0.0119) [2023-02-27 12:01:52,230][00394] Fps is (10 sec: 3686.7, 60 sec: 2252.8, 300 sec: 2485.4). Total num frames: 9773056. Throughput: 0: 621.3. Samples: 441144. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2023-02-27 12:01:52,257][00394] Avg episode reward: [(0, '11.248')] [2023-02-27 12:01:52,262][37536] Saving new best policy, reward=11.248! [2023-02-27 12:01:57,243][00394] Fps is (10 sec: 2048.0, 60 sec: 2184.6, 300 sec: 2443.7). Total num frames: 9777152. Throughput: 0: 626.0. Samples: 444530. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2023-02-27 12:01:57,476][00394] Avg episode reward: [(0, '11.475')] [2023-02-27 12:02:00,731][37536] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002389_9785344.pth... [2023-02-27 12:02:01,253][37536] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002249_9211904.pth [2023-02-27 12:02:01,261][37536] Saving new best policy, reward=11.475! [2023-02-27 12:02:02,235][00394] Fps is (10 sec: 1228.6, 60 sec: 2252.8, 300 sec: 2402.0). Total num frames: 9785344. Throughput: 0: 597.8. Samples: 444970. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2023-02-27 12:02:02,405][00394] Avg episode reward: [(0, '11.799')] [2023-02-27 12:02:04,246][37536] Saving new best policy, reward=11.799! [2023-02-27 12:02:07,238][00394] Fps is (10 sec: 2048.0, 60 sec: 2321.1, 300 sec: 2388.2). Total num frames: 9797632. Throughput: 0: 552.9. Samples: 447464. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2023-02-27 12:02:07,290][00394] Avg episode reward: [(0, '11.905')] [2023-02-27 12:02:07,363][37536] Saving new best policy, reward=11.905! [2023-02-27 12:02:10,808][37558] Updated weights for policy 0, policy_version 2395 (0.0140) [2023-02-27 12:02:12,234][00394] Fps is (10 sec: 2867.6, 60 sec: 2457.6, 300 sec: 2415.9). Total num frames: 9814016. Throughput: 0: 533.2. Samples: 451704. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2023-02-27 12:02:12,260][00394] Avg episode reward: [(0, '12.516')] [2023-02-27 12:02:12,262][37536] Saving new best policy, reward=12.516! [2023-02-27 12:02:17,230][00394] Fps is (10 sec: 3686.4, 60 sec: 2525.9, 300 sec: 2443.7). Total num frames: 9834496. Throughput: 0: 576.1. Samples: 454366. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-27 12:02:17,256][00394] Avg episode reward: [(0, '13.247')] [2023-02-27 12:02:17,273][37536] Saving new best policy, reward=13.247! [2023-02-27 12:02:22,243][00394] Fps is (10 sec: 2867.2, 60 sec: 2389.3, 300 sec: 2429.8). Total num frames: 9842688. Throughput: 0: 649.1. Samples: 459834. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:02:22,400][00394] Avg episode reward: [(0, '13.052')] [2023-02-27 12:02:24,545][37558] Updated weights for policy 0, policy_version 2405 (0.0057) [2023-02-27 12:02:27,249][00394] Fps is (10 sec: 2047.3, 60 sec: 2389.2, 300 sec: 2402.0). Total num frames: 9854976. Throughput: 0: 640.7. Samples: 462798. Policy #0 lag: (min: 1.0, avg: 1.0, max: 2.0) [2023-02-27 12:02:27,308][00394] Avg episode reward: [(0, '13.197')] [2023-02-27 12:02:32,237][00394] Fps is (10 sec: 2457.6, 60 sec: 2525.9, 300 sec: 2374.3). Total num frames: 9867264. Throughput: 0: 627.8. Samples: 464270. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2023-02-27 12:02:32,336][00394] Avg episode reward: [(0, '14.223')] [2023-02-27 12:02:32,346][37536] Saving new best policy, reward=14.223! [2023-02-27 12:02:37,243][00394] Fps is (10 sec: 2458.4, 60 sec: 2525.9, 300 sec: 2388.2). Total num frames: 9879552. Throughput: 0: 590.0. Samples: 467694. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2023-02-27 12:02:37,284][00394] Avg episode reward: [(0, '14.761')] [2023-02-27 12:02:37,378][37536] Saving new best policy, reward=14.761! [2023-02-27 12:02:40,336][37558] Updated weights for policy 0, policy_version 2415 (0.0148) [2023-02-27 12:02:42,230][00394] Fps is (10 sec: 3276.7, 60 sec: 2730.7, 300 sec: 2416.0). Total num frames: 9900032. Throughput: 0: 625.5. Samples: 472678. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2023-02-27 12:02:42,261][00394] Avg episode reward: [(0, '14.452')] [2023-02-27 12:02:47,245][00394] Fps is (10 sec: 3686.4, 60 sec: 2662.4, 300 sec: 2443.7). Total num frames: 9916416. Throughput: 0: 683.9. Samples: 475746. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:02:47,390][00394] Avg episode reward: [(0, '15.438')] [2023-02-27 12:02:47,527][37536] Saving new best policy, reward=15.438! [2023-02-27 12:02:52,230][00394] Fps is (10 sec: 2457.6, 60 sec: 2525.9, 300 sec: 2415.9). Total num frames: 9924608. Throughput: 0: 717.5. Samples: 479750. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-27 12:02:52,258][00394] Avg episode reward: [(0, '14.882')] [2023-02-27 12:02:56,185][37558] Updated weights for policy 0, policy_version 2425 (0.0077) [2023-02-27 12:02:57,302][00394] Fps is (10 sec: 1637.7, 60 sec: 2594.0, 300 sec: 2388.1). Total num frames: 9932800. Throughput: 0: 688.0. Samples: 482668. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2023-02-27 12:02:57,603][00394] Avg episode reward: [(0, '14.558')] [2023-02-27 12:03:02,230][00394] Fps is (10 sec: 2048.0, 60 sec: 2662.5, 300 sec: 2374.3). Total num frames: 9945088. Throughput: 0: 661.1. Samples: 484116. Policy #0 lag: (min: 0.0, avg: 1.5, max: 2.0) [2023-02-27 12:03:02,300][00394] Avg episode reward: [(0, '13.884')] [2023-02-27 12:03:07,236][00394] Fps is (10 sec: 2458.6, 60 sec: 2662.4, 300 sec: 2388.2). Total num frames: 9957376. Throughput: 0: 626.5. Samples: 488028. Policy #0 lag: (min: 0.0, avg: 1.6, max: 2.0) [2023-02-27 12:03:07,327][00394] Avg episode reward: [(0, '14.439')] [2023-02-27 12:03:10,355][37558] Updated weights for policy 0, policy_version 2435 (0.0188) [2023-02-27 12:03:12,230][00394] Fps is (10 sec: 3276.8, 60 sec: 2730.7, 300 sec: 2429.8). Total num frames: 9977856. Throughput: 0: 673.5. Samples: 493102. Policy #0 lag: (min: 0.0, avg: 1.5, max: 2.0) [2023-02-27 12:03:12,252][00394] Avg episode reward: [(0, '14.500')] [2023-02-27 12:03:17,230][00394] Fps is (10 sec: 3686.5, 60 sec: 2662.4, 300 sec: 2457.6). Total num frames: 9994240. Throughput: 0: 710.4. Samples: 496240. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2023-02-27 12:03:17,322][00394] Avg episode reward: [(0, '13.808')] [2023-02-27 12:03:22,251][00394] Fps is (10 sec: 2457.6, 60 sec: 2662.4, 300 sec: 2443.7). Total num frames: 10002432. Throughput: 0: 711.6. Samples: 499716. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-27 12:03:22,452][00394] Avg episode reward: [(0, '14.568')] [2023-02-27 12:03:27,080][37558] Updated weights for policy 0, policy_version 2445 (0.0139) [2023-02-27 12:03:27,230][00394] Fps is (10 sec: 2048.0, 60 sec: 2662.5, 300 sec: 2415.9). Total num frames: 10014720. Throughput: 0: 668.5. Samples: 502762. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2023-02-27 12:03:27,349][00394] Avg episode reward: [(0, '14.516')] [2023-02-27 12:03:32,231][00394] Fps is (10 sec: 2457.6, 60 sec: 2662.4, 300 sec: 2402.1). Total num frames: 10027008. Throughput: 0: 636.8. Samples: 504404. Policy #0 lag: (min: 1.0, avg: 1.1, max: 2.0) [2023-02-27 12:03:32,277][00394] Avg episode reward: [(0, '15.343')] [2023-02-27 12:03:37,233][00394] Fps is (10 sec: 2867.3, 60 sec: 2730.7, 300 sec: 2429.8). Total num frames: 10043392. Throughput: 0: 637.1. Samples: 508418. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2023-02-27 12:03:37,275][00394] Avg episode reward: [(0, '15.607')] [2023-02-27 12:03:37,339][37536] Saving new best policy, reward=15.607! [2023-02-27 12:03:40,064][37558] Updated weights for policy 0, policy_version 2455 (0.0139) [2023-02-27 12:03:42,230][00394] Fps is (10 sec: 3686.4, 60 sec: 2730.7, 300 sec: 2471.5). Total num frames: 10063872. Throughput: 0: 691.5. Samples: 513784. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-27 12:03:42,254][00394] Avg episode reward: [(0, '15.320')] [2023-02-27 12:03:47,244][00394] Fps is (10 sec: 2866.2, 60 sec: 2594.0, 300 sec: 2457.6). Total num frames: 10072064. Throughput: 0: 712.3. Samples: 516172. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-27 12:03:47,340][00394] Avg episode reward: [(0, '15.720')] [2023-02-27 12:03:47,419][37536] Saving new best policy, reward=15.720! [2023-02-27 12:03:52,250][00394] Fps is (10 sec: 1638.4, 60 sec: 2594.1, 300 sec: 2415.9). Total num frames: 10080256. Throughput: 0: 688.8. Samples: 519024. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2023-02-27 12:03:52,332][00394] Avg episode reward: [(0, '16.569')] [2023-02-27 12:03:52,335][37536] Saving new best policy, reward=16.569! [2023-02-27 12:03:57,236][00394] Fps is (10 sec: 2048.7, 60 sec: 2662.6, 300 sec: 2388.2). Total num frames: 10092544. Throughput: 0: 646.1. Samples: 522178. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2023-02-27 12:03:57,365][00394] Avg episode reward: [(0, '16.209')] [2023-02-27 12:03:59,103][37558] Updated weights for policy 0, policy_version 2465 (0.0175) [2023-02-27 12:03:59,103][37536] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002465_10096640.pth... [2023-02-27 12:03:59,619][37536] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002325_9523200.pth [2023-02-27 12:04:02,241][00394] Fps is (10 sec: 2047.2, 60 sec: 2594.0, 300 sec: 2402.0). Total num frames: 10100736. Throughput: 0: 611.7. Samples: 523768. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2023-02-27 12:04:02,334][00394] Avg episode reward: [(0, '16.357')] [2023-02-27 12:04:07,230][00394] Fps is (10 sec: 3276.8, 60 sec: 2798.9, 300 sec: 2443.7). Total num frames: 10125312. Throughput: 0: 625.7. Samples: 527874. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) [2023-02-27 12:04:07,287][00394] Avg episode reward: [(0, '16.120')] [2023-02-27 12:04:10,249][37558] Updated weights for policy 0, policy_version 2475 (0.0086) [2023-02-27 12:04:12,230][00394] Fps is (10 sec: 4507.1, 60 sec: 2798.9, 300 sec: 2485.4). Total num frames: 10145792. Throughput: 0: 700.2. Samples: 534272. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:04:12,237][00394] Avg episode reward: [(0, '15.801')] [2023-02-27 12:04:17,230][00394] Fps is (10 sec: 2457.6, 60 sec: 2594.1, 300 sec: 2457.6). Total num frames: 10149888. Throughput: 0: 709.7. Samples: 536340. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:04:17,300][00394] Avg episode reward: [(0, '15.357')] [2023-02-27 12:04:22,261][00394] Fps is (10 sec: 819.2, 60 sec: 2525.9, 300 sec: 2402.1). Total num frames: 10153984. Throughput: 0: 637.5. Samples: 537106. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:04:22,394][00394] Avg episode reward: [(0, '15.393')] [2023-02-27 12:04:27,248][00394] Fps is (10 sec: 409.5, 60 sec: 2321.0, 300 sec: 2332.6). Total num frames: 10153984. Throughput: 0: 552.3. Samples: 538638. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:04:27,340][00394] Avg episode reward: [(0, '14.948')] [2023-02-27 12:04:32,240][00394] Fps is (10 sec: 1638.4, 60 sec: 2389.3, 300 sec: 2374.3). Total num frames: 10170368. Throughput: 0: 539.1. Samples: 540432. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2023-02-27 12:04:32,377][00394] Avg episode reward: [(0, '15.998')] [2023-02-27 12:04:33,995][37558] Updated weights for policy 0, policy_version 2485 (0.0191) [2023-02-27 12:04:37,243][00394] Fps is (10 sec: 3686.8, 60 sec: 2457.6, 300 sec: 2429.8). Total num frames: 10190848. Throughput: 0: 573.8. Samples: 544844. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2023-02-27 12:04:37,295][00394] Avg episode reward: [(0, '14.705')] [2023-02-27 12:04:42,255][00394] Fps is (10 sec: 2457.6, 60 sec: 2184.5, 300 sec: 2416.0). Total num frames: 10194944. Throughput: 0: 599.2. Samples: 549142. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2023-02-27 12:04:42,403][00394] Avg episode reward: [(0, '14.615')] [2023-02-27 12:04:47,336][00394] Fps is (10 sec: 812.4, 60 sec: 2113.4, 300 sec: 2359.7). Total num frames: 10199040. Throughput: 0: 568.3. Samples: 549388. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:04:47,476][00394] Avg episode reward: [(0, '14.615')] [2023-02-27 12:04:52,258][00394] Fps is (10 sec: 819.2, 60 sec: 2048.0, 300 sec: 2318.8). Total num frames: 10203136. Throughput: 0: 480.3. Samples: 549488. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-02-27 12:04:52,415][00394] Avg episode reward: [(0, '15.000')] [2023-02-27 12:04:56,881][37558] Updated weights for policy 0, policy_version 2495 (0.0100) [2023-02-27 12:04:57,230][00394] Fps is (10 sec: 2065.3, 60 sec: 2116.3, 300 sec: 2346.5). Total num frames: 10219520. Throughput: 0: 424.0. Samples: 553350. Policy #0 lag: (min: 1.0, avg: 1.1, max: 3.0) [2023-02-27 12:04:57,263][00394] Avg episode reward: [(0, '16.605')] [2023-02-27 12:04:57,315][37536] Saving new best policy, reward=16.605! [2023-02-27 12:05:02,230][00394] Fps is (10 sec: 3686.4, 60 sec: 2321.2, 300 sec: 2388.2). Total num frames: 10240000. Throughput: 0: 429.3. Samples: 555658. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2023-02-27 12:05:02,256][00394] Avg episode reward: [(0, '17.507')] [2023-02-27 12:05:02,258][37536] Saving new best policy, reward=17.507! [2023-02-27 12:05:07,269][00394] Fps is (10 sec: 2867.2, 60 sec: 2048.0, 300 sec: 2374.3). Total num frames: 10248192. Throughput: 0: 530.2. Samples: 560966. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-27 12:05:07,490][00394] Avg episode reward: [(0, '17.060')] [2023-02-27 12:05:12,230][00394] Fps is (10 sec: 1638.4, 60 sec: 1843.2, 300 sec: 2360.4). Total num frames: 10256384. Throughput: 0: 540.9. Samples: 562980. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2023-02-27 12:05:12,287][00394] Avg episode reward: [(0, '16.625')] [2023-02-27 12:05:14,759][37558] Updated weights for policy 0, policy_version 2505 (0.0128) [2023-02-27 12:05:17,239][00394] Fps is (10 sec: 1228.8, 60 sec: 1843.2, 300 sec: 2304.9). Total num frames: 10260480. Throughput: 0: 536.5. Samples: 564576. Policy #0 lag: (min: 0.0, avg: 1.4, max: 2.0) [2023-02-27 12:05:17,319][00394] Avg episode reward: [(0, '16.303')] [2023-02-27 12:05:22,242][00394] Fps is (10 sec: 1638.4, 60 sec: 1979.7, 300 sec: 2304.9). Total num frames: 10272768. Throughput: 0: 494.4. Samples: 567090. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2023-02-27 12:05:22,441][00394] Avg episode reward: [(0, '16.844')] [2023-02-27 12:05:27,238][00394] Fps is (10 sec: 2867.2, 60 sec: 2252.8, 300 sec: 2346.5). Total num frames: 10289152. Throughput: 0: 482.7. Samples: 570864. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2023-02-27 12:05:27,295][00394] Avg episode reward: [(0, '16.600')] [2023-02-27 12:05:31,385][37558] Updated weights for policy 0, policy_version 2515 (0.0203) [2023-02-27 12:05:32,251][00394] Fps is (10 sec: 2867.1, 60 sec: 2184.5, 300 sec: 2374.3). Total num frames: 10301440. Throughput: 0: 529.7. Samples: 573180. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2023-02-27 12:05:32,450][00394] Avg episode reward: [(0, '14.703')] [2023-02-27 12:05:37,251][00394] Fps is (10 sec: 2045.2, 60 sec: 1979.3, 300 sec: 2388.1). Total num frames: 10309632. Throughput: 0: 600.4. Samples: 576516. Policy #0 lag: (min: 0.0, avg: 1.4, max: 2.0) [2023-02-27 12:05:37,343][00394] Avg episode reward: [(0, '14.857')] [2023-02-27 12:05:42,233][00394] Fps is (10 sec: 1228.6, 60 sec: 1979.7, 300 sec: 2360.4). Total num frames: 10313728. Throughput: 0: 535.1. Samples: 577432. Policy #0 lag: (min: 0.0, avg: 1.4, max: 2.0) [2023-02-27 12:05:42,280][00394] Avg episode reward: [(0, '14.719')] [2023-02-27 12:05:47,230][00394] Fps is (10 sec: 1230.5, 60 sec: 2050.9, 300 sec: 2318.8). Total num frames: 10321920. Throughput: 0: 518.0. Samples: 578966. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2023-02-27 12:05:47,344][00394] Avg episode reward: [(0, '15.214')] [2023-02-27 12:05:52,288][00394] Fps is (10 sec: 1227.9, 60 sec: 2047.7, 300 sec: 2304.8). Total num frames: 10326016. Throughput: 0: 450.3. Samples: 581234. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2023-02-27 12:05:52,480][00394] Avg episode reward: [(0, '14.633')] [2023-02-27 12:05:57,251][00394] Fps is (10 sec: 819.1, 60 sec: 1843.2, 300 sec: 2304.9). Total num frames: 10330112. Throughput: 0: 420.7. Samples: 581910. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2023-02-27 12:05:57,392][00394] Avg episode reward: [(0, '14.383')] [2023-02-27 12:05:57,708][37536] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002523_10334208.pth... [2023-02-27 12:05:58,062][37536] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002389_9785344.pth [2023-02-27 12:06:00,961][37558] Updated weights for policy 0, policy_version 2525 (0.0203) [2023-02-27 12:06:02,244][00394] Fps is (10 sec: 1639.8, 60 sec: 1706.7, 300 sec: 2318.8). Total num frames: 10342400. Throughput: 0: 434.0. Samples: 584106. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2023-02-27 12:06:02,314][00394] Avg episode reward: [(0, '14.580')] [2023-02-27 12:06:07,272][00394] Fps is (10 sec: 2867.4, 60 sec: 1843.2, 300 sec: 2346.5). Total num frames: 10358784. Throughput: 0: 468.8. Samples: 588186. Policy #0 lag: (min: 0.0, avg: 1.4, max: 2.0) [2023-02-27 12:06:07,528][00394] Avg episode reward: [(0, '13.426')] [2023-02-27 12:06:12,231][00394] Fps is (10 sec: 2047.9, 60 sec: 1774.9, 300 sec: 2304.9). Total num frames: 10362880. Throughput: 0: 444.7. Samples: 590874. Policy #0 lag: (min: 0.0, avg: 1.4, max: 2.0) [2023-02-27 12:06:12,385][00394] Avg episode reward: [(0, '13.658')] [2023-02-27 12:06:17,266][00394] Fps is (10 sec: 1637.3, 60 sec: 1911.2, 300 sec: 2290.9). Total num frames: 10375168. Throughput: 0: 416.4. Samples: 591920. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2023-02-27 12:06:17,476][00394] Avg episode reward: [(0, '13.850')] [2023-02-27 12:06:21,965][37558] Updated weights for policy 0, policy_version 2535 (0.0217) [2023-02-27 12:06:22,235][00394] Fps is (10 sec: 2047.9, 60 sec: 1843.2, 300 sec: 2277.1). Total num frames: 10383360. Throughput: 0: 397.3. Samples: 594390. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2023-02-27 12:06:22,288][00394] Avg episode reward: [(0, '13.919')] [2023-02-27 12:06:27,232][00394] Fps is (10 sec: 1639.6, 60 sec: 1706.7, 300 sec: 2291.0). Total num frames: 10391552. Throughput: 0: 437.0. Samples: 597098. Policy #0 lag: (min: 1.0, avg: 1.3, max: 3.0) [2023-02-27 12:06:27,346][00394] Avg episode reward: [(0, '14.468')] [2023-02-27 12:06:32,231][00394] Fps is (10 sec: 2048.2, 60 sec: 1706.7, 300 sec: 2291.0). Total num frames: 10403840. Throughput: 0: 444.1. Samples: 598950. Policy #0 lag: (min: 0.0, avg: 1.6, max: 3.0) [2023-02-27 12:06:32,386][00394] Avg episode reward: [(0, '15.074')] [2023-02-27 12:06:37,000][37536] Signal inference workers to stop experience collection... (100 times) [2023-02-27 12:06:37,038][37558] InferenceWorker_p0-w0: stopping experience collection (100 times) [2023-02-27 12:06:37,217][37536] Signal inference workers to resume experience collection... (100 times) [2023-02-27 12:06:37,217][37558] InferenceWorker_p0-w0: resuming experience collection (100 times) [2023-02-27 12:06:37,353][00394] Fps is (10 sec: 2832.5, 60 sec: 1839.8, 300 sec: 2317.8). Total num frames: 10420224. Throughput: 0: 495.5. Samples: 603586. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2023-02-27 12:06:37,407][00394] Avg episode reward: [(0, '16.229')] [2023-02-27 12:06:39,286][37558] Updated weights for policy 0, policy_version 2545 (0.0243) [2023-02-27 12:06:42,230][00394] Fps is (10 sec: 2457.6, 60 sec: 1911.5, 300 sec: 2277.1). Total num frames: 10428416. Throughput: 0: 547.6. Samples: 606552. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2023-02-27 12:06:42,451][00394] Avg episode reward: [(0, '17.067')] [2023-02-27 12:06:47,234][00394] Fps is (10 sec: 1658.6, 60 sec: 1911.4, 300 sec: 2249.3). Total num frames: 10436608. Throughput: 0: 526.0. Samples: 607776. Policy #0 lag: (min: 0.0, avg: 1.6, max: 3.0) [2023-02-27 12:06:47,440][00394] Avg episode reward: [(0, '16.943')] [2023-02-27 12:06:52,252][00394] Fps is (10 sec: 1228.8, 60 sec: 1911.7, 300 sec: 2249.3). Total num frames: 10440704. Throughput: 0: 478.4. Samples: 609712. Policy #0 lag: (min: 0.0, avg: 1.6, max: 3.0) [2023-02-27 12:06:52,359][00394] Avg episode reward: [(0, '17.516')] [2023-02-27 12:06:54,590][37536] Saving new best policy, reward=17.516! [2023-02-27 12:06:57,254][00394] Fps is (10 sec: 819.3, 60 sec: 1911.5, 300 sec: 2235.5). Total num frames: 10444800. Throughput: 0: 438.0. Samples: 610584. Policy #0 lag: (min: 1.0, avg: 1.1, max: 2.0) [2023-02-27 12:06:57,366][00394] Avg episode reward: [(0, '17.310')] [2023-02-27 12:07:01,855][37558] Updated weights for policy 0, policy_version 2555 (0.0217) [2023-02-27 12:07:02,230][00394] Fps is (10 sec: 2457.6, 60 sec: 2048.0, 300 sec: 2263.2). Total num frames: 10465280. Throughput: 0: 458.3. Samples: 612540. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2023-02-27 12:07:02,249][00394] Avg episode reward: [(0, '18.727')] [2023-02-27 12:07:02,262][37536] Saving new best policy, reward=18.727! [2023-02-27 12:07:07,232][00394] Fps is (10 sec: 4095.1, 60 sec: 2116.2, 300 sec: 2277.1). Total num frames: 10485760. Throughput: 0: 530.3. Samples: 618256. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-27 12:07:07,258][00394] Avg episode reward: [(0, '17.309')] [2023-02-27 12:07:12,230][00394] Fps is (10 sec: 3276.8, 60 sec: 2252.8, 300 sec: 2249.3). Total num frames: 10498048. Throughput: 0: 586.3. Samples: 623482. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:07:12,288][00394] Avg episode reward: [(0, '17.390')] [2023-02-27 12:07:15,359][37558] Updated weights for policy 0, policy_version 2565 (0.0058) [2023-02-27 12:07:17,276][00394] Fps is (10 sec: 2048.2, 60 sec: 2184.7, 300 sec: 2249.3). Total num frames: 10506240. Throughput: 0: 577.5. Samples: 624940. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2023-02-27 12:07:17,419][00394] Avg episode reward: [(0, '17.988')] [2023-02-27 12:07:22,237][00394] Fps is (10 sec: 2048.0, 60 sec: 2252.8, 300 sec: 2249.4). Total num frames: 10518528. Throughput: 0: 543.1. Samples: 627960. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2023-02-27 12:07:22,321][00394] Avg episode reward: [(0, '17.709')] [2023-02-27 12:07:27,236][00394] Fps is (10 sec: 2457.9, 60 sec: 2321.1, 300 sec: 2249.3). Total num frames: 10530816. Throughput: 0: 557.8. Samples: 631654. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2023-02-27 12:07:27,294][00394] Avg episode reward: [(0, '18.021')] [2023-02-27 12:07:30,911][37558] Updated weights for policy 0, policy_version 2575 (0.0187) [2023-02-27 12:07:32,230][00394] Fps is (10 sec: 3276.8, 60 sec: 2457.6, 300 sec: 2277.1). Total num frames: 10551296. Throughput: 0: 578.9. Samples: 633826. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2023-02-27 12:07:32,253][00394] Avg episode reward: [(0, '17.359')] [2023-02-27 12:07:37,230][00394] Fps is (10 sec: 4096.1, 60 sec: 2531.0, 300 sec: 2277.1). Total num frames: 10571776. Throughput: 0: 672.1. Samples: 639956. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-27 12:07:37,233][00394] Avg episode reward: [(0, '17.855')] [2023-02-27 12:07:42,230][00394] Fps is (10 sec: 3276.7, 60 sec: 2594.1, 300 sec: 2263.2). Total num frames: 10584064. Throughput: 0: 764.8. Samples: 645000. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 12:07:42,299][00394] Avg episode reward: [(0, '17.475')] [2023-02-27 12:07:43,880][37558] Updated weights for policy 0, policy_version 2585 (0.0061) [2023-02-27 12:07:47,232][00394] Fps is (10 sec: 2457.6, 60 sec: 2662.5, 300 sec: 2277.1). Total num frames: 10596352. Throughput: 0: 759.0. Samples: 646694. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-02-27 12:07:47,285][00394] Avg episode reward: [(0, '16.812')] [2023-02-27 12:07:52,261][00394] Fps is (10 sec: 2047.5, 60 sec: 2730.6, 300 sec: 2277.1). Total num frames: 10604544. Throughput: 0: 706.1. Samples: 650032. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-02-27 12:07:52,426][00394] Avg episode reward: [(0, '17.453')] [2023-02-27 12:07:57,230][00394] Fps is (10 sec: 2867.2, 60 sec: 3003.7, 300 sec: 2304.9). Total num frames: 10625024. Throughput: 0: 675.6. Samples: 653886. Policy #0 lag: (min: 0.0, avg: 1.6, max: 3.0) [2023-02-27 12:07:57,257][00394] Avg episode reward: [(0, '16.878')] [2023-02-27 12:07:57,339][37536] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002594_10625024.pth... [2023-02-27 12:07:57,679][37536] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002465_10096640.pth [2023-02-27 12:07:59,213][37558] Updated weights for policy 0, policy_version 2595 (0.0163) [2023-02-27 12:08:02,230][00394] Fps is (10 sec: 3687.3, 60 sec: 2935.5, 300 sec: 2318.8). Total num frames: 10641408. Throughput: 0: 702.4. Samples: 656548. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:08:02,263][00394] Avg episode reward: [(0, '17.431')] [2023-02-27 12:08:07,237][00394] Fps is (10 sec: 3276.8, 60 sec: 2867.3, 300 sec: 2304.9). Total num frames: 10657792. Throughput: 0: 761.5. Samples: 662228. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-27 12:08:07,316][00394] Avg episode reward: [(0, '17.632')] [2023-02-27 12:08:11,560][37558] Updated weights for policy 0, policy_version 2605 (0.0043) [2023-02-27 12:08:12,230][00394] Fps is (10 sec: 2867.2, 60 sec: 2867.2, 300 sec: 2291.0). Total num frames: 10670080. Throughput: 0: 760.9. Samples: 665892. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:08:12,277][00394] Avg episode reward: [(0, '17.761')] [2023-02-27 12:08:17,247][00394] Fps is (10 sec: 2048.0, 60 sec: 2867.3, 300 sec: 2291.0). Total num frames: 10678272. Throughput: 0: 755.9. Samples: 667840. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:08:17,406][00394] Avg episode reward: [(0, '18.309')] [2023-02-27 12:08:22,237][00394] Fps is (10 sec: 2048.0, 60 sec: 2867.2, 300 sec: 2291.0). Total num frames: 10690560. Throughput: 0: 704.3. Samples: 671648. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2023-02-27 12:08:22,335][00394] Avg episode reward: [(0, '17.736')] [2023-02-27 12:08:26,763][37558] Updated weights for policy 0, policy_version 2615 (0.0100) [2023-02-27 12:08:27,230][00394] Fps is (10 sec: 3276.8, 60 sec: 3003.7, 300 sec: 2318.8). Total num frames: 10711040. Throughput: 0: 688.1. Samples: 675966. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2023-02-27 12:08:27,252][00394] Avg episode reward: [(0, '17.348')] [2023-02-27 12:08:32,230][00394] Fps is (10 sec: 4096.0, 60 sec: 3003.7, 300 sec: 2332.6). Total num frames: 10731520. Throughput: 0: 720.8. Samples: 679128. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 12:08:32,239][00394] Avg episode reward: [(0, '17.840')] [2023-02-27 12:08:37,233][00394] Fps is (10 sec: 3275.9, 60 sec: 2867.1, 300 sec: 2304.8). Total num frames: 10743808. Throughput: 0: 764.7. Samples: 684444. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:08:37,320][00394] Avg episode reward: [(0, '17.914')] [2023-02-27 12:08:39,723][37558] Updated weights for policy 0, policy_version 2625 (0.0048) [2023-02-27 12:08:42,234][00394] Fps is (10 sec: 2456.5, 60 sec: 2867.0, 300 sec: 2318.7). Total num frames: 10756096. Throughput: 0: 752.6. Samples: 687758. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2023-02-27 12:08:42,291][00394] Avg episode reward: [(0, '17.345')] [2023-02-27 12:08:47,246][00394] Fps is (10 sec: 2458.3, 60 sec: 2867.2, 300 sec: 2332.6). Total num frames: 10768384. Throughput: 0: 735.1. Samples: 689626. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2023-02-27 12:08:47,327][00394] Avg episode reward: [(0, '16.830')] [2023-02-27 12:08:52,230][00394] Fps is (10 sec: 2868.5, 60 sec: 3003.9, 300 sec: 2346.5). Total num frames: 10784768. Throughput: 0: 700.1. Samples: 693734. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2023-02-27 12:08:52,248][00394] Avg episode reward: [(0, '18.350')] [2023-02-27 12:08:53,457][37558] Updated weights for policy 0, policy_version 2635 (0.0101) [2023-02-27 12:08:57,230][00394] Fps is (10 sec: 4096.1, 60 sec: 3072.0, 300 sec: 2402.1). Total num frames: 10809344. Throughput: 0: 758.1. Samples: 700006. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-27 12:08:57,237][00394] Avg episode reward: [(0, '19.344')] [2023-02-27 12:08:57,249][37536] Saving new best policy, reward=19.344! [2023-02-27 12:09:02,244][00394] Fps is (10 sec: 3684.0, 60 sec: 3003.4, 300 sec: 2360.4). Total num frames: 10821632. Throughput: 0: 783.1. Samples: 703086. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-27 12:09:02,429][00394] Avg episode reward: [(0, '19.916')] [2023-02-27 12:09:04,015][37536] Saving new best policy, reward=19.916! [2023-02-27 12:09:07,052][37558] Updated weights for policy 0, policy_version 2645 (0.0054) [2023-02-27 12:09:07,230][00394] Fps is (10 sec: 2457.6, 60 sec: 2935.5, 300 sec: 2332.6). Total num frames: 10833920. Throughput: 0: 770.9. Samples: 706338. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:09:07,289][00394] Avg episode reward: [(0, '19.043')] [2023-02-27 12:09:12,237][00394] Fps is (10 sec: 2459.1, 60 sec: 2935.4, 300 sec: 2360.4). Total num frames: 10846208. Throughput: 0: 752.4. Samples: 709824. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-02-27 12:09:12,324][00394] Avg episode reward: [(0, '19.273')] [2023-02-27 12:09:17,234][00394] Fps is (10 sec: 2867.2, 60 sec: 3072.0, 300 sec: 2402.1). Total num frames: 10862592. Throughput: 0: 723.9. Samples: 711702. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-02-27 12:09:17,272][00394] Avg episode reward: [(0, '19.708')] [2023-02-27 12:09:20,132][37558] Updated weights for policy 0, policy_version 2655 (0.0072) [2023-02-27 12:09:22,230][00394] Fps is (10 sec: 3686.6, 60 sec: 3208.5, 300 sec: 2471.5). Total num frames: 10883072. Throughput: 0: 735.1. Samples: 717522. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-27 12:09:22,233][00394] Avg episode reward: [(0, '19.667')] [2023-02-27 12:09:27,235][00394] Fps is (10 sec: 3275.1, 60 sec: 3071.7, 300 sec: 2457.6). Total num frames: 10895360. Throughput: 0: 787.5. Samples: 723196. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-27 12:09:27,517][00394] Avg episode reward: [(0, '18.430')] [2023-02-27 12:09:32,232][00394] Fps is (10 sec: 2457.6, 60 sec: 2935.5, 300 sec: 2429.8). Total num frames: 10907648. Throughput: 0: 775.2. Samples: 724512. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:09:32,267][00394] Avg episode reward: [(0, '18.702')] [2023-02-27 12:09:35,710][37558] Updated weights for policy 0, policy_version 2665 (0.0043) [2023-02-27 12:09:37,244][00394] Fps is (10 sec: 2049.0, 60 sec: 2867.3, 300 sec: 2443.7). Total num frames: 10915840. Throughput: 0: 761.6. Samples: 728006. Policy #0 lag: (min: 0.0, avg: 0.6, max: 3.0) [2023-02-27 12:09:37,412][00394] Avg episode reward: [(0, '18.008')] [2023-02-27 12:09:42,230][00394] Fps is (10 sec: 2867.2, 60 sec: 3004.0, 300 sec: 2500.0). Total num frames: 10936320. Throughput: 0: 702.9. Samples: 731636. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2023-02-27 12:09:42,256][00394] Avg episode reward: [(0, '18.610')] [2023-02-27 12:09:47,065][37558] Updated weights for policy 0, policy_version 2675 (0.0112) [2023-02-27 12:09:47,230][00394] Fps is (10 sec: 4096.1, 60 sec: 3140.3, 300 sec: 2554.8). Total num frames: 10956800. Throughput: 0: 703.0. Samples: 734716. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:09:47,233][00394] Avg episode reward: [(0, '19.662')] [2023-02-27 12:09:52,235][00394] Fps is (10 sec: 3686.0, 60 sec: 3140.2, 300 sec: 2554.8). Total num frames: 10973184. Throughput: 0: 777.0. Samples: 741302. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:09:52,332][00394] Avg episode reward: [(0, '19.234')] [2023-02-27 12:09:57,230][00394] Fps is (10 sec: 2457.6, 60 sec: 2867.2, 300 sec: 2513.1). Total num frames: 10981376. Throughput: 0: 771.5. Samples: 744542. Policy #0 lag: (min: 1.0, avg: 1.0, max: 2.0) [2023-02-27 12:09:57,265][00394] Avg episode reward: [(0, '20.313')] [2023-02-27 12:09:58,628][37536] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002682_10985472.pth... [2023-02-27 12:09:59,229][37536] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002523_10334208.pth [2023-02-27 12:09:59,237][37536] Saving new best policy, reward=20.313! [2023-02-27 12:10:02,236][00394] Fps is (10 sec: 1638.5, 60 sec: 2799.2, 300 sec: 2513.1). Total num frames: 10989568. Throughput: 0: 759.6. Samples: 745882. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2023-02-27 12:10:02,404][00394] Avg episode reward: [(0, '20.217')] [2023-02-27 12:10:05,263][37558] Updated weights for policy 0, policy_version 2685 (0.0140) [2023-02-27 12:10:07,237][00394] Fps is (10 sec: 2048.0, 60 sec: 2798.9, 300 sec: 2527.0). Total num frames: 11001856. Throughput: 0: 692.2. Samples: 748672. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2023-02-27 12:10:07,295][00394] Avg episode reward: [(0, '20.475')] [2023-02-27 12:10:07,374][37536] Saving new best policy, reward=20.475! [2023-02-27 12:10:12,230][00394] Fps is (10 sec: 2867.2, 60 sec: 2867.2, 300 sec: 2568.7). Total num frames: 11018240. Throughput: 0: 652.8. Samples: 752570. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2023-02-27 12:10:12,254][00394] Avg episode reward: [(0, '21.306')] [2023-02-27 12:10:12,258][37536] Saving new best policy, reward=21.306! [2023-02-27 12:10:17,139][37558] Updated weights for policy 0, policy_version 2695 (0.0051) [2023-02-27 12:10:17,231][00394] Fps is (10 sec: 3686.3, 60 sec: 2935.5, 300 sec: 2596.4). Total num frames: 11038720. Throughput: 0: 681.8. Samples: 755192. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:10:17,260][00394] Avg episode reward: [(0, '22.054')] [2023-02-27 12:10:17,287][37536] Saving new best policy, reward=22.054! [2023-02-27 12:10:22,231][00394] Fps is (10 sec: 2457.3, 60 sec: 2662.3, 300 sec: 2554.8). Total num frames: 11042816. Throughput: 0: 713.9. Samples: 760130. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:10:22,399][00394] Avg episode reward: [(0, '21.832')] [2023-02-27 12:10:27,234][00394] Fps is (10 sec: 1638.1, 60 sec: 2662.5, 300 sec: 2554.8). Total num frames: 11055104. Throughput: 0: 695.3. Samples: 762924. Policy #0 lag: (min: 0.0, avg: 1.5, max: 2.0) [2023-02-27 12:10:27,266][00394] Avg episode reward: [(0, '22.201')] [2023-02-27 12:10:27,361][37536] Saving new best policy, reward=22.201! [2023-02-27 12:10:32,235][00394] Fps is (10 sec: 2047.2, 60 sec: 2593.9, 300 sec: 2554.9). Total num frames: 11063296. Throughput: 0: 658.6. Samples: 764356. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2023-02-27 12:10:32,402][00394] Avg episode reward: [(0, '21.700')] [2023-02-27 12:10:37,244][00394] Fps is (10 sec: 2048.4, 60 sec: 2662.4, 300 sec: 2582.6). Total num frames: 11075584. Throughput: 0: 588.6. Samples: 767788. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2023-02-27 12:10:37,421][00394] Avg episode reward: [(0, '22.517')] [2023-02-27 12:10:37,845][37536] Saving new best policy, reward=22.517! [2023-02-27 12:10:37,848][37558] Updated weights for policy 0, policy_version 2705 (0.0213) [2023-02-27 12:10:42,230][00394] Fps is (10 sec: 3278.5, 60 sec: 2662.4, 300 sec: 2624.2). Total num frames: 11096064. Throughput: 0: 607.6. Samples: 771882. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2023-02-27 12:10:42,255][00394] Avg episode reward: [(0, '21.448')] [2023-02-27 12:10:47,230][00394] Fps is (10 sec: 4096.1, 60 sec: 2662.4, 300 sec: 2679.8). Total num frames: 11116544. Throughput: 0: 645.2. Samples: 774914. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-27 12:10:47,236][00394] Avg episode reward: [(0, '20.766')] [2023-02-27 12:10:47,964][37558] Updated weights for policy 0, policy_version 2715 (0.0050) [2023-02-27 12:10:52,230][00394] Fps is (10 sec: 3276.8, 60 sec: 2594.2, 300 sec: 2707.5). Total num frames: 11128832. Throughput: 0: 708.7. Samples: 780562. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:10:52,287][00394] Avg episode reward: [(0, '20.186')] [2023-02-27 12:10:57,255][00394] Fps is (10 sec: 2047.7, 60 sec: 2594.1, 300 sec: 2693.6). Total num frames: 11137024. Throughput: 0: 692.0. Samples: 783710. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-27 12:10:57,577][00394] Avg episode reward: [(0, '20.531')] [2023-02-27 12:11:02,230][00394] Fps is (10 sec: 2457.6, 60 sec: 2730.7, 300 sec: 2693.6). Total num frames: 11153408. Throughput: 0: 664.2. Samples: 785080. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) [2023-02-27 12:11:02,292][00394] Avg episode reward: [(0, '20.263')] [2023-02-27 12:11:05,077][37558] Updated weights for policy 0, policy_version 2725 (0.0135) [2023-02-27 12:11:07,230][00394] Fps is (10 sec: 3277.3, 60 sec: 2798.9, 300 sec: 2735.3). Total num frames: 11169792. Throughput: 0: 647.6. Samples: 789272. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2023-02-27 12:11:07,250][00394] Avg episode reward: [(0, '20.119')] [2023-02-27 12:11:12,230][00394] Fps is (10 sec: 3686.4, 60 sec: 2867.2, 300 sec: 2763.1). Total num frames: 11190272. Throughput: 0: 729.7. Samples: 795758. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:11:12,239][00394] Avg episode reward: [(0, '21.085')] [2023-02-27 12:11:16,954][37558] Updated weights for policy 0, policy_version 2735 (0.0043) [2023-02-27 12:11:17,231][00394] Fps is (10 sec: 3276.3, 60 sec: 2730.6, 300 sec: 2776.9). Total num frames: 11202560. Throughput: 0: 760.5. Samples: 798576. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 12:11:17,291][00394] Avg episode reward: [(0, '21.750')] [2023-02-27 12:11:22,245][00394] Fps is (10 sec: 2456.8, 60 sec: 2867.1, 300 sec: 2790.8). Total num frames: 11214848. Throughput: 0: 755.1. Samples: 801770. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2023-02-27 12:11:22,295][00394] Avg episode reward: [(0, '22.248')] [2023-02-27 12:11:27,238][00394] Fps is (10 sec: 2048.3, 60 sec: 2799.0, 300 sec: 2776.9). Total num frames: 11223040. Throughput: 0: 737.6. Samples: 805074. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-27 12:11:27,339][00394] Avg episode reward: [(0, '22.463')] [2023-02-27 12:11:32,231][00394] Fps is (10 sec: 2458.5, 60 sec: 2935.7, 300 sec: 2778.1). Total num frames: 11239424. Throughput: 0: 708.9. Samples: 806814. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2023-02-27 12:11:32,260][00394] Avg episode reward: [(0, '23.549')] [2023-02-27 12:11:32,275][37536] Saving new best policy, reward=23.549! [2023-02-27 12:11:33,843][37558] Updated weights for policy 0, policy_version 2745 (0.0112) [2023-02-27 12:11:37,230][00394] Fps is (10 sec: 3276.9, 60 sec: 3003.7, 300 sec: 2804.7). Total num frames: 11255808. Throughput: 0: 686.0. Samples: 811430. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-27 12:11:37,257][00394] Avg episode reward: [(0, '23.536')] [2023-02-27 12:11:42,235][00394] Fps is (10 sec: 3275.0, 60 sec: 2935.2, 300 sec: 2832.4). Total num frames: 11272192. Throughput: 0: 740.7. Samples: 817044. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-27 12:11:42,279][00394] Avg episode reward: [(0, '23.766')] [2023-02-27 12:11:42,288][37536] Saving new best policy, reward=23.766! [2023-02-27 12:11:46,853][37558] Updated weights for policy 0, policy_version 2755 (0.0061) [2023-02-27 12:11:47,230][00394] Fps is (10 sec: 2867.2, 60 sec: 2798.9, 300 sec: 2860.3). Total num frames: 11284480. Throughput: 0: 745.7. Samples: 818636. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-27 12:11:47,261][00394] Avg episode reward: [(0, '23.584')] [2023-02-27 12:11:52,238][00394] Fps is (10 sec: 2049.1, 60 sec: 2730.7, 300 sec: 2874.1). Total num frames: 11292672. Throughput: 0: 726.0. Samples: 821940. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-02-27 12:11:52,328][00394] Avg episode reward: [(0, '22.751')] [2023-02-27 12:11:57,232][00394] Fps is (10 sec: 2048.0, 60 sec: 2799.0, 300 sec: 2846.4). Total num frames: 11304960. Throughput: 0: 663.1. Samples: 825596. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2023-02-27 12:11:57,321][00394] Avg episode reward: [(0, '23.142')] [2023-02-27 12:11:57,890][37536] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002761_11309056.pth... [2023-02-27 12:11:58,228][37536] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002594_10625024.pth [2023-02-27 12:12:01,596][37558] Updated weights for policy 0, policy_version 2765 (0.0155) [2023-02-27 12:12:02,230][00394] Fps is (10 sec: 3276.7, 60 sec: 2867.2, 300 sec: 2846.4). Total num frames: 11325440. Throughput: 0: 648.6. Samples: 827764. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2023-02-27 12:12:02,250][00394] Avg episode reward: [(0, '22.490')] [2023-02-27 12:12:07,232][00394] Fps is (10 sec: 4096.0, 60 sec: 2935.5, 300 sec: 2874.1). Total num frames: 11345920. Throughput: 0: 712.1. Samples: 833810. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:12:07,272][00394] Avg episode reward: [(0, '21.696')] [2023-02-27 12:12:12,230][00394] Fps is (10 sec: 2867.2, 60 sec: 2730.7, 300 sec: 2874.2). Total num frames: 11354112. Throughput: 0: 721.5. Samples: 837540. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:12:12,262][00394] Avg episode reward: [(0, '21.019')] [2023-02-27 12:12:17,217][37558] Updated weights for policy 0, policy_version 2775 (0.0123) [2023-02-27 12:12:17,230][00394] Fps is (10 sec: 2048.0, 60 sec: 2730.7, 300 sec: 2874.1). Total num frames: 11366400. Throughput: 0: 718.6. Samples: 839152. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2023-02-27 12:12:17,344][00394] Avg episode reward: [(0, '20.491')] [2023-02-27 12:12:22,230][00394] Fps is (10 sec: 2457.7, 60 sec: 2730.8, 300 sec: 2874.1). Total num frames: 11378688. Throughput: 0: 692.0. Samples: 842572. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2023-02-27 12:12:22,277][00394] Avg episode reward: [(0, '21.273')] [2023-02-27 12:12:27,230][00394] Fps is (10 sec: 3276.7, 60 sec: 2935.5, 300 sec: 2874.1). Total num frames: 11399168. Throughput: 0: 677.9. Samples: 847544. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:12:27,244][00394] Avg episode reward: [(0, '21.987')] [2023-02-27 12:12:29,017][37558] Updated weights for policy 0, policy_version 2785 (0.0040) [2023-02-27 12:12:32,230][00394] Fps is (10 sec: 4096.0, 60 sec: 3003.7, 300 sec: 2874.1). Total num frames: 11419648. Throughput: 0: 715.3. Samples: 850826. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:12:32,232][00394] Avg episode reward: [(0, '21.452')] [2023-02-27 12:12:37,230][00394] Fps is (10 sec: 2867.1, 60 sec: 2867.2, 300 sec: 2860.3). Total num frames: 11427840. Throughput: 0: 755.7. Samples: 855948. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:12:37,359][00394] Avg episode reward: [(0, '21.622')] [2023-02-27 12:12:42,231][00394] Fps is (10 sec: 2047.9, 60 sec: 2799.2, 300 sec: 2860.3). Total num frames: 11440128. Throughput: 0: 742.9. Samples: 859026. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2023-02-27 12:12:42,336][00394] Avg episode reward: [(0, '22.700')] [2023-02-27 12:12:46,008][37558] Updated weights for policy 0, policy_version 2795 (0.0102) [2023-02-27 12:12:47,253][00394] Fps is (10 sec: 2047.9, 60 sec: 2730.6, 300 sec: 2860.3). Total num frames: 11448320. Throughput: 0: 726.5. Samples: 860458. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2023-02-27 12:12:47,380][00394] Avg episode reward: [(0, '23.265')] [2023-02-27 12:12:52,232][00394] Fps is (10 sec: 2867.4, 60 sec: 2935.5, 300 sec: 2860.3). Total num frames: 11468800. Throughput: 0: 669.6. Samples: 863940. Policy #0 lag: (min: 1.0, avg: 1.3, max: 3.0) [2023-02-27 12:12:52,258][00394] Avg episode reward: [(0, '24.148')] [2023-02-27 12:12:52,261][37536] Saving new best policy, reward=24.148! [2023-02-27 12:12:57,231][00394] Fps is (10 sec: 3277.0, 60 sec: 2935.5, 300 sec: 2846.4). Total num frames: 11481088. Throughput: 0: 695.8. Samples: 868850. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 12:12:57,265][00394] Avg episode reward: [(0, '24.112')] [2023-02-27 12:12:58,933][37558] Updated weights for policy 0, policy_version 2805 (0.0067) [2023-02-27 12:13:02,241][00394] Fps is (10 sec: 2457.5, 60 sec: 2798.9, 300 sec: 2832.5). Total num frames: 11493376. Throughput: 0: 716.6. Samples: 871400. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:13:02,347][00394] Avg episode reward: [(0, '23.611')] [2023-02-27 12:13:07,237][00394] Fps is (10 sec: 2457.6, 60 sec: 2662.4, 300 sec: 2832.5). Total num frames: 11505664. Throughput: 0: 715.3. Samples: 874762. Policy #0 lag: (min: 1.0, avg: 1.0, max: 2.0) [2023-02-27 12:13:07,292][00394] Avg episode reward: [(0, '25.169')] [2023-02-27 12:13:07,370][37536] Saving new best policy, reward=25.169! [2023-02-27 12:13:12,230][00394] Fps is (10 sec: 2048.0, 60 sec: 2662.4, 300 sec: 2832.5). Total num frames: 11513856. Throughput: 0: 668.5. Samples: 877628. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2023-02-27 12:13:12,319][00394] Avg episode reward: [(0, '24.543')] [2023-02-27 12:13:17,243][00394] Fps is (10 sec: 2048.0, 60 sec: 2662.4, 300 sec: 2832.5). Total num frames: 11526144. Throughput: 0: 632.4. Samples: 879286. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2023-02-27 12:13:17,293][00394] Avg episode reward: [(0, '24.189')] [2023-02-27 12:13:18,746][37558] Updated weights for policy 0, policy_version 2815 (0.0221) [2023-02-27 12:13:22,230][00394] Fps is (10 sec: 2867.2, 60 sec: 2730.7, 300 sec: 2818.6). Total num frames: 11542528. Throughput: 0: 600.2. Samples: 882958. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2023-02-27 12:13:22,260][00394] Avg episode reward: [(0, '23.625')] [2023-02-27 12:13:27,230][00394] Fps is (10 sec: 3276.8, 60 sec: 2662.4, 300 sec: 2804.7). Total num frames: 11558912. Throughput: 0: 651.5. Samples: 888344. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:13:27,236][00394] Avg episode reward: [(0, '22.528')] [2023-02-27 12:13:32,242][00394] Fps is (10 sec: 2456.0, 60 sec: 2457.3, 300 sec: 2790.8). Total num frames: 11567104. Throughput: 0: 664.5. Samples: 890364. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:13:32,372][00394] Avg episode reward: [(0, '22.932')] [2023-02-27 12:13:33,528][37558] Updated weights for policy 0, policy_version 2825 (0.0043) [2023-02-27 12:13:37,243][00394] Fps is (10 sec: 2046.5, 60 sec: 2525.6, 300 sec: 2790.8). Total num frames: 11579392. Throughput: 0: 645.3. Samples: 892982. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2023-02-27 12:13:37,289][00394] Avg episode reward: [(0, '23.593')] [2023-02-27 12:13:42,231][00394] Fps is (10 sec: 2049.1, 60 sec: 2457.6, 300 sec: 2776.9). Total num frames: 11587584. Throughput: 0: 607.8. Samples: 896202. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2023-02-27 12:13:42,304][00394] Avg episode reward: [(0, '22.710')] [2023-02-27 12:13:47,230][00394] Fps is (10 sec: 2869.3, 60 sec: 2662.4, 300 sec: 2790.8). Total num frames: 11608064. Throughput: 0: 588.6. Samples: 897888. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2023-02-27 12:13:47,260][00394] Avg episode reward: [(0, '22.285')] [2023-02-27 12:13:48,280][37558] Updated weights for policy 0, policy_version 2835 (0.0089) [2023-02-27 12:13:52,230][00394] Fps is (10 sec: 4096.4, 60 sec: 2662.4, 300 sec: 2776.9). Total num frames: 11628544. Throughput: 0: 647.2. Samples: 903888. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:13:52,232][00394] Avg episode reward: [(0, '22.378')] [2023-02-27 12:13:57,237][00394] Fps is (10 sec: 2865.1, 60 sec: 2593.8, 300 sec: 2763.1). Total num frames: 11636736. Throughput: 0: 700.9. Samples: 909174. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:13:57,329][00394] Avg episode reward: [(0, '22.876')] [2023-02-27 12:13:58,221][37536] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002842_11640832.pth... [2023-02-27 12:13:58,669][37536] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002682_10985472.pth [2023-02-27 12:14:02,230][00394] Fps is (10 sec: 2048.0, 60 sec: 2594.1, 300 sec: 2763.1). Total num frames: 11649024. Throughput: 0: 687.6. Samples: 910228. Policy #0 lag: (min: 1.0, avg: 1.0, max: 2.0) [2023-02-27 12:14:02,302][00394] Avg episode reward: [(0, '22.813')] [2023-02-27 12:14:04,280][37558] Updated weights for policy 0, policy_version 2845 (0.0074) [2023-02-27 12:14:07,236][00394] Fps is (10 sec: 2049.4, 60 sec: 2525.9, 300 sec: 2749.2). Total num frames: 11657216. Throughput: 0: 676.0. Samples: 913380. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2023-02-27 12:14:07,293][00394] Avg episode reward: [(0, '22.379')] [2023-02-27 12:14:12,230][00394] Fps is (10 sec: 2867.2, 60 sec: 2730.7, 300 sec: 2763.1). Total num frames: 11677696. Throughput: 0: 648.5. Samples: 917528. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2023-02-27 12:14:12,254][00394] Avg episode reward: [(0, '22.108')] [2023-02-27 12:14:15,729][37558] Updated weights for policy 0, policy_version 2855 (0.0057) [2023-02-27 12:14:17,230][00394] Fps is (10 sec: 4096.1, 60 sec: 2867.2, 300 sec: 2763.1). Total num frames: 11698176. Throughput: 0: 672.5. Samples: 920624. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:14:17,234][00394] Avg episode reward: [(0, '23.253')] [2023-02-27 12:14:22,232][00394] Fps is (10 sec: 3685.6, 60 sec: 2867.1, 300 sec: 2777.0). Total num frames: 11714560. Throughput: 0: 755.5. Samples: 926974. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:14:22,272][00394] Avg episode reward: [(0, '21.881')] [2023-02-27 12:14:27,243][00394] Fps is (10 sec: 2457.6, 60 sec: 2730.7, 300 sec: 2763.1). Total num frames: 11722752. Throughput: 0: 753.6. Samples: 930112. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:14:27,354][00394] Avg episode reward: [(0, '22.076')] [2023-02-27 12:14:32,239][37558] Updated weights for policy 0, policy_version 2865 (0.0133) [2023-02-27 12:14:32,230][00394] Fps is (10 sec: 1638.7, 60 sec: 2730.9, 300 sec: 2763.1). Total num frames: 11730944. Throughput: 0: 748.7. Samples: 931580. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2023-02-27 12:14:32,345][00394] Avg episode reward: [(0, '22.293')] [2023-02-27 12:14:37,241][00394] Fps is (10 sec: 2457.6, 60 sec: 2799.3, 300 sec: 2749.2). Total num frames: 11747328. Throughput: 0: 685.7. Samples: 934744. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2023-02-27 12:14:37,305][00394] Avg episode reward: [(0, '22.710')] [2023-02-27 12:14:42,230][00394] Fps is (10 sec: 3686.6, 60 sec: 3003.8, 300 sec: 2749.2). Total num frames: 11767808. Throughput: 0: 678.6. Samples: 939708. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2023-02-27 12:14:42,256][00394] Avg episode reward: [(0, '21.924')] [2023-02-27 12:14:44,122][37558] Updated weights for policy 0, policy_version 2875 (0.0078) [2023-02-27 12:14:47,230][00394] Fps is (10 sec: 4096.0, 60 sec: 3003.7, 300 sec: 2763.1). Total num frames: 11788288. Throughput: 0: 725.4. Samples: 942872. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:14:47,232][00394] Avg episode reward: [(0, '21.549')] [2023-02-27 12:14:52,253][00394] Fps is (10 sec: 2864.2, 60 sec: 2798.5, 300 sec: 2763.0). Total num frames: 11796480. Throughput: 0: 769.0. Samples: 947992. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:14:52,389][00394] Avg episode reward: [(0, '22.595')] [2023-02-27 12:14:57,246][00394] Fps is (10 sec: 2047.1, 60 sec: 2867.3, 300 sec: 2776.9). Total num frames: 11808768. Throughput: 0: 746.7. Samples: 951132. Policy #0 lag: (min: 0.0, avg: 0.9, max: 1.0) [2023-02-27 12:14:57,360][00394] Avg episode reward: [(0, '22.630')] [2023-02-27 12:15:00,340][37558] Updated weights for policy 0, policy_version 2885 (0.0085) [2023-02-27 12:15:02,239][00394] Fps is (10 sec: 2049.8, 60 sec: 2798.9, 300 sec: 2763.0). Total num frames: 11816960. Throughput: 0: 714.1. Samples: 952758. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2023-02-27 12:15:02,345][00394] Avg episode reward: [(0, '22.280')] [2023-02-27 12:15:07,234][00394] Fps is (10 sec: 2868.5, 60 sec: 3003.8, 300 sec: 2776.9). Total num frames: 11837440. Throughput: 0: 653.3. Samples: 956370. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2023-02-27 12:15:07,272][00394] Avg episode reward: [(0, '23.154')] [2023-02-27 12:15:11,839][37558] Updated weights for policy 0, policy_version 2895 (0.0096) [2023-02-27 12:15:12,230][00394] Fps is (10 sec: 4096.8, 60 sec: 3003.7, 300 sec: 2777.0). Total num frames: 11857920. Throughput: 0: 721.1. Samples: 962560. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:15:12,232][00394] Avg episode reward: [(0, '24.079')] [2023-02-27 12:15:17,233][00394] Fps is (10 sec: 3275.7, 60 sec: 2867.0, 300 sec: 2804.7). Total num frames: 11870208. Throughput: 0: 762.8. Samples: 965910. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:15:17,343][00394] Avg episode reward: [(0, '23.756')] [2023-02-27 12:15:22,232][00394] Fps is (10 sec: 2866.6, 60 sec: 2867.2, 300 sec: 2818.6). Total num frames: 11886592. Throughput: 0: 762.5. Samples: 969058. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-02-27 12:15:22,284][00394] Avg episode reward: [(0, '22.848')] [2023-02-27 12:15:27,230][00394] Fps is (10 sec: 2458.4, 60 sec: 2867.2, 300 sec: 2818.7). Total num frames: 11894784. Throughput: 0: 732.0. Samples: 972650. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2023-02-27 12:15:27,282][00394] Avg episode reward: [(0, '21.104')] [2023-02-27 12:15:29,409][37558] Updated weights for policy 0, policy_version 2905 (0.0107) [2023-02-27 12:15:32,230][00394] Fps is (10 sec: 2458.1, 60 sec: 3003.8, 300 sec: 2832.5). Total num frames: 11911168. Throughput: 0: 701.0. Samples: 974416. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2023-02-27 12:15:32,261][00394] Avg episode reward: [(0, '21.612')] [2023-02-27 12:15:37,230][00394] Fps is (10 sec: 3686.4, 60 sec: 3072.0, 300 sec: 2832.5). Total num frames: 11931648. Throughput: 0: 702.6. Samples: 979602. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:15:37,238][00394] Avg episode reward: [(0, '20.428')] [2023-02-27 12:15:38,923][37558] Updated weights for policy 0, policy_version 2915 (0.0036) [2023-02-27 12:15:42,251][00394] Fps is (10 sec: 3686.4, 60 sec: 3003.7, 300 sec: 2818.6). Total num frames: 11948032. Throughput: 0: 776.9. Samples: 986088. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:15:42,400][00394] Avg episode reward: [(0, '19.594')] [2023-02-27 12:15:47,239][00394] Fps is (10 sec: 2457.6, 60 sec: 2798.9, 300 sec: 2804.7). Total num frames: 11956224. Throughput: 0: 773.4. Samples: 987558. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:15:47,341][00394] Avg episode reward: [(0, '20.279')] [2023-02-27 12:15:52,250][00394] Fps is (10 sec: 1638.2, 60 sec: 2799.4, 300 sec: 2804.7). Total num frames: 11964416. Throughput: 0: 753.7. Samples: 990286. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2023-02-27 12:15:52,455][00394] Avg episode reward: [(0, '20.935')] [2023-02-27 12:15:57,249][00394] Fps is (10 sec: 2047.5, 60 sec: 2799.0, 300 sec: 2790.8). Total num frames: 11976704. Throughput: 0: 692.0. Samples: 993700. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2023-02-27 12:15:57,425][00394] Avg episode reward: [(0, '20.689')] [2023-02-27 12:15:57,648][37558] Updated weights for policy 0, policy_version 2925 (0.0175) [2023-02-27 12:15:57,647][37536] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002925_11980800.pth... [2023-02-27 12:15:58,077][37536] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002761_11309056.pth [2023-02-27 12:16:02,230][00394] Fps is (10 sec: 2867.5, 60 sec: 2935.6, 300 sec: 2790.8). Total num frames: 11993088. Throughput: 0: 658.8. Samples: 995552. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) [2023-02-27 12:16:02,267][00394] Avg episode reward: [(0, '22.434')] [2023-02-27 12:16:04,345][37536] Stopping Batcher_0... [2023-02-27 12:16:04,346][37536] Loop batcher_evt_loop terminating... [2023-02-27 12:16:04,352][37536] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002931_12005376.pth... [2023-02-27 12:16:04,424][37558] Weights refcount: 2 0 [2023-02-27 12:16:04,437][00394] Component Batcher_0 stopped! [2023-02-27 12:16:04,451][37558] Stopping InferenceWorker_p0-w0... [2023-02-27 12:16:04,451][37558] Loop inference_proc0-0_evt_loop terminating... [2023-02-27 12:16:04,451][00394] Component InferenceWorker_p0-w0 stopped! [2023-02-27 12:16:04,648][37536] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002842_11640832.pth [2023-02-27 12:16:04,651][37536] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002931_12005376.pth... [2023-02-27 12:16:04,811][37536] Stopping LearnerWorker_p0... [2023-02-27 12:16:04,812][00394] Component LearnerWorker_p0 stopped! [2023-02-27 12:16:04,815][37536] Loop learner_proc0_evt_loop terminating... [2023-02-27 12:16:04,850][00394] Component RolloutWorker_w1 stopped! [2023-02-27 12:16:04,853][37555] Stopping RolloutWorker_w1... [2023-02-27 12:16:04,857][37555] Loop rollout_proc1_evt_loop terminating... [2023-02-27 12:16:04,860][00394] Component RolloutWorker_w5 stopped! [2023-02-27 12:16:04,863][00394] Component RolloutWorker_w0 stopped! [2023-02-27 12:16:04,866][00394] Component RolloutWorker_w3 stopped! [2023-02-27 12:16:04,864][37569] Stopping RolloutWorker_w5... [2023-02-27 12:16:04,866][37556] Stopping RolloutWorker_w3... [2023-02-27 12:16:04,870][37556] Loop rollout_proc3_evt_loop terminating... [2023-02-27 12:16:04,870][00394] Component RolloutWorker_w6 stopped! [2023-02-27 12:16:04,872][37569] Loop rollout_proc5_evt_loop terminating... [2023-02-27 12:16:04,874][37576] Stopping RolloutWorker_w7... [2023-02-27 12:16:04,875][37576] Loop rollout_proc7_evt_loop terminating... [2023-02-27 12:16:04,874][00394] Component RolloutWorker_w7 stopped! [2023-02-27 12:16:04,888][37566] Stopping RolloutWorker_w2... [2023-02-27 12:16:04,886][00394] Component RolloutWorker_w4 stopped! [2023-02-27 12:16:04,893][00394] Component RolloutWorker_w2 stopped! [2023-02-27 12:16:04,897][00394] Waiting for process learner_proc0 to stop... [2023-02-27 12:16:04,873][37577] Stopping RolloutWorker_w6... [2023-02-27 12:16:04,901][37577] Loop rollout_proc6_evt_loop terminating... [2023-02-27 12:16:04,886][37564] Stopping RolloutWorker_w4... [2023-02-27 12:16:04,909][37564] Loop rollout_proc4_evt_loop terminating... [2023-02-27 12:16:04,861][37554] Stopping RolloutWorker_w0... [2023-02-27 12:16:04,910][37554] Loop rollout_proc0_evt_loop terminating... [2023-02-27 12:16:04,889][37566] Loop rollout_proc2_evt_loop terminating... [2023-02-27 12:16:08,387][00394] Waiting for process inference_proc0-0 to join... [2023-02-27 12:16:08,389][00394] Waiting for process rollout_proc0 to join... [2023-02-27 12:16:08,393][00394] Waiting for process rollout_proc1 to join... [2023-02-27 12:16:08,401][00394] Waiting for process rollout_proc2 to join... [2023-02-27 12:16:08,402][00394] Waiting for process rollout_proc3 to join... [2023-02-27 12:16:08,403][00394] Waiting for process rollout_proc4 to join... [2023-02-27 12:16:08,408][00394] Waiting for process rollout_proc5 to join... [2023-02-27 12:16:08,413][00394] Waiting for process rollout_proc6 to join... [2023-02-27 12:16:08,415][00394] Waiting for process rollout_proc7 to join... [2023-02-27 12:16:08,417][00394] Batcher 0 profile tree view: batching: 42.4838, releasing_batches: 0.1596 [2023-02-27 12:16:08,420][00394] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0048 wait_policy_total: 440.8138 update_model: 25.7197 weight_update: 0.0126 one_step: 0.0091 handle_policy_step: 940.2608 deserialize: 28.1715, stack: 6.2853, obs_to_device_normalize: 193.5817, forward: 497.4783, send_messages: 49.5014 prepare_outputs: 118.2842 to_cpu: 66.6327 [2023-02-27 12:16:08,422][00394] Learner 0 profile tree view: misc: 0.0056, prepare_batch: 256.8947 train: 593.2102 epoch_init: 0.0087, minibatch_init: 0.0101, losses_postprocess: 1.9653, kl_divergence: 5.0174, after_optimizer: 26.6448 calculate_losses: 239.1089 losses_init: 0.0090, forward_head: 26.4547, bptt_initial: 125.7768, tail: 22.5763, advantages_returns: 7.1498, losses: 48.1635 bptt: 8.2512 bptt_forward_core: 8.1628 update: 295.4818 clip: 27.4529 [2023-02-27 12:16:08,424][00394] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.5821, enqueue_policy_requests: 138.7012, env_step: 934.1600, overhead: 66.7653, complete_rollouts: 8.0681 save_policy_outputs: 52.9794 split_output_tensors: 25.8993 [2023-02-27 12:16:08,426][00394] RolloutWorker_w7 profile tree view: wait_for_trajectories: 0.8920, enqueue_policy_requests: 149.2018, env_step: 940.7048, overhead: 67.3373, complete_rollouts: 5.6190 save_policy_outputs: 55.9852 split_output_tensors: 26.3858 [2023-02-27 12:16:08,434][00394] Loop Runner_EvtLoop terminating... [2023-02-27 12:16:08,438][00394] Runner profile tree view: main_loop: 1567.2632 [2023-02-27 12:16:08,439][00394] Collected {0: 12005376}, FPS: 2550.7 [2023-02-27 12:16:08,544][00394] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2023-02-27 12:16:08,546][00394] Overriding arg 'num_workers' with value 1 passed from command line [2023-02-27 12:16:08,548][00394] Adding new argument 'no_render'=True that is not in the saved config file! [2023-02-27 12:16:08,552][00394] Adding new argument 'save_video'=True that is not in the saved config file! [2023-02-27 12:16:08,553][00394] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2023-02-27 12:16:08,556][00394] Adding new argument 'video_name'=None that is not in the saved config file! [2023-02-27 12:16:08,558][00394] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! [2023-02-27 12:16:08,563][00394] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2023-02-27 12:16:08,564][00394] Adding new argument 'push_to_hub'=False that is not in the saved config file! [2023-02-27 12:16:08,566][00394] Adding new argument 'hf_repository'=None that is not in the saved config file! [2023-02-27 12:16:08,568][00394] Adding new argument 'policy_index'=0 that is not in the saved config file! [2023-02-27 12:16:08,570][00394] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2023-02-27 12:16:08,571][00394] Adding new argument 'train_script'=None that is not in the saved config file! [2023-02-27 12:16:08,573][00394] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2023-02-27 12:16:08,574][00394] Using frameskip 1 and render_action_repeat=4 for evaluation [2023-02-27 12:16:08,615][00394] RunningMeanStd input shape: (3, 72, 128) [2023-02-27 12:16:08,626][00394] RunningMeanStd input shape: (1,) [2023-02-27 12:16:08,659][00394] ConvEncoder: input_channels=3 [2023-02-27 12:16:08,810][00394] Conv encoder output size: 512 [2023-02-27 12:16:08,813][00394] Policy head output size: 512 [2023-02-27 12:16:08,926][00394] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002931_12005376.pth... [2023-02-27 12:16:10,405][00394] Num frames 100... [2023-02-27 12:16:10,577][00394] Num frames 200... [2023-02-27 12:16:10,760][00394] Num frames 300... [2023-02-27 12:16:10,947][00394] Num frames 400... [2023-02-27 12:16:11,132][00394] Num frames 500... [2023-02-27 12:16:11,309][00394] Num frames 600... [2023-02-27 12:16:11,492][00394] Num frames 700... [2023-02-27 12:16:11,675][00394] Num frames 800... [2023-02-27 12:16:11,847][00394] Num frames 900... [2023-02-27 12:16:12,028][00394] Num frames 1000... [2023-02-27 12:16:12,203][00394] Num frames 1100... [2023-02-27 12:16:12,387][00394] Num frames 1200... [2023-02-27 12:16:12,547][00394] Num frames 1300... [2023-02-27 12:16:12,681][00394] Avg episode rewards: #0: 27.440, true rewards: #0: 13.440 [2023-02-27 12:16:12,683][00394] Avg episode reward: 27.440, avg true_objective: 13.440 [2023-02-27 12:16:12,776][00394] Num frames 1400... [2023-02-27 12:16:12,936][00394] Num frames 1500... [2023-02-27 12:16:13,103][00394] Num frames 1600... [2023-02-27 12:16:13,262][00394] Num frames 1700... [2023-02-27 12:16:13,437][00394] Num frames 1800... [2023-02-27 12:16:13,601][00394] Num frames 1900... [2023-02-27 12:16:13,763][00394] Num frames 2000... [2023-02-27 12:16:13,925][00394] Num frames 2100... [2023-02-27 12:16:14,095][00394] Num frames 2200... [2023-02-27 12:16:14,271][00394] Num frames 2300... [2023-02-27 12:16:14,476][00394] Avg episode rewards: #0: 26.405, true rewards: #0: 11.905 [2023-02-27 12:16:14,479][00394] Avg episode reward: 26.405, avg true_objective: 11.905 [2023-02-27 12:16:14,512][00394] Num frames 2400... [2023-02-27 12:16:14,680][00394] Num frames 2500... [2023-02-27 12:16:14,819][00394] Num frames 2600... [2023-02-27 12:16:14,954][00394] Num frames 2700... [2023-02-27 12:16:15,080][00394] Num frames 2800... [2023-02-27 12:16:15,210][00394] Num frames 2900... [2023-02-27 12:16:15,331][00394] Num frames 3000... [2023-02-27 12:16:15,467][00394] Num frames 3100... [2023-02-27 12:16:15,589][00394] Num frames 3200... [2023-02-27 12:16:15,717][00394] Num frames 3300... [2023-02-27 12:16:15,845][00394] Num frames 3400... [2023-02-27 12:16:15,974][00394] Num frames 3500... [2023-02-27 12:16:16,120][00394] Num frames 3600... [2023-02-27 12:16:16,244][00394] Num frames 3700... [2023-02-27 12:16:16,364][00394] Num frames 3800... [2023-02-27 12:16:16,495][00394] Num frames 3900... [2023-02-27 12:16:16,617][00394] Num frames 4000... [2023-02-27 12:16:16,737][00394] Num frames 4100... [2023-02-27 12:16:16,871][00394] Num frames 4200... [2023-02-27 12:16:16,993][00394] Num frames 4300... [2023-02-27 12:16:17,090][00394] Avg episode rewards: #0: 33.443, true rewards: #0: 14.443 [2023-02-27 12:16:17,093][00394] Avg episode reward: 33.443, avg true_objective: 14.443 [2023-02-27 12:16:17,175][00394] Num frames 4400... [2023-02-27 12:16:17,298][00394] Num frames 4500... [2023-02-27 12:16:17,429][00394] Num frames 4600... [2023-02-27 12:16:17,557][00394] Num frames 4700... [2023-02-27 12:16:17,677][00394] Num frames 4800... [2023-02-27 12:16:17,804][00394] Num frames 4900... [2023-02-27 12:16:17,926][00394] Num frames 5000... [2023-02-27 12:16:18,051][00394] Num frames 5100... [2023-02-27 12:16:18,174][00394] Num frames 5200... [2023-02-27 12:16:18,262][00394] Avg episode rewards: #0: 30.297, true rewards: #0: 13.047 [2023-02-27 12:16:18,264][00394] Avg episode reward: 30.297, avg true_objective: 13.047 [2023-02-27 12:16:18,368][00394] Num frames 5300... [2023-02-27 12:16:18,503][00394] Num frames 5400... [2023-02-27 12:16:18,623][00394] Num frames 5500... [2023-02-27 12:16:18,748][00394] Num frames 5600... [2023-02-27 12:16:18,866][00394] Num frames 5700... [2023-02-27 12:16:18,992][00394] Num frames 5800... [2023-02-27 12:16:19,109][00394] Avg episode rewards: #0: 26.500, true rewards: #0: 11.700 [2023-02-27 12:16:19,111][00394] Avg episode reward: 26.500, avg true_objective: 11.700 [2023-02-27 12:16:19,174][00394] Num frames 5900... [2023-02-27 12:16:19,299][00394] Num frames 6000... [2023-02-27 12:16:19,421][00394] Num frames 6100... [2023-02-27 12:16:19,552][00394] Num frames 6200... [2023-02-27 12:16:19,675][00394] Num frames 6300... [2023-02-27 12:16:19,797][00394] Num frames 6400... [2023-02-27 12:16:19,921][00394] Num frames 6500... [2023-02-27 12:16:20,041][00394] Num frames 6600... [2023-02-27 12:16:20,166][00394] Num frames 6700... [2023-02-27 12:16:20,288][00394] Num frames 6800... [2023-02-27 12:16:20,413][00394] Num frames 6900... [2023-02-27 12:16:20,561][00394] Avg episode rewards: #0: 26.117, true rewards: #0: 11.617 [2023-02-27 12:16:20,563][00394] Avg episode reward: 26.117, avg true_objective: 11.617 [2023-02-27 12:16:20,605][00394] Num frames 7000... [2023-02-27 12:16:20,725][00394] Num frames 7100... [2023-02-27 12:16:20,854][00394] Num frames 7200... [2023-02-27 12:16:20,974][00394] Num frames 7300... [2023-02-27 12:16:21,101][00394] Num frames 7400... [2023-02-27 12:16:21,257][00394] Avg episode rewards: #0: 24.117, true rewards: #0: 10.689 [2023-02-27 12:16:21,259][00394] Avg episode reward: 24.117, avg true_objective: 10.689 [2023-02-27 12:16:21,285][00394] Num frames 7500... [2023-02-27 12:16:21,436][00394] Num frames 7600... [2023-02-27 12:16:21,570][00394] Num frames 7700... [2023-02-27 12:16:21,691][00394] Num frames 7800... [2023-02-27 12:16:21,818][00394] Num frames 7900... [2023-02-27 12:16:21,937][00394] Num frames 8000... [2023-02-27 12:16:22,067][00394] Num frames 8100... [2023-02-27 12:16:22,190][00394] Num frames 8200... [2023-02-27 12:16:22,318][00394] Num frames 8300... [2023-02-27 12:16:22,445][00394] Num frames 8400... [2023-02-27 12:16:22,557][00394] Avg episode rewards: #0: 23.303, true rewards: #0: 10.552 [2023-02-27 12:16:22,559][00394] Avg episode reward: 23.303, avg true_objective: 10.552 [2023-02-27 12:16:22,633][00394] Num frames 8500... [2023-02-27 12:16:22,760][00394] Num frames 8600... [2023-02-27 12:16:22,880][00394] Num frames 8700... [2023-02-27 12:16:23,007][00394] Num frames 8800... [2023-02-27 12:16:23,133][00394] Num frames 8900... [2023-02-27 12:16:23,260][00394] Num frames 9000... [2023-02-27 12:16:23,385][00394] Num frames 9100... [2023-02-27 12:16:23,505][00394] Num frames 9200... [2023-02-27 12:16:23,637][00394] Num frames 9300... [2023-02-27 12:16:23,758][00394] Num frames 9400... [2023-02-27 12:16:23,893][00394] Num frames 9500... [2023-02-27 12:16:23,964][00394] Avg episode rewards: #0: 23.233, true rewards: #0: 10.567 [2023-02-27 12:16:23,966][00394] Avg episode reward: 23.233, avg true_objective: 10.567 [2023-02-27 12:16:24,080][00394] Num frames 9600... [2023-02-27 12:16:24,212][00394] Num frames 9700... [2023-02-27 12:16:24,336][00394] Num frames 9800... [2023-02-27 12:16:24,464][00394] Num frames 9900... [2023-02-27 12:16:24,592][00394] Num frames 10000... [2023-02-27 12:16:24,714][00394] Num frames 10100... [2023-02-27 12:16:24,884][00394] Num frames 10200... [2023-02-27 12:16:25,085][00394] Num frames 10300... [2023-02-27 12:16:25,254][00394] Num frames 10400... [2023-02-27 12:16:25,460][00394] Num frames 10500... [2023-02-27 12:16:25,633][00394] Num frames 10600... [2023-02-27 12:16:25,805][00394] Num frames 10700... [2023-02-27 12:16:25,972][00394] Num frames 10800... [2023-02-27 12:16:26,184][00394] Avg episode rewards: #0: 23.986, true rewards: #0: 10.886 [2023-02-27 12:16:26,187][00394] Avg episode reward: 23.986, avg true_objective: 10.886 [2023-02-27 12:17:36,340][00394] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2023-02-27 12:17:36,956][00394] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2023-02-27 12:17:36,958][00394] Overriding arg 'num_workers' with value 1 passed from command line [2023-02-27 12:17:36,960][00394] Adding new argument 'no_render'=True that is not in the saved config file! [2023-02-27 12:17:36,962][00394] Adding new argument 'save_video'=True that is not in the saved config file! [2023-02-27 12:17:36,964][00394] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2023-02-27 12:17:36,966][00394] Adding new argument 'video_name'=None that is not in the saved config file! [2023-02-27 12:17:36,968][00394] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2023-02-27 12:17:36,969][00394] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2023-02-27 12:17:36,970][00394] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2023-02-27 12:17:36,971][00394] Adding new argument 'hf_repository'='Clawoo/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2023-02-27 12:17:36,972][00394] Adding new argument 'policy_index'=0 that is not in the saved config file! [2023-02-27 12:17:36,973][00394] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2023-02-27 12:17:36,974][00394] Adding new argument 'train_script'=None that is not in the saved config file! [2023-02-27 12:17:36,976][00394] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2023-02-27 12:17:36,979][00394] Using frameskip 1 and render_action_repeat=4 for evaluation [2023-02-27 12:17:36,999][00394] RunningMeanStd input shape: (3, 72, 128) [2023-02-27 12:17:37,002][00394] RunningMeanStd input shape: (1,) [2023-02-27 12:17:37,026][00394] ConvEncoder: input_channels=3 [2023-02-27 12:17:37,095][00394] Conv encoder output size: 512 [2023-02-27 12:17:37,097][00394] Policy head output size: 512 [2023-02-27 12:17:37,125][00394] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002931_12005376.pth... [2023-02-27 12:17:37,974][00394] Num frames 100... [2023-02-27 12:17:38,186][00394] Num frames 200... [2023-02-27 12:17:38,421][00394] Num frames 300... [2023-02-27 12:17:38,603][00394] Num frames 400... [2023-02-27 12:17:38,816][00394] Num frames 500... [2023-02-27 12:17:38,990][00394] Num frames 600... [2023-02-27 12:17:39,174][00394] Num frames 700... [2023-02-27 12:17:39,359][00394] Num frames 800... [2023-02-27 12:17:39,534][00394] Num frames 900... [2023-02-27 12:17:39,699][00394] Num frames 1000... [2023-02-27 12:17:39,876][00394] Num frames 1100... [2023-02-27 12:17:40,061][00394] Num frames 1200... [2023-02-27 12:17:40,244][00394] Num frames 1300... [2023-02-27 12:17:40,416][00394] Num frames 1400... [2023-02-27 12:17:40,581][00394] Num frames 1500... [2023-02-27 12:17:40,748][00394] Num frames 1600... [2023-02-27 12:17:40,948][00394] Num frames 1700... [2023-02-27 12:17:41,161][00394] Avg episode rewards: #0: 42.749, true rewards: #0: 17.750 [2023-02-27 12:17:41,163][00394] Avg episode reward: 42.749, avg true_objective: 17.750 [2023-02-27 12:17:41,219][00394] Num frames 1800... [2023-02-27 12:17:41,426][00394] Num frames 1900... [2023-02-27 12:17:41,647][00394] Num frames 2000... [2023-02-27 12:17:41,853][00394] Num frames 2100... [2023-02-27 12:17:42,064][00394] Num frames 2200... [2023-02-27 12:17:42,291][00394] Num frames 2300... [2023-02-27 12:17:42,514][00394] Num frames 2400... [2023-02-27 12:17:42,714][00394] Num frames 2500... [2023-02-27 12:17:42,930][00394] Num frames 2600... [2023-02-27 12:17:43,146][00394] Num frames 2700... [2023-02-27 12:17:43,329][00394] Num frames 2800... [2023-02-27 12:17:43,507][00394] Num frames 2900... [2023-02-27 12:17:43,695][00394] Num frames 3000... [2023-02-27 12:17:43,792][00394] Avg episode rewards: #0: 33.115, true rewards: #0: 15.115 [2023-02-27 12:17:43,794][00394] Avg episode reward: 33.115, avg true_objective: 15.115 [2023-02-27 12:17:43,971][00394] Num frames 3100... [2023-02-27 12:17:44,176][00394] Num frames 3200... [2023-02-27 12:17:44,379][00394] Num frames 3300... [2023-02-27 12:17:44,558][00394] Num frames 3400... [2023-02-27 12:17:44,731][00394] Num frames 3500... [2023-02-27 12:17:44,896][00394] Num frames 3600... [2023-02-27 12:17:45,071][00394] Num frames 3700... [2023-02-27 12:17:45,244][00394] Num frames 3800... [2023-02-27 12:17:45,409][00394] Num frames 3900... [2023-02-27 12:17:45,493][00394] Avg episode rewards: #0: 28.383, true rewards: #0: 13.050 [2023-02-27 12:17:45,495][00394] Avg episode reward: 28.383, avg true_objective: 13.050 [2023-02-27 12:17:45,632][00394] Num frames 4000... [2023-02-27 12:17:45,801][00394] Num frames 4100... [2023-02-27 12:17:45,922][00394] Num frames 4200... [2023-02-27 12:17:46,040][00394] Num frames 4300... [2023-02-27 12:17:46,147][00394] Avg episode rewards: #0: 23.602, true rewards: #0: 10.852 [2023-02-27 12:17:46,149][00394] Avg episode reward: 23.602, avg true_objective: 10.852 [2023-02-27 12:17:46,221][00394] Num frames 4400... [2023-02-27 12:17:46,339][00394] Num frames 4500... [2023-02-27 12:17:46,466][00394] Num frames 4600... [2023-02-27 12:17:46,590][00394] Num frames 4700... [2023-02-27 12:17:46,709][00394] Num frames 4800... [2023-02-27 12:17:46,828][00394] Avg episode rewards: #0: 20.906, true rewards: #0: 9.706 [2023-02-27 12:17:46,830][00394] Avg episode reward: 20.906, avg true_objective: 9.706 [2023-02-27 12:17:46,889][00394] Num frames 4900... [2023-02-27 12:17:47,007][00394] Num frames 5000... [2023-02-27 12:17:47,116][00394] Avg episode rewards: #0: 17.908, true rewards: #0: 8.408 [2023-02-27 12:17:47,119][00394] Avg episode reward: 17.908, avg true_objective: 8.408 [2023-02-27 12:17:47,194][00394] Num frames 5100... [2023-02-27 12:17:47,317][00394] Num frames 5200... [2023-02-27 12:17:47,441][00394] Num frames 5300... [2023-02-27 12:17:47,572][00394] Num frames 5400... [2023-02-27 12:17:47,697][00394] Num frames 5500... [2023-02-27 12:17:47,817][00394] Num frames 5600... [2023-02-27 12:17:47,939][00394] Num frames 5700... [2023-02-27 12:17:48,058][00394] Num frames 5800... [2023-02-27 12:17:48,191][00394] Num frames 5900... [2023-02-27 12:17:48,313][00394] Num frames 6000... [2023-02-27 12:17:48,434][00394] Num frames 6100... [2023-02-27 12:17:48,554][00394] Num frames 6200... [2023-02-27 12:17:48,677][00394] Num frames 6300... [2023-02-27 12:17:48,799][00394] Num frames 6400... [2023-02-27 12:17:48,920][00394] Num frames 6500... [2023-02-27 12:17:49,093][00394] Avg episode rewards: #0: 21.136, true rewards: #0: 9.421 [2023-02-27 12:17:49,094][00394] Avg episode reward: 21.136, avg true_objective: 9.421 [2023-02-27 12:17:49,108][00394] Num frames 6600... [2023-02-27 12:17:49,235][00394] Num frames 6700... [2023-02-27 12:17:49,358][00394] Num frames 6800... [2023-02-27 12:17:49,483][00394] Num frames 6900... [2023-02-27 12:17:49,607][00394] Num frames 7000... [2023-02-27 12:17:49,715][00394] Avg episode rewards: #0: 19.554, true rewards: #0: 8.804 [2023-02-27 12:17:49,717][00394] Avg episode reward: 19.554, avg true_objective: 8.804 [2023-02-27 12:17:49,791][00394] Num frames 7100... [2023-02-27 12:17:49,921][00394] Num frames 7200... [2023-02-27 12:17:50,044][00394] Num frames 7300... [2023-02-27 12:17:50,166][00394] Num frames 7400... [2023-02-27 12:17:50,293][00394] Num frames 7500... [2023-02-27 12:17:50,409][00394] Num frames 7600... [2023-02-27 12:17:50,530][00394] Num frames 7700... [2023-02-27 12:17:50,652][00394] Num frames 7800... [2023-02-27 12:17:50,773][00394] Num frames 7900... [2023-02-27 12:17:50,893][00394] Num frames 8000... [2023-02-27 12:17:51,011][00394] Num frames 8100... [2023-02-27 12:17:51,156][00394] Avg episode rewards: #0: 20.194, true rewards: #0: 9.083 [2023-02-27 12:17:51,158][00394] Avg episode reward: 20.194, avg true_objective: 9.083 [2023-02-27 12:17:51,191][00394] Num frames 8200... [2023-02-27 12:17:51,315][00394] Num frames 8300... [2023-02-27 12:17:51,432][00394] Num frames 8400... [2023-02-27 12:17:51,549][00394] Num frames 8500... [2023-02-27 12:17:51,686][00394] Num frames 8600... [2023-02-27 12:17:51,806][00394] Avg episode rewards: #0: 18.855, true rewards: #0: 8.655 [2023-02-27 12:17:51,808][00394] Avg episode reward: 18.855, avg true_objective: 8.655 [2023-02-27 12:18:46,740][00394] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2023-02-27 12:18:50,489][00394] The model has been pushed to https://huggingface.co./Clawoo/rl_course_vizdoom_health_gathering_supreme [2023-02-27 12:22:07,509][00394] Environment doom_basic already registered, overwriting... [2023-02-27 12:22:07,512][00394] Environment doom_two_colors_easy already registered, overwriting... [2023-02-27 12:22:07,514][00394] Environment doom_two_colors_hard already registered, overwriting... [2023-02-27 12:22:07,515][00394] Environment doom_dm already registered, overwriting... [2023-02-27 12:22:07,517][00394] Environment doom_dwango5 already registered, overwriting... [2023-02-27 12:22:07,519][00394] Environment doom_my_way_home_flat_actions already registered, overwriting... [2023-02-27 12:22:07,521][00394] Environment doom_defend_the_center_flat_actions already registered, overwriting... [2023-02-27 12:22:07,528][00394] Environment doom_my_way_home already registered, overwriting... [2023-02-27 12:22:07,532][00394] Environment doom_deadly_corridor already registered, overwriting... [2023-02-27 12:22:07,533][00394] Environment doom_defend_the_center already registered, overwriting... [2023-02-27 12:22:07,535][00394] Environment doom_defend_the_line already registered, overwriting... [2023-02-27 12:22:07,536][00394] Environment doom_health_gathering already registered, overwriting... [2023-02-27 12:22:07,539][00394] Environment doom_health_gathering_supreme already registered, overwriting... [2023-02-27 12:22:07,541][00394] Environment doom_battle already registered, overwriting... [2023-02-27 12:22:07,543][00394] Environment doom_battle2 already registered, overwriting... [2023-02-27 12:22:07,545][00394] Environment doom_duel_bots already registered, overwriting... [2023-02-27 12:22:07,547][00394] Environment doom_deathmatch_bots already registered, overwriting... [2023-02-27 12:22:07,548][00394] Environment doom_duel already registered, overwriting... [2023-02-27 12:22:07,550][00394] Environment doom_deathmatch_full already registered, overwriting... [2023-02-27 12:22:07,552][00394] Environment doom_benchmark already registered, overwriting... [2023-02-27 12:22:07,559][00394] register_encoder_factory: [2023-02-27 12:22:07,584][00394] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2023-02-27 12:22:07,587][00394] Overriding arg 'train_for_env_steps' with value 16000000 passed from command line [2023-02-27 12:22:07,592][00394] Experiment dir /content/train_dir/default_experiment already exists! [2023-02-27 12:22:07,594][00394] Resuming existing experiment from /content/train_dir/default_experiment... [2023-02-27 12:22:07,596][00394] Weights and Biases integration disabled [2023-02-27 12:22:07,603][00394] Environment var CUDA_VISIBLE_DEVICES is 0 [2023-02-27 12:22:11,230][00394] Starting experiment with the following configuration: help=False algo=APPO env=doom_health_gathering_supreme experiment=default_experiment train_dir=/content/train_dir restart_behavior=resume device=gpu seed=None num_policies=1 async_rl=True serial_mode=False batched_sampling=False num_batches_to_accumulate=2 worker_num_splits=2 policy_workers_per_policy=1 max_policy_lag=1000 num_workers=8 num_envs_per_worker=4 batch_size=1024 num_batches_per_epoch=1 num_epochs=1 rollout=32 recurrence=32 shuffle_minibatches=False gamma=0.99 reward_scale=1.0 reward_clip=1000.0 value_bootstrap=False normalize_returns=True exploration_loss_coeff=0.001 value_loss_coeff=0.5 kl_loss_coeff=0.0 exploration_loss=symmetric_kl gae_lambda=0.95 ppo_clip_ratio=0.1 ppo_clip_value=0.2 with_vtrace=False vtrace_rho=1.0 vtrace_c=1.0 optimizer=adam adam_eps=1e-06 adam_beta1=0.9 adam_beta2=0.999 max_grad_norm=4.0 learning_rate=0.0001 lr_schedule=constant lr_schedule_kl_threshold=0.008 lr_adaptive_min=1e-06 lr_adaptive_max=0.01 obs_subtract_mean=0.0 obs_scale=255.0 normalize_input=True normalize_input_keys=None decorrelate_experience_max_seconds=0 decorrelate_envs_on_one_worker=True actor_worker_gpus=[] set_workers_cpu_affinity=True force_envs_single_thread=False default_niceness=0 log_to_file=True experiment_summaries_interval=10 flush_summaries_interval=30 stats_avg=100 summaries_use_frameskip=True heartbeat_interval=20 heartbeat_reporting_interval=600 train_for_env_steps=16000000 train_for_seconds=10000000000 save_every_sec=120 keep_checkpoints=2 load_checkpoint_kind=latest save_milestones_sec=-1 save_best_every_sec=5 save_best_metric=reward save_best_after=100000 benchmark=False encoder_mlp_layers=[512, 512] encoder_conv_architecture=convnet_simple encoder_conv_mlp_layers=[512] use_rnn=True rnn_size=512 rnn_type=gru rnn_num_layers=1 decoder_mlp_layers=[] nonlinearity=elu policy_initialization=orthogonal policy_init_gain=1.0 actor_critic_share_weights=True adaptive_stddev=True continuous_tanh_scale=0.0 initial_stddev=1.0 use_env_info_cache=False env_gpu_actions=False env_gpu_observations=True env_frameskip=4 env_framestack=1 pixel_format=CHW use_record_episode_statistics=False with_wandb=False wandb_user=None wandb_project=sample_factory wandb_group=None wandb_job_type=SF wandb_tags=[] with_pbt=False pbt_mix_policies_in_one_env=True pbt_period_env_steps=5000000 pbt_start_mutation=20000000 pbt_replace_fraction=0.3 pbt_mutation_rate=0.15 pbt_replace_reward_gap=0.1 pbt_replace_reward_gap_absolute=1e-06 pbt_optimize_gamma=False pbt_target_objective=true_objective pbt_perturb_min=1.1 pbt_perturb_max=1.5 num_agents=-1 num_humans=0 num_bots=-1 start_bot_difficulty=None timelimit=None res_w=128 res_h=72 wide_aspect_ratio=False eval_env_frameskip=1 fps=35 command_line=--env=doom_health_gathering_supreme --num_workers=8 --num_envs_per_worker=4 --train_for_env_steps=4000000 cli_args={'env': 'doom_health_gathering_supreme', 'num_workers': 8, 'num_envs_per_worker': 4, 'train_for_env_steps': 4000000} git_hash=unknown git_repo_name=not a git repository [2023-02-27 12:22:11,234][00394] Saving configuration to /content/train_dir/default_experiment/config.json... [2023-02-27 12:22:11,236][00394] Rollout worker 0 uses device cpu [2023-02-27 12:22:11,239][00394] Rollout worker 1 uses device cpu [2023-02-27 12:22:11,240][00394] Rollout worker 2 uses device cpu [2023-02-27 12:22:11,242][00394] Rollout worker 3 uses device cpu [2023-02-27 12:22:11,243][00394] Rollout worker 4 uses device cpu [2023-02-27 12:22:11,244][00394] Rollout worker 5 uses device cpu [2023-02-27 12:22:11,246][00394] Rollout worker 6 uses device cpu [2023-02-27 12:22:11,247][00394] Rollout worker 7 uses device cpu [2023-02-27 12:22:11,409][00394] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-27 12:22:11,414][00394] InferenceWorker_p0-w0: min num requests: 2 [2023-02-27 12:22:11,463][00394] Starting all processes... [2023-02-27 12:22:11,465][00394] Starting process learner_proc0 [2023-02-27 12:22:11,685][00394] Starting all processes... [2023-02-27 12:22:11,698][00394] Starting process inference_proc0-0 [2023-02-27 12:22:11,698][00394] Starting process rollout_proc0 [2023-02-27 12:22:11,702][00394] Starting process rollout_proc1 [2023-02-27 12:22:11,706][00394] Starting process rollout_proc2 [2023-02-27 12:22:11,706][00394] Starting process rollout_proc3 [2023-02-27 12:22:11,706][00394] Starting process rollout_proc4 [2023-02-27 12:22:11,706][00394] Starting process rollout_proc5 [2023-02-27 12:22:11,706][00394] Starting process rollout_proc6 [2023-02-27 12:22:11,860][00394] Starting process rollout_proc7 [2023-02-27 12:22:21,684][47447] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-27 12:22:21,684][47447] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2023-02-27 12:22:21,750][47447] Num visible devices: 1 [2023-02-27 12:22:21,788][47447] Starting seed is not provided [2023-02-27 12:22:21,788][47447] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-27 12:22:21,789][47447] Initializing actor-critic model on device cuda:0 [2023-02-27 12:22:21,789][47447] RunningMeanStd input shape: (3, 72, 128) [2023-02-27 12:22:21,795][47447] RunningMeanStd input shape: (1,) [2023-02-27 12:22:21,911][47447] ConvEncoder: input_channels=3 [2023-02-27 12:22:22,815][47465] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-27 12:22:22,817][47465] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2023-02-27 12:22:22,914][47466] Worker 1 uses CPU cores [1] [2023-02-27 12:22:22,932][47465] Num visible devices: 1 [2023-02-27 12:22:23,107][47447] Conv encoder output size: 512 [2023-02-27 12:22:23,111][47447] Policy head output size: 512 [2023-02-27 12:22:23,198][47447] Created Actor Critic model with architecture: [2023-02-27 12:22:23,200][47447] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2023-02-27 12:22:23,562][47470] Worker 0 uses CPU cores [0] [2023-02-27 12:22:23,767][47467] Worker 2 uses CPU cores [0] [2023-02-27 12:22:24,208][47476] Worker 3 uses CPU cores [1] [2023-02-27 12:22:24,305][47494] Worker 6 uses CPU cores [0] [2023-02-27 12:22:24,454][47480] Worker 4 uses CPU cores [0] [2023-02-27 12:22:24,550][47486] Worker 7 uses CPU cores [1] [2023-02-27 12:22:24,658][47488] Worker 5 uses CPU cores [1] [2023-02-27 12:22:31,358][47447] Using optimizer [2023-02-27 12:22:31,359][47447] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002931_12005376.pth... [2023-02-27 12:22:31,397][47447] Loading model from checkpoint [2023-02-27 12:22:31,399][00394] Heartbeat connected on Batcher_0 [2023-02-27 12:22:31,407][47447] Loaded experiment state at self.train_step=2931, self.env_steps=12005376 [2023-02-27 12:22:31,408][47447] Initialized policy 0 weights for model version 2931 [2023-02-27 12:22:31,409][00394] Heartbeat connected on InferenceWorker_p0-w0 [2023-02-27 12:22:31,416][47447] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-27 12:22:31,423][47447] LearnerWorker_p0 finished initialization! [2023-02-27 12:22:31,426][00394] Heartbeat connected on LearnerWorker_p0 [2023-02-27 12:22:31,432][00394] Heartbeat connected on RolloutWorker_w0 [2023-02-27 12:22:31,441][00394] Heartbeat connected on RolloutWorker_w2 [2023-02-27 12:22:31,449][00394] Heartbeat connected on RolloutWorker_w1 [2023-02-27 12:22:31,459][00394] Heartbeat connected on RolloutWorker_w4 [2023-02-27 12:22:31,470][00394] Heartbeat connected on RolloutWorker_w3 [2023-02-27 12:22:31,480][00394] Heartbeat connected on RolloutWorker_w6 [2023-02-27 12:22:31,545][00394] Heartbeat connected on RolloutWorker_w5 [2023-02-27 12:22:31,550][00394] Heartbeat connected on RolloutWorker_w7 [2023-02-27 12:22:31,624][47465] RunningMeanStd input shape: (3, 72, 128) [2023-02-27 12:22:31,626][47465] RunningMeanStd input shape: (1,) [2023-02-27 12:22:31,643][47465] ConvEncoder: input_channels=3 [2023-02-27 12:22:31,743][47465] Conv encoder output size: 512 [2023-02-27 12:22:31,744][47465] Policy head output size: 512 [2023-02-27 12:22:32,603][00394] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 12005376. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-02-27 12:22:34,158][00394] Inference worker 0-0 is ready! [2023-02-27 12:22:34,161][00394] All inference workers are ready! Signal rollout workers to start! [2023-02-27 12:22:34,292][47466] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-27 12:22:34,301][47488] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-27 12:22:34,298][47486] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-27 12:22:34,311][47476] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-27 12:22:34,338][47470] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-27 12:22:34,333][47480] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-27 12:22:34,336][47467] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-27 12:22:34,339][47494] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-27 12:22:34,976][47480] Decorrelating experience for 0 frames... [2023-02-27 12:22:35,639][47466] Decorrelating experience for 0 frames... [2023-02-27 12:22:35,641][47486] Decorrelating experience for 0 frames... [2023-02-27 12:22:35,643][47476] Decorrelating experience for 0 frames... [2023-02-27 12:22:35,646][47488] Decorrelating experience for 0 frames... [2023-02-27 12:22:36,025][47494] Decorrelating experience for 0 frames... [2023-02-27 12:22:36,761][47494] Decorrelating experience for 32 frames... [2023-02-27 12:22:36,805][47480] Decorrelating experience for 32 frames... [2023-02-27 12:22:37,061][47488] Decorrelating experience for 32 frames... [2023-02-27 12:22:37,063][47466] Decorrelating experience for 32 frames... [2023-02-27 12:22:37,065][47486] Decorrelating experience for 32 frames... [2023-02-27 12:22:37,069][47476] Decorrelating experience for 32 frames... [2023-02-27 12:22:37,603][00394] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 12005376. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-02-27 12:22:37,735][47486] Decorrelating experience for 64 frames... [2023-02-27 12:22:37,856][47470] Decorrelating experience for 0 frames... [2023-02-27 12:22:38,211][47480] Decorrelating experience for 64 frames... [2023-02-27 12:22:38,226][47467] Decorrelating experience for 0 frames... [2023-02-27 12:22:38,496][47476] Decorrelating experience for 64 frames... [2023-02-27 12:22:38,731][47488] Decorrelating experience for 64 frames... [2023-02-27 12:22:38,930][47470] Decorrelating experience for 32 frames... [2023-02-27 12:22:39,256][47466] Decorrelating experience for 64 frames... [2023-02-27 12:22:39,395][47494] Decorrelating experience for 64 frames... [2023-02-27 12:22:39,780][47488] Decorrelating experience for 96 frames... [2023-02-27 12:22:39,995][47480] Decorrelating experience for 96 frames... [2023-02-27 12:22:40,201][47476] Decorrelating experience for 96 frames... [2023-02-27 12:22:40,600][47467] Decorrelating experience for 32 frames... [2023-02-27 12:22:41,058][47466] Decorrelating experience for 96 frames... [2023-02-27 12:22:41,352][47486] Decorrelating experience for 96 frames... [2023-02-27 12:22:41,900][47470] Decorrelating experience for 64 frames... [2023-02-27 12:22:42,077][47494] Decorrelating experience for 96 frames... [2023-02-27 12:22:42,606][00394] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 12005376. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-02-27 12:22:42,896][47467] Decorrelating experience for 64 frames... [2023-02-27 12:22:43,392][47470] Decorrelating experience for 96 frames... [2023-02-27 12:22:43,868][47467] Decorrelating experience for 96 frames... [2023-02-27 12:22:47,604][00394] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 12005376. Throughput: 0: 67.3. Samples: 1010. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-02-27 12:22:47,612][00394] Avg episode reward: [(0, '2.160')] [2023-02-27 12:22:49,596][47447] Signal inference workers to stop experience collection... [2023-02-27 12:22:49,633][47465] InferenceWorker_p0-w0: stopping experience collection [2023-02-27 12:22:52,608][00394] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 12005376. Throughput: 0: 128.9. Samples: 2578. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-02-27 12:22:52,655][00394] Avg episode reward: [(0, '4.700')] [2023-02-27 12:22:52,871][47447] Signal inference workers to resume experience collection... [2023-02-27 12:22:52,873][47465] InferenceWorker_p0-w0: resuming experience collection [2023-02-27 12:22:57,603][00394] Fps is (10 sec: 2048.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 12025856. Throughput: 0: 154.1. Samples: 3852. Policy #0 lag: (min: 0.0, avg: 0.6, max: 3.0) [2023-02-27 12:22:57,606][00394] Avg episode reward: [(0, '8.121')] [2023-02-27 12:23:02,606][00394] Fps is (10 sec: 3685.3, 60 sec: 1228.7, 300 sec: 1228.7). Total num frames: 12042240. Throughput: 0: 318.6. Samples: 9560. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-27 12:23:02,616][00394] Avg episode reward: [(0, '12.387')] [2023-02-27 12:23:02,884][47465] Updated weights for policy 0, policy_version 2941 (0.0022) [2023-02-27 12:23:07,608][00394] Fps is (10 sec: 3275.3, 60 sec: 1521.2, 300 sec: 1521.2). Total num frames: 12058624. Throughput: 0: 388.8. Samples: 13610. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 12:23:07,611][00394] Avg episode reward: [(0, '13.765')] [2023-02-27 12:23:12,603][00394] Fps is (10 sec: 2868.0, 60 sec: 1638.4, 300 sec: 1638.4). Total num frames: 12070912. Throughput: 0: 392.0. Samples: 15682. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2023-02-27 12:23:12,606][00394] Avg episode reward: [(0, '17.453')] [2023-02-27 12:23:15,603][47465] Updated weights for policy 0, policy_version 2951 (0.0012) [2023-02-27 12:23:17,603][00394] Fps is (10 sec: 3688.1, 60 sec: 2002.5, 300 sec: 2002.5). Total num frames: 12095488. Throughput: 0: 483.4. Samples: 21754. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:23:17,610][00394] Avg episode reward: [(0, '18.932')] [2023-02-27 12:23:22,603][00394] Fps is (10 sec: 4096.0, 60 sec: 2129.9, 300 sec: 2129.9). Total num frames: 12111872. Throughput: 0: 616.2. Samples: 27728. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:23:22,610][00394] Avg episode reward: [(0, '19.905')] [2023-02-27 12:23:27,132][47465] Updated weights for policy 0, policy_version 2961 (0.0013) [2023-02-27 12:23:27,603][00394] Fps is (10 sec: 3276.8, 60 sec: 2234.2, 300 sec: 2234.2). Total num frames: 12128256. Throughput: 0: 661.9. Samples: 29782. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:23:27,612][00394] Avg episode reward: [(0, '22.022')] [2023-02-27 12:23:32,603][00394] Fps is (10 sec: 2867.2, 60 sec: 2252.8, 300 sec: 2252.8). Total num frames: 12140544. Throughput: 0: 727.1. Samples: 33728. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2023-02-27 12:23:32,609][00394] Avg episode reward: [(0, '24.064')] [2023-02-27 12:23:37,603][00394] Fps is (10 sec: 3276.8, 60 sec: 2594.1, 300 sec: 2394.6). Total num frames: 12161024. Throughput: 0: 828.9. Samples: 39880. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:23:37,611][00394] Avg episode reward: [(0, '25.580')] [2023-02-27 12:23:37,653][47447] Saving new best policy, reward=25.580! [2023-02-27 12:23:38,651][47465] Updated weights for policy 0, policy_version 2971 (0.0014) [2023-02-27 12:23:42,607][00394] Fps is (10 sec: 4094.3, 60 sec: 2935.4, 300 sec: 2516.0). Total num frames: 12181504. Throughput: 0: 870.5. Samples: 43028. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-27 12:23:42,610][00394] Avg episode reward: [(0, '24.474')] [2023-02-27 12:23:47,603][00394] Fps is (10 sec: 3686.4, 60 sec: 3208.6, 300 sec: 2566.8). Total num frames: 12197888. Throughput: 0: 849.9. Samples: 47802. Policy #0 lag: (min: 0.0, avg: 0.3, max: 2.0) [2023-02-27 12:23:47,606][00394] Avg episode reward: [(0, '23.995')] [2023-02-27 12:23:51,890][47465] Updated weights for policy 0, policy_version 2981 (0.0029) [2023-02-27 12:23:52,604][00394] Fps is (10 sec: 2868.1, 60 sec: 3413.3, 300 sec: 2560.0). Total num frames: 12210176. Throughput: 0: 851.7. Samples: 51932. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:23:52,614][00394] Avg episode reward: [(0, '23.079')] [2023-02-27 12:23:57,603][00394] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 2650.4). Total num frames: 12230656. Throughput: 0: 870.2. Samples: 54840. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-27 12:23:57,605][00394] Avg episode reward: [(0, '22.356')] [2023-02-27 12:24:01,698][47465] Updated weights for policy 0, policy_version 2991 (0.0014) [2023-02-27 12:24:02,603][00394] Fps is (10 sec: 4096.4, 60 sec: 3481.8, 300 sec: 2730.7). Total num frames: 12251136. Throughput: 0: 881.3. Samples: 61412. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-27 12:24:02,605][00394] Avg episode reward: [(0, '20.398')] [2023-02-27 12:24:07,609][00394] Fps is (10 sec: 3684.1, 60 sec: 3481.5, 300 sec: 2759.2). Total num frames: 12267520. Throughput: 0: 853.7. Samples: 66150. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 12:24:07,619][00394] Avg episode reward: [(0, '20.733')] [2023-02-27 12:24:07,633][47447] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002995_12267520.pth... [2023-02-27 12:24:07,862][47447] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002925_11980800.pth [2023-02-27 12:24:12,603][00394] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 2744.3). Total num frames: 12279808. Throughput: 0: 852.8. Samples: 68158. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:24:12,606][00394] Avg episode reward: [(0, '20.286')] [2023-02-27 12:24:15,352][47465] Updated weights for policy 0, policy_version 3001 (0.0014) [2023-02-27 12:24:17,603][00394] Fps is (10 sec: 3278.7, 60 sec: 3413.3, 300 sec: 2808.7). Total num frames: 12300288. Throughput: 0: 876.1. Samples: 73154. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-27 12:24:17,606][00394] Avg episode reward: [(0, '20.217')] [2023-02-27 12:24:22,603][00394] Fps is (10 sec: 4096.0, 60 sec: 3481.6, 300 sec: 2867.2). Total num frames: 12320768. Throughput: 0: 885.6. Samples: 79732. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:24:22,606][00394] Avg episode reward: [(0, '21.842')] [2023-02-27 12:24:25,218][47465] Updated weights for policy 0, policy_version 3011 (0.0015) [2023-02-27 12:24:27,603][00394] Fps is (10 sec: 3686.5, 60 sec: 3481.6, 300 sec: 2885.0). Total num frames: 12337152. Throughput: 0: 876.8. Samples: 82480. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:24:27,607][00394] Avg episode reward: [(0, '22.466')] [2023-02-27 12:24:32,604][00394] Fps is (10 sec: 3276.6, 60 sec: 3549.8, 300 sec: 2901.3). Total num frames: 12353536. Throughput: 0: 860.8. Samples: 86538. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-27 12:24:32,608][00394] Avg episode reward: [(0, '23.469')] [2023-02-27 12:24:37,603][00394] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 2916.4). Total num frames: 12369920. Throughput: 0: 882.9. Samples: 91662. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-27 12:24:37,609][00394] Avg episode reward: [(0, '22.642')] [2023-02-27 12:24:38,131][47465] Updated weights for policy 0, policy_version 3021 (0.0012) [2023-02-27 12:24:42,603][00394] Fps is (10 sec: 3686.7, 60 sec: 3481.8, 300 sec: 2961.7). Total num frames: 12390400. Throughput: 0: 891.2. Samples: 94944. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 12:24:42,609][00394] Avg episode reward: [(0, '23.363')] [2023-02-27 12:24:47,603][00394] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 2973.4). Total num frames: 12406784. Throughput: 0: 874.7. Samples: 100772. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:24:47,610][00394] Avg episode reward: [(0, '23.392')] [2023-02-27 12:24:49,609][47465] Updated weights for policy 0, policy_version 3031 (0.0022) [2023-02-27 12:24:52,603][00394] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 2984.2). Total num frames: 12423168. Throughput: 0: 860.2. Samples: 104854. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:24:52,610][00394] Avg episode reward: [(0, '22.012')] [2023-02-27 12:24:57,603][00394] Fps is (10 sec: 3276.7, 60 sec: 3481.6, 300 sec: 2994.3). Total num frames: 12439552. Throughput: 0: 863.5. Samples: 107016. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:24:57,606][00394] Avg episode reward: [(0, '21.782')] [2023-02-27 12:25:00,789][47465] Updated weights for policy 0, policy_version 3041 (0.0019) [2023-02-27 12:25:02,603][00394] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3031.0). Total num frames: 12460032. Throughput: 0: 899.9. Samples: 113650. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:25:02,611][00394] Avg episode reward: [(0, '22.628')] [2023-02-27 12:25:07,603][00394] Fps is (10 sec: 4096.1, 60 sec: 3550.2, 300 sec: 3065.4). Total num frames: 12480512. Throughput: 0: 877.6. Samples: 119222. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:25:07,611][00394] Avg episode reward: [(0, '22.852')] [2023-02-27 12:25:12,603][00394] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3046.4). Total num frames: 12492800. Throughput: 0: 863.2. Samples: 121322. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:25:12,607][00394] Avg episode reward: [(0, '22.165')] [2023-02-27 12:25:13,414][47465] Updated weights for policy 0, policy_version 3051 (0.0024) [2023-02-27 12:25:17,603][00394] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3053.4). Total num frames: 12509184. Throughput: 0: 867.8. Samples: 125588. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:25:17,609][00394] Avg episode reward: [(0, '22.673')] [2023-02-27 12:25:22,603][00394] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3108.1). Total num frames: 12533760. Throughput: 0: 902.4. Samples: 132268. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 12:25:22,609][00394] Avg episode reward: [(0, '22.113')] [2023-02-27 12:25:23,308][47465] Updated weights for policy 0, policy_version 3061 (0.0013) [2023-02-27 12:25:27,603][00394] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3113.0). Total num frames: 12550144. Throughput: 0: 903.4. Samples: 135596. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2023-02-27 12:25:27,611][00394] Avg episode reward: [(0, '22.524')] [2023-02-27 12:25:32,603][00394] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3117.5). Total num frames: 12566528. Throughput: 0: 867.4. Samples: 139804. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-27 12:25:32,608][00394] Avg episode reward: [(0, '23.094')] [2023-02-27 12:25:36,473][47465] Updated weights for policy 0, policy_version 3071 (0.0019) [2023-02-27 12:25:37,603][00394] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3121.8). Total num frames: 12582912. Throughput: 0: 883.2. Samples: 144598. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 12:25:37,606][00394] Avg episode reward: [(0, '23.718')] [2023-02-27 12:25:42,603][00394] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3147.5). Total num frames: 12603392. Throughput: 0: 908.2. Samples: 147884. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-27 12:25:42,609][00394] Avg episode reward: [(0, '23.177')] [2023-02-27 12:25:45,898][47465] Updated weights for policy 0, policy_version 3081 (0.0018) [2023-02-27 12:25:47,609][00394] Fps is (10 sec: 4093.5, 60 sec: 3617.8, 300 sec: 3171.7). Total num frames: 12623872. Throughput: 0: 904.0. Samples: 154336. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:25:47,616][00394] Avg episode reward: [(0, '23.834')] [2023-02-27 12:25:52,603][00394] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3153.9). Total num frames: 12636160. Throughput: 0: 870.2. Samples: 158380. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:25:52,607][00394] Avg episode reward: [(0, '25.568')] [2023-02-27 12:25:57,603][00394] Fps is (10 sec: 2868.9, 60 sec: 3549.9, 300 sec: 3156.9). Total num frames: 12652544. Throughput: 0: 868.6. Samples: 160410. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-27 12:25:57,609][00394] Avg episode reward: [(0, '23.959')] [2023-02-27 12:25:59,128][47465] Updated weights for policy 0, policy_version 3091 (0.0019) [2023-02-27 12:26:02,603][00394] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3179.3). Total num frames: 12673024. Throughput: 0: 915.1. Samples: 166768. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:26:02,608][00394] Avg episode reward: [(0, '23.388')] [2023-02-27 12:26:07,603][00394] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3200.6). Total num frames: 12693504. Throughput: 0: 899.4. Samples: 172742. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:26:07,607][00394] Avg episode reward: [(0, '23.461')] [2023-02-27 12:26:07,623][47447] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000003099_12693504.pth... [2023-02-27 12:26:07,876][47447] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002931_12005376.pth [2023-02-27 12:26:10,022][47465] Updated weights for policy 0, policy_version 3101 (0.0012) [2023-02-27 12:26:12,607][00394] Fps is (10 sec: 3275.5, 60 sec: 3549.6, 300 sec: 3183.7). Total num frames: 12705792. Throughput: 0: 869.5. Samples: 174726. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:26:12,610][00394] Avg episode reward: [(0, '23.324')] [2023-02-27 12:26:17,603][00394] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3185.8). Total num frames: 12722176. Throughput: 0: 868.9. Samples: 178906. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:26:17,606][00394] Avg episode reward: [(0, '23.863')] [2023-02-27 12:26:21,609][47465] Updated weights for policy 0, policy_version 3111 (0.0025) [2023-02-27 12:26:22,603][00394] Fps is (10 sec: 3687.9, 60 sec: 3481.6, 300 sec: 3205.6). Total num frames: 12742656. Throughput: 0: 907.6. Samples: 185440. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:26:22,606][00394] Avg episode reward: [(0, '23.294')] [2023-02-27 12:26:27,605][00394] Fps is (10 sec: 4095.3, 60 sec: 3549.8, 300 sec: 3224.5). Total num frames: 12763136. Throughput: 0: 908.2. Samples: 188754. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:26:27,610][00394] Avg episode reward: [(0, '23.808')] [2023-02-27 12:26:32,603][00394] Fps is (10 sec: 3686.3, 60 sec: 3549.8, 300 sec: 3225.6). Total num frames: 12779520. Throughput: 0: 864.6. Samples: 193236. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-27 12:26:32,606][00394] Avg episode reward: [(0, '24.100')] [2023-02-27 12:26:33,812][47465] Updated weights for policy 0, policy_version 3121 (0.0012) [2023-02-27 12:26:37,603][00394] Fps is (10 sec: 2867.8, 60 sec: 3481.6, 300 sec: 3209.9). Total num frames: 12791808. Throughput: 0: 871.0. Samples: 197574. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 12:26:37,606][00394] Avg episode reward: [(0, '25.155')] [2023-02-27 12:26:42,603][00394] Fps is (10 sec: 3686.5, 60 sec: 3549.9, 300 sec: 3244.0). Total num frames: 12816384. Throughput: 0: 899.4. Samples: 200882. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:26:42,606][00394] Avg episode reward: [(0, '25.320')] [2023-02-27 12:26:44,289][47465] Updated weights for policy 0, policy_version 3131 (0.0023) [2023-02-27 12:26:47,603][00394] Fps is (10 sec: 4505.6, 60 sec: 3550.2, 300 sec: 3260.7). Total num frames: 12836864. Throughput: 0: 905.3. Samples: 207508. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:26:47,610][00394] Avg episode reward: [(0, '26.214')] [2023-02-27 12:26:47,626][47447] Saving new best policy, reward=26.214! [2023-02-27 12:26:52,603][00394] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3245.3). Total num frames: 12849152. Throughput: 0: 866.9. Samples: 211754. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:26:52,606][00394] Avg episode reward: [(0, '26.207')] [2023-02-27 12:26:57,506][47465] Updated weights for policy 0, policy_version 3141 (0.0031) [2023-02-27 12:26:57,603][00394] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3245.9). Total num frames: 12865536. Throughput: 0: 868.7. Samples: 213816. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 12:26:57,606][00394] Avg episode reward: [(0, '25.927')] [2023-02-27 12:27:02,603][00394] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3261.6). Total num frames: 12886016. Throughput: 0: 905.7. Samples: 219662. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-27 12:27:02,606][00394] Avg episode reward: [(0, '25.486')] [2023-02-27 12:27:06,758][47465] Updated weights for policy 0, policy_version 3151 (0.0016) [2023-02-27 12:27:07,603][00394] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3276.8). Total num frames: 12906496. Throughput: 0: 908.4. Samples: 226320. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:27:07,610][00394] Avg episode reward: [(0, '24.556')] [2023-02-27 12:27:12,603][00394] Fps is (10 sec: 3686.4, 60 sec: 3618.4, 300 sec: 3276.8). Total num frames: 12922880. Throughput: 0: 881.5. Samples: 228418. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:27:12,608][00394] Avg episode reward: [(0, '24.172')] [2023-02-27 12:27:17,603][00394] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3262.4). Total num frames: 12935168. Throughput: 0: 875.6. Samples: 232636. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:27:17,609][00394] Avg episode reward: [(0, '22.179')] [2023-02-27 12:27:19,937][47465] Updated weights for policy 0, policy_version 3161 (0.0022) [2023-02-27 12:27:22,603][00394] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3290.9). Total num frames: 12959744. Throughput: 0: 915.3. Samples: 238762. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:27:22,608][00394] Avg episode reward: [(0, '22.699')] [2023-02-27 12:27:27,603][00394] Fps is (10 sec: 4505.6, 60 sec: 3618.3, 300 sec: 3304.6). Total num frames: 12980224. Throughput: 0: 914.8. Samples: 242048. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:27:27,609][00394] Avg episode reward: [(0, '24.003')] [2023-02-27 12:27:29,989][47465] Updated weights for policy 0, policy_version 3171 (0.0021) [2023-02-27 12:27:32,610][00394] Fps is (10 sec: 3274.5, 60 sec: 3549.5, 300 sec: 3346.1). Total num frames: 12992512. Throughput: 0: 878.1. Samples: 247028. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:27:32,613][00394] Avg episode reward: [(0, '24.111')] [2023-02-27 12:27:37,603][00394] Fps is (10 sec: 2867.2, 60 sec: 3618.1, 300 sec: 3401.8). Total num frames: 13008896. Throughput: 0: 877.9. Samples: 251258. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:27:37,610][00394] Avg episode reward: [(0, '23.771')] [2023-02-27 12:27:42,134][47465] Updated weights for policy 0, policy_version 3181 (0.0023) [2023-02-27 12:27:42,603][00394] Fps is (10 sec: 3689.0, 60 sec: 3549.9, 300 sec: 3471.2). Total num frames: 13029376. Throughput: 0: 902.3. Samples: 254420. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:27:42,610][00394] Avg episode reward: [(0, '24.889')] [2023-02-27 12:27:47,603][00394] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3540.6). Total num frames: 13049856. Throughput: 0: 918.4. Samples: 260988. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:27:47,608][00394] Avg episode reward: [(0, '24.477')] [2023-02-27 12:27:52,603][00394] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3526.7). Total num frames: 13066240. Throughput: 0: 876.1. Samples: 265744. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:27:52,609][00394] Avg episode reward: [(0, '24.669')] [2023-02-27 12:27:54,018][47465] Updated weights for policy 0, policy_version 3191 (0.0021) [2023-02-27 12:27:57,604][00394] Fps is (10 sec: 2867.0, 60 sec: 3549.8, 300 sec: 3512.9). Total num frames: 13078528. Throughput: 0: 874.5. Samples: 267772. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2023-02-27 12:27:57,609][00394] Avg episode reward: [(0, '23.978')] [2023-02-27 12:28:02,603][00394] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3526.8). Total num frames: 13099008. Throughput: 0: 902.2. Samples: 273236. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 12:28:02,608][00394] Avg episode reward: [(0, '24.067')] [2023-02-27 12:28:04,583][47465] Updated weights for policy 0, policy_version 3201 (0.0025) [2023-02-27 12:28:07,603][00394] Fps is (10 sec: 4505.9, 60 sec: 3618.1, 300 sec: 3568.4). Total num frames: 13123584. Throughput: 0: 916.8. Samples: 280018. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 12:28:07,610][00394] Avg episode reward: [(0, '23.944')] [2023-02-27 12:28:07,621][47447] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000003204_13123584.pth... [2023-02-27 12:28:07,872][47447] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002995_12267520.pth [2023-02-27 12:28:12,603][00394] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3526.7). Total num frames: 13135872. Throughput: 0: 891.1. Samples: 282148. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:28:12,607][00394] Avg episode reward: [(0, '23.433')] [2023-02-27 12:28:17,597][47465] Updated weights for policy 0, policy_version 3211 (0.0024) [2023-02-27 12:28:17,603][00394] Fps is (10 sec: 2457.6, 60 sec: 3549.9, 300 sec: 3512.8). Total num frames: 13148160. Throughput: 0: 873.9. Samples: 286346. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 12:28:17,609][00394] Avg episode reward: [(0, '24.024')] [2023-02-27 12:28:22,603][00394] Fps is (10 sec: 3686.3, 60 sec: 3549.9, 300 sec: 3540.6). Total num frames: 13172736. Throughput: 0: 907.8. Samples: 292110. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:28:22,611][00394] Avg episode reward: [(0, '24.609')] [2023-02-27 12:28:27,274][47465] Updated weights for policy 0, policy_version 3221 (0.0012) [2023-02-27 12:28:27,603][00394] Fps is (10 sec: 4505.6, 60 sec: 3549.9, 300 sec: 3568.4). Total num frames: 13193216. Throughput: 0: 910.8. Samples: 295406. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:28:27,610][00394] Avg episode reward: [(0, '24.000')] [2023-02-27 12:28:32,603][00394] Fps is (10 sec: 3686.5, 60 sec: 3618.6, 300 sec: 3554.5). Total num frames: 13209600. Throughput: 0: 881.9. Samples: 300672. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:28:32,607][00394] Avg episode reward: [(0, '23.523')] [2023-02-27 12:28:37,604][00394] Fps is (10 sec: 2867.0, 60 sec: 3549.8, 300 sec: 3526.8). Total num frames: 13221888. Throughput: 0: 869.9. Samples: 304890. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-27 12:28:37,612][00394] Avg episode reward: [(0, '22.908')] [2023-02-27 12:28:40,202][47465] Updated weights for policy 0, policy_version 3231 (0.0019) [2023-02-27 12:28:42,603][00394] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3540.6). Total num frames: 13242368. Throughput: 0: 886.3. Samples: 307654. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 12:28:42,614][00394] Avg episode reward: [(0, '21.978')] [2023-02-27 12:28:47,603][00394] Fps is (10 sec: 4096.4, 60 sec: 3549.9, 300 sec: 3568.4). Total num frames: 13262848. Throughput: 0: 910.2. Samples: 314194. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-27 12:28:47,605][00394] Avg episode reward: [(0, '22.774')] [2023-02-27 12:28:50,325][47465] Updated weights for policy 0, policy_version 3241 (0.0012) [2023-02-27 12:28:52,603][00394] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 13279232. Throughput: 0: 873.2. Samples: 319312. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:28:52,605][00394] Avg episode reward: [(0, '22.993')] [2023-02-27 12:28:57,609][00394] Fps is (10 sec: 3274.9, 60 sec: 3617.8, 300 sec: 3540.5). Total num frames: 13295616. Throughput: 0: 871.3. Samples: 321362. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:28:57,612][00394] Avg episode reward: [(0, '24.697')] [2023-02-27 12:29:02,603][00394] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3540.7). Total num frames: 13312000. Throughput: 0: 889.4. Samples: 326370. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-27 12:29:02,605][00394] Avg episode reward: [(0, '23.877')] [2023-02-27 12:29:02,899][47465] Updated weights for policy 0, policy_version 3251 (0.0013) [2023-02-27 12:29:07,603][00394] Fps is (10 sec: 4098.4, 60 sec: 3549.9, 300 sec: 3582.3). Total num frames: 13336576. Throughput: 0: 909.1. Samples: 333020. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:29:07,605][00394] Avg episode reward: [(0, '25.253')] [2023-02-27 12:29:12,603][00394] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 13348864. Throughput: 0: 895.2. Samples: 335690. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:29:12,618][00394] Avg episode reward: [(0, '25.601')] [2023-02-27 12:29:14,229][47465] Updated weights for policy 0, policy_version 3261 (0.0012) [2023-02-27 12:29:17,603][00394] Fps is (10 sec: 2867.2, 60 sec: 3618.1, 300 sec: 3540.6). Total num frames: 13365248. Throughput: 0: 868.8. Samples: 339768. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:29:17,609][00394] Avg episode reward: [(0, '26.585')] [2023-02-27 12:29:17,624][47447] Saving new best policy, reward=26.585! [2023-02-27 12:29:22,603][00394] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3540.6). Total num frames: 13381632. Throughput: 0: 893.3. Samples: 345088. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-27 12:29:22,614][00394] Avg episode reward: [(0, '27.246')] [2023-02-27 12:29:22,616][47447] Saving new best policy, reward=27.246! [2023-02-27 12:29:25,650][47465] Updated weights for policy 0, policy_version 3271 (0.0021) [2023-02-27 12:29:27,603][00394] Fps is (10 sec: 4095.9, 60 sec: 3549.9, 300 sec: 3568.4). Total num frames: 13406208. Throughput: 0: 903.9. Samples: 348330. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-27 12:29:27,605][00394] Avg episode reward: [(0, '27.334')] [2023-02-27 12:29:27,620][47447] Saving new best policy, reward=27.334! [2023-02-27 12:29:32,604][00394] Fps is (10 sec: 4095.5, 60 sec: 3549.8, 300 sec: 3568.4). Total num frames: 13422592. Throughput: 0: 885.1. Samples: 354024. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:29:32,611][00394] Avg episode reward: [(0, '27.011')] [2023-02-27 12:29:37,604][00394] Fps is (10 sec: 2867.1, 60 sec: 3549.9, 300 sec: 3540.6). Total num frames: 13434880. Throughput: 0: 863.5. Samples: 358168. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 12:29:37,609][00394] Avg episode reward: [(0, '26.117')] [2023-02-27 12:29:38,222][47465] Updated weights for policy 0, policy_version 3281 (0.0013) [2023-02-27 12:29:42,603][00394] Fps is (10 sec: 3277.1, 60 sec: 3549.8, 300 sec: 3554.5). Total num frames: 13455360. Throughput: 0: 869.8. Samples: 360498. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 12:29:42,605][00394] Avg episode reward: [(0, '25.807')] [2023-02-27 12:29:47,603][00394] Fps is (10 sec: 4096.2, 60 sec: 3549.9, 300 sec: 3568.4). Total num frames: 13475840. Throughput: 0: 904.3. Samples: 367062. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:29:47,609][00394] Avg episode reward: [(0, '25.822')] [2023-02-27 12:29:48,233][47465] Updated weights for policy 0, policy_version 3291 (0.0022) [2023-02-27 12:29:52,603][00394] Fps is (10 sec: 3686.5, 60 sec: 3549.9, 300 sec: 3568.4). Total num frames: 13492224. Throughput: 0: 880.0. Samples: 372622. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:29:52,610][00394] Avg episode reward: [(0, '24.253')] [2023-02-27 12:29:57,603][00394] Fps is (10 sec: 3276.8, 60 sec: 3550.2, 300 sec: 3554.5). Total num frames: 13508608. Throughput: 0: 868.5. Samples: 374772. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:29:57,617][00394] Avg episode reward: [(0, '24.582')] [2023-02-27 12:30:01,326][47465] Updated weights for policy 0, policy_version 3301 (0.0028) [2023-02-27 12:30:02,603][00394] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3540.6). Total num frames: 13524992. Throughput: 0: 878.5. Samples: 379302. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:30:02,611][00394] Avg episode reward: [(0, '24.098')] [2023-02-27 12:30:07,603][00394] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3568.4). Total num frames: 13545472. Throughput: 0: 904.8. Samples: 385802. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:30:07,606][00394] Avg episode reward: [(0, '24.046')] [2023-02-27 12:30:07,618][47447] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000003307_13545472.pth... [2023-02-27 12:30:07,815][47447] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000003099_12693504.pth [2023-02-27 12:30:11,136][47465] Updated weights for policy 0, policy_version 3311 (0.0012) [2023-02-27 12:30:12,603][00394] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3568.4). Total num frames: 13561856. Throughput: 0: 904.0. Samples: 389008. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:30:12,610][00394] Avg episode reward: [(0, '23.434')] [2023-02-27 12:30:17,603][00394] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3540.6). Total num frames: 13578240. Throughput: 0: 868.1. Samples: 393086. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:30:17,611][00394] Avg episode reward: [(0, '22.720')] [2023-02-27 12:30:22,603][00394] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3540.6). Total num frames: 13594624. Throughput: 0: 878.6. Samples: 397704. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 12:30:22,610][00394] Avg episode reward: [(0, '22.343')] [2023-02-27 12:30:24,219][47465] Updated weights for policy 0, policy_version 3321 (0.0016) [2023-02-27 12:30:27,603][00394] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3554.5). Total num frames: 13615104. Throughput: 0: 900.4. Samples: 401014. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:30:27,610][00394] Avg episode reward: [(0, '23.183')] [2023-02-27 12:30:32,603][00394] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3568.4). Total num frames: 13635584. Throughput: 0: 899.3. Samples: 407530. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:30:32,606][00394] Avg episode reward: [(0, '23.832')] [2023-02-27 12:30:35,026][47465] Updated weights for policy 0, policy_version 3331 (0.0034) [2023-02-27 12:30:37,603][00394] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3540.6). Total num frames: 13647872. Throughput: 0: 866.8. Samples: 411628. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 12:30:37,610][00394] Avg episode reward: [(0, '24.471')] [2023-02-27 12:30:42,603][00394] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3526.8). Total num frames: 13664256. Throughput: 0: 865.2. Samples: 413704. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:30:42,606][00394] Avg episode reward: [(0, '25.189')] [2023-02-27 12:30:46,933][47465] Updated weights for policy 0, policy_version 3341 (0.0027) [2023-02-27 12:30:47,603][00394] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3554.5). Total num frames: 13684736. Throughput: 0: 896.8. Samples: 419656. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 12:30:47,606][00394] Avg episode reward: [(0, '25.283')] [2023-02-27 12:30:52,609][00394] Fps is (10 sec: 4093.5, 60 sec: 3549.5, 300 sec: 3568.3). Total num frames: 13705216. Throughput: 0: 891.7. Samples: 425936. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:30:52,611][00394] Avg episode reward: [(0, '24.673')] [2023-02-27 12:30:57,605][00394] Fps is (10 sec: 3685.8, 60 sec: 3549.8, 300 sec: 3554.5). Total num frames: 13721600. Throughput: 0: 868.1. Samples: 428072. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-27 12:30:57,613][00394] Avg episode reward: [(0, '23.926')] [2023-02-27 12:30:58,979][47465] Updated weights for policy 0, policy_version 3351 (0.0026) [2023-02-27 12:31:02,603][00394] Fps is (10 sec: 2869.0, 60 sec: 3481.6, 300 sec: 3526.7). Total num frames: 13733888. Throughput: 0: 872.3. Samples: 432338. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:31:02,610][00394] Avg episode reward: [(0, '22.356')] [2023-02-27 12:31:07,603][00394] Fps is (10 sec: 3686.8, 60 sec: 3549.8, 300 sec: 3568.4). Total num frames: 13758464. Throughput: 0: 909.1. Samples: 438614. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-27 12:31:07,605][00394] Avg episode reward: [(0, '20.339')] [2023-02-27 12:31:09,188][47465] Updated weights for policy 0, policy_version 3361 (0.0021) [2023-02-27 12:31:12,603][00394] Fps is (10 sec: 4505.6, 60 sec: 3618.1, 300 sec: 3582.3). Total num frames: 13778944. Throughput: 0: 910.5. Samples: 441986. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:31:12,610][00394] Avg episode reward: [(0, '20.288')] [2023-02-27 12:31:17,608][00394] Fps is (10 sec: 3275.2, 60 sec: 3549.6, 300 sec: 3554.4). Total num frames: 13791232. Throughput: 0: 878.3. Samples: 447060. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:31:17,618][00394] Avg episode reward: [(0, '19.104')] [2023-02-27 12:31:22,150][47465] Updated weights for policy 0, policy_version 3371 (0.0024) [2023-02-27 12:31:22,603][00394] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3540.6). Total num frames: 13807616. Throughput: 0: 876.7. Samples: 451078. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:31:22,609][00394] Avg episode reward: [(0, '20.681')] [2023-02-27 12:31:27,603][00394] Fps is (10 sec: 3688.2, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 13828096. Throughput: 0: 900.6. Samples: 454232. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:31:27,611][00394] Avg episode reward: [(0, '22.663')] [2023-02-27 12:31:31,398][47465] Updated weights for policy 0, policy_version 3381 (0.0012) [2023-02-27 12:31:32,612][00394] Fps is (10 sec: 4505.1, 60 sec: 3618.1, 300 sec: 3596.1). Total num frames: 13852672. Throughput: 0: 919.7. Samples: 461042. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 12:31:32,615][00394] Avg episode reward: [(0, '23.490')] [2023-02-27 12:31:37,603][00394] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3554.5). Total num frames: 13864960. Throughput: 0: 881.3. Samples: 465588. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-27 12:31:37,608][00394] Avg episode reward: [(0, '24.674')] [2023-02-27 12:31:42,603][00394] Fps is (10 sec: 2457.9, 60 sec: 3549.9, 300 sec: 3526.7). Total num frames: 13877248. Throughput: 0: 878.8. Samples: 467618. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-27 12:31:42,611][00394] Avg episode reward: [(0, '25.741')] [2023-02-27 12:31:44,635][47465] Updated weights for policy 0, policy_version 3391 (0.0018) [2023-02-27 12:31:47,603][00394] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3568.4). Total num frames: 13901824. Throughput: 0: 907.6. Samples: 473182. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:31:47,610][00394] Avg episode reward: [(0, '27.413')] [2023-02-27 12:31:47,625][47447] Saving new best policy, reward=27.413! [2023-02-27 12:31:52,603][00394] Fps is (10 sec: 4505.7, 60 sec: 3618.5, 300 sec: 3582.3). Total num frames: 13922304. Throughput: 0: 912.6. Samples: 479682. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-27 12:31:52,609][00394] Avg episode reward: [(0, '28.543')] [2023-02-27 12:31:52,613][47447] Saving new best policy, reward=28.543! [2023-02-27 12:31:55,373][47465] Updated weights for policy 0, policy_version 3401 (0.0016) [2023-02-27 12:31:57,603][00394] Fps is (10 sec: 3276.8, 60 sec: 3550.0, 300 sec: 3554.5). Total num frames: 13934592. Throughput: 0: 884.5. Samples: 481788. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:31:57,608][00394] Avg episode reward: [(0, '26.938')] [2023-02-27 12:32:02,604][00394] Fps is (10 sec: 2867.0, 60 sec: 3618.1, 300 sec: 3540.6). Total num frames: 13950976. Throughput: 0: 864.5. Samples: 485960. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:32:02,615][00394] Avg episode reward: [(0, '26.518')] [2023-02-27 12:32:07,580][47465] Updated weights for policy 0, policy_version 3411 (0.0014) [2023-02-27 12:32:07,603][00394] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 13971456. Throughput: 0: 902.1. Samples: 491672. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:32:07,607][00394] Avg episode reward: [(0, '26.794')] [2023-02-27 12:32:07,622][47447] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000003411_13971456.pth... [2023-02-27 12:32:07,794][47447] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000003204_13123584.pth [2023-02-27 12:32:12,603][00394] Fps is (10 sec: 4096.3, 60 sec: 3549.9, 300 sec: 3582.3). Total num frames: 13991936. Throughput: 0: 902.7. Samples: 494854. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:32:12,611][00394] Avg episode reward: [(0, '25.678')] [2023-02-27 12:32:17,603][00394] Fps is (10 sec: 3276.8, 60 sec: 3550.2, 300 sec: 3540.6). Total num frames: 14004224. Throughput: 0: 871.9. Samples: 500276. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-27 12:32:17,609][00394] Avg episode reward: [(0, '24.496')] [2023-02-27 12:32:19,251][47465] Updated weights for policy 0, policy_version 3421 (0.0020) [2023-02-27 12:32:22,603][00394] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3526.7). Total num frames: 14020608. Throughput: 0: 862.7. Samples: 504408. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:32:22,610][00394] Avg episode reward: [(0, '23.926')] [2023-02-27 12:32:27,603][00394] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3554.6). Total num frames: 14041088. Throughput: 0: 874.0. Samples: 506946. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:32:27,608][00394] Avg episode reward: [(0, '24.789')] [2023-02-27 12:32:30,463][47465] Updated weights for policy 0, policy_version 3431 (0.0021) [2023-02-27 12:32:32,603][00394] Fps is (10 sec: 4096.0, 60 sec: 3481.7, 300 sec: 3568.4). Total num frames: 14061568. Throughput: 0: 898.0. Samples: 513590. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:32:32,609][00394] Avg episode reward: [(0, '24.318')] [2023-02-27 12:32:37,603][00394] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 14077952. Throughput: 0: 869.1. Samples: 518790. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:32:37,611][00394] Avg episode reward: [(0, '24.150')] [2023-02-27 12:32:42,603][00394] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3526.7). Total num frames: 14090240. Throughput: 0: 868.0. Samples: 520846. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:32:42,607][00394] Avg episode reward: [(0, '24.832')] [2023-02-27 12:32:43,163][47465] Updated weights for policy 0, policy_version 3441 (0.0035) [2023-02-27 12:32:47,603][00394] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3540.6). Total num frames: 14110720. Throughput: 0: 882.0. Samples: 525650. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:32:47,605][00394] Avg episode reward: [(0, '25.614')] [2023-02-27 12:32:52,603][00394] Fps is (10 sec: 4096.0, 60 sec: 3481.6, 300 sec: 3568.4). Total num frames: 14131200. Throughput: 0: 903.3. Samples: 532322. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:32:52,610][00394] Avg episode reward: [(0, '26.365')] [2023-02-27 12:32:52,875][47465] Updated weights for policy 0, policy_version 3451 (0.0014) [2023-02-27 12:32:57,605][00394] Fps is (10 sec: 3685.7, 60 sec: 3549.8, 300 sec: 3554.5). Total num frames: 14147584. Throughput: 0: 899.0. Samples: 535310. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-27 12:32:57,610][00394] Avg episode reward: [(0, '24.379')] [2023-02-27 12:33:02,603][00394] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3526.7). Total num frames: 14163968. Throughput: 0: 870.8. Samples: 539462. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:33:02,607][00394] Avg episode reward: [(0, '25.694')] [2023-02-27 12:33:06,043][47465] Updated weights for policy 0, policy_version 3461 (0.0012) [2023-02-27 12:33:07,603][00394] Fps is (10 sec: 3277.4, 60 sec: 3481.6, 300 sec: 3540.6). Total num frames: 14180352. Throughput: 0: 891.3. Samples: 544518. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:33:07,605][00394] Avg episode reward: [(0, '25.695')] [2023-02-27 12:33:12,603][00394] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3582.3). Total num frames: 14204928. Throughput: 0: 908.0. Samples: 547804. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:33:12,606][00394] Avg episode reward: [(0, '24.153')] [2023-02-27 12:33:15,466][47465] Updated weights for policy 0, policy_version 3471 (0.0012) [2023-02-27 12:33:17,603][00394] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3554.5). Total num frames: 14221312. Throughput: 0: 896.8. Samples: 553948. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 12:33:17,609][00394] Avg episode reward: [(0, '22.463')] [2023-02-27 12:33:22,606][00394] Fps is (10 sec: 2866.3, 60 sec: 3549.7, 300 sec: 3526.7). Total num frames: 14233600. Throughput: 0: 874.5. Samples: 558144. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:33:22,609][00394] Avg episode reward: [(0, '22.374')] [2023-02-27 12:33:27,603][00394] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3540.6). Total num frames: 14254080. Throughput: 0: 873.8. Samples: 560168. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 12:33:27,606][00394] Avg episode reward: [(0, '24.747')] [2023-02-27 12:33:28,525][47465] Updated weights for policy 0, policy_version 3481 (0.0026) [2023-02-27 12:33:32,603][00394] Fps is (10 sec: 4097.3, 60 sec: 3549.9, 300 sec: 3568.4). Total num frames: 14274560. Throughput: 0: 911.2. Samples: 566652. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-27 12:33:32,610][00394] Avg episode reward: [(0, '23.463')] [2023-02-27 12:33:37,603][00394] Fps is (10 sec: 4096.1, 60 sec: 3618.1, 300 sec: 3568.4). Total num frames: 14295040. Throughput: 0: 896.8. Samples: 572678. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-27 12:33:37,608][00394] Avg episode reward: [(0, '23.962')] [2023-02-27 12:33:38,951][47465] Updated weights for policy 0, policy_version 3491 (0.0020) [2023-02-27 12:33:42,608][00394] Fps is (10 sec: 3275.2, 60 sec: 3617.8, 300 sec: 3540.6). Total num frames: 14307328. Throughput: 0: 875.9. Samples: 574728. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-27 12:33:42,611][00394] Avg episode reward: [(0, '23.450')] [2023-02-27 12:33:47,603][00394] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3540.6). Total num frames: 14323712. Throughput: 0: 880.4. Samples: 579080. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 12:33:47,611][00394] Avg episode reward: [(0, '23.769')] [2023-02-27 12:33:50,759][47465] Updated weights for policy 0, policy_version 3501 (0.0016) [2023-02-27 12:33:52,603][00394] Fps is (10 sec: 3688.2, 60 sec: 3549.9, 300 sec: 3554.6). Total num frames: 14344192. Throughput: 0: 916.1. Samples: 585742. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:33:52,611][00394] Avg episode reward: [(0, '23.897')] [2023-02-27 12:33:57,603][00394] Fps is (10 sec: 4096.0, 60 sec: 3618.3, 300 sec: 3568.4). Total num frames: 14364672. Throughput: 0: 916.7. Samples: 589056. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:33:57,610][00394] Avg episode reward: [(0, '23.325')] [2023-02-27 12:34:02,579][47465] Updated weights for policy 0, policy_version 3511 (0.0019) [2023-02-27 12:34:02,603][00394] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3540.6). Total num frames: 14381056. Throughput: 0: 876.8. Samples: 593404. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 12:34:02,610][00394] Avg episode reward: [(0, '23.122')] [2023-02-27 12:34:07,603][00394] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3554.5). Total num frames: 14397440. Throughput: 0: 885.5. Samples: 597990. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-27 12:34:07,606][00394] Avg episode reward: [(0, '22.917')] [2023-02-27 12:34:07,620][47447] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000003515_14397440.pth... [2023-02-27 12:34:07,798][47447] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000003307_13545472.pth [2023-02-27 12:34:12,603][00394] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3568.4). Total num frames: 14417920. Throughput: 0: 913.7. Samples: 601284. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-27 12:34:12,606][00394] Avg episode reward: [(0, '22.576')] [2023-02-27 12:34:13,319][47465] Updated weights for policy 0, policy_version 3521 (0.0033) [2023-02-27 12:34:17,603][00394] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3582.3). Total num frames: 14438400. Throughput: 0: 917.7. Samples: 607948. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-27 12:34:17,608][00394] Avg episode reward: [(0, '22.725')] [2023-02-27 12:34:22,603][00394] Fps is (10 sec: 3276.8, 60 sec: 3618.3, 300 sec: 3540.6). Total num frames: 14450688. Throughput: 0: 876.4. Samples: 612118. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:34:22,606][00394] Avg episode reward: [(0, '22.287')] [2023-02-27 12:34:26,489][47465] Updated weights for policy 0, policy_version 3531 (0.0044) [2023-02-27 12:34:27,603][00394] Fps is (10 sec: 2457.6, 60 sec: 3481.6, 300 sec: 3526.7). Total num frames: 14462976. Throughput: 0: 875.4. Samples: 614118. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:34:27,612][00394] Avg episode reward: [(0, '23.469')] [2023-02-27 12:34:32,603][00394] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3568.4). Total num frames: 14487552. Throughput: 0: 906.6. Samples: 619876. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:34:32,610][00394] Avg episode reward: [(0, '23.504')] [2023-02-27 12:34:35,985][47465] Updated weights for policy 0, policy_version 3541 (0.0025) [2023-02-27 12:34:37,603][00394] Fps is (10 sec: 4505.6, 60 sec: 3549.9, 300 sec: 3568.4). Total num frames: 14508032. Throughput: 0: 903.2. Samples: 626384. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:34:37,612][00394] Avg episode reward: [(0, '23.613')] [2023-02-27 12:34:42,603][00394] Fps is (10 sec: 3686.4, 60 sec: 3618.4, 300 sec: 3554.5). Total num frames: 14524416. Throughput: 0: 875.9. Samples: 628472. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-27 12:34:42,608][00394] Avg episode reward: [(0, '23.367')] [2023-02-27 12:34:47,603][00394] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3540.6). Total num frames: 14536704. Throughput: 0: 870.8. Samples: 632588. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-27 12:34:47,611][00394] Avg episode reward: [(0, '22.671')] [2023-02-27 12:34:49,233][47465] Updated weights for policy 0, policy_version 3551 (0.0017) [2023-02-27 12:34:52,603][00394] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 14557184. Throughput: 0: 902.2. Samples: 638590. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-27 12:34:52,612][00394] Avg episode reward: [(0, '22.909')] [2023-02-27 12:34:57,603][00394] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3568.4). Total num frames: 14577664. Throughput: 0: 902.7. Samples: 641904. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:34:57,605][00394] Avg episode reward: [(0, '22.056')] [2023-02-27 12:34:58,983][47465] Updated weights for policy 0, policy_version 3561 (0.0012) [2023-02-27 12:35:02,603][00394] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 14594048. Throughput: 0: 870.8. Samples: 647134. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-27 12:35:02,611][00394] Avg episode reward: [(0, '22.705')] [2023-02-27 12:35:07,603][00394] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3540.6). Total num frames: 14606336. Throughput: 0: 870.6. Samples: 651294. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:35:07,609][00394] Avg episode reward: [(0, '23.382')] [2023-02-27 12:35:11,518][47465] Updated weights for policy 0, policy_version 3571 (0.0017) [2023-02-27 12:35:12,603][00394] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3568.4). Total num frames: 14630912. Throughput: 0: 895.3. Samples: 654406. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:35:12,606][00394] Avg episode reward: [(0, '24.156')] [2023-02-27 12:35:17,603][00394] Fps is (10 sec: 4505.6, 60 sec: 3549.9, 300 sec: 3582.3). Total num frames: 14651392. Throughput: 0: 913.2. Samples: 660970. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:35:17,608][00394] Avg episode reward: [(0, '24.364')] [2023-02-27 12:35:22,603][00394] Fps is (10 sec: 3276.7, 60 sec: 3549.8, 300 sec: 3554.5). Total num frames: 14663680. Throughput: 0: 877.0. Samples: 665848. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 12:35:22,610][00394] Avg episode reward: [(0, '24.373')] [2023-02-27 12:35:22,750][47465] Updated weights for policy 0, policy_version 3581 (0.0014) [2023-02-27 12:35:27,603][00394] Fps is (10 sec: 2867.1, 60 sec: 3618.1, 300 sec: 3540.6). Total num frames: 14680064. Throughput: 0: 877.1. Samples: 667940. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:35:27,614][00394] Avg episode reward: [(0, '25.920')] [2023-02-27 12:35:32,603][00394] Fps is (10 sec: 3686.6, 60 sec: 3549.9, 300 sec: 3568.4). Total num frames: 14700544. Throughput: 0: 907.7. Samples: 673436. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-27 12:35:32,606][00394] Avg episode reward: [(0, '25.193')] [2023-02-27 12:35:33,984][47465] Updated weights for policy 0, policy_version 3591 (0.0020) [2023-02-27 12:35:37,603][00394] Fps is (10 sec: 4096.1, 60 sec: 3549.9, 300 sec: 3582.3). Total num frames: 14721024. Throughput: 0: 919.3. Samples: 679958. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:35:37,606][00394] Avg episode reward: [(0, '23.773')] [2023-02-27 12:35:42,603][00394] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3568.4). Total num frames: 14737408. Throughput: 0: 900.9. Samples: 682446. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:35:42,611][00394] Avg episode reward: [(0, '22.873')] [2023-02-27 12:35:46,094][47465] Updated weights for policy 0, policy_version 3601 (0.0015) [2023-02-27 12:35:47,603][00394] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3540.7). Total num frames: 14749696. Throughput: 0: 876.0. Samples: 686554. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:35:47,616][00394] Avg episode reward: [(0, '23.253')] [2023-02-27 12:35:52,603][00394] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 14770176. Throughput: 0: 908.0. Samples: 692152. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-27 12:35:52,605][00394] Avg episode reward: [(0, '23.321')] [2023-02-27 12:35:56,514][47465] Updated weights for policy 0, policy_version 3611 (0.0024) [2023-02-27 12:35:57,603][00394] Fps is (10 sec: 4505.6, 60 sec: 3618.1, 300 sec: 3596.1). Total num frames: 14794752. Throughput: 0: 913.0. Samples: 695492. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:35:57,606][00394] Avg episode reward: [(0, '21.411')] [2023-02-27 12:36:02,603][00394] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 14807040. Throughput: 0: 888.0. Samples: 700932. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:36:02,611][00394] Avg episode reward: [(0, '22.540')] [2023-02-27 12:36:07,604][00394] Fps is (10 sec: 2867.0, 60 sec: 3618.1, 300 sec: 3540.6). Total num frames: 14823424. Throughput: 0: 871.9. Samples: 705084. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:36:07,606][00394] Avg episode reward: [(0, '23.299')] [2023-02-27 12:36:07,625][47447] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000003619_14823424.pth... [2023-02-27 12:36:07,841][47447] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000003411_13971456.pth [2023-02-27 12:36:09,817][47465] Updated weights for policy 0, policy_version 3621 (0.0023) [2023-02-27 12:36:12,603][00394] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3568.4). Total num frames: 14843904. Throughput: 0: 880.3. Samples: 707554. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:36:12,606][00394] Avg episode reward: [(0, '22.745')] [2023-02-27 12:36:17,603][00394] Fps is (10 sec: 4096.2, 60 sec: 3549.9, 300 sec: 3582.3). Total num frames: 14864384. Throughput: 0: 907.0. Samples: 714252. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:36:17,611][00394] Avg episode reward: [(0, '22.282')] [2023-02-27 12:36:19,116][47465] Updated weights for policy 0, policy_version 3631 (0.0025) [2023-02-27 12:36:22,603][00394] Fps is (10 sec: 3686.4, 60 sec: 3618.2, 300 sec: 3568.4). Total num frames: 14880768. Throughput: 0: 878.8. Samples: 719502. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:36:22,613][00394] Avg episode reward: [(0, '23.255')] [2023-02-27 12:36:27,603][00394] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3526.7). Total num frames: 14893056. Throughput: 0: 870.1. Samples: 721600. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:36:27,605][00394] Avg episode reward: [(0, '23.309')] [2023-02-27 12:36:32,292][47465] Updated weights for policy 0, policy_version 3641 (0.0031) [2023-02-27 12:36:32,603][00394] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 14913536. Throughput: 0: 885.7. Samples: 726410. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:36:32,616][00394] Avg episode reward: [(0, '23.409')] [2023-02-27 12:36:37,603][00394] Fps is (10 sec: 4096.1, 60 sec: 3549.9, 300 sec: 3582.3). Total num frames: 14934016. Throughput: 0: 906.8. Samples: 732960. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:36:37,611][00394] Avg episode reward: [(0, '23.854')] [2023-02-27 12:36:42,603][00394] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 14950400. Throughput: 0: 901.0. Samples: 736038. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:36:42,610][00394] Avg episode reward: [(0, '25.768')] [2023-02-27 12:36:42,847][47465] Updated weights for policy 0, policy_version 3651 (0.0022) [2023-02-27 12:36:47,606][00394] Fps is (10 sec: 3275.8, 60 sec: 3617.9, 300 sec: 3540.6). Total num frames: 14966784. Throughput: 0: 872.8. Samples: 740210. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:36:47,612][00394] Avg episode reward: [(0, '26.055')] [2023-02-27 12:36:52,603][00394] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 14983168. Throughput: 0: 889.3. Samples: 745100. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:36:52,613][00394] Avg episode reward: [(0, '27.087')] [2023-02-27 12:36:54,938][47465] Updated weights for policy 0, policy_version 3661 (0.0032) [2023-02-27 12:36:57,607][00394] Fps is (10 sec: 3686.1, 60 sec: 3481.4, 300 sec: 3568.3). Total num frames: 15003648. Throughput: 0: 906.1. Samples: 748334. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:36:57,610][00394] Avg episode reward: [(0, '27.212')] [2023-02-27 12:37:02,603][00394] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3568.4). Total num frames: 15024128. Throughput: 0: 891.7. Samples: 754376. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:37:02,606][00394] Avg episode reward: [(0, '26.920')] [2023-02-27 12:37:06,724][47465] Updated weights for policy 0, policy_version 3671 (0.0020) [2023-02-27 12:37:07,603][00394] Fps is (10 sec: 3278.1, 60 sec: 3549.9, 300 sec: 3540.6). Total num frames: 15036416. Throughput: 0: 868.0. Samples: 758560. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 12:37:07,606][00394] Avg episode reward: [(0, '27.278')] [2023-02-27 12:37:12,603][00394] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3554.5). Total num frames: 15052800. Throughput: 0: 866.9. Samples: 760612. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:37:12,606][00394] Avg episode reward: [(0, '26.393')] [2023-02-27 12:37:17,538][47465] Updated weights for policy 0, policy_version 3681 (0.0014) [2023-02-27 12:37:17,603][00394] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3582.3). Total num frames: 15077376. Throughput: 0: 904.0. Samples: 767092. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 12:37:17,610][00394] Avg episode reward: [(0, '26.899')] [2023-02-27 12:37:22,603][00394] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3568.4). Total num frames: 15093760. Throughput: 0: 891.2. Samples: 773062. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 12:37:22,607][00394] Avg episode reward: [(0, '26.427')] [2023-02-27 12:37:27,603][00394] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3540.6). Total num frames: 15106048. Throughput: 0: 867.3. Samples: 775068. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:37:27,606][00394] Avg episode reward: [(0, '26.840')] [2023-02-27 12:37:30,628][47465] Updated weights for policy 0, policy_version 3691 (0.0018) [2023-02-27 12:37:32,603][00394] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 15126528. Throughput: 0: 869.2. Samples: 779320. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-27 12:37:32,606][00394] Avg episode reward: [(0, '25.015')] [2023-02-27 12:37:37,603][00394] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3582.3). Total num frames: 15147008. Throughput: 0: 908.7. Samples: 785992. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 12:37:37,606][00394] Avg episode reward: [(0, '25.030')] [2023-02-27 12:37:39,826][47465] Updated weights for policy 0, policy_version 3701 (0.0013) [2023-02-27 12:37:42,603][00394] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3582.3). Total num frames: 15167488. Throughput: 0: 912.4. Samples: 789388. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:37:42,607][00394] Avg episode reward: [(0, '23.222')] [2023-02-27 12:37:47,603][00394] Fps is (10 sec: 3276.8, 60 sec: 3550.1, 300 sec: 3554.5). Total num frames: 15179776. Throughput: 0: 875.3. Samples: 793766. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:37:47,607][00394] Avg episode reward: [(0, '22.401')] [2023-02-27 12:37:52,603][00394] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 15196160. Throughput: 0: 881.0. Samples: 798204. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:37:52,608][00394] Avg episode reward: [(0, '23.037')] [2023-02-27 12:37:53,191][47465] Updated weights for policy 0, policy_version 3711 (0.0020) [2023-02-27 12:37:57,603][00394] Fps is (10 sec: 3686.4, 60 sec: 3550.1, 300 sec: 3568.4). Total num frames: 15216640. Throughput: 0: 907.1. Samples: 801432. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:37:57,606][00394] Avg episode reward: [(0, '23.523')] [2023-02-27 12:38:02,603][00394] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3582.3). Total num frames: 15237120. Throughput: 0: 908.3. Samples: 807966. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 12:38:02,610][00394] Avg episode reward: [(0, '24.048')] [2023-02-27 12:38:03,347][47465] Updated weights for policy 0, policy_version 3721 (0.0014) [2023-02-27 12:38:07,603][00394] Fps is (10 sec: 3276.7, 60 sec: 3549.9, 300 sec: 3540.6). Total num frames: 15249408. Throughput: 0: 867.7. Samples: 812108. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:38:07,620][00394] Avg episode reward: [(0, '24.317')] [2023-02-27 12:38:07,645][47447] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000003723_15249408.pth... [2023-02-27 12:38:07,947][47447] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000003515_14397440.pth [2023-02-27 12:38:12,603][00394] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3540.6). Total num frames: 15265792. Throughput: 0: 867.9. Samples: 814124. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:38:12,606][00394] Avg episode reward: [(0, '25.960')] [2023-02-27 12:38:15,748][47465] Updated weights for policy 0, policy_version 3731 (0.0041) [2023-02-27 12:38:17,603][00394] Fps is (10 sec: 3686.5, 60 sec: 3481.6, 300 sec: 3568.4). Total num frames: 15286272. Throughput: 0: 909.0. Samples: 820224. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-27 12:38:17,606][00394] Avg episode reward: [(0, '26.885')] [2023-02-27 12:38:22,603][00394] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3568.4). Total num frames: 15306752. Throughput: 0: 901.3. Samples: 826550. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-27 12:38:22,614][00394] Avg episode reward: [(0, '26.914')] [2023-02-27 12:38:27,041][47465] Updated weights for policy 0, policy_version 3741 (0.0016) [2023-02-27 12:38:27,603][00394] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3554.5). Total num frames: 15323136. Throughput: 0: 871.2. Samples: 828592. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-27 12:38:27,610][00394] Avg episode reward: [(0, '26.841')] [2023-02-27 12:38:32,603][00394] Fps is (10 sec: 2867.1, 60 sec: 3481.6, 300 sec: 3526.7). Total num frames: 15335424. Throughput: 0: 865.7. Samples: 832724. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-27 12:38:32,607][00394] Avg episode reward: [(0, '26.532')] [2023-02-27 12:38:37,603][00394] Fps is (10 sec: 3686.3, 60 sec: 3549.9, 300 sec: 3568.4). Total num frames: 15360000. Throughput: 0: 908.4. Samples: 839082. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:38:37,610][00394] Avg episode reward: [(0, '27.558')] [2023-02-27 12:38:38,340][47465] Updated weights for policy 0, policy_version 3751 (0.0024) [2023-02-27 12:38:42,603][00394] Fps is (10 sec: 4505.8, 60 sec: 3549.9, 300 sec: 3582.3). Total num frames: 15380480. Throughput: 0: 910.8. Samples: 842418. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:38:42,611][00394] Avg episode reward: [(0, '26.173')] [2023-02-27 12:38:47,606][00394] Fps is (10 sec: 3275.9, 60 sec: 3549.7, 300 sec: 3554.5). Total num frames: 15392768. Throughput: 0: 872.8. Samples: 847244. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:38:47,613][00394] Avg episode reward: [(0, '26.044')] [2023-02-27 12:38:50,850][47465] Updated weights for policy 0, policy_version 3761 (0.0023) [2023-02-27 12:38:52,603][00394] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3540.6). Total num frames: 15409152. Throughput: 0: 873.7. Samples: 851426. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-27 12:38:52,612][00394] Avg episode reward: [(0, '26.236')] [2023-02-27 12:38:57,603][00394] Fps is (10 sec: 3687.5, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 15429632. Throughput: 0: 900.7. Samples: 854656. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 12:38:57,605][00394] Avg episode reward: [(0, '27.420')] [2023-02-27 12:39:00,649][47465] Updated weights for policy 0, policy_version 3771 (0.0012) [2023-02-27 12:39:02,603][00394] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3568.4). Total num frames: 15450112. Throughput: 0: 910.2. Samples: 861182. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 12:39:02,605][00394] Avg episode reward: [(0, '26.789')] [2023-02-27 12:39:07,603][00394] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3554.5). Total num frames: 15466496. Throughput: 0: 869.7. Samples: 865688. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-27 12:39:07,606][00394] Avg episode reward: [(0, '27.136')] [2023-02-27 12:39:12,603][00394] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3526.7). Total num frames: 15478784. Throughput: 0: 869.7. Samples: 867728. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-27 12:39:12,606][00394] Avg episode reward: [(0, '26.613')] [2023-02-27 12:39:14,085][47465] Updated weights for policy 0, policy_version 3781 (0.0024) [2023-02-27 12:39:17,603][00394] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 15499264. Throughput: 0: 900.5. Samples: 873248. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:39:17,610][00394] Avg episode reward: [(0, '28.290')] [2023-02-27 12:39:22,604][00394] Fps is (10 sec: 4505.4, 60 sec: 3618.1, 300 sec: 3596.1). Total num frames: 15523840. Throughput: 0: 907.3. Samples: 879910. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 12:39:22,614][00394] Avg episode reward: [(0, '28.558')] [2023-02-27 12:39:22,617][47447] Saving new best policy, reward=28.558! [2023-02-27 12:39:23,974][47465] Updated weights for policy 0, policy_version 3791 (0.0014) [2023-02-27 12:39:27,603][00394] Fps is (10 sec: 3686.3, 60 sec: 3549.8, 300 sec: 3554.5). Total num frames: 15536128. Throughput: 0: 879.9. Samples: 882016. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:39:27,606][00394] Avg episode reward: [(0, '28.321')] [2023-02-27 12:39:32,604][00394] Fps is (10 sec: 2457.6, 60 sec: 3549.9, 300 sec: 3526.7). Total num frames: 15548416. Throughput: 0: 863.6. Samples: 886102. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:39:32,611][00394] Avg episode reward: [(0, '27.980')] [2023-02-27 12:39:36,937][47465] Updated weights for policy 0, policy_version 3801 (0.0017) [2023-02-27 12:39:37,603][00394] Fps is (10 sec: 3276.9, 60 sec: 3481.6, 300 sec: 3540.6). Total num frames: 15568896. Throughput: 0: 897.7. Samples: 891824. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:39:37,606][00394] Avg episode reward: [(0, '28.866')] [2023-02-27 12:39:37,615][47447] Saving new best policy, reward=28.866! [2023-02-27 12:39:42,603][00394] Fps is (10 sec: 4505.9, 60 sec: 3549.9, 300 sec: 3582.3). Total num frames: 15593472. Throughput: 0: 898.7. Samples: 895098. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-27 12:39:42,605][00394] Avg episode reward: [(0, '29.020')] [2023-02-27 12:39:42,612][47447] Saving new best policy, reward=29.020! [2023-02-27 12:39:47,603][00394] Fps is (10 sec: 3686.4, 60 sec: 3550.0, 300 sec: 3554.5). Total num frames: 15605760. Throughput: 0: 868.7. Samples: 900272. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-27 12:39:47,611][00394] Avg episode reward: [(0, '28.330')] [2023-02-27 12:39:47,958][47465] Updated weights for policy 0, policy_version 3811 (0.0021) [2023-02-27 12:39:52,603][00394] Fps is (10 sec: 2457.6, 60 sec: 3481.6, 300 sec: 3526.7). Total num frames: 15618048. Throughput: 0: 859.7. Samples: 904374. Policy #0 lag: (min: 0.0, avg: 0.3, max: 2.0) [2023-02-27 12:39:52,612][00394] Avg episode reward: [(0, '28.337')] [2023-02-27 12:39:57,603][00394] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3540.6). Total num frames: 15638528. Throughput: 0: 873.9. Samples: 907054. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-27 12:39:57,611][00394] Avg episode reward: [(0, '27.669')] [2023-02-27 12:39:59,683][47465] Updated weights for policy 0, policy_version 3821 (0.0012) [2023-02-27 12:40:02,603][00394] Fps is (10 sec: 4505.6, 60 sec: 3549.9, 300 sec: 3582.3). Total num frames: 15663104. Throughput: 0: 898.9. Samples: 913700. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:40:02,608][00394] Avg episode reward: [(0, '26.112')] [2023-02-27 12:40:07,603][00394] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3540.6). Total num frames: 15675392. Throughput: 0: 862.1. Samples: 918702. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:40:07,609][00394] Avg episode reward: [(0, '24.931')] [2023-02-27 12:40:07,625][47447] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000003827_15675392.pth... [2023-02-27 12:40:07,897][47447] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000003619_14823424.pth [2023-02-27 12:40:12,491][47465] Updated weights for policy 0, policy_version 3831 (0.0012) [2023-02-27 12:40:12,603][00394] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3526.7). Total num frames: 15691776. Throughput: 0: 861.2. Samples: 920770. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:40:12,606][00394] Avg episode reward: [(0, '24.978')] [2023-02-27 12:40:17,603][00394] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 15712256. Throughput: 0: 881.5. Samples: 925770. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-27 12:40:17,606][00394] Avg episode reward: [(0, '25.431')] [2023-02-27 12:40:22,055][47465] Updated weights for policy 0, policy_version 3841 (0.0018) [2023-02-27 12:40:22,603][00394] Fps is (10 sec: 4095.9, 60 sec: 3481.6, 300 sec: 3568.4). Total num frames: 15732736. Throughput: 0: 901.5. Samples: 932394. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 12:40:22,607][00394] Avg episode reward: [(0, '25.530')] [2023-02-27 12:40:27,603][00394] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 15749120. Throughput: 0: 893.5. Samples: 935304. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:40:27,606][00394] Avg episode reward: [(0, '26.860')] [2023-02-27 12:40:32,603][00394] Fps is (10 sec: 2867.3, 60 sec: 3549.9, 300 sec: 3526.7). Total num frames: 15761408. Throughput: 0: 871.5. Samples: 939490. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-27 12:40:32,609][00394] Avg episode reward: [(0, '27.033')] [2023-02-27 12:40:35,258][47465] Updated weights for policy 0, policy_version 3851 (0.0017) [2023-02-27 12:40:37,603][00394] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3540.6). Total num frames: 15781888. Throughput: 0: 893.6. Samples: 944586. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-27 12:40:37,605][00394] Avg episode reward: [(0, '26.288')] [2023-02-27 12:40:42,603][00394] Fps is (10 sec: 4096.0, 60 sec: 3481.6, 300 sec: 3568.4). Total num frames: 15802368. Throughput: 0: 907.9. Samples: 947908. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-27 12:40:42,606][00394] Avg episode reward: [(0, '27.333')] [2023-02-27 12:40:44,509][47465] Updated weights for policy 0, policy_version 3861 (0.0012) [2023-02-27 12:40:47,606][00394] Fps is (10 sec: 4094.7, 60 sec: 3617.9, 300 sec: 3568.3). Total num frames: 15822848. Throughput: 0: 896.0. Samples: 954022. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-27 12:40:47,609][00394] Avg episode reward: [(0, '27.727')] [2023-02-27 12:40:52,603][00394] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3526.7). Total num frames: 15835136. Throughput: 0: 876.2. Samples: 958130. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-27 12:40:52,612][00394] Avg episode reward: [(0, '28.085')] [2023-02-27 12:40:57,603][00394] Fps is (10 sec: 2868.1, 60 sec: 3549.9, 300 sec: 3540.6). Total num frames: 15851520. Throughput: 0: 877.0. Samples: 960234. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2023-02-27 12:40:57,605][00394] Avg episode reward: [(0, '26.525')] [2023-02-27 12:40:57,743][47465] Updated weights for policy 0, policy_version 3871 (0.0025) [2023-02-27 12:41:02,603][00394] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3568.4). Total num frames: 15876096. Throughput: 0: 915.0. Samples: 966946. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-27 12:41:02,611][00394] Avg episode reward: [(0, '25.830')] [2023-02-27 12:41:07,603][00394] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3554.5). Total num frames: 15892480. Throughput: 0: 896.6. Samples: 972742. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-27 12:41:07,616][00394] Avg episode reward: [(0, '26.253')] [2023-02-27 12:41:07,952][47465] Updated weights for policy 0, policy_version 3881 (0.0016) [2023-02-27 12:41:12,604][00394] Fps is (10 sec: 3276.4, 60 sec: 3618.1, 300 sec: 3540.6). Total num frames: 15908864. Throughput: 0: 877.5. Samples: 974792. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-27 12:41:12,606][00394] Avg episode reward: [(0, '25.187')] [2023-02-27 12:41:17,603][00394] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3540.6). Total num frames: 15925248. Throughput: 0: 884.4. Samples: 979288. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-27 12:41:17,609][00394] Avg episode reward: [(0, '25.891')] [2023-02-27 12:41:20,144][47465] Updated weights for policy 0, policy_version 3891 (0.0031) [2023-02-27 12:41:22,603][00394] Fps is (10 sec: 3686.9, 60 sec: 3549.9, 300 sec: 3568.4). Total num frames: 15945728. Throughput: 0: 916.6. Samples: 985834. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-27 12:41:22,606][00394] Avg episode reward: [(0, '25.764')] [2023-02-27 12:41:27,603][00394] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3568.4). Total num frames: 15966208. Throughput: 0: 916.2. Samples: 989136. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-27 12:41:27,607][00394] Avg episode reward: [(0, '26.755')] [2023-02-27 12:41:31,716][47465] Updated weights for policy 0, policy_version 3901 (0.0016) [2023-02-27 12:41:32,607][00394] Fps is (10 sec: 3275.5, 60 sec: 3617.9, 300 sec: 3540.6). Total num frames: 15978496. Throughput: 0: 875.5. Samples: 993420. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-27 12:41:32,615][00394] Avg episode reward: [(0, '26.698')] [2023-02-27 12:41:37,603][00394] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3540.6). Total num frames: 15994880. Throughput: 0: 890.3. Samples: 998194. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-27 12:41:37,606][00394] Avg episode reward: [(0, '26.868')] [2023-02-27 12:41:39,710][47447] Stopping Batcher_0... [2023-02-27 12:41:39,711][47447] Loop batcher_evt_loop terminating... [2023-02-27 12:41:39,714][47447] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000003908_16007168.pth... [2023-02-27 12:41:39,711][00394] Component Batcher_0 stopped! [2023-02-27 12:41:39,805][47465] Weights refcount: 2 0 [2023-02-27 12:41:39,841][00394] Component InferenceWorker_p0-w0 stopped! [2023-02-27 12:41:39,843][47465] Stopping InferenceWorker_p0-w0... [2023-02-27 12:41:39,844][47465] Loop inference_proc0-0_evt_loop terminating... [2023-02-27 12:41:39,920][47447] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000003723_15249408.pth [2023-02-27 12:41:39,925][47447] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000003908_16007168.pth... [2023-02-27 12:41:40,066][47447] Stopping LearnerWorker_p0... [2023-02-27 12:41:40,068][47447] Loop learner_proc0_evt_loop terminating... [2023-02-27 12:41:40,066][00394] Component LearnerWorker_p0 stopped! [2023-02-27 12:41:40,260][47476] Stopping RolloutWorker_w3... [2023-02-27 12:41:40,261][47466] Stopping RolloutWorker_w1... [2023-02-27 12:41:40,259][00394] Component RolloutWorker_w0 stopped! [2023-02-27 12:41:40,267][47488] Stopping RolloutWorker_w5... [2023-02-27 12:41:40,268][47476] Loop rollout_proc3_evt_loop terminating... [2023-02-27 12:41:40,269][47466] Loop rollout_proc1_evt_loop terminating... [2023-02-27 12:41:40,268][00394] Component RolloutWorker_w3 stopped! [2023-02-27 12:41:40,268][47488] Loop rollout_proc5_evt_loop terminating... [2023-02-27 12:41:40,271][00394] Component RolloutWorker_w1 stopped! [2023-02-27 12:41:40,277][47486] Stopping RolloutWorker_w7... [2023-02-27 12:41:40,278][47486] Loop rollout_proc7_evt_loop terminating... [2023-02-27 12:41:40,275][00394] Component RolloutWorker_w5 stopped! [2023-02-27 12:41:40,279][00394] Component RolloutWorker_w7 stopped! [2023-02-27 12:41:40,280][47470] Stopping RolloutWorker_w0... [2023-02-27 12:41:40,302][47470] Loop rollout_proc0_evt_loop terminating... [2023-02-27 12:41:40,307][00394] Component RolloutWorker_w2 stopped! [2023-02-27 12:41:40,308][47467] Stopping RolloutWorker_w2... [2023-02-27 12:41:40,309][47467] Loop rollout_proc2_evt_loop terminating... [2023-02-27 12:41:40,317][00394] Component RolloutWorker_w6 stopped! [2023-02-27 12:41:40,321][47494] Stopping RolloutWorker_w6... [2023-02-27 12:41:40,326][47480] Stopping RolloutWorker_w4... [2023-02-27 12:41:40,326][00394] Component RolloutWorker_w4 stopped! [2023-02-27 12:41:40,328][00394] Waiting for process learner_proc0 to stop... [2023-02-27 12:41:40,342][47480] Loop rollout_proc4_evt_loop terminating... [2023-02-27 12:41:40,339][47494] Loop rollout_proc6_evt_loop terminating... [2023-02-27 12:41:43,575][00394] Waiting for process inference_proc0-0 to join... [2023-02-27 12:41:43,649][00394] Waiting for process rollout_proc0 to join... [2023-02-27 12:41:43,651][00394] Waiting for process rollout_proc1 to join... [2023-02-27 12:41:43,657][00394] Waiting for process rollout_proc2 to join... [2023-02-27 12:41:43,660][00394] Waiting for process rollout_proc3 to join... [2023-02-27 12:41:43,662][00394] Waiting for process rollout_proc4 to join... [2023-02-27 12:41:43,663][00394] Waiting for process rollout_proc5 to join... [2023-02-27 12:41:43,668][00394] Waiting for process rollout_proc6 to join... [2023-02-27 12:41:43,669][00394] Waiting for process rollout_proc7 to join... [2023-02-27 12:41:43,671][00394] Batcher 0 profile tree view: batching: 26.0965, releasing_batches: 0.0298 [2023-02-27 12:41:43,673][00394] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0026 wait_policy_total: 542.2606 update_model: 8.0453 weight_update: 0.0020 one_step: 0.0034 handle_policy_step: 547.5090 deserialize: 16.1034, stack: 3.1577, obs_to_device_normalize: 120.7245, forward: 263.6819, send_messages: 28.3321 prepare_outputs: 87.5433 to_cpu: 52.8791 [2023-02-27 12:41:43,674][00394] Learner 0 profile tree view: misc: 0.0072, prepare_batch: 18.2706 train: 81.1486 epoch_init: 0.0088, minibatch_init: 0.0078, losses_postprocess: 0.6623, kl_divergence: 0.6086, after_optimizer: 2.8761 calculate_losses: 26.9739 losses_init: 0.0162, forward_head: 1.9850, bptt_initial: 17.3610, tail: 1.2509, advantages_returns: 0.2622, losses: 3.3113 bptt: 2.4481 bptt_forward_core: 2.3531 update: 49.1618 clip: 1.5064 [2023-02-27 12:41:43,676][00394] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.3743, enqueue_policy_requests: 143.4421, env_step: 864.5080, overhead: 23.1713, complete_rollouts: 7.4822 save_policy_outputs: 21.7239 split_output_tensors: 10.6965 [2023-02-27 12:41:43,677][00394] RolloutWorker_w7 profile tree view: wait_for_trajectories: 0.3087, enqueue_policy_requests: 152.6712, env_step: 853.8602, overhead: 23.0619, complete_rollouts: 7.4721 save_policy_outputs: 21.9218 split_output_tensors: 10.8747 [2023-02-27 12:41:43,680][00394] Loop Runner_EvtLoop terminating... [2023-02-27 12:41:43,682][00394] Runner profile tree view: main_loop: 1172.2192 [2023-02-27 12:41:43,683][00394] Collected {0: 16007168}, FPS: 3413.9 [2023-02-27 12:41:43,781][00394] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2023-02-27 12:41:43,783][00394] Overriding arg 'num_workers' with value 1 passed from command line [2023-02-27 12:41:43,784][00394] Adding new argument 'no_render'=True that is not in the saved config file! [2023-02-27 12:41:43,788][00394] Adding new argument 'save_video'=True that is not in the saved config file! [2023-02-27 12:41:43,790][00394] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2023-02-27 12:41:43,792][00394] Adding new argument 'video_name'=None that is not in the saved config file! [2023-02-27 12:41:43,794][00394] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! [2023-02-27 12:41:43,795][00394] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2023-02-27 12:41:43,796][00394] Adding new argument 'push_to_hub'=False that is not in the saved config file! [2023-02-27 12:41:43,797][00394] Adding new argument 'hf_repository'=None that is not in the saved config file! [2023-02-27 12:41:43,798][00394] Adding new argument 'policy_index'=0 that is not in the saved config file! [2023-02-27 12:41:43,800][00394] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2023-02-27 12:41:43,801][00394] Adding new argument 'train_script'=None that is not in the saved config file! [2023-02-27 12:41:43,802][00394] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2023-02-27 12:41:43,803][00394] Using frameskip 1 and render_action_repeat=4 for evaluation [2023-02-27 12:41:43,842][00394] RunningMeanStd input shape: (3, 72, 128) [2023-02-27 12:41:43,851][00394] RunningMeanStd input shape: (1,) [2023-02-27 12:41:43,882][00394] ConvEncoder: input_channels=3 [2023-02-27 12:41:44,027][00394] Conv encoder output size: 512 [2023-02-27 12:41:44,029][00394] Policy head output size: 512 [2023-02-27 12:41:44,139][00394] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000003908_16007168.pth... [2023-02-27 12:41:45,198][00394] Num frames 100... [2023-02-27 12:41:45,320][00394] Num frames 200... [2023-02-27 12:41:45,446][00394] Num frames 300... [2023-02-27 12:41:45,577][00394] Num frames 400... [2023-02-27 12:41:45,765][00394] Num frames 500... [2023-02-27 12:41:45,935][00394] Num frames 600... [2023-02-27 12:41:46,101][00394] Num frames 700... [2023-02-27 12:41:46,268][00394] Num frames 800... [2023-02-27 12:41:46,437][00394] Num frames 900... [2023-02-27 12:41:46,611][00394] Num frames 1000... [2023-02-27 12:41:46,781][00394] Num frames 1100... [2023-02-27 12:41:46,959][00394] Num frames 1200... [2023-02-27 12:41:47,130][00394] Num frames 1300... [2023-02-27 12:41:47,299][00394] Num frames 1400... [2023-02-27 12:41:47,467][00394] Num frames 1500... [2023-02-27 12:41:47,647][00394] Num frames 1600... [2023-02-27 12:41:47,859][00394] Avg episode rewards: #0: 41.959, true rewards: #0: 16.960 [2023-02-27 12:41:47,862][00394] Avg episode reward: 41.959, avg true_objective: 16.960 [2023-02-27 12:41:47,875][00394] Num frames 1700... [2023-02-27 12:41:48,044][00394] Num frames 1800... [2023-02-27 12:41:48,212][00394] Num frames 1900... [2023-02-27 12:41:48,392][00394] Num frames 2000... [2023-02-27 12:41:48,570][00394] Num frames 2100... [2023-02-27 12:41:48,750][00394] Num frames 2200... [2023-02-27 12:41:48,923][00394] Num frames 2300... [2023-02-27 12:41:49,107][00394] Num frames 2400... [2023-02-27 12:41:49,277][00394] Num frames 2500... [2023-02-27 12:41:49,430][00394] Num frames 2600... [2023-02-27 12:41:49,554][00394] Num frames 2700... [2023-02-27 12:41:49,634][00394] Avg episode rewards: #0: 33.600, true rewards: #0: 13.600 [2023-02-27 12:41:49,636][00394] Avg episode reward: 33.600, avg true_objective: 13.600 [2023-02-27 12:41:49,737][00394] Num frames 2800... [2023-02-27 12:41:49,860][00394] Num frames 2900... [2023-02-27 12:41:49,978][00394] Num frames 3000... [2023-02-27 12:41:50,117][00394] Num frames 3100... [2023-02-27 12:41:50,237][00394] Num frames 3200... [2023-02-27 12:41:50,364][00394] Num frames 3300... [2023-02-27 12:41:50,490][00394] Num frames 3400... [2023-02-27 12:41:50,612][00394] Num frames 3500... [2023-02-27 12:41:50,731][00394] Num frames 3600... [2023-02-27 12:41:50,856][00394] Num frames 3700... [2023-02-27 12:41:50,978][00394] Num frames 3800... [2023-02-27 12:41:51,109][00394] Num frames 3900... [2023-02-27 12:41:51,232][00394] Num frames 4000... [2023-02-27 12:41:51,356][00394] Num frames 4100... [2023-02-27 12:41:51,480][00394] Num frames 4200... [2023-02-27 12:41:51,608][00394] Num frames 4300... [2023-02-27 12:41:51,730][00394] Num frames 4400... [2023-02-27 12:41:51,859][00394] Num frames 4500... [2023-02-27 12:41:51,981][00394] Num frames 4600... [2023-02-27 12:41:52,110][00394] Num frames 4700... [2023-02-27 12:41:52,242][00394] Num frames 4800... [2023-02-27 12:41:52,323][00394] Avg episode rewards: #0: 41.066, true rewards: #0: 16.067 [2023-02-27 12:41:52,325][00394] Avg episode reward: 41.066, avg true_objective: 16.067 [2023-02-27 12:41:52,425][00394] Num frames 4900... [2023-02-27 12:41:52,549][00394] Num frames 5000... [2023-02-27 12:41:52,674][00394] Num frames 5100... [2023-02-27 12:41:52,791][00394] Num frames 5200... [2023-02-27 12:41:52,910][00394] Num frames 5300... [2023-02-27 12:41:53,031][00394] Num frames 5400... [2023-02-27 12:41:53,167][00394] Num frames 5500... [2023-02-27 12:41:53,287][00394] Num frames 5600... [2023-02-27 12:41:53,418][00394] Num frames 5700... [2023-02-27 12:41:53,547][00394] Num frames 5800... [2023-02-27 12:41:53,672][00394] Num frames 5900... [2023-02-27 12:41:53,801][00394] Num frames 6000... [2023-02-27 12:41:53,926][00394] Num frames 6100... [2023-02-27 12:41:54,056][00394] Num frames 6200... [2023-02-27 12:41:54,232][00394] Avg episode rewards: #0: 41.480, true rewards: #0: 15.730 [2023-02-27 12:41:54,233][00394] Avg episode reward: 41.480, avg true_objective: 15.730 [2023-02-27 12:41:54,249][00394] Num frames 6300... [2023-02-27 12:41:54,372][00394] Num frames 6400... [2023-02-27 12:41:54,500][00394] Num frames 6500... [2023-02-27 12:41:54,626][00394] Num frames 6600... [2023-02-27 12:41:54,747][00394] Num frames 6700... [2023-02-27 12:41:54,870][00394] Num frames 6800... [2023-02-27 12:41:55,001][00394] Num frames 6900... [2023-02-27 12:41:55,124][00394] Num frames 7000... [2023-02-27 12:41:55,251][00394] Num frames 7100... [2023-02-27 12:41:55,343][00394] Avg episode rewards: #0: 36.848, true rewards: #0: 14.248 [2023-02-27 12:41:55,346][00394] Avg episode reward: 36.848, avg true_objective: 14.248 [2023-02-27 12:41:55,441][00394] Num frames 7200... [2023-02-27 12:41:55,574][00394] Num frames 7300... [2023-02-27 12:41:55,693][00394] Num frames 7400... [2023-02-27 12:41:55,817][00394] Num frames 7500... [2023-02-27 12:41:55,935][00394] Num frames 7600... [2023-02-27 12:41:56,065][00394] Num frames 7700... [2023-02-27 12:41:56,189][00394] Num frames 7800... [2023-02-27 12:41:56,315][00394] Num frames 7900... [2023-02-27 12:41:56,435][00394] Num frames 8000... [2023-02-27 12:41:56,564][00394] Num frames 8100... [2023-02-27 12:41:56,696][00394] Num frames 8200... [2023-02-27 12:41:56,823][00394] Num frames 8300... [2023-02-27 12:41:56,907][00394] Avg episode rewards: #0: 35.701, true rewards: #0: 13.868 [2023-02-27 12:41:56,909][00394] Avg episode reward: 35.701, avg true_objective: 13.868 [2023-02-27 12:41:57,019][00394] Num frames 8400... [2023-02-27 12:41:57,149][00394] Num frames 8500... [2023-02-27 12:41:57,281][00394] Num frames 8600... [2023-02-27 12:41:57,405][00394] Num frames 8700... [2023-02-27 12:41:57,533][00394] Num frames 8800... [2023-02-27 12:41:57,653][00394] Num frames 8900... [2023-02-27 12:41:57,768][00394] Num frames 9000... [2023-02-27 12:41:57,919][00394] Avg episode rewards: #0: 32.963, true rewards: #0: 12.963 [2023-02-27 12:41:57,921][00394] Avg episode reward: 32.963, avg true_objective: 12.963 [2023-02-27 12:41:57,954][00394] Num frames 9100... [2023-02-27 12:41:58,081][00394] Num frames 9200... [2023-02-27 12:41:58,203][00394] Num frames 9300... [2023-02-27 12:41:58,332][00394] Num frames 9400... [2023-02-27 12:41:58,453][00394] Num frames 9500... [2023-02-27 12:41:58,580][00394] Num frames 9600... [2023-02-27 12:41:58,709][00394] Num frames 9700... [2023-02-27 12:41:58,831][00394] Num frames 9800... [2023-02-27 12:41:58,953][00394] Num frames 9900... [2023-02-27 12:41:59,076][00394] Num frames 10000... [2023-02-27 12:41:59,204][00394] Num frames 10100... [2023-02-27 12:41:59,335][00394] Num frames 10200... [2023-02-27 12:41:59,500][00394] Num frames 10300... [2023-02-27 12:41:59,673][00394] Num frames 10400... [2023-02-27 12:41:59,842][00394] Num frames 10500... [2023-02-27 12:42:00,010][00394] Num frames 10600... [2023-02-27 12:42:00,169][00394] Num frames 10700... [2023-02-27 12:42:00,334][00394] Num frames 10800... [2023-02-27 12:42:00,497][00394] Num frames 10900... [2023-02-27 12:42:00,727][00394] Avg episode rewards: #0: 35.492, true rewards: #0: 13.742 [2023-02-27 12:42:00,734][00394] Avg episode reward: 35.492, avg true_objective: 13.742 [2023-02-27 12:42:00,749][00394] Num frames 11000... [2023-02-27 12:42:00,920][00394] Num frames 11100... [2023-02-27 12:42:01,087][00394] Num frames 11200... [2023-02-27 12:42:01,258][00394] Num frames 11300... [2023-02-27 12:42:01,437][00394] Num frames 11400... [2023-02-27 12:42:01,604][00394] Num frames 11500... [2023-02-27 12:42:01,769][00394] Num frames 11600... [2023-02-27 12:42:01,945][00394] Num frames 11700... [2023-02-27 12:42:02,111][00394] Avg episode rewards: #0: 33.069, true rewards: #0: 13.069 [2023-02-27 12:42:02,114][00394] Avg episode reward: 33.069, avg true_objective: 13.069 [2023-02-27 12:42:02,180][00394] Num frames 11800... [2023-02-27 12:42:02,356][00394] Num frames 11900... [2023-02-27 12:42:02,542][00394] Num frames 12000... [2023-02-27 12:42:02,725][00394] Num frames 12100... [2023-02-27 12:42:02,895][00394] Num frames 12200... [2023-02-27 12:42:03,072][00394] Num frames 12300... [2023-02-27 12:42:03,226][00394] Num frames 12400... [2023-02-27 12:42:03,367][00394] Num frames 12500... [2023-02-27 12:42:03,498][00394] Num frames 12600... [2023-02-27 12:42:03,636][00394] Num frames 12700... [2023-02-27 12:42:03,754][00394] Num frames 12800... [2023-02-27 12:42:03,882][00394] Num frames 12900... [2023-02-27 12:42:04,006][00394] Num frames 13000... [2023-02-27 12:42:04,131][00394] Num frames 13100... [2023-02-27 12:42:04,258][00394] Num frames 13200... [2023-02-27 12:42:04,398][00394] Num frames 13300... [2023-02-27 12:42:04,524][00394] Num frames 13400... [2023-02-27 12:42:04,659][00394] Num frames 13500... [2023-02-27 12:42:04,783][00394] Num frames 13600... [2023-02-27 12:42:04,913][00394] Num frames 13700... [2023-02-27 12:42:05,055][00394] Avg episode rewards: #0: 35.572, true rewards: #0: 13.772 [2023-02-27 12:42:05,057][00394] Avg episode reward: 35.572, avg true_objective: 13.772 [2023-02-27 12:43:31,744][00394] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2023-02-27 12:43:32,209][00394] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2023-02-27 12:43:32,212][00394] Overriding arg 'num_workers' with value 1 passed from command line [2023-02-27 12:43:32,214][00394] Adding new argument 'no_render'=True that is not in the saved config file! [2023-02-27 12:43:32,216][00394] Adding new argument 'save_video'=True that is not in the saved config file! [2023-02-27 12:43:32,218][00394] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2023-02-27 12:43:32,220][00394] Adding new argument 'video_name'=None that is not in the saved config file! [2023-02-27 12:43:32,222][00394] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2023-02-27 12:43:32,223][00394] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2023-02-27 12:43:32,225][00394] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2023-02-27 12:43:32,226][00394] Adding new argument 'hf_repository'='Clawoo/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2023-02-27 12:43:32,227][00394] Adding new argument 'policy_index'=0 that is not in the saved config file! [2023-02-27 12:43:32,228][00394] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2023-02-27 12:43:32,230][00394] Adding new argument 'train_script'=None that is not in the saved config file! [2023-02-27 12:43:32,231][00394] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2023-02-27 12:43:32,232][00394] Using frameskip 1 and render_action_repeat=4 for evaluation [2023-02-27 12:43:32,256][00394] RunningMeanStd input shape: (3, 72, 128) [2023-02-27 12:43:32,259][00394] RunningMeanStd input shape: (1,) [2023-02-27 12:43:32,280][00394] ConvEncoder: input_channels=3 [2023-02-27 12:43:32,341][00394] Conv encoder output size: 512 [2023-02-27 12:43:32,343][00394] Policy head output size: 512 [2023-02-27 12:43:32,372][00394] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000003908_16007168.pth... [2023-02-27 12:43:33,070][00394] Num frames 100... [2023-02-27 12:43:33,186][00394] Num frames 200... [2023-02-27 12:43:33,315][00394] Num frames 300... [2023-02-27 12:43:33,432][00394] Num frames 400... [2023-02-27 12:43:33,547][00394] Avg episode rewards: #0: 5.480, true rewards: #0: 4.480 [2023-02-27 12:43:33,551][00394] Avg episode reward: 5.480, avg true_objective: 4.480 [2023-02-27 12:43:33,666][00394] Num frames 500... [2023-02-27 12:43:33,822][00394] Num frames 600... [2023-02-27 12:43:33,954][00394] Num frames 700... [2023-02-27 12:43:34,098][00394] Num frames 800... [2023-02-27 12:43:34,215][00394] Num frames 900... [2023-02-27 12:43:34,339][00394] Num frames 1000... [2023-02-27 12:43:34,453][00394] Num frames 1100... [2023-02-27 12:43:34,574][00394] Num frames 1200... [2023-02-27 12:43:34,694][00394] Num frames 1300... [2023-02-27 12:43:34,801][00394] Avg episode rewards: #0: 11.220, true rewards: #0: 6.720 [2023-02-27 12:43:34,805][00394] Avg episode reward: 11.220, avg true_objective: 6.720 [2023-02-27 12:43:34,870][00394] Num frames 1400... [2023-02-27 12:43:34,989][00394] Num frames 1500... [2023-02-27 12:43:35,119][00394] Num frames 1600... [2023-02-27 12:43:35,258][00394] Num frames 1700... [2023-02-27 12:43:35,377][00394] Num frames 1800... [2023-02-27 12:43:35,514][00394] Num frames 1900... [2023-02-27 12:43:35,649][00394] Num frames 2000... [2023-02-27 12:43:35,769][00394] Num frames 2100... [2023-02-27 12:43:35,895][00394] Num frames 2200... [2023-02-27 12:43:36,016][00394] Num frames 2300... [2023-02-27 12:43:36,140][00394] Num frames 2400... [2023-02-27 12:43:36,259][00394] Num frames 2500... [2023-02-27 12:43:36,384][00394] Num frames 2600... [2023-02-27 12:43:36,504][00394] Num frames 2700... [2023-02-27 12:43:36,626][00394] Num frames 2800... [2023-02-27 12:43:36,751][00394] Num frames 2900... [2023-02-27 12:43:36,881][00394] Num frames 3000... [2023-02-27 12:43:36,999][00394] Num frames 3100... [2023-02-27 12:43:37,121][00394] Num frames 3200... [2023-02-27 12:43:37,245][00394] Num frames 3300... [2023-02-27 12:43:37,365][00394] Num frames 3400... [2023-02-27 12:43:37,474][00394] Avg episode rewards: #0: 26.146, true rewards: #0: 11.480 [2023-02-27 12:43:37,476][00394] Avg episode reward: 26.146, avg true_objective: 11.480 [2023-02-27 12:43:37,549][00394] Num frames 3500... [2023-02-27 12:43:37,666][00394] Num frames 3600... [2023-02-27 12:43:37,787][00394] Num frames 3700... [2023-02-27 12:43:37,915][00394] Num frames 3800... [2023-02-27 12:43:38,037][00394] Num frames 3900... [2023-02-27 12:43:38,200][00394] Avg episode rewards: #0: 22.220, true rewards: #0: 9.970 [2023-02-27 12:43:38,203][00394] Avg episode reward: 22.220, avg true_objective: 9.970 [2023-02-27 12:43:38,222][00394] Num frames 4000... [2023-02-27 12:43:38,337][00394] Num frames 4100... [2023-02-27 12:43:38,464][00394] Num frames 4200... [2023-02-27 12:43:38,585][00394] Num frames 4300... [2023-02-27 12:43:38,709][00394] Num frames 4400... [2023-02-27 12:43:38,771][00394] Avg episode rewards: #0: 18.808, true rewards: #0: 8.808 [2023-02-27 12:43:38,773][00394] Avg episode reward: 18.808, avg true_objective: 8.808 [2023-02-27 12:43:38,896][00394] Num frames 4500... [2023-02-27 12:43:39,021][00394] Num frames 4600... [2023-02-27 12:43:39,146][00394] Num frames 4700... [2023-02-27 12:43:39,267][00394] Num frames 4800... [2023-02-27 12:43:39,386][00394] Num frames 4900... [2023-02-27 12:43:39,557][00394] Num frames 5000... [2023-02-27 12:43:39,726][00394] Num frames 5100... [2023-02-27 12:43:39,890][00394] Num frames 5200... [2023-02-27 12:43:40,056][00394] Num frames 5300... [2023-02-27 12:43:40,218][00394] Num frames 5400... [2023-02-27 12:43:40,389][00394] Num frames 5500... [2023-02-27 12:43:40,556][00394] Num frames 5600... [2023-02-27 12:43:40,736][00394] Num frames 5700... [2023-02-27 12:43:40,908][00394] Num frames 5800... [2023-02-27 12:43:41,077][00394] Num frames 5900... [2023-02-27 12:43:41,257][00394] Num frames 6000... [2023-02-27 12:43:41,444][00394] Num frames 6100... [2023-02-27 12:43:41,618][00394] Num frames 6200... [2023-02-27 12:43:41,793][00394] Num frames 6300... [2023-02-27 12:43:41,980][00394] Num frames 6400... [2023-02-27 12:43:42,161][00394] Num frames 6500... [2023-02-27 12:43:42,226][00394] Avg episode rewards: #0: 26.006, true rewards: #0: 10.840 [2023-02-27 12:43:42,229][00394] Avg episode reward: 26.006, avg true_objective: 10.840 [2023-02-27 12:43:42,398][00394] Num frames 6600... [2023-02-27 12:43:42,531][00394] Num frames 6700... [2023-02-27 12:43:42,652][00394] Num frames 6800... [2023-02-27 12:43:42,770][00394] Num frames 6900... [2023-02-27 12:43:42,892][00394] Num frames 7000... [2023-02-27 12:43:43,013][00394] Num frames 7100... [2023-02-27 12:43:43,084][00394] Avg episode rewards: #0: 23.588, true rewards: #0: 10.160 [2023-02-27 12:43:43,086][00394] Avg episode reward: 23.588, avg true_objective: 10.160 [2023-02-27 12:43:43,210][00394] Num frames 7200... [2023-02-27 12:43:43,330][00394] Num frames 7300... [2023-02-27 12:43:43,454][00394] Num frames 7400... [2023-02-27 12:43:43,573][00394] Num frames 7500... [2023-02-27 12:43:43,691][00394] Num frames 7600... [2023-02-27 12:43:43,813][00394] Num frames 7700... [2023-02-27 12:43:43,933][00394] Num frames 7800... [2023-02-27 12:43:44,053][00394] Num frames 7900... [2023-02-27 12:43:44,178][00394] Num frames 8000... [2023-02-27 12:43:44,302][00394] Num frames 8100... [2023-02-27 12:43:44,433][00394] Num frames 8200... [2023-02-27 12:43:44,550][00394] Num frames 8300... [2023-02-27 12:43:44,676][00394] Num frames 8400... [2023-02-27 12:43:44,800][00394] Num frames 8500... [2023-02-27 12:43:44,927][00394] Num frames 8600... [2023-02-27 12:43:45,055][00394] Num frames 8700... [2023-02-27 12:43:45,179][00394] Num frames 8800... [2023-02-27 12:43:45,305][00394] Num frames 8900... [2023-02-27 12:43:45,431][00394] Num frames 9000... [2023-02-27 12:43:45,549][00394] Num frames 9100... [2023-02-27 12:43:45,683][00394] Num frames 9200... [2023-02-27 12:43:45,755][00394] Avg episode rewards: #0: 27.765, true rewards: #0: 11.515 [2023-02-27 12:43:45,756][00394] Avg episode reward: 27.765, avg true_objective: 11.515 [2023-02-27 12:43:45,869][00394] Num frames 9300... [2023-02-27 12:43:45,985][00394] Num frames 9400... [2023-02-27 12:43:46,102][00394] Num frames 9500... [2023-02-27 12:43:46,155][00394] Avg episode rewards: #0: 25.111, true rewards: #0: 10.556 [2023-02-27 12:43:46,158][00394] Avg episode reward: 25.111, avg true_objective: 10.556 [2023-02-27 12:43:46,287][00394] Num frames 9600... [2023-02-27 12:43:46,422][00394] Num frames 9700... [2023-02-27 12:43:46,543][00394] Num frames 9800... [2023-02-27 12:43:46,672][00394] Num frames 9900... [2023-02-27 12:43:46,799][00394] Num frames 10000... [2023-02-27 12:43:46,908][00394] Avg episode rewards: #0: 23.544, true rewards: #0: 10.044 [2023-02-27 12:43:46,909][00394] Avg episode reward: 23.544, avg true_objective: 10.044 [2023-02-27 12:44:46,898][00394] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2023-02-27 12:44:51,099][00394] The model has been pushed to https://huggingface.co./Clawoo/rl_course_vizdoom_health_gathering_supreme [2023-02-27 12:45:10,860][00394] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2023-02-27 12:45:10,863][00394] Overriding arg 'num_workers' with value 1 passed from command line [2023-02-27 12:45:10,865][00394] Adding new argument 'no_render'=True that is not in the saved config file! [2023-02-27 12:45:10,868][00394] Adding new argument 'save_video'=True that is not in the saved config file! [2023-02-27 12:45:10,870][00394] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2023-02-27 12:45:10,871][00394] Adding new argument 'video_name'=None that is not in the saved config file! [2023-02-27 12:45:10,874][00394] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2023-02-27 12:45:10,875][00394] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2023-02-27 12:45:10,876][00394] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2023-02-27 12:45:10,878][00394] Adding new argument 'hf_repository'='Clawoo/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2023-02-27 12:45:10,879][00394] Adding new argument 'policy_index'=0 that is not in the saved config file! [2023-02-27 12:45:10,880][00394] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2023-02-27 12:45:10,881][00394] Adding new argument 'train_script'=None that is not in the saved config file! [2023-02-27 12:45:10,883][00394] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2023-02-27 12:45:10,884][00394] Using frameskip 1 and render_action_repeat=4 for evaluation [2023-02-27 12:45:10,907][00394] RunningMeanStd input shape: (3, 72, 128) [2023-02-27 12:45:10,910][00394] RunningMeanStd input shape: (1,) [2023-02-27 12:45:10,923][00394] ConvEncoder: input_channels=3 [2023-02-27 12:45:10,959][00394] Conv encoder output size: 512 [2023-02-27 12:45:10,963][00394] Policy head output size: 512 [2023-02-27 12:45:10,981][00394] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000003908_16007168.pth... [2023-02-27 12:45:11,467][00394] Num frames 100... [2023-02-27 12:45:11,597][00394] Num frames 200... [2023-02-27 12:45:11,723][00394] Num frames 300... [2023-02-27 12:45:11,838][00394] Num frames 400... [2023-02-27 12:45:11,955][00394] Num frames 500... [2023-02-27 12:45:12,068][00394] Num frames 600... [2023-02-27 12:45:12,189][00394] Num frames 700... [2023-02-27 12:45:12,316][00394] Num frames 800... [2023-02-27 12:45:12,439][00394] Num frames 900... [2023-02-27 12:45:12,564][00394] Num frames 1000... [2023-02-27 12:45:12,691][00394] Num frames 1100... [2023-02-27 12:45:12,811][00394] Num frames 1200... [2023-02-27 12:45:12,931][00394] Num frames 1300... [2023-02-27 12:45:13,052][00394] Num frames 1400... [2023-02-27 12:45:13,185][00394] Num frames 1500... [2023-02-27 12:45:13,323][00394] Avg episode rewards: #0: 40.680, true rewards: #0: 15.680 [2023-02-27 12:45:13,328][00394] Avg episode reward: 40.680, avg true_objective: 15.680 [2023-02-27 12:45:13,368][00394] Num frames 1600... [2023-02-27 12:45:13,499][00394] Num frames 1700... [2023-02-27 12:45:13,618][00394] Num frames 1800... [2023-02-27 12:45:13,794][00394] Num frames 1900... [2023-02-27 12:45:13,975][00394] Num frames 2000... [2023-02-27 12:45:14,150][00394] Num frames 2100... [2023-02-27 12:45:14,317][00394] Num frames 2200... [2023-02-27 12:45:14,480][00394] Num frames 2300... [2023-02-27 12:45:14,646][00394] Num frames 2400... [2023-02-27 12:45:14,725][00394] Avg episode rewards: #0: 31.555, true rewards: #0: 12.055 [2023-02-27 12:45:14,728][00394] Avg episode reward: 31.555, avg true_objective: 12.055 [2023-02-27 12:45:14,885][00394] Num frames 2500... [2023-02-27 12:45:15,061][00394] Num frames 2600... [2023-02-27 12:45:15,223][00394] Num frames 2700... [2023-02-27 12:45:15,384][00394] Num frames 2800... [2023-02-27 12:45:15,547][00394] Num frames 2900... [2023-02-27 12:45:15,713][00394] Avg episode rewards: #0: 24.517, true rewards: #0: 9.850 [2023-02-27 12:45:15,716][00394] Avg episode reward: 24.517, avg true_objective: 9.850 [2023-02-27 12:45:15,795][00394] Num frames 3000... [2023-02-27 12:45:15,963][00394] Num frames 3100... [2023-02-27 12:45:16,134][00394] Num frames 3200... [2023-02-27 12:45:16,314][00394] Num frames 3300... [2023-02-27 12:45:16,486][00394] Num frames 3400... [2023-02-27 12:45:16,659][00394] Num frames 3500... [2023-02-27 12:45:16,838][00394] Num frames 3600... [2023-02-27 12:45:17,009][00394] Num frames 3700... [2023-02-27 12:45:17,186][00394] Num frames 3800... [2023-02-27 12:45:17,364][00394] Num frames 3900... [2023-02-27 12:45:17,501][00394] Num frames 4000... [2023-02-27 12:45:17,623][00394] Num frames 4100... [2023-02-27 12:45:17,758][00394] Num frames 4200... [2023-02-27 12:45:17,824][00394] Avg episode rewards: #0: 25.768, true rewards: #0: 10.517 [2023-02-27 12:45:17,827][00394] Avg episode reward: 25.768, avg true_objective: 10.517 [2023-02-27 12:45:17,935][00394] Num frames 4300... [2023-02-27 12:45:18,055][00394] Num frames 4400... [2023-02-27 12:45:18,167][00394] Num frames 4500... [2023-02-27 12:45:18,291][00394] Num frames 4600... [2023-02-27 12:45:18,414][00394] Num frames 4700... [2023-02-27 12:45:18,496][00394] Avg episode rewards: #0: 22.238, true rewards: #0: 9.438 [2023-02-27 12:45:18,497][00394] Avg episode reward: 22.238, avg true_objective: 9.438 [2023-02-27 12:45:18,603][00394] Num frames 4800... [2023-02-27 12:45:18,748][00394] Num frames 4900... [2023-02-27 12:45:18,871][00394] Num frames 5000... [2023-02-27 12:45:18,995][00394] Num frames 5100... [2023-02-27 12:45:19,117][00394] Num frames 5200... [2023-02-27 12:45:19,235][00394] Num frames 5300... [2023-02-27 12:45:19,362][00394] Num frames 5400... [2023-02-27 12:45:19,447][00394] Avg episode rewards: #0: 20.872, true rewards: #0: 9.038 [2023-02-27 12:45:19,449][00394] Avg episode reward: 20.872, avg true_objective: 9.038 [2023-02-27 12:45:19,549][00394] Num frames 5500... [2023-02-27 12:45:19,672][00394] Num frames 5600... [2023-02-27 12:45:19,796][00394] Num frames 5700... [2023-02-27 12:45:19,914][00394] Num frames 5800... [2023-02-27 12:45:20,040][00394] Num frames 5900... [2023-02-27 12:45:20,163][00394] Num frames 6000... [2023-02-27 12:45:20,283][00394] Num frames 6100... [2023-02-27 12:45:20,406][00394] Num frames 6200... [2023-02-27 12:45:20,528][00394] Num frames 6300... [2023-02-27 12:45:20,660][00394] Num frames 6400... [2023-02-27 12:45:20,781][00394] Num frames 6500... [2023-02-27 12:45:20,902][00394] Num frames 6600... [2023-02-27 12:45:21,021][00394] Num frames 6700... [2023-02-27 12:45:21,081][00394] Avg episode rewards: #0: 22.433, true rewards: #0: 9.576 [2023-02-27 12:45:21,083][00394] Avg episode reward: 22.433, avg true_objective: 9.576 [2023-02-27 12:45:21,201][00394] Num frames 6800... [2023-02-27 12:45:21,332][00394] Num frames 6900... [2023-02-27 12:45:21,449][00394] Num frames 7000... [2023-02-27 12:45:21,568][00394] Num frames 7100... [2023-02-27 12:45:21,685][00394] Num frames 7200... [2023-02-27 12:45:21,810][00394] Num frames 7300... [2023-02-27 12:45:21,928][00394] Num frames 7400... [2023-02-27 12:45:22,053][00394] Num frames 7500... [2023-02-27 12:45:22,116][00394] Avg episode rewards: #0: 22.254, true rewards: #0: 9.379 [2023-02-27 12:45:22,119][00394] Avg episode reward: 22.254, avg true_objective: 9.379 [2023-02-27 12:45:22,244][00394] Num frames 7600... [2023-02-27 12:45:22,366][00394] Num frames 7700... [2023-02-27 12:45:22,483][00394] Num frames 7800... [2023-02-27 12:45:22,607][00394] Num frames 7900... [2023-02-27 12:45:22,735][00394] Num frames 8000... [2023-02-27 12:45:22,860][00394] Avg episode rewards: #0: 20.952, true rewards: #0: 8.952 [2023-02-27 12:45:22,862][00394] Avg episode reward: 20.952, avg true_objective: 8.952 [2023-02-27 12:45:22,913][00394] Num frames 8100... [2023-02-27 12:45:23,033][00394] Num frames 8200... [2023-02-27 12:45:23,163][00394] Num frames 8300... [2023-02-27 12:45:23,288][00394] Num frames 8400... [2023-02-27 12:45:23,405][00394] Num frames 8500... [2023-02-27 12:45:23,523][00394] Num frames 8600... [2023-02-27 12:45:23,702][00394] Avg episode rewards: #0: 20.097, true rewards: #0: 8.697 [2023-02-27 12:45:23,704][00394] Avg episode reward: 20.097, avg true_objective: 8.697 [2023-02-27 12:45:23,711][00394] Num frames 8700... [2023-02-27 12:46:17,986][00394] Replay video saved to /content/train_dir/default_experiment/replay.mp4!