qbbian's picture
Upload folder using huggingface_hub
3911944 verified
[2024-12-30 19:56:32,381][00338] Saving configuration to /content/train_dir/default_experiment/config.json...
[2024-12-30 19:56:32,384][00338] Rollout worker 0 uses device cpu
[2024-12-30 19:56:32,385][00338] Rollout worker 1 uses device cpu
[2024-12-30 19:56:32,387][00338] Rollout worker 2 uses device cpu
[2024-12-30 19:56:32,389][00338] Rollout worker 3 uses device cpu
[2024-12-30 19:56:32,390][00338] Rollout worker 4 uses device cpu
[2024-12-30 19:56:32,391][00338] Rollout worker 5 uses device cpu
[2024-12-30 19:56:32,392][00338] Rollout worker 6 uses device cpu
[2024-12-30 19:56:32,394][00338] Rollout worker 7 uses device cpu
[2024-12-30 19:56:32,561][00338] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2024-12-30 19:56:32,563][00338] InferenceWorker_p0-w0: min num requests: 2
[2024-12-30 19:56:32,596][00338] Starting all processes...
[2024-12-30 19:56:32,598][00338] Starting process learner_proc0
[2024-12-30 19:56:32,642][00338] Starting all processes...
[2024-12-30 19:56:32,650][00338] Starting process inference_proc0-0
[2024-12-30 19:56:32,651][00338] Starting process rollout_proc0
[2024-12-30 19:56:32,652][00338] Starting process rollout_proc1
[2024-12-30 19:56:32,652][00338] Starting process rollout_proc2
[2024-12-30 19:56:32,652][00338] Starting process rollout_proc3
[2024-12-30 19:56:32,652][00338] Starting process rollout_proc4
[2024-12-30 19:56:32,652][00338] Starting process rollout_proc5
[2024-12-30 19:56:32,652][00338] Starting process rollout_proc6
[2024-12-30 19:56:32,652][00338] Starting process rollout_proc7
[2024-12-30 19:56:50,740][02943] Worker 6 uses CPU cores [0]
[2024-12-30 19:56:50,796][02942] Worker 4 uses CPU cores [0]
[2024-12-30 19:56:50,880][02939] Worker 1 uses CPU cores [1]
[2024-12-30 19:56:50,883][02938] Worker 0 uses CPU cores [0]
[2024-12-30 19:56:50,899][02924] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2024-12-30 19:56:50,899][02924] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0
[2024-12-30 19:56:50,908][02941] Worker 3 uses CPU cores [1]
[2024-12-30 19:56:50,946][02924] Num visible devices: 1
[2024-12-30 19:56:50,952][02944] Worker 5 uses CPU cores [1]
[2024-12-30 19:56:50,961][02924] Starting seed is not provided
[2024-12-30 19:56:50,962][02924] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2024-12-30 19:56:50,962][02924] Initializing actor-critic model on device cuda:0
[2024-12-30 19:56:50,963][02924] RunningMeanStd input shape: (3, 72, 128)
[2024-12-30 19:56:50,966][02924] RunningMeanStd input shape: (1,)
[2024-12-30 19:56:50,985][02924] ConvEncoder: input_channels=3
[2024-12-30 19:56:51,008][02940] Worker 2 uses CPU cores [0]
[2024-12-30 19:56:51,016][02945] Worker 7 uses CPU cores [1]
[2024-12-30 19:56:51,086][02937] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2024-12-30 19:56:51,086][02937] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0
[2024-12-30 19:56:51,102][02937] Num visible devices: 1
[2024-12-30 19:56:51,250][02924] Conv encoder output size: 512
[2024-12-30 19:56:51,250][02924] Policy head output size: 512
[2024-12-30 19:56:51,302][02924] Created Actor Critic model with architecture:
[2024-12-30 19:56:51,302][02924] ActorCriticSharedWeights(
(obs_normalizer): ObservationNormalizer(
(running_mean_std): RunningMeanStdDictInPlace(
(running_mean_std): ModuleDict(
(obs): RunningMeanStdInPlace()
)
)
)
(returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace)
(encoder): VizdoomEncoder(
(basic_encoder): ConvEncoder(
(enc): RecursiveScriptModule(
original_name=ConvEncoderImpl
(conv_head): RecursiveScriptModule(
original_name=Sequential
(0): RecursiveScriptModule(original_name=Conv2d)
(1): RecursiveScriptModule(original_name=ELU)
(2): RecursiveScriptModule(original_name=Conv2d)
(3): RecursiveScriptModule(original_name=ELU)
(4): RecursiveScriptModule(original_name=Conv2d)
(5): RecursiveScriptModule(original_name=ELU)
)
(mlp_layers): RecursiveScriptModule(
original_name=Sequential
(0): RecursiveScriptModule(original_name=Linear)
(1): RecursiveScriptModule(original_name=ELU)
)
)
)
)
(core): ModelCoreRNN(
(core): GRU(512, 512)
)
(decoder): MlpDecoder(
(mlp): Identity()
)
(critic_linear): Linear(in_features=512, out_features=1, bias=True)
(action_parameterization): ActionParameterizationDefault(
(distribution_linear): Linear(in_features=512, out_features=5, bias=True)
)
)
[2024-12-30 19:56:51,687][02924] Using optimizer <class 'torch.optim.adam.Adam'>
[2024-12-30 19:56:52,560][00338] Heartbeat connected on Batcher_0
[2024-12-30 19:56:52,563][00338] Heartbeat connected on InferenceWorker_p0-w0
[2024-12-30 19:56:52,574][00338] Heartbeat connected on RolloutWorker_w1
[2024-12-30 19:56:52,575][00338] Heartbeat connected on RolloutWorker_w0
[2024-12-30 19:56:52,579][00338] Heartbeat connected on RolloutWorker_w2
[2024-12-30 19:56:52,581][00338] Heartbeat connected on RolloutWorker_w3
[2024-12-30 19:56:52,585][00338] Heartbeat connected on RolloutWorker_w4
[2024-12-30 19:56:52,593][00338] Heartbeat connected on RolloutWorker_w5
[2024-12-30 19:56:52,594][00338] Heartbeat connected on RolloutWorker_w6
[2024-12-30 19:56:52,597][00338] Heartbeat connected on RolloutWorker_w7
[2024-12-30 19:56:55,612][02924] No checkpoints found
[2024-12-30 19:56:55,612][02924] Did not load from checkpoint, starting from scratch!
[2024-12-30 19:56:55,613][02924] Initialized policy 0 weights for model version 0
[2024-12-30 19:56:55,626][02924] LearnerWorker_p0 finished initialization!
[2024-12-30 19:56:55,626][02924] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2024-12-30 19:56:55,629][00338] Heartbeat connected on LearnerWorker_p0
[2024-12-30 19:56:55,852][02937] RunningMeanStd input shape: (3, 72, 128)
[2024-12-30 19:56:55,856][02937] RunningMeanStd input shape: (1,)
[2024-12-30 19:56:55,876][02937] ConvEncoder: input_channels=3
[2024-12-30 19:56:56,039][02937] Conv encoder output size: 512
[2024-12-30 19:56:56,040][02937] Policy head output size: 512
[2024-12-30 19:56:56,120][00338] Inference worker 0-0 is ready!
[2024-12-30 19:56:56,124][00338] All inference workers are ready! Signal rollout workers to start!
[2024-12-30 19:56:56,356][02942] Doom resolution: 160x120, resize resolution: (128, 72)
[2024-12-30 19:56:56,357][02938] Doom resolution: 160x120, resize resolution: (128, 72)
[2024-12-30 19:56:56,359][02940] Doom resolution: 160x120, resize resolution: (128, 72)
[2024-12-30 19:56:56,364][02943] Doom resolution: 160x120, resize resolution: (128, 72)
[2024-12-30 19:56:56,425][02945] Doom resolution: 160x120, resize resolution: (128, 72)
[2024-12-30 19:56:56,430][02944] Doom resolution: 160x120, resize resolution: (128, 72)
[2024-12-30 19:56:56,427][02939] Doom resolution: 160x120, resize resolution: (128, 72)
[2024-12-30 19:56:56,430][02941] Doom resolution: 160x120, resize resolution: (128, 72)
[2024-12-30 19:56:57,000][00338] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
[2024-12-30 19:56:57,806][02940] Decorrelating experience for 0 frames...
[2024-12-30 19:56:57,806][02944] Decorrelating experience for 0 frames...
[2024-12-30 19:56:57,810][02943] Decorrelating experience for 0 frames...
[2024-12-30 19:56:57,808][02945] Decorrelating experience for 0 frames...
[2024-12-30 19:56:58,540][02944] Decorrelating experience for 32 frames...
[2024-12-30 19:56:58,637][02939] Decorrelating experience for 0 frames...
[2024-12-30 19:56:59,265][02940] Decorrelating experience for 32 frames...
[2024-12-30 19:56:59,270][02943] Decorrelating experience for 32 frames...
[2024-12-30 19:56:59,298][02938] Decorrelating experience for 0 frames...
[2024-12-30 19:56:59,314][02942] Decorrelating experience for 0 frames...
[2024-12-30 19:56:59,890][02939] Decorrelating experience for 32 frames...
[2024-12-30 19:56:59,932][02941] Decorrelating experience for 0 frames...
[2024-12-30 19:57:00,145][02944] Decorrelating experience for 64 frames...
[2024-12-30 19:57:00,280][02938] Decorrelating experience for 32 frames...
[2024-12-30 19:57:00,522][02940] Decorrelating experience for 64 frames...
[2024-12-30 19:57:00,791][02945] Decorrelating experience for 32 frames...
[2024-12-30 19:57:01,151][02943] Decorrelating experience for 64 frames...
[2024-12-30 19:57:01,465][02941] Decorrelating experience for 32 frames...
[2024-12-30 19:57:01,893][02944] Decorrelating experience for 96 frames...
[2024-12-30 19:57:01,994][00338] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
[2024-12-30 19:57:02,114][02938] Decorrelating experience for 64 frames...
[2024-12-30 19:57:02,122][02940] Decorrelating experience for 96 frames...
[2024-12-30 19:57:02,304][02939] Decorrelating experience for 64 frames...
[2024-12-30 19:57:02,423][02942] Decorrelating experience for 32 frames...
[2024-12-30 19:57:03,598][02943] Decorrelating experience for 96 frames...
[2024-12-30 19:57:03,693][02945] Decorrelating experience for 64 frames...
[2024-12-30 19:57:04,163][02941] Decorrelating experience for 64 frames...
[2024-12-30 19:57:04,255][02938] Decorrelating experience for 96 frames...
[2024-12-30 19:57:04,952][02945] Decorrelating experience for 96 frames...
[2024-12-30 19:57:05,596][02942] Decorrelating experience for 64 frames...
[2024-12-30 19:57:05,670][02941] Decorrelating experience for 96 frames...
[2024-12-30 19:57:06,996][00338] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 49.6. Samples: 496. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
[2024-12-30 19:57:06,998][00338] Avg episode reward: [(0, '2.065')]
[2024-12-30 19:57:10,949][02924] Signal inference workers to stop experience collection...
[2024-12-30 19:57:11,001][02937] InferenceWorker_p0-w0: stopping experience collection
[2024-12-30 19:57:11,205][02939] Decorrelating experience for 96 frames...
[2024-12-30 19:57:11,787][02942] Decorrelating experience for 96 frames...
[2024-12-30 19:57:11,994][00338] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 159.8. Samples: 2396. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
[2024-12-30 19:57:11,997][00338] Avg episode reward: [(0, '2.759')]
[2024-12-30 19:57:15,485][02924] Signal inference workers to resume experience collection...
[2024-12-30 19:57:15,486][02937] InferenceWorker_p0-w0: resuming experience collection
[2024-12-30 19:57:16,994][00338] Fps is (10 sec: 1229.0, 60 sec: 614.6, 300 sec: 614.6). Total num frames: 12288. Throughput: 0: 119.8. Samples: 2396. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0)
[2024-12-30 19:57:17,000][00338] Avg episode reward: [(0, '2.821')]
[2024-12-30 19:57:21,994][00338] Fps is (10 sec: 3276.8, 60 sec: 1311.0, 300 sec: 1311.0). Total num frames: 32768. Throughput: 0: 304.5. Samples: 7610. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-12-30 19:57:21,996][00338] Avg episode reward: [(0, '3.809')]
[2024-12-30 19:57:22,794][02937] Updated weights for policy 0, policy_version 10 (0.0035)
[2024-12-30 19:57:26,997][00338] Fps is (10 sec: 4095.0, 60 sec: 1775.1, 300 sec: 1775.1). Total num frames: 53248. Throughput: 0: 460.9. Samples: 13826. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-12-30 19:57:26,999][00338] Avg episode reward: [(0, '4.409')]
[2024-12-30 19:57:31,994][00338] Fps is (10 sec: 3686.4, 60 sec: 1989.8, 300 sec: 1989.8). Total num frames: 69632. Throughput: 0: 457.0. Samples: 15994. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-12-30 19:57:31,999][00338] Avg episode reward: [(0, '4.518')]
[2024-12-30 19:57:33,914][02937] Updated weights for policy 0, policy_version 20 (0.0018)
[2024-12-30 19:57:36,994][00338] Fps is (10 sec: 4096.9, 60 sec: 2355.5, 300 sec: 2355.5). Total num frames: 94208. Throughput: 0: 566.6. Samples: 22662. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-12-30 19:57:36,997][00338] Avg episode reward: [(0, '4.469')]
[2024-12-30 19:57:41,994][00338] Fps is (10 sec: 4505.6, 60 sec: 2548.9, 300 sec: 2548.9). Total num frames: 114688. Throughput: 0: 661.3. Samples: 29756. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-12-30 19:57:42,003][00338] Avg episode reward: [(0, '4.295')]
[2024-12-30 19:57:42,098][02924] Saving new best policy, reward=4.295!
[2024-12-30 19:57:43,494][02937] Updated weights for policy 0, policy_version 30 (0.0013)
[2024-12-30 19:57:46,996][00338] Fps is (10 sec: 3686.0, 60 sec: 2621.6, 300 sec: 2621.6). Total num frames: 131072. Throughput: 0: 708.3. Samples: 31876. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2024-12-30 19:57:46,998][00338] Avg episode reward: [(0, '4.366')]
[2024-12-30 19:57:47,014][02924] Saving new best policy, reward=4.366!
[2024-12-30 19:57:51,994][00338] Fps is (10 sec: 4096.0, 60 sec: 2830.2, 300 sec: 2830.2). Total num frames: 155648. Throughput: 0: 824.3. Samples: 37590. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-12-30 19:57:51,999][00338] Avg episode reward: [(0, '4.384')]
[2024-12-30 19:57:52,002][02924] Saving new best policy, reward=4.384!
[2024-12-30 19:57:53,655][02937] Updated weights for policy 0, policy_version 40 (0.0018)
[2024-12-30 19:57:56,994][00338] Fps is (10 sec: 4915.9, 60 sec: 3004.0, 300 sec: 3004.0). Total num frames: 180224. Throughput: 0: 942.0. Samples: 44786. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-12-30 19:57:56,998][00338] Avg episode reward: [(0, '4.362')]
[2024-12-30 19:58:01,994][00338] Fps is (10 sec: 4096.0, 60 sec: 3276.8, 300 sec: 3025.0). Total num frames: 196608. Throughput: 0: 1009.2. Samples: 47810. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2024-12-30 19:58:02,000][00338] Avg episode reward: [(0, '4.379')]
[2024-12-30 19:58:04,722][02937] Updated weights for policy 0, policy_version 50 (0.0032)
[2024-12-30 19:58:06,995][00338] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3043.0). Total num frames: 212992. Throughput: 0: 997.4. Samples: 52492. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2024-12-30 19:58:06,996][00338] Avg episode reward: [(0, '4.299')]
[2024-12-30 19:58:11,994][00338] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3167.8). Total num frames: 237568. Throughput: 0: 1019.2. Samples: 59688. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2024-12-30 19:58:12,001][00338] Avg episode reward: [(0, '4.459')]
[2024-12-30 19:58:12,006][02924] Saving new best policy, reward=4.459!
[2024-12-30 19:58:13,203][02937] Updated weights for policy 0, policy_version 60 (0.0031)
[2024-12-30 19:58:16,994][00338] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 3225.8). Total num frames: 258048. Throughput: 0: 1050.8. Samples: 63282. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-12-30 19:58:16,996][00338] Avg episode reward: [(0, '4.540')]
[2024-12-30 19:58:17,010][02924] Saving new best policy, reward=4.540!
[2024-12-30 19:58:21,994][00338] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3228.8). Total num frames: 274432. Throughput: 0: 1005.2. Samples: 67898. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-12-30 19:58:21,997][00338] Avg episode reward: [(0, '4.342')]
[2024-12-30 19:58:24,360][02937] Updated weights for policy 0, policy_version 70 (0.0030)
[2024-12-30 19:58:26,994][00338] Fps is (10 sec: 4096.0, 60 sec: 4096.2, 300 sec: 3322.5). Total num frames: 299008. Throughput: 0: 995.9. Samples: 74570. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2024-12-30 19:58:26,997][00338] Avg episode reward: [(0, '4.116')]
[2024-12-30 19:58:27,002][02924] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000073_299008.pth...
[2024-12-30 19:58:31,994][00338] Fps is (10 sec: 4915.2, 60 sec: 4232.5, 300 sec: 3406.3). Total num frames: 323584. Throughput: 0: 1030.5. Samples: 78246. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-12-30 19:58:31,998][00338] Avg episode reward: [(0, '4.390')]
[2024-12-30 19:58:33,127][02937] Updated weights for policy 0, policy_version 80 (0.0022)
[2024-12-30 19:58:36,995][00338] Fps is (10 sec: 3686.2, 60 sec: 4027.7, 300 sec: 3358.9). Total num frames: 335872. Throughput: 0: 1025.2. Samples: 83726. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-12-30 19:58:37,001][00338] Avg episode reward: [(0, '4.442')]
[2024-12-30 19:58:41,994][00338] Fps is (10 sec: 3276.8, 60 sec: 4027.7, 300 sec: 3394.0). Total num frames: 356352. Throughput: 0: 991.1. Samples: 89384. Policy #0 lag: (min: 0.0, avg: 0.3, max: 2.0)
[2024-12-30 19:58:42,000][00338] Avg episode reward: [(0, '4.380')]
[2024-12-30 19:58:43,877][02937] Updated weights for policy 0, policy_version 90 (0.0034)
[2024-12-30 19:58:46,994][00338] Fps is (10 sec: 4505.8, 60 sec: 4164.4, 300 sec: 3463.2). Total num frames: 380928. Throughput: 0: 1002.4. Samples: 92916. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2024-12-30 19:58:46,996][00338] Avg episode reward: [(0, '4.438')]
[2024-12-30 19:58:51,994][00338] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3455.0). Total num frames: 397312. Throughput: 0: 1045.8. Samples: 99554. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-12-30 19:58:52,001][00338] Avg episode reward: [(0, '4.431')]
[2024-12-30 19:58:55,299][02937] Updated weights for policy 0, policy_version 100 (0.0042)
[2024-12-30 19:58:56,994][00338] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3447.6). Total num frames: 413696. Throughput: 0: 982.9. Samples: 103918. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-12-30 19:58:56,997][00338] Avg episode reward: [(0, '4.544')]
[2024-12-30 19:58:57,004][02924] Saving new best policy, reward=4.544!
[2024-12-30 19:59:01,994][00338] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3506.3). Total num frames: 438272. Throughput: 0: 982.0. Samples: 107472. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-12-30 19:59:02,000][00338] Avg episode reward: [(0, '4.671')]
[2024-12-30 19:59:02,004][02924] Saving new best policy, reward=4.671!
[2024-12-30 19:59:04,195][02937] Updated weights for policy 0, policy_version 110 (0.0014)
[2024-12-30 19:59:07,001][00338] Fps is (10 sec: 4502.7, 60 sec: 4095.6, 300 sec: 3528.8). Total num frames: 458752. Throughput: 0: 1032.4. Samples: 114362. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-12-30 19:59:07,005][00338] Avg episode reward: [(0, '4.383')]
[2024-12-30 19:59:12,000][00338] Fps is (10 sec: 3684.2, 60 sec: 3959.1, 300 sec: 3519.5). Total num frames: 475136. Throughput: 0: 986.7. Samples: 118978. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-12-30 19:59:12,003][00338] Avg episode reward: [(0, '4.294')]
[2024-12-30 19:59:15,460][02937] Updated weights for policy 0, policy_version 120 (0.0037)
[2024-12-30 19:59:16,995][00338] Fps is (10 sec: 3688.6, 60 sec: 3959.4, 300 sec: 3540.2). Total num frames: 495616. Throughput: 0: 970.0. Samples: 121898. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-12-30 19:59:16,997][00338] Avg episode reward: [(0, '4.443')]
[2024-12-30 19:59:21,994][00338] Fps is (10 sec: 4508.3, 60 sec: 4096.0, 300 sec: 3587.7). Total num frames: 520192. Throughput: 0: 1009.4. Samples: 129150. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2024-12-30 19:59:21,999][00338] Avg episode reward: [(0, '4.674')]
[2024-12-30 19:59:22,001][02924] Saving new best policy, reward=4.674!
[2024-12-30 19:59:24,720][02937] Updated weights for policy 0, policy_version 130 (0.0032)
[2024-12-30 19:59:26,994][00338] Fps is (10 sec: 4096.1, 60 sec: 3959.5, 300 sec: 3577.3). Total num frames: 536576. Throughput: 0: 1009.2. Samples: 134798. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-12-30 19:59:26,997][00338] Avg episode reward: [(0, '4.609')]
[2024-12-30 19:59:31,996][00338] Fps is (10 sec: 3685.7, 60 sec: 3891.1, 300 sec: 3594.0). Total num frames: 557056. Throughput: 0: 980.1. Samples: 137022. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-12-30 19:59:32,002][00338] Avg episode reward: [(0, '4.451')]
[2024-12-30 19:59:34,902][02937] Updated weights for policy 0, policy_version 140 (0.0035)
[2024-12-30 19:59:36,994][00338] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 3635.3). Total num frames: 581632. Throughput: 0: 992.0. Samples: 144192. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-12-30 19:59:37,001][00338] Avg episode reward: [(0, '4.458')]
[2024-12-30 19:59:41,994][00338] Fps is (10 sec: 4096.8, 60 sec: 4027.7, 300 sec: 3624.5). Total num frames: 598016. Throughput: 0: 1037.1. Samples: 150586. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-12-30 19:59:41,998][00338] Avg episode reward: [(0, '4.380')]
[2024-12-30 19:59:46,068][02937] Updated weights for policy 0, policy_version 150 (0.0043)
[2024-12-30 19:59:46,996][00338] Fps is (10 sec: 3685.9, 60 sec: 3959.4, 300 sec: 3638.3). Total num frames: 618496. Throughput: 0: 1007.2. Samples: 152798. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2024-12-30 19:59:46,999][00338] Avg episode reward: [(0, '4.460')]
[2024-12-30 19:59:51,994][00338] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 3674.8). Total num frames: 643072. Throughput: 0: 996.4. Samples: 159192. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-12-30 19:59:51,997][00338] Avg episode reward: [(0, '4.586')]
[2024-12-30 19:59:54,505][02937] Updated weights for policy 0, policy_version 160 (0.0021)
[2024-12-30 19:59:56,996][00338] Fps is (10 sec: 4505.6, 60 sec: 4164.2, 300 sec: 3686.5). Total num frames: 663552. Throughput: 0: 1056.2. Samples: 166500. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-12-30 19:59:57,001][00338] Avg episode reward: [(0, '4.649')]
[2024-12-30 20:00:01,994][00338] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3675.4). Total num frames: 679936. Throughput: 0: 1038.9. Samples: 168646. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2024-12-30 20:00:01,997][00338] Avg episode reward: [(0, '4.582')]
[2024-12-30 20:00:05,900][02937] Updated weights for policy 0, policy_version 170 (0.0025)
[2024-12-30 20:00:06,995][00338] Fps is (10 sec: 3686.8, 60 sec: 4028.1, 300 sec: 3686.5). Total num frames: 700416. Throughput: 0: 992.3. Samples: 173802. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-12-30 20:00:07,000][00338] Avg episode reward: [(0, '4.337')]
[2024-12-30 20:00:11,994][00338] Fps is (10 sec: 4505.6, 60 sec: 4164.7, 300 sec: 3718.0). Total num frames: 724992. Throughput: 0: 1026.3. Samples: 180980. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-12-30 20:00:11,997][00338] Avg episode reward: [(0, '4.365')]
[2024-12-30 20:00:14,915][02937] Updated weights for policy 0, policy_version 180 (0.0021)
[2024-12-30 20:00:16,994][00338] Fps is (10 sec: 4096.1, 60 sec: 4096.0, 300 sec: 3707.0). Total num frames: 741376. Throughput: 0: 1048.8. Samples: 184214. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-12-30 20:00:16,999][00338] Avg episode reward: [(0, '4.686')]
[2024-12-30 20:00:17,009][02924] Saving new best policy, reward=4.686!
[2024-12-30 20:00:21,994][00338] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3716.5). Total num frames: 761856. Throughput: 0: 988.8. Samples: 188686. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-12-30 20:00:22,001][00338] Avg episode reward: [(0, '4.703')]
[2024-12-30 20:00:22,004][02924] Saving new best policy, reward=4.703!
[2024-12-30 20:00:25,486][02937] Updated weights for policy 0, policy_version 190 (0.0033)
[2024-12-30 20:00:26,994][00338] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 3725.5). Total num frames: 782336. Throughput: 0: 1008.4. Samples: 195962. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2024-12-30 20:00:27,001][00338] Avg episode reward: [(0, '4.813')]
[2024-12-30 20:00:27,014][02924] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000191_782336.pth...
[2024-12-30 20:00:27,167][02924] Saving new best policy, reward=4.813!
[2024-12-30 20:00:31,995][00338] Fps is (10 sec: 4095.8, 60 sec: 4096.1, 300 sec: 3734.1). Total num frames: 802816. Throughput: 0: 1034.9. Samples: 199366. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2024-12-30 20:00:31,997][00338] Avg episode reward: [(0, '4.806')]
[2024-12-30 20:00:36,730][02937] Updated weights for policy 0, policy_version 200 (0.0027)
[2024-12-30 20:00:36,994][00338] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3723.7). Total num frames: 819200. Throughput: 0: 1000.0. Samples: 204194. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-12-30 20:00:37,001][00338] Avg episode reward: [(0, '4.730')]
[2024-12-30 20:00:41,994][00338] Fps is (10 sec: 4096.2, 60 sec: 4096.0, 300 sec: 3750.2). Total num frames: 843776. Throughput: 0: 978.9. Samples: 210548. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-12-30 20:00:41,999][00338] Avg episode reward: [(0, '4.432')]
[2024-12-30 20:00:45,320][02937] Updated weights for policy 0, policy_version 210 (0.0026)
[2024-12-30 20:00:46,994][00338] Fps is (10 sec: 4505.6, 60 sec: 4096.1, 300 sec: 3757.7). Total num frames: 864256. Throughput: 0: 1014.0. Samples: 214278. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-12-30 20:00:47,001][00338] Avg episode reward: [(0, '4.501')]
[2024-12-30 20:00:51,994][00338] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3747.5). Total num frames: 880640. Throughput: 0: 1021.8. Samples: 219782. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-12-30 20:00:51,997][00338] Avg episode reward: [(0, '4.499')]
[2024-12-30 20:00:56,580][02937] Updated weights for policy 0, policy_version 220 (0.0015)
[2024-12-30 20:00:56,994][00338] Fps is (10 sec: 3686.3, 60 sec: 3959.6, 300 sec: 3754.7). Total num frames: 901120. Throughput: 0: 986.9. Samples: 225390. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2024-12-30 20:00:56,999][00338] Avg episode reward: [(0, '4.562')]
[2024-12-30 20:01:01,994][00338] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 3778.4). Total num frames: 925696. Throughput: 0: 993.1. Samples: 228902. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-12-30 20:01:01,999][00338] Avg episode reward: [(0, '4.806')]
[2024-12-30 20:01:06,200][02937] Updated weights for policy 0, policy_version 230 (0.0024)
[2024-12-30 20:01:06,994][00338] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3768.4). Total num frames: 942080. Throughput: 0: 1037.1. Samples: 235354. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-12-30 20:01:06,998][00338] Avg episode reward: [(0, '4.764')]
[2024-12-30 20:01:11,994][00338] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3758.8). Total num frames: 958464. Throughput: 0: 972.6. Samples: 239728. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-12-30 20:01:12,001][00338] Avg episode reward: [(0, '4.859')]
[2024-12-30 20:01:12,005][02924] Saving new best policy, reward=4.859!
[2024-12-30 20:01:16,683][02937] Updated weights for policy 0, policy_version 240 (0.0020)
[2024-12-30 20:01:16,994][00338] Fps is (10 sec: 4096.1, 60 sec: 4027.7, 300 sec: 3781.0). Total num frames: 983040. Throughput: 0: 975.2. Samples: 243250. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-12-30 20:01:16,997][00338] Avg episode reward: [(0, '4.698')]
[2024-12-30 20:01:21,999][00338] Fps is (10 sec: 4503.6, 60 sec: 4027.4, 300 sec: 3786.9). Total num frames: 1003520. Throughput: 0: 1029.1. Samples: 250506. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-12-30 20:01:22,004][00338] Avg episode reward: [(0, '4.792')]
[2024-12-30 20:01:26,994][00338] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3777.5). Total num frames: 1019904. Throughput: 0: 991.9. Samples: 255182. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2024-12-30 20:01:27,004][00338] Avg episode reward: [(0, '4.838')]
[2024-12-30 20:01:28,014][02937] Updated weights for policy 0, policy_version 250 (0.0016)
[2024-12-30 20:01:31,994][00338] Fps is (10 sec: 3688.0, 60 sec: 3959.5, 300 sec: 3783.3). Total num frames: 1040384. Throughput: 0: 970.7. Samples: 257960. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-12-30 20:01:31,996][00338] Avg episode reward: [(0, '4.748')]
[2024-12-30 20:01:36,543][02937] Updated weights for policy 0, policy_version 260 (0.0024)
[2024-12-30 20:01:36,994][00338] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 3803.5). Total num frames: 1064960. Throughput: 0: 1006.9. Samples: 265094. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-12-30 20:01:36,997][00338] Avg episode reward: [(0, '4.880')]
[2024-12-30 20:01:37,009][02924] Saving new best policy, reward=4.880!
[2024-12-30 20:01:41,994][00338] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3794.3). Total num frames: 1081344. Throughput: 0: 1000.7. Samples: 270422. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-12-30 20:01:41,998][00338] Avg episode reward: [(0, '4.857')]
[2024-12-30 20:01:46,994][00338] Fps is (10 sec: 3276.7, 60 sec: 3891.2, 300 sec: 3785.3). Total num frames: 1097728. Throughput: 0: 969.8. Samples: 272544. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-12-30 20:01:47,001][00338] Avg episode reward: [(0, '4.671')]
[2024-12-30 20:01:48,248][02937] Updated weights for policy 0, policy_version 270 (0.0017)
[2024-12-30 20:01:51,994][00338] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3804.5). Total num frames: 1122304. Throughput: 0: 977.9. Samples: 279358. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2024-12-30 20:01:51,997][00338] Avg episode reward: [(0, '4.866')]
[2024-12-30 20:01:56,994][00338] Fps is (10 sec: 4505.7, 60 sec: 4027.7, 300 sec: 3873.8). Total num frames: 1142784. Throughput: 0: 1028.5. Samples: 286012. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-12-30 20:01:56,997][00338] Avg episode reward: [(0, '4.958')]
[2024-12-30 20:01:57,006][02924] Saving new best policy, reward=4.958!
[2024-12-30 20:01:57,832][02937] Updated weights for policy 0, policy_version 280 (0.0037)
[2024-12-30 20:02:01,994][00338] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3915.5). Total num frames: 1155072. Throughput: 0: 997.0. Samples: 288116. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-12-30 20:02:02,004][00338] Avg episode reward: [(0, '5.116')]
[2024-12-30 20:02:02,028][02924] Saving new best policy, reward=5.116!
[2024-12-30 20:02:06,994][00338] Fps is (10 sec: 3686.3, 60 sec: 3959.5, 300 sec: 3998.8). Total num frames: 1179648. Throughput: 0: 966.2. Samples: 293980. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-12-30 20:02:07,000][00338] Avg episode reward: [(0, '5.230')]
[2024-12-30 20:02:07,008][02924] Saving new best policy, reward=5.230!
[2024-12-30 20:02:08,177][02937] Updated weights for policy 0, policy_version 290 (0.0030)
[2024-12-30 20:02:11,996][00338] Fps is (10 sec: 4914.1, 60 sec: 4095.8, 300 sec: 4040.4). Total num frames: 1204224. Throughput: 0: 1017.2. Samples: 300958. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-12-30 20:02:11,999][00338] Avg episode reward: [(0, '5.130')]
[2024-12-30 20:02:16,994][00338] Fps is (10 sec: 3686.5, 60 sec: 3891.2, 300 sec: 4012.7). Total num frames: 1216512. Throughput: 0: 1010.8. Samples: 303444. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-12-30 20:02:16,999][00338] Avg episode reward: [(0, '5.145')]
[2024-12-30 20:02:19,703][02937] Updated weights for policy 0, policy_version 300 (0.0027)
[2024-12-30 20:02:21,996][00338] Fps is (10 sec: 3276.8, 60 sec: 3891.3, 300 sec: 4012.7). Total num frames: 1236992. Throughput: 0: 961.6. Samples: 308368. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-12-30 20:02:21,999][00338] Avg episode reward: [(0, '5.304')]
[2024-12-30 20:02:22,007][02924] Saving new best policy, reward=5.304!
[2024-12-30 20:02:26,994][00338] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 4040.5). Total num frames: 1261568. Throughput: 0: 1002.7. Samples: 315542. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-12-30 20:02:26,997][00338] Avg episode reward: [(0, '5.349')]
[2024-12-30 20:02:27,007][02924] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000308_1261568.pth...
[2024-12-30 20:02:27,118][02924] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000073_299008.pth
[2024-12-30 20:02:27,145][02924] Saving new best policy, reward=5.349!
[2024-12-30 20:02:28,324][02937] Updated weights for policy 0, policy_version 310 (0.0027)
[2024-12-30 20:02:31,995][00338] Fps is (10 sec: 4096.7, 60 sec: 3959.4, 300 sec: 4012.7). Total num frames: 1277952. Throughput: 0: 1025.2. Samples: 318680. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-12-30 20:02:31,997][00338] Avg episode reward: [(0, '5.290')]
[2024-12-30 20:02:36,994][00338] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3998.8). Total num frames: 1294336. Throughput: 0: 970.5. Samples: 323032. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-12-30 20:02:37,005][00338] Avg episode reward: [(0, '5.480')]
[2024-12-30 20:02:37,021][02924] Saving new best policy, reward=5.480!
[2024-12-30 20:02:39,799][02937] Updated weights for policy 0, policy_version 320 (0.0035)
[2024-12-30 20:02:41,994][00338] Fps is (10 sec: 4096.1, 60 sec: 3959.5, 300 sec: 4026.6). Total num frames: 1318912. Throughput: 0: 969.7. Samples: 329650. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-12-30 20:02:41,996][00338] Avg episode reward: [(0, '5.449')]
[2024-12-30 20:02:46,994][00338] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 4012.7). Total num frames: 1339392. Throughput: 0: 1002.8. Samples: 333242. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-12-30 20:02:47,000][00338] Avg episode reward: [(0, '5.286')]
[2024-12-30 20:02:50,302][02937] Updated weights for policy 0, policy_version 330 (0.0014)
[2024-12-30 20:02:51,995][00338] Fps is (10 sec: 3686.1, 60 sec: 3891.1, 300 sec: 3984.9). Total num frames: 1355776. Throughput: 0: 985.6. Samples: 338332. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-12-30 20:02:52,003][00338] Avg episode reward: [(0, '5.247')]
[2024-12-30 20:02:56,994][00338] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3998.8). Total num frames: 1376256. Throughput: 0: 963.2. Samples: 344302. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-12-30 20:02:56,997][00338] Avg episode reward: [(0, '5.163')]
[2024-12-30 20:02:59,869][02937] Updated weights for policy 0, policy_version 340 (0.0020)
[2024-12-30 20:03:01,994][00338] Fps is (10 sec: 4506.0, 60 sec: 4096.0, 300 sec: 4026.6). Total num frames: 1400832. Throughput: 0: 987.6. Samples: 347886. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-12-30 20:03:01,997][00338] Avg episode reward: [(0, '5.257')]
[2024-12-30 20:03:06,994][00338] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3998.8). Total num frames: 1417216. Throughput: 0: 1010.8. Samples: 353850. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-12-30 20:03:06,996][00338] Avg episode reward: [(0, '5.457')]
[2024-12-30 20:03:11,159][02937] Updated weights for policy 0, policy_version 350 (0.0023)
[2024-12-30 20:03:11,994][00338] Fps is (10 sec: 3686.4, 60 sec: 3891.3, 300 sec: 3998.8). Total num frames: 1437696. Throughput: 0: 962.0. Samples: 358834. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-12-30 20:03:11,996][00338] Avg episode reward: [(0, '5.737')]
[2024-12-30 20:03:12,004][02924] Saving new best policy, reward=5.737!
[2024-12-30 20:03:16,994][00338] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 4012.7). Total num frames: 1458176. Throughput: 0: 971.4. Samples: 362394. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2024-12-30 20:03:16,998][00338] Avg episode reward: [(0, '5.619')]
[2024-12-30 20:03:19,730][02937] Updated weights for policy 0, policy_version 360 (0.0015)
[2024-12-30 20:03:21,998][00338] Fps is (10 sec: 4094.6, 60 sec: 4027.6, 300 sec: 3998.8). Total num frames: 1478656. Throughput: 0: 1028.1. Samples: 369300. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-12-30 20:03:22,000][00338] Avg episode reward: [(0, '5.545')]
[2024-12-30 20:03:26,994][00338] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3971.0). Total num frames: 1495040. Throughput: 0: 979.3. Samples: 373718. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-12-30 20:03:27,001][00338] Avg episode reward: [(0, '5.747')]
[2024-12-30 20:03:27,010][02924] Saving new best policy, reward=5.747!
[2024-12-30 20:03:31,160][02937] Updated weights for policy 0, policy_version 370 (0.0026)
[2024-12-30 20:03:31,994][00338] Fps is (10 sec: 3687.7, 60 sec: 3959.5, 300 sec: 3998.8). Total num frames: 1515520. Throughput: 0: 975.1. Samples: 377120. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-12-30 20:03:32,002][00338] Avg episode reward: [(0, '6.008')]
[2024-12-30 20:03:32,020][02924] Saving new best policy, reward=6.008!
[2024-12-30 20:03:36,994][00338] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 4012.7). Total num frames: 1540096. Throughput: 0: 1019.9. Samples: 384226. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-12-30 20:03:36,997][00338] Avg episode reward: [(0, '5.952')]
[2024-12-30 20:03:41,994][00338] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3971.0). Total num frames: 1552384. Throughput: 0: 994.4. Samples: 389050. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2024-12-30 20:03:42,005][00338] Avg episode reward: [(0, '5.880')]
[2024-12-30 20:03:42,102][02937] Updated weights for policy 0, policy_version 380 (0.0022)
[2024-12-30 20:03:46,994][00338] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3998.8). Total num frames: 1576960. Throughput: 0: 970.5. Samples: 391558. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-12-30 20:03:46,997][00338] Avg episode reward: [(0, '6.338')]
[2024-12-30 20:03:47,007][02924] Saving new best policy, reward=6.338!
[2024-12-30 20:03:51,236][02937] Updated weights for policy 0, policy_version 390 (0.0024)
[2024-12-30 20:03:51,994][00338] Fps is (10 sec: 4505.6, 60 sec: 4027.8, 300 sec: 4012.7). Total num frames: 1597440. Throughput: 0: 996.6. Samples: 398696. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-12-30 20:03:51,998][00338] Avg episode reward: [(0, '6.392')]
[2024-12-30 20:03:52,002][02924] Saving new best policy, reward=6.392!
[2024-12-30 20:03:56,994][00338] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3998.8). Total num frames: 1617920. Throughput: 0: 1015.8. Samples: 404546. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-12-30 20:03:56,997][00338] Avg episode reward: [(0, '6.642')]
[2024-12-30 20:03:57,011][02924] Saving new best policy, reward=6.642!
[2024-12-30 20:04:01,994][00338] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3985.0). Total num frames: 1634304. Throughput: 0: 983.4. Samples: 406648. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-12-30 20:04:01,996][00338] Avg episode reward: [(0, '7.200')]
[2024-12-30 20:04:02,004][02924] Saving new best policy, reward=7.200!
[2024-12-30 20:04:02,639][02937] Updated weights for policy 0, policy_version 400 (0.0018)
[2024-12-30 20:04:06,994][00338] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 4012.8). Total num frames: 1658880. Throughput: 0: 978.4. Samples: 413324. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-12-30 20:04:06,999][00338] Avg episode reward: [(0, '7.688')]
[2024-12-30 20:04:07,008][02924] Saving new best policy, reward=7.688!
[2024-12-30 20:04:11,503][02937] Updated weights for policy 0, policy_version 410 (0.0017)
[2024-12-30 20:04:11,994][00338] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 4012.7). Total num frames: 1679360. Throughput: 0: 1030.6. Samples: 420096. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2024-12-30 20:04:11,996][00338] Avg episode reward: [(0, '7.982')]
[2024-12-30 20:04:12,002][02924] Saving new best policy, reward=7.982!
[2024-12-30 20:04:17,001][00338] Fps is (10 sec: 3274.7, 60 sec: 3890.8, 300 sec: 3970.9). Total num frames: 1691648. Throughput: 0: 998.4. Samples: 422054. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-12-30 20:04:17,003][00338] Avg episode reward: [(0, '8.522')]
[2024-12-30 20:04:17,020][02924] Saving new best policy, reward=8.522!
[2024-12-30 20:04:21,994][00338] Fps is (10 sec: 3686.4, 60 sec: 3959.7, 300 sec: 3998.8). Total num frames: 1716224. Throughput: 0: 965.6. Samples: 427678. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-12-30 20:04:21,999][00338] Avg episode reward: [(0, '9.253')]
[2024-12-30 20:04:22,002][02924] Saving new best policy, reward=9.253!
[2024-12-30 20:04:22,797][02937] Updated weights for policy 0, policy_version 420 (0.0027)
[2024-12-30 20:04:26,994][00338] Fps is (10 sec: 4508.5, 60 sec: 4027.7, 300 sec: 3998.8). Total num frames: 1736704. Throughput: 0: 1015.6. Samples: 434750. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2024-12-30 20:04:26,997][00338] Avg episode reward: [(0, '8.960')]
[2024-12-30 20:04:27,012][02924] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000424_1736704.pth...
[2024-12-30 20:04:27,144][02924] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000191_782336.pth
[2024-12-30 20:04:31,997][00338] Fps is (10 sec: 3685.5, 60 sec: 3959.3, 300 sec: 3971.0). Total num frames: 1753088. Throughput: 0: 1018.9. Samples: 437412. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2024-12-30 20:04:31,999][00338] Avg episode reward: [(0, '8.909')]
[2024-12-30 20:04:34,001][02937] Updated weights for policy 0, policy_version 430 (0.0016)
[2024-12-30 20:04:36,994][00338] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3984.9). Total num frames: 1773568. Throughput: 0: 964.6. Samples: 442104. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2024-12-30 20:04:36,997][00338] Avg episode reward: [(0, '9.415')]
[2024-12-30 20:04:37,005][02924] Saving new best policy, reward=9.415!
[2024-12-30 20:04:41,994][00338] Fps is (10 sec: 4506.7, 60 sec: 4096.0, 300 sec: 3998.8). Total num frames: 1798144. Throughput: 0: 992.6. Samples: 449214. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-12-30 20:04:41,997][00338] Avg episode reward: [(0, '10.134')]
[2024-12-30 20:04:42,001][02924] Saving new best policy, reward=10.134!
[2024-12-30 20:04:42,824][02937] Updated weights for policy 0, policy_version 440 (0.0017)
[2024-12-30 20:04:46,994][00338] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3971.0). Total num frames: 1814528. Throughput: 0: 1023.1. Samples: 452686. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-12-30 20:04:47,000][00338] Avg episode reward: [(0, '10.608')]
[2024-12-30 20:04:47,008][02924] Saving new best policy, reward=10.608!
[2024-12-30 20:04:51,994][00338] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3957.2). Total num frames: 1830912. Throughput: 0: 972.9. Samples: 457104. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-12-30 20:04:51,997][00338] Avg episode reward: [(0, '10.380')]
[2024-12-30 20:04:54,308][02937] Updated weights for policy 0, policy_version 450 (0.0014)
[2024-12-30 20:04:56,994][00338] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3984.9). Total num frames: 1855488. Throughput: 0: 968.1. Samples: 463662. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-12-30 20:04:57,001][00338] Avg episode reward: [(0, '11.428')]
[2024-12-30 20:04:57,011][02924] Saving new best policy, reward=11.428!
[2024-12-30 20:05:01,995][00338] Fps is (10 sec: 4505.4, 60 sec: 4027.7, 300 sec: 3984.9). Total num frames: 1875968. Throughput: 0: 1004.5. Samples: 467252. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-12-30 20:05:01,997][00338] Avg episode reward: [(0, '11.505')]
[2024-12-30 20:05:01,999][02924] Saving new best policy, reward=11.505!
[2024-12-30 20:05:03,896][02937] Updated weights for policy 0, policy_version 460 (0.0018)
[2024-12-30 20:05:06,994][00338] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3957.2). Total num frames: 1892352. Throughput: 0: 996.4. Samples: 472518. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2024-12-30 20:05:07,001][00338] Avg episode reward: [(0, '12.124')]
[2024-12-30 20:05:07,016][02924] Saving new best policy, reward=12.124!
[2024-12-30 20:05:11,994][00338] Fps is (10 sec: 3686.6, 60 sec: 3891.2, 300 sec: 3971.0). Total num frames: 1912832. Throughput: 0: 965.6. Samples: 478202. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2024-12-30 20:05:12,001][00338] Avg episode reward: [(0, '12.660')]
[2024-12-30 20:05:12,005][02924] Saving new best policy, reward=12.660!
[2024-12-30 20:05:14,296][02937] Updated weights for policy 0, policy_version 470 (0.0036)
[2024-12-30 20:05:16,994][00338] Fps is (10 sec: 4505.6, 60 sec: 4096.4, 300 sec: 3984.9). Total num frames: 1937408. Throughput: 0: 982.1. Samples: 481604. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-12-30 20:05:17,001][00338] Avg episode reward: [(0, '10.789')]
[2024-12-30 20:05:21,994][00338] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3971.0). Total num frames: 1953792. Throughput: 0: 1017.4. Samples: 487886. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-12-30 20:05:21,996][00338] Avg episode reward: [(0, '11.828')]
[2024-12-30 20:05:25,580][02937] Updated weights for policy 0, policy_version 480 (0.0029)
[2024-12-30 20:05:26,994][00338] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3957.2). Total num frames: 1970176. Throughput: 0: 966.0. Samples: 492682. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-12-30 20:05:26,997][00338] Avg episode reward: [(0, '11.629')]
[2024-12-30 20:05:31,994][00338] Fps is (10 sec: 4096.0, 60 sec: 4027.9, 300 sec: 3984.9). Total num frames: 1994752. Throughput: 0: 968.5. Samples: 496270. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-12-30 20:05:32,001][00338] Avg episode reward: [(0, '13.302')]
[2024-12-30 20:05:32,005][02924] Saving new best policy, reward=13.302!
[2024-12-30 20:05:34,144][02937] Updated weights for policy 0, policy_version 490 (0.0021)
[2024-12-30 20:05:36,998][00338] Fps is (10 sec: 4504.1, 60 sec: 4027.5, 300 sec: 3971.0). Total num frames: 2015232. Throughput: 0: 1031.3. Samples: 503514. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-12-30 20:05:37,004][00338] Avg episode reward: [(0, '14.132')]
[2024-12-30 20:05:37,024][02924] Saving new best policy, reward=14.132!
[2024-12-30 20:05:41,994][00338] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3957.2). Total num frames: 2031616. Throughput: 0: 981.9. Samples: 507848. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-12-30 20:05:42,001][00338] Avg episode reward: [(0, '14.532')]
[2024-12-30 20:05:42,002][02924] Saving new best policy, reward=14.532!
[2024-12-30 20:05:45,620][02937] Updated weights for policy 0, policy_version 500 (0.0016)
[2024-12-30 20:05:46,994][00338] Fps is (10 sec: 3687.7, 60 sec: 3959.5, 300 sec: 3971.0). Total num frames: 2052096. Throughput: 0: 968.1. Samples: 510814. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-12-30 20:05:46,999][00338] Avg episode reward: [(0, '15.120')]
[2024-12-30 20:05:47,008][02924] Saving new best policy, reward=15.120!
[2024-12-30 20:05:51,994][00338] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 3984.9). Total num frames: 2076672. Throughput: 0: 1010.8. Samples: 518006. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2024-12-30 20:05:51,997][00338] Avg episode reward: [(0, '16.101')]
[2024-12-30 20:05:52,005][02924] Saving new best policy, reward=16.101!
[2024-12-30 20:05:55,774][02937] Updated weights for policy 0, policy_version 510 (0.0024)
[2024-12-30 20:05:56,994][00338] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3943.3). Total num frames: 2088960. Throughput: 0: 997.8. Samples: 523104. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2024-12-30 20:05:56,999][00338] Avg episode reward: [(0, '16.200')]
[2024-12-30 20:05:57,014][02924] Saving new best policy, reward=16.200!
[2024-12-30 20:06:01,994][00338] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3957.2). Total num frames: 2109440. Throughput: 0: 970.5. Samples: 525276. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-12-30 20:06:02,001][00338] Avg episode reward: [(0, '16.456')]
[2024-12-30 20:06:02,005][02924] Saving new best policy, reward=16.456!
[2024-12-30 20:06:05,619][02937] Updated weights for policy 0, policy_version 520 (0.0016)
[2024-12-30 20:06:06,994][00338] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3984.9). Total num frames: 2134016. Throughput: 0: 991.6. Samples: 532508. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-12-30 20:06:06,997][00338] Avg episode reward: [(0, '16.886')]
[2024-12-30 20:06:07,003][02924] Saving new best policy, reward=16.886!
[2024-12-30 20:06:11,994][00338] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3971.0). Total num frames: 2154496. Throughput: 0: 1021.0. Samples: 538628. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-12-30 20:06:11,998][00338] Avg episode reward: [(0, '17.220')]
[2024-12-30 20:06:12,004][02924] Saving new best policy, reward=17.220!
[2024-12-30 20:06:16,994][00338] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3943.3). Total num frames: 2166784. Throughput: 0: 987.9. Samples: 540724. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-12-30 20:06:16,997][00338] Avg episode reward: [(0, '16.939')]
[2024-12-30 20:06:17,005][02937] Updated weights for policy 0, policy_version 530 (0.0023)
[2024-12-30 20:06:21,995][00338] Fps is (10 sec: 3686.2, 60 sec: 3959.4, 300 sec: 3971.0). Total num frames: 2191360. Throughput: 0: 967.4. Samples: 547044. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-12-30 20:06:21,997][00338] Avg episode reward: [(0, '16.967')]
[2024-12-30 20:06:25,523][02937] Updated weights for policy 0, policy_version 540 (0.0017)
[2024-12-30 20:06:26,994][00338] Fps is (10 sec: 4915.2, 60 sec: 4096.0, 300 sec: 3984.9). Total num frames: 2215936. Throughput: 0: 1033.2. Samples: 554340. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-12-30 20:06:26,999][00338] Avg episode reward: [(0, '16.302')]
[2024-12-30 20:06:27,011][02924] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000541_2215936.pth...
[2024-12-30 20:06:27,172][02924] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000308_1261568.pth
[2024-12-30 20:06:31,997][00338] Fps is (10 sec: 3685.7, 60 sec: 3891.0, 300 sec: 3943.2). Total num frames: 2228224. Throughput: 0: 1013.8. Samples: 556436. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-12-30 20:06:32,001][00338] Avg episode reward: [(0, '15.203')]
[2024-12-30 20:06:36,671][02937] Updated weights for policy 0, policy_version 550 (0.0030)
[2024-12-30 20:06:36,994][00338] Fps is (10 sec: 3686.4, 60 sec: 3959.7, 300 sec: 3971.0). Total num frames: 2252800. Throughput: 0: 971.6. Samples: 561728. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-12-30 20:06:36,997][00338] Avg episode reward: [(0, '14.382')]
[2024-12-30 20:06:41,994][00338] Fps is (10 sec: 4916.4, 60 sec: 4096.0, 300 sec: 3998.8). Total num frames: 2277376. Throughput: 0: 1022.2. Samples: 569104. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-12-30 20:06:42,001][00338] Avg episode reward: [(0, '15.511')]
[2024-12-30 20:06:46,810][02937] Updated weights for policy 0, policy_version 560 (0.0020)
[2024-12-30 20:06:46,995][00338] Fps is (10 sec: 4095.8, 60 sec: 4027.7, 300 sec: 3971.0). Total num frames: 2293760. Throughput: 0: 1040.2. Samples: 572086. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-12-30 20:06:46,998][00338] Avg episode reward: [(0, '17.318')]
[2024-12-30 20:06:47,008][02924] Saving new best policy, reward=17.318!
[2024-12-30 20:06:51,994][00338] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3957.2). Total num frames: 2310144. Throughput: 0: 976.2. Samples: 576438. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2024-12-30 20:06:51,999][00338] Avg episode reward: [(0, '16.867')]
[2024-12-30 20:06:56,627][02937] Updated weights for policy 0, policy_version 570 (0.0017)
[2024-12-30 20:06:56,994][00338] Fps is (10 sec: 4096.2, 60 sec: 4096.0, 300 sec: 3998.8). Total num frames: 2334720. Throughput: 0: 999.2. Samples: 583590. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-12-30 20:06:57,001][00338] Avg episode reward: [(0, '17.922')]
[2024-12-30 20:06:57,011][02924] Saving new best policy, reward=17.922!
[2024-12-30 20:07:01,996][00338] Fps is (10 sec: 4504.6, 60 sec: 4095.8, 300 sec: 3984.9). Total num frames: 2355200. Throughput: 0: 1032.3. Samples: 587180. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-12-30 20:07:01,999][00338] Avg episode reward: [(0, '18.309')]
[2024-12-30 20:07:02,003][02924] Saving new best policy, reward=18.309!
[2024-12-30 20:07:06,994][00338] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3943.3). Total num frames: 2367488. Throughput: 0: 996.1. Samples: 591868. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-12-30 20:07:06,997][00338] Avg episode reward: [(0, '18.389')]
[2024-12-30 20:07:07,046][02924] Saving new best policy, reward=18.389!
[2024-12-30 20:07:07,978][02937] Updated weights for policy 0, policy_version 580 (0.0021)
[2024-12-30 20:07:11,994][00338] Fps is (10 sec: 3687.2, 60 sec: 3959.5, 300 sec: 3984.9). Total num frames: 2392064. Throughput: 0: 974.8. Samples: 598208. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2024-12-30 20:07:12,002][00338] Avg episode reward: [(0, '17.932')]
[2024-12-30 20:07:16,716][02937] Updated weights for policy 0, policy_version 590 (0.0019)
[2024-12-30 20:07:16,994][00338] Fps is (10 sec: 4915.2, 60 sec: 4164.3, 300 sec: 3998.8). Total num frames: 2416640. Throughput: 0: 1005.3. Samples: 601672. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2024-12-30 20:07:17,001][00338] Avg episode reward: [(0, '18.134')]
[2024-12-30 20:07:21,994][00338] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3957.2). Total num frames: 2428928. Throughput: 0: 1016.3. Samples: 607462. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2024-12-30 20:07:21,997][00338] Avg episode reward: [(0, '18.246')]
[2024-12-30 20:07:26,994][00338] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3971.0). Total num frames: 2449408. Throughput: 0: 972.1. Samples: 612850. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-12-30 20:07:27,002][00338] Avg episode reward: [(0, '17.406')]
[2024-12-30 20:07:27,854][02937] Updated weights for policy 0, policy_version 600 (0.0014)
[2024-12-30 20:07:31,994][00338] Fps is (10 sec: 4505.7, 60 sec: 4096.2, 300 sec: 3998.8). Total num frames: 2473984. Throughput: 0: 985.1. Samples: 616416. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-12-30 20:07:31,996][00338] Avg episode reward: [(0, '18.745')]
[2024-12-30 20:07:32,007][02924] Saving new best policy, reward=18.745!
[2024-12-30 20:07:36,994][00338] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3984.9). Total num frames: 2494464. Throughput: 0: 1036.3. Samples: 623070. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-12-30 20:07:37,001][00338] Avg episode reward: [(0, '17.921')]
[2024-12-30 20:07:38,266][02937] Updated weights for policy 0, policy_version 610 (0.0018)
[2024-12-30 20:07:41,994][00338] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3971.0). Total num frames: 2510848. Throughput: 0: 975.3. Samples: 627478. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-12-30 20:07:41,997][00338] Avg episode reward: [(0, '17.921')]
[2024-12-30 20:07:46,994][00338] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3984.9). Total num frames: 2531328. Throughput: 0: 974.2. Samples: 631018. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-12-30 20:07:46,999][00338] Avg episode reward: [(0, '20.292')]
[2024-12-30 20:07:47,008][02924] Saving new best policy, reward=20.292!
[2024-12-30 20:07:47,989][02937] Updated weights for policy 0, policy_version 620 (0.0041)
[2024-12-30 20:07:52,000][00338] Fps is (10 sec: 4503.1, 60 sec: 4095.6, 300 sec: 3998.7). Total num frames: 2555904. Throughput: 0: 1027.8. Samples: 638126. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-12-30 20:07:52,002][00338] Avg episode reward: [(0, '18.989')]
[2024-12-30 20:07:56,994][00338] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3957.2). Total num frames: 2568192. Throughput: 0: 989.0. Samples: 642714. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-12-30 20:07:57,007][00338] Avg episode reward: [(0, '17.805')]
[2024-12-30 20:07:59,125][02937] Updated weights for policy 0, policy_version 630 (0.0022)
[2024-12-30 20:08:01,994][00338] Fps is (10 sec: 3688.4, 60 sec: 3959.6, 300 sec: 3984.9). Total num frames: 2592768. Throughput: 0: 976.4. Samples: 645610. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-12-30 20:08:01,997][00338] Avg episode reward: [(0, '19.331')]
[2024-12-30 20:08:06,994][00338] Fps is (10 sec: 4915.2, 60 sec: 4164.3, 300 sec: 3998.8). Total num frames: 2617344. Throughput: 0: 1008.0. Samples: 652822. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-12-30 20:08:06,997][00338] Avg episode reward: [(0, '18.677')]
[2024-12-30 20:08:07,736][02937] Updated weights for policy 0, policy_version 640 (0.0030)
[2024-12-30 20:08:11,995][00338] Fps is (10 sec: 3686.1, 60 sec: 3959.4, 300 sec: 3971.0). Total num frames: 2629632. Throughput: 0: 1010.8. Samples: 658336. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-12-30 20:08:12,002][00338] Avg episode reward: [(0, '18.678')]
[2024-12-30 20:08:16,994][00338] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3971.1). Total num frames: 2650112. Throughput: 0: 978.7. Samples: 660456. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-12-30 20:08:17,001][00338] Avg episode reward: [(0, '19.203')]
[2024-12-30 20:08:19,133][02937] Updated weights for policy 0, policy_version 650 (0.0027)
[2024-12-30 20:08:21,994][00338] Fps is (10 sec: 4505.9, 60 sec: 4096.0, 300 sec: 3998.8). Total num frames: 2674688. Throughput: 0: 984.6. Samples: 667378. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2024-12-30 20:08:22,001][00338] Avg episode reward: [(0, '19.413')]
[2024-12-30 20:08:26,995][00338] Fps is (10 sec: 4505.4, 60 sec: 4096.0, 300 sec: 3998.8). Total num frames: 2695168. Throughput: 0: 1031.7. Samples: 673904. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-12-30 20:08:26,997][00338] Avg episode reward: [(0, '20.996')]
[2024-12-30 20:08:27,011][02924] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000658_2695168.pth...
[2024-12-30 20:08:27,178][02924] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000424_1736704.pth
[2024-12-30 20:08:27,195][02924] Saving new best policy, reward=20.996!
[2024-12-30 20:08:29,924][02937] Updated weights for policy 0, policy_version 660 (0.0015)
[2024-12-30 20:08:31,994][00338] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3957.2). Total num frames: 2707456. Throughput: 0: 999.0. Samples: 675974. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-12-30 20:08:31,999][00338] Avg episode reward: [(0, '20.689')]
[2024-12-30 20:08:36,994][00338] Fps is (10 sec: 3686.6, 60 sec: 3959.5, 300 sec: 3998.8). Total num frames: 2732032. Throughput: 0: 973.6. Samples: 681934. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-12-30 20:08:37,000][00338] Avg episode reward: [(0, '21.298')]
[2024-12-30 20:08:37,009][02924] Saving new best policy, reward=21.298!
[2024-12-30 20:08:39,203][02937] Updated weights for policy 0, policy_version 670 (0.0028)
[2024-12-30 20:08:41,994][00338] Fps is (10 sec: 4915.2, 60 sec: 4096.0, 300 sec: 3998.8). Total num frames: 2756608. Throughput: 0: 1030.3. Samples: 689078. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-12-30 20:08:41,997][00338] Avg episode reward: [(0, '23.269')]
[2024-12-30 20:08:42,002][02924] Saving new best policy, reward=23.269!
[2024-12-30 20:08:46,998][00338] Fps is (10 sec: 3685.1, 60 sec: 3959.2, 300 sec: 3971.0). Total num frames: 2768896. Throughput: 0: 1019.4. Samples: 691486. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2024-12-30 20:08:47,000][00338] Avg episode reward: [(0, '22.685')]
[2024-12-30 20:08:50,625][02937] Updated weights for policy 0, policy_version 680 (0.0017)
[2024-12-30 20:08:51,994][00338] Fps is (10 sec: 3276.8, 60 sec: 3891.6, 300 sec: 3971.0). Total num frames: 2789376. Throughput: 0: 968.8. Samples: 696420. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-12-30 20:08:52,001][00338] Avg episode reward: [(0, '20.595')]
[2024-12-30 20:08:56,994][00338] Fps is (10 sec: 4507.1, 60 sec: 4096.0, 300 sec: 3998.8). Total num frames: 2813952. Throughput: 0: 1005.6. Samples: 703588. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2024-12-30 20:08:56,999][00338] Avg episode reward: [(0, '22.092')]
[2024-12-30 20:08:59,380][02937] Updated weights for policy 0, policy_version 690 (0.0018)
[2024-12-30 20:09:01,994][00338] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3971.0). Total num frames: 2830336. Throughput: 0: 1035.9. Samples: 707070. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-12-30 20:09:02,000][00338] Avg episode reward: [(0, '21.348')]
[2024-12-30 20:09:06,995][00338] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3957.1). Total num frames: 2846720. Throughput: 0: 977.8. Samples: 711378. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2024-12-30 20:09:06,997][00338] Avg episode reward: [(0, '21.252')]
[2024-12-30 20:09:10,362][02937] Updated weights for policy 0, policy_version 700 (0.0021)
[2024-12-30 20:09:11,994][00338] Fps is (10 sec: 4096.0, 60 sec: 4027.8, 300 sec: 3998.9). Total num frames: 2871296. Throughput: 0: 987.4. Samples: 718338. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-12-30 20:09:11,996][00338] Avg episode reward: [(0, '23.068')]
[2024-12-30 20:09:16,994][00338] Fps is (10 sec: 4915.4, 60 sec: 4096.0, 300 sec: 3998.8). Total num frames: 2895872. Throughput: 0: 1019.7. Samples: 721860. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-12-30 20:09:17,002][00338] Avg episode reward: [(0, '23.341')]
[2024-12-30 20:09:17,015][02924] Saving new best policy, reward=23.341!
[2024-12-30 20:09:21,210][02937] Updated weights for policy 0, policy_version 710 (0.0019)
[2024-12-30 20:09:21,994][00338] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3971.0). Total num frames: 2908160. Throughput: 0: 995.5. Samples: 726732. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-12-30 20:09:22,003][00338] Avg episode reward: [(0, '23.111')]
[2024-12-30 20:09:26,994][00338] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3998.8). Total num frames: 2932736. Throughput: 0: 971.6. Samples: 732802. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-12-30 20:09:27,001][00338] Avg episode reward: [(0, '22.742')]
[2024-12-30 20:09:30,330][02937] Updated weights for policy 0, policy_version 720 (0.0015)
[2024-12-30 20:09:31,994][00338] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 3998.8). Total num frames: 2953216. Throughput: 0: 998.7. Samples: 736422. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-12-30 20:09:31,996][00338] Avg episode reward: [(0, '23.523')]
[2024-12-30 20:09:32,044][02924] Saving new best policy, reward=23.523!
[2024-12-30 20:09:36,995][00338] Fps is (10 sec: 3686.1, 60 sec: 3959.4, 300 sec: 3971.0). Total num frames: 2969600. Throughput: 0: 1020.5. Samples: 742344. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-12-30 20:09:36,998][00338] Avg episode reward: [(0, '22.349')]
[2024-12-30 20:09:41,680][02937] Updated weights for policy 0, policy_version 730 (0.0016)
[2024-12-30 20:09:41,994][00338] Fps is (10 sec: 3686.3, 60 sec: 3891.2, 300 sec: 3984.9). Total num frames: 2990080. Throughput: 0: 972.5. Samples: 747352. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-12-30 20:09:41,999][00338] Avg episode reward: [(0, '22.614')]
[2024-12-30 20:09:46,994][00338] Fps is (10 sec: 4096.3, 60 sec: 4028.0, 300 sec: 3998.8). Total num frames: 3010560. Throughput: 0: 974.6. Samples: 750926. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-12-30 20:09:46,997][00338] Avg episode reward: [(0, '22.465')]
[2024-12-30 20:09:51,116][02937] Updated weights for policy 0, policy_version 740 (0.0013)
[2024-12-30 20:09:52,001][00338] Fps is (10 sec: 4093.2, 60 sec: 4027.3, 300 sec: 3984.8). Total num frames: 3031040. Throughput: 0: 1027.5. Samples: 757624. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-12-30 20:09:52,004][00338] Avg episode reward: [(0, '24.615')]
[2024-12-30 20:09:52,006][02924] Saving new best policy, reward=24.615!
[2024-12-30 20:09:56,994][00338] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3971.0). Total num frames: 3047424. Throughput: 0: 969.0. Samples: 761942. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-12-30 20:09:56,998][00338] Avg episode reward: [(0, '24.300')]
[2024-12-30 20:10:01,882][02937] Updated weights for policy 0, policy_version 750 (0.0027)
[2024-12-30 20:10:01,994][00338] Fps is (10 sec: 4098.8, 60 sec: 4027.7, 300 sec: 3998.8). Total num frames: 3072000. Throughput: 0: 963.7. Samples: 765226. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-12-30 20:10:01,998][00338] Avg episode reward: [(0, '25.283')]
[2024-12-30 20:10:02,001][02924] Saving new best policy, reward=25.283!
[2024-12-30 20:10:06,999][00338] Fps is (10 sec: 4503.6, 60 sec: 4095.7, 300 sec: 3998.7). Total num frames: 3092480. Throughput: 0: 1014.2. Samples: 772376. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-12-30 20:10:07,003][00338] Avg episode reward: [(0, '24.958')]
[2024-12-30 20:10:11,994][00338] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3971.0). Total num frames: 3108864. Throughput: 0: 989.6. Samples: 777334. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-12-30 20:10:11,997][00338] Avg episode reward: [(0, '25.480')]
[2024-12-30 20:10:12,001][02924] Saving new best policy, reward=25.480!
[2024-12-30 20:10:13,253][02937] Updated weights for policy 0, policy_version 760 (0.0022)
[2024-12-30 20:10:16,994][00338] Fps is (10 sec: 3688.0, 60 sec: 3891.2, 300 sec: 3984.9). Total num frames: 3129344. Throughput: 0: 962.3. Samples: 779724. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-12-30 20:10:17,000][00338] Avg episode reward: [(0, '24.827')]
[2024-12-30 20:10:21,938][02937] Updated weights for policy 0, policy_version 770 (0.0020)
[2024-12-30 20:10:21,994][00338] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 4012.7). Total num frames: 3153920. Throughput: 0: 989.7. Samples: 786878. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-12-30 20:10:21,999][00338] Avg episode reward: [(0, '22.796')]
[2024-12-30 20:10:26,996][00338] Fps is (10 sec: 4095.4, 60 sec: 3959.4, 300 sec: 3984.9). Total num frames: 3170304. Throughput: 0: 1011.4. Samples: 792868. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-12-30 20:10:27,005][00338] Avg episode reward: [(0, '22.265')]
[2024-12-30 20:10:27,013][02924] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000774_3170304.pth...
[2024-12-30 20:10:27,167][02924] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000541_2215936.pth
[2024-12-30 20:10:31,994][00338] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3971.1). Total num frames: 3186688. Throughput: 0: 980.0. Samples: 795026. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-12-30 20:10:31,999][00338] Avg episode reward: [(0, '21.560')]
[2024-12-30 20:10:33,191][02937] Updated weights for policy 0, policy_version 780 (0.0039)
[2024-12-30 20:10:36,997][00338] Fps is (10 sec: 4095.6, 60 sec: 4027.6, 300 sec: 3998.8). Total num frames: 3211264. Throughput: 0: 979.2. Samples: 801684. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2024-12-30 20:10:37,004][00338] Avg episode reward: [(0, '20.763')]
[2024-12-30 20:10:41,994][00338] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3998.8). Total num frames: 3231744. Throughput: 0: 1039.7. Samples: 808730. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-12-30 20:10:42,001][00338] Avg episode reward: [(0, '21.018')]
[2024-12-30 20:10:42,184][02937] Updated weights for policy 0, policy_version 790 (0.0013)
[2024-12-30 20:10:46,994][00338] Fps is (10 sec: 3687.3, 60 sec: 3959.5, 300 sec: 3971.0). Total num frames: 3248128. Throughput: 0: 1014.0. Samples: 810854. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-12-30 20:10:47,001][00338] Avg episode reward: [(0, '22.977')]
[2024-12-30 20:10:51,994][00338] Fps is (10 sec: 3686.4, 60 sec: 3959.9, 300 sec: 3998.8). Total num frames: 3268608. Throughput: 0: 978.8. Samples: 816418. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-12-30 20:10:51,997][00338] Avg episode reward: [(0, '22.164')]
[2024-12-30 20:10:53,032][02937] Updated weights for policy 0, policy_version 800 (0.0013)
[2024-12-30 20:10:56,994][00338] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 4012.7). Total num frames: 3293184. Throughput: 0: 1023.3. Samples: 823384. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-12-30 20:10:57,003][00338] Avg episode reward: [(0, '22.531')]
[2024-12-30 20:11:01,994][00338] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3984.9). Total num frames: 3309568. Throughput: 0: 1032.1. Samples: 826168. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-12-30 20:11:01,999][00338] Avg episode reward: [(0, '22.995')]
[2024-12-30 20:11:04,581][02937] Updated weights for policy 0, policy_version 810 (0.0024)
[2024-12-30 20:11:06,994][00338] Fps is (10 sec: 3276.8, 60 sec: 3891.5, 300 sec: 3971.0). Total num frames: 3325952. Throughput: 0: 972.4. Samples: 830638. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-12-30 20:11:07,002][00338] Avg episode reward: [(0, '22.243')]
[2024-12-30 20:11:11,995][00338] Fps is (10 sec: 4095.9, 60 sec: 4027.7, 300 sec: 4012.7). Total num frames: 3350528. Throughput: 0: 996.3. Samples: 837702. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-12-30 20:11:12,002][00338] Avg episode reward: [(0, '21.920')]
[2024-12-30 20:11:13,216][02937] Updated weights for policy 0, policy_version 820 (0.0014)
[2024-12-30 20:11:16,998][00338] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3998.8). Total num frames: 3371008. Throughput: 0: 1027.6. Samples: 841266. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-12-30 20:11:17,005][00338] Avg episode reward: [(0, '22.273')]
[2024-12-30 20:11:21,994][00338] Fps is (10 sec: 3276.9, 60 sec: 3822.9, 300 sec: 3957.2). Total num frames: 3383296. Throughput: 0: 978.1. Samples: 845694. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-12-30 20:11:21,996][00338] Avg episode reward: [(0, '22.489')]
[2024-12-30 20:11:24,626][02937] Updated weights for policy 0, policy_version 830 (0.0036)
[2024-12-30 20:11:26,994][00338] Fps is (10 sec: 3686.4, 60 sec: 3959.6, 300 sec: 3998.8). Total num frames: 3407872. Throughput: 0: 966.4. Samples: 852216. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-12-30 20:11:27,003][00338] Avg episode reward: [(0, '22.932')]
[2024-12-30 20:11:31,994][00338] Fps is (10 sec: 4915.2, 60 sec: 4096.0, 300 sec: 3998.8). Total num frames: 3432448. Throughput: 0: 995.7. Samples: 855660. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-12-30 20:11:32,002][00338] Avg episode reward: [(0, '24.577')]
[2024-12-30 20:11:34,413][02937] Updated weights for policy 0, policy_version 840 (0.0019)
[2024-12-30 20:11:36,997][00338] Fps is (10 sec: 3685.5, 60 sec: 3891.2, 300 sec: 3957.1). Total num frames: 3444736. Throughput: 0: 995.6. Samples: 861224. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-12-30 20:11:36,999][00338] Avg episode reward: [(0, '24.389')]
[2024-12-30 20:11:41,994][00338] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3971.0). Total num frames: 3465216. Throughput: 0: 957.8. Samples: 866486. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-12-30 20:11:41,997][00338] Avg episode reward: [(0, '25.368')]
[2024-12-30 20:11:45,022][02937] Updated weights for policy 0, policy_version 850 (0.0031)
[2024-12-30 20:11:46,994][00338] Fps is (10 sec: 4506.7, 60 sec: 4027.7, 300 sec: 3998.8). Total num frames: 3489792. Throughput: 0: 972.2. Samples: 869916. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-12-30 20:11:46,997][00338] Avg episode reward: [(0, '25.328')]
[2024-12-30 20:11:51,994][00338] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3971.0). Total num frames: 3506176. Throughput: 0: 1009.8. Samples: 876080. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-12-30 20:11:51,999][00338] Avg episode reward: [(0, '24.922')]
[2024-12-30 20:11:56,841][02937] Updated weights for policy 0, policy_version 860 (0.0026)
[2024-12-30 20:11:56,994][00338] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3957.2). Total num frames: 3522560. Throughput: 0: 950.8. Samples: 880488. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-12-30 20:11:56,997][00338] Avg episode reward: [(0, '24.741')]
[2024-12-30 20:12:01,995][00338] Fps is (10 sec: 3686.3, 60 sec: 3891.2, 300 sec: 3984.9). Total num frames: 3543040. Throughput: 0: 948.5. Samples: 883948. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-12-30 20:12:02,001][00338] Avg episode reward: [(0, '23.888')]
[2024-12-30 20:12:05,482][02937] Updated weights for policy 0, policy_version 870 (0.0020)
[2024-12-30 20:12:06,994][00338] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3984.9). Total num frames: 3567616. Throughput: 0: 1007.9. Samples: 891048. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-12-30 20:12:06,996][00338] Avg episode reward: [(0, '24.077')]
[2024-12-30 20:12:11,994][00338] Fps is (10 sec: 3686.5, 60 sec: 3823.0, 300 sec: 3943.3). Total num frames: 3579904. Throughput: 0: 962.8. Samples: 895542. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-12-30 20:12:12,000][00338] Avg episode reward: [(0, '24.367')]
[2024-12-30 20:12:16,855][02937] Updated weights for policy 0, policy_version 880 (0.0017)
[2024-12-30 20:12:16,994][00338] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3984.9). Total num frames: 3604480. Throughput: 0: 954.3. Samples: 898602. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-12-30 20:12:17,001][00338] Avg episode reward: [(0, '24.123')]
[2024-12-30 20:12:21,994][00338] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3984.9). Total num frames: 3624960. Throughput: 0: 989.0. Samples: 905728. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-12-30 20:12:21,996][00338] Avg episode reward: [(0, '23.791')]
[2024-12-30 20:12:26,999][00338] Fps is (10 sec: 3684.8, 60 sec: 3890.9, 300 sec: 3957.1). Total num frames: 3641344. Throughput: 0: 989.9. Samples: 911036. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-12-30 20:12:27,001][00338] Avg episode reward: [(0, '24.124')]
[2024-12-30 20:12:27,014][02924] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000889_3641344.pth...
[2024-12-30 20:12:27,224][02924] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000658_2695168.pth
[2024-12-30 20:12:27,341][02937] Updated weights for policy 0, policy_version 890 (0.0057)
[2024-12-30 20:12:31,994][00338] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3957.2). Total num frames: 3661824. Throughput: 0: 962.4. Samples: 913224. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-12-30 20:12:32,002][00338] Avg episode reward: [(0, '24.495')]
[2024-12-30 20:12:36,784][02937] Updated weights for policy 0, policy_version 900 (0.0032)
[2024-12-30 20:12:36,994][00338] Fps is (10 sec: 4507.6, 60 sec: 4027.9, 300 sec: 3984.9). Total num frames: 3686400. Throughput: 0: 981.6. Samples: 920250. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-12-30 20:12:36,997][00338] Avg episode reward: [(0, '24.720')]
[2024-12-30 20:12:41,994][00338] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3971.0). Total num frames: 3702784. Throughput: 0: 1025.9. Samples: 926654. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-12-30 20:12:41,997][00338] Avg episode reward: [(0, '26.380')]
[2024-12-30 20:12:42,056][02924] Saving new best policy, reward=26.380!
[2024-12-30 20:12:46,994][00338] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3943.3). Total num frames: 3719168. Throughput: 0: 995.7. Samples: 928754. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-12-30 20:12:46,999][00338] Avg episode reward: [(0, '26.367')]
[2024-12-30 20:12:48,161][02937] Updated weights for policy 0, policy_version 910 (0.0028)
[2024-12-30 20:12:51,994][00338] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3984.9). Total num frames: 3743744. Throughput: 0: 972.2. Samples: 934796. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-12-30 20:12:51,999][00338] Avg episode reward: [(0, '27.231')]
[2024-12-30 20:12:52,002][02924] Saving new best policy, reward=27.231!
[2024-12-30 20:12:56,994][00338] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3971.0). Total num frames: 3764224. Throughput: 0: 1031.8. Samples: 941974. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-12-30 20:12:57,001][00338] Avg episode reward: [(0, '28.804')]
[2024-12-30 20:12:57,012][02924] Saving new best policy, reward=28.804!
[2024-12-30 20:12:57,338][02937] Updated weights for policy 0, policy_version 920 (0.0017)
[2024-12-30 20:13:01,994][00338] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3943.3). Total num frames: 3780608. Throughput: 0: 1008.8. Samples: 943998. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-12-30 20:13:01,996][00338] Avg episode reward: [(0, '28.385')]
[2024-12-30 20:13:06,994][00338] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3971.0). Total num frames: 3801088. Throughput: 0: 971.0. Samples: 949424. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-12-30 20:13:06,996][00338] Avg episode reward: [(0, '28.096')]
[2024-12-30 20:13:08,158][02937] Updated weights for policy 0, policy_version 930 (0.0017)
[2024-12-30 20:13:11,994][00338] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 3984.9). Total num frames: 3825664. Throughput: 0: 1011.9. Samples: 956566. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-12-30 20:13:12,001][00338] Avg episode reward: [(0, '26.923')]
[2024-12-30 20:13:16,996][00338] Fps is (10 sec: 4095.3, 60 sec: 3959.3, 300 sec: 3957.1). Total num frames: 3842048. Throughput: 0: 1028.8. Samples: 959520. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2024-12-30 20:13:17,002][00338] Avg episode reward: [(0, '26.471')]
[2024-12-30 20:13:19,174][02937] Updated weights for policy 0, policy_version 940 (0.0021)
[2024-12-30 20:13:22,001][00338] Fps is (10 sec: 3274.7, 60 sec: 3890.8, 300 sec: 3943.2). Total num frames: 3858432. Throughput: 0: 968.8. Samples: 963854. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2024-12-30 20:13:22,006][00338] Avg episode reward: [(0, '25.451')]
[2024-12-30 20:13:26,994][00338] Fps is (10 sec: 4096.7, 60 sec: 4028.0, 300 sec: 3984.9). Total num frames: 3883008. Throughput: 0: 982.4. Samples: 970860. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2024-12-30 20:13:26,997][00338] Avg episode reward: [(0, '24.357')]
[2024-12-30 20:13:28,320][02937] Updated weights for policy 0, policy_version 950 (0.0017)
[2024-12-30 20:13:31,994][00338] Fps is (10 sec: 4508.5, 60 sec: 4027.7, 300 sec: 3971.0). Total num frames: 3903488. Throughput: 0: 1014.9. Samples: 974426. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-12-30 20:13:31,997][00338] Avg episode reward: [(0, '23.261')]
[2024-12-30 20:13:36,994][00338] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3943.3). Total num frames: 3919872. Throughput: 0: 984.4. Samples: 979094. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2024-12-30 20:13:36,996][00338] Avg episode reward: [(0, '23.994')]
[2024-12-30 20:13:39,413][02937] Updated weights for policy 0, policy_version 960 (0.0029)
[2024-12-30 20:13:41,994][00338] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3971.1). Total num frames: 3940352. Throughput: 0: 967.6. Samples: 985516. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2024-12-30 20:13:42,005][00338] Avg episode reward: [(0, '22.942')]
[2024-12-30 20:13:46,996][00338] Fps is (10 sec: 4504.9, 60 sec: 4095.9, 300 sec: 3984.9). Total num frames: 3964928. Throughput: 0: 1002.0. Samples: 989090. Policy #0 lag: (min: 0.0, avg: 0.3, max: 2.0)
[2024-12-30 20:13:47,003][00338] Avg episode reward: [(0, '22.463')]
[2024-12-30 20:13:48,826][02937] Updated weights for policy 0, policy_version 970 (0.0039)
[2024-12-30 20:13:51,995][00338] Fps is (10 sec: 4095.7, 60 sec: 3959.4, 300 sec: 3957.1). Total num frames: 3981312. Throughput: 0: 1004.5. Samples: 994628. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2024-12-30 20:13:51,998][00338] Avg episode reward: [(0, '22.844')]
[2024-12-30 20:13:56,994][00338] Fps is (10 sec: 3686.9, 60 sec: 3959.5, 300 sec: 3971.0). Total num frames: 4001792. Throughput: 0: 965.9. Samples: 1000032. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-12-30 20:13:56,997][00338] Avg episode reward: [(0, '22.690')]
[2024-12-30 20:13:57,802][02924] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
[2024-12-30 20:13:57,807][00338] Component Batcher_0 stopped!
[2024-12-30 20:13:57,803][02924] Stopping Batcher_0...
[2024-12-30 20:13:57,816][02924] Loop batcher_evt_loop terminating...
[2024-12-30 20:13:57,880][02937] Weights refcount: 2 0
[2024-12-30 20:13:57,885][00338] Component InferenceWorker_p0-w0 stopped!
[2024-12-30 20:13:57,893][02937] Stopping InferenceWorker_p0-w0...
[2024-12-30 20:13:57,894][02937] Loop inference_proc0-0_evt_loop terminating...
[2024-12-30 20:13:57,962][02924] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000774_3170304.pth
[2024-12-30 20:13:57,984][02924] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
[2024-12-30 20:13:58,149][02943] Stopping RolloutWorker_w6...
[2024-12-30 20:13:58,148][00338] Component RolloutWorker_w6 stopped!
[2024-12-30 20:13:58,150][02943] Loop rollout_proc6_evt_loop terminating...
[2024-12-30 20:13:58,197][02940] Stopping RolloutWorker_w2...
[2024-12-30 20:13:58,197][00338] Component RolloutWorker_w2 stopped!
[2024-12-30 20:13:58,205][00338] Component RolloutWorker_w0 stopped!
[2024-12-30 20:13:58,205][02938] Stopping RolloutWorker_w0...
[2024-12-30 20:13:58,223][02938] Loop rollout_proc0_evt_loop terminating...
[2024-12-30 20:13:58,226][02924] Stopping LearnerWorker_p0...
[2024-12-30 20:13:58,200][02940] Loop rollout_proc2_evt_loop terminating...
[2024-12-30 20:13:58,229][02924] Loop learner_proc0_evt_loop terminating...
[2024-12-30 20:13:58,226][00338] Component LearnerWorker_p0 stopped!
[2024-12-30 20:13:58,246][02942] Stopping RolloutWorker_w4...
[2024-12-30 20:13:58,247][02942] Loop rollout_proc4_evt_loop terminating...
[2024-12-30 20:13:58,246][00338] Component RolloutWorker_w4 stopped!
[2024-12-30 20:13:58,287][00338] Component RolloutWorker_w7 stopped!
[2024-12-30 20:13:58,292][02945] Stopping RolloutWorker_w7...
[2024-12-30 20:13:58,295][02945] Loop rollout_proc7_evt_loop terminating...
[2024-12-30 20:13:58,313][00338] Component RolloutWorker_w5 stopped!
[2024-12-30 20:13:58,319][02944] Stopping RolloutWorker_w5...
[2024-12-30 20:13:58,323][00338] Component RolloutWorker_w1 stopped!
[2024-12-30 20:13:58,327][02939] Stopping RolloutWorker_w1...
[2024-12-30 20:13:58,320][02944] Loop rollout_proc5_evt_loop terminating...
[2024-12-30 20:13:58,328][02939] Loop rollout_proc1_evt_loop terminating...
[2024-12-30 20:13:58,345][00338] Component RolloutWorker_w3 stopped!
[2024-12-30 20:13:58,351][00338] Waiting for process learner_proc0 to stop...
[2024-12-30 20:13:58,357][02941] Stopping RolloutWorker_w3...
[2024-12-30 20:13:58,358][02941] Loop rollout_proc3_evt_loop terminating...
[2024-12-30 20:13:59,931][00338] Waiting for process inference_proc0-0 to join...
[2024-12-30 20:13:59,938][00338] Waiting for process rollout_proc0 to join...
[2024-12-30 20:14:01,834][00338] Waiting for process rollout_proc1 to join...
[2024-12-30 20:14:01,843][00338] Waiting for process rollout_proc2 to join...
[2024-12-30 20:14:01,846][00338] Waiting for process rollout_proc3 to join...
[2024-12-30 20:14:01,849][00338] Waiting for process rollout_proc4 to join...
[2024-12-30 20:14:01,852][00338] Waiting for process rollout_proc5 to join...
[2024-12-30 20:14:01,856][00338] Waiting for process rollout_proc6 to join...
[2024-12-30 20:14:01,861][00338] Waiting for process rollout_proc7 to join...
[2024-12-30 20:14:01,865][00338] Batcher 0 profile tree view:
batching: 26.6346, releasing_batches: 0.0307
[2024-12-30 20:14:01,866][00338] InferenceWorker_p0-w0 profile tree view:
wait_policy: 0.0000
wait_policy_total: 408.7284
update_model: 8.2431
weight_update: 0.0032
one_step: 0.0058
handle_policy_step: 559.6663
deserialize: 14.1682, stack: 3.0056, obs_to_device_normalize: 120.3126, forward: 280.5517, send_messages: 27.2702
prepare_outputs: 85.7343
to_cpu: 52.2131
[2024-12-30 20:14:01,869][00338] Learner 0 profile tree view:
misc: 0.0060, prepare_batch: 15.4795
train: 74.7115
epoch_init: 0.0062, minibatch_init: 0.0061, losses_postprocess: 0.6784, kl_divergence: 0.7140, after_optimizer: 33.3833
calculate_losses: 27.4171
losses_init: 0.0037, forward_head: 1.4711, bptt_initial: 18.7289, tail: 1.0654, advantages_returns: 0.2509, losses: 3.7698
bptt: 1.8192
bptt_forward_core: 1.7231
update: 11.8961
clip: 0.8663
[2024-12-30 20:14:01,871][00338] RolloutWorker_w0 profile tree view:
wait_for_trajectories: 0.3129, enqueue_policy_requests: 96.5137, env_step: 798.1487, overhead: 11.9352, complete_rollouts: 7.0398
save_policy_outputs: 20.1138
split_output_tensors: 7.8855
[2024-12-30 20:14:01,873][00338] RolloutWorker_w7 profile tree view:
wait_for_trajectories: 0.3270, enqueue_policy_requests: 96.3087, env_step: 797.6427, overhead: 12.5089, complete_rollouts: 6.8090
save_policy_outputs: 19.2943
split_output_tensors: 7.8342
[2024-12-30 20:14:01,874][00338] Loop Runner_EvtLoop terminating...
[2024-12-30 20:14:01,877][00338] Runner profile tree view:
main_loop: 1049.2810
[2024-12-30 20:14:01,881][00338] Collected {0: 4005888}, FPS: 3817.7
[2024-12-30 20:14:02,287][00338] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
[2024-12-30 20:14:02,289][00338] Overriding arg 'num_workers' with value 1 passed from command line
[2024-12-30 20:14:02,292][00338] Adding new argument 'no_render'=True that is not in the saved config file!
[2024-12-30 20:14:02,294][00338] Adding new argument 'save_video'=True that is not in the saved config file!
[2024-12-30 20:14:02,296][00338] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
[2024-12-30 20:14:02,298][00338] Adding new argument 'video_name'=None that is not in the saved config file!
[2024-12-30 20:14:02,300][00338] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file!
[2024-12-30 20:14:02,302][00338] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
[2024-12-30 20:14:02,303][00338] Adding new argument 'push_to_hub'=False that is not in the saved config file!
[2024-12-30 20:14:02,304][00338] Adding new argument 'hf_repository'=None that is not in the saved config file!
[2024-12-30 20:14:02,305][00338] Adding new argument 'policy_index'=0 that is not in the saved config file!
[2024-12-30 20:14:02,305][00338] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
[2024-12-30 20:14:02,306][00338] Adding new argument 'train_script'=None that is not in the saved config file!
[2024-12-30 20:14:02,307][00338] Adding new argument 'enjoy_script'=None that is not in the saved config file!
[2024-12-30 20:14:02,308][00338] Using frameskip 1 and render_action_repeat=4 for evaluation
[2024-12-30 20:14:02,340][00338] Doom resolution: 160x120, resize resolution: (128, 72)
[2024-12-30 20:14:02,344][00338] RunningMeanStd input shape: (3, 72, 128)
[2024-12-30 20:14:02,346][00338] RunningMeanStd input shape: (1,)
[2024-12-30 20:14:02,362][00338] ConvEncoder: input_channels=3
[2024-12-30 20:14:02,462][00338] Conv encoder output size: 512
[2024-12-30 20:14:02,464][00338] Policy head output size: 512
[2024-12-30 20:14:02,728][00338] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
[2024-12-30 20:14:03,546][00338] Num frames 100...
[2024-12-30 20:14:03,666][00338] Num frames 200...
[2024-12-30 20:14:03,841][00338] Num frames 300...
[2024-12-30 20:14:04,014][00338] Num frames 400...
[2024-12-30 20:14:04,185][00338] Num frames 500...
[2024-12-30 20:14:04,360][00338] Num frames 600...
[2024-12-30 20:14:04,525][00338] Num frames 700...
[2024-12-30 20:14:04,692][00338] Num frames 800...
[2024-12-30 20:14:04,865][00338] Num frames 900...
[2024-12-30 20:14:05,029][00338] Num frames 1000...
[2024-12-30 20:14:05,209][00338] Num frames 1100...
[2024-12-30 20:14:05,382][00338] Num frames 1200...
[2024-12-30 20:14:05,558][00338] Num frames 1300...
[2024-12-30 20:14:05,733][00338] Num frames 1400...
[2024-12-30 20:14:05,863][00338] Avg episode rewards: #0: 36.400, true rewards: #0: 14.400
[2024-12-30 20:14:05,864][00338] Avg episode reward: 36.400, avg true_objective: 14.400
[2024-12-30 20:14:05,968][00338] Num frames 1500...
[2024-12-30 20:14:06,150][00338] Num frames 1600...
[2024-12-30 20:14:06,304][00338] Num frames 1700...
[2024-12-30 20:14:06,422][00338] Num frames 1800...
[2024-12-30 20:14:06,541][00338] Num frames 1900...
[2024-12-30 20:14:06,661][00338] Num frames 2000...
[2024-12-30 20:14:06,780][00338] Num frames 2100...
[2024-12-30 20:14:06,911][00338] Num frames 2200...
[2024-12-30 20:14:07,034][00338] Num frames 2300...
[2024-12-30 20:14:07,155][00338] Num frames 2400...
[2024-12-30 20:14:07,288][00338] Num frames 2500...
[2024-12-30 20:14:07,411][00338] Num frames 2600...
[2024-12-30 20:14:07,532][00338] Num frames 2700...
[2024-12-30 20:14:07,653][00338] Num frames 2800...
[2024-12-30 20:14:07,772][00338] Num frames 2900...
[2024-12-30 20:14:07,904][00338] Num frames 3000...
[2024-12-30 20:14:08,028][00338] Num frames 3100...
[2024-12-30 20:14:08,167][00338] Num frames 3200...
[2024-12-30 20:14:08,301][00338] Num frames 3300...
[2024-12-30 20:14:08,422][00338] Num frames 3400...
[2024-12-30 20:14:08,507][00338] Avg episode rewards: #0: 46.619, true rewards: #0: 17.120
[2024-12-30 20:14:08,509][00338] Avg episode reward: 46.619, avg true_objective: 17.120
[2024-12-30 20:14:08,604][00338] Num frames 3500...
[2024-12-30 20:14:08,725][00338] Num frames 3600...
[2024-12-30 20:14:08,856][00338] Num frames 3700...
[2024-12-30 20:14:08,980][00338] Num frames 3800...
[2024-12-30 20:14:09,105][00338] Num frames 3900...
[2024-12-30 20:14:09,224][00338] Num frames 4000...
[2024-12-30 20:14:09,356][00338] Num frames 4100...
[2024-12-30 20:14:09,480][00338] Num frames 4200...
[2024-12-30 20:14:09,601][00338] Num frames 4300...
[2024-12-30 20:14:09,720][00338] Num frames 4400...
[2024-12-30 20:14:09,851][00338] Num frames 4500...
[2024-12-30 20:14:09,974][00338] Num frames 4600...
[2024-12-30 20:14:10,098][00338] Num frames 4700...
[2024-12-30 20:14:10,224][00338] Num frames 4800...
[2024-12-30 20:14:10,277][00338] Avg episode rewards: #0: 42.000, true rewards: #0: 16.000
[2024-12-30 20:14:10,280][00338] Avg episode reward: 42.000, avg true_objective: 16.000
[2024-12-30 20:14:10,408][00338] Num frames 4900...
[2024-12-30 20:14:10,531][00338] Num frames 5000...
[2024-12-30 20:14:10,651][00338] Num frames 5100...
[2024-12-30 20:14:10,772][00338] Num frames 5200...
[2024-12-30 20:14:10,899][00338] Num frames 5300...
[2024-12-30 20:14:11,027][00338] Num frames 5400...
[2024-12-30 20:14:11,153][00338] Num frames 5500...
[2024-12-30 20:14:11,267][00338] Avg episode rewards: #0: 35.862, true rewards: #0: 13.862
[2024-12-30 20:14:11,268][00338] Avg episode reward: 35.862, avg true_objective: 13.862
[2024-12-30 20:14:11,339][00338] Num frames 5600...
[2024-12-30 20:14:11,463][00338] Num frames 5700...
[2024-12-30 20:14:11,586][00338] Num frames 5800...
[2024-12-30 20:14:11,706][00338] Num frames 5900...
[2024-12-30 20:14:11,827][00338] Num frames 6000...
[2024-12-30 20:14:11,954][00338] Num frames 6100...
[2024-12-30 20:14:12,078][00338] Num frames 6200...
[2024-12-30 20:14:12,200][00338] Num frames 6300...
[2024-12-30 20:14:12,318][00338] Num frames 6400...
[2024-12-30 20:14:12,447][00338] Num frames 6500...
[2024-12-30 20:14:12,628][00338] Avg episode rewards: #0: 33.598, true rewards: #0: 13.198
[2024-12-30 20:14:12,630][00338] Avg episode reward: 33.598, avg true_objective: 13.198
[2024-12-30 20:14:12,634][00338] Num frames 6600...
[2024-12-30 20:14:12,754][00338] Num frames 6700...
[2024-12-30 20:14:12,884][00338] Num frames 6800...
[2024-12-30 20:14:13,003][00338] Num frames 6900...
[2024-12-30 20:14:13,126][00338] Num frames 7000...
[2024-12-30 20:14:13,247][00338] Num frames 7100...
[2024-12-30 20:14:13,370][00338] Num frames 7200...
[2024-12-30 20:14:13,497][00338] Num frames 7300...
[2024-12-30 20:14:13,620][00338] Num frames 7400...
[2024-12-30 20:14:13,738][00338] Num frames 7500...
[2024-12-30 20:14:13,908][00338] Avg episode rewards: #0: 31.818, true rewards: #0: 12.652
[2024-12-30 20:14:13,910][00338] Avg episode reward: 31.818, avg true_objective: 12.652
[2024-12-30 20:14:13,924][00338] Num frames 7600...
[2024-12-30 20:14:14,048][00338] Num frames 7700...
[2024-12-30 20:14:14,172][00338] Num frames 7800...
[2024-12-30 20:14:14,293][00338] Num frames 7900...
[2024-12-30 20:14:14,411][00338] Num frames 8000...
[2024-12-30 20:14:14,543][00338] Num frames 8100...
[2024-12-30 20:14:14,716][00338] Avg episode rewards: #0: 28.998, true rewards: #0: 11.713
[2024-12-30 20:14:14,718][00338] Avg episode reward: 28.998, avg true_objective: 11.713
[2024-12-30 20:14:14,724][00338] Num frames 8200...
[2024-12-30 20:14:14,849][00338] Num frames 8300...
[2024-12-30 20:14:14,976][00338] Num frames 8400...
[2024-12-30 20:14:15,100][00338] Num frames 8500...
[2024-12-30 20:14:15,220][00338] Num frames 8600...
[2024-12-30 20:14:15,340][00338] Num frames 8700...
[2024-12-30 20:14:15,472][00338] Num frames 8800...
[2024-12-30 20:14:15,590][00338] Num frames 8900...
[2024-12-30 20:14:15,688][00338] Avg episode rewards: #0: 27.044, true rewards: #0: 11.169
[2024-12-30 20:14:15,690][00338] Avg episode reward: 27.044, avg true_objective: 11.169
[2024-12-30 20:14:15,770][00338] Num frames 9000...
[2024-12-30 20:14:15,901][00338] Num frames 9100...
[2024-12-30 20:14:16,024][00338] Num frames 9200...
[2024-12-30 20:14:16,146][00338] Num frames 9300...
[2024-12-30 20:14:16,286][00338] Num frames 9400...
[2024-12-30 20:14:16,461][00338] Num frames 9500...
[2024-12-30 20:14:16,644][00338] Num frames 9600...
[2024-12-30 20:14:16,811][00338] Num frames 9700...
[2024-12-30 20:14:16,978][00338] Num frames 9800...
[2024-12-30 20:14:17,150][00338] Num frames 9900...
[2024-12-30 20:14:17,236][00338] Avg episode rewards: #0: 26.240, true rewards: #0: 11.018
[2024-12-30 20:14:17,238][00338] Avg episode reward: 26.240, avg true_objective: 11.018
[2024-12-30 20:14:17,374][00338] Num frames 10000...
[2024-12-30 20:14:17,574][00338] Num frames 10100...
[2024-12-30 20:14:17,741][00338] Num frames 10200...
[2024-12-30 20:14:17,924][00338] Num frames 10300...
[2024-12-30 20:14:18,090][00338] Num frames 10400...
[2024-12-30 20:14:18,276][00338] Num frames 10500...
[2024-12-30 20:14:18,459][00338] Num frames 10600...
[2024-12-30 20:14:18,634][00338] Num frames 10700...
[2024-12-30 20:14:18,804][00338] Num frames 10800...
[2024-12-30 20:14:18,979][00338] Num frames 10900...
[2024-12-30 20:14:19,045][00338] Avg episode rewards: #0: 26.108, true rewards: #0: 10.908
[2024-12-30 20:14:19,048][00338] Avg episode reward: 26.108, avg true_objective: 10.908
[2024-12-30 20:15:18,940][00338] Replay video saved to /content/train_dir/default_experiment/replay.mp4!
[2024-12-30 20:15:19,581][00338] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
[2024-12-30 20:15:19,583][00338] Overriding arg 'num_workers' with value 1 passed from command line
[2024-12-30 20:15:19,584][00338] Adding new argument 'no_render'=True that is not in the saved config file!
[2024-12-30 20:15:19,586][00338] Adding new argument 'save_video'=True that is not in the saved config file!
[2024-12-30 20:15:19,587][00338] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
[2024-12-30 20:15:19,589][00338] Adding new argument 'video_name'=None that is not in the saved config file!
[2024-12-30 20:15:19,590][00338] Adding new argument 'max_num_frames'=100000 that is not in the saved config file!
[2024-12-30 20:15:19,591][00338] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
[2024-12-30 20:15:19,592][00338] Adding new argument 'push_to_hub'=True that is not in the saved config file!
[2024-12-30 20:15:19,593][00338] Adding new argument 'hf_repository'='qbbian/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file!
[2024-12-30 20:15:19,595][00338] Adding new argument 'policy_index'=0 that is not in the saved config file!
[2024-12-30 20:15:19,596][00338] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
[2024-12-30 20:15:19,597][00338] Adding new argument 'train_script'=None that is not in the saved config file!
[2024-12-30 20:15:19,597][00338] Adding new argument 'enjoy_script'=None that is not in the saved config file!
[2024-12-30 20:15:19,598][00338] Using frameskip 1 and render_action_repeat=4 for evaluation
[2024-12-30 20:15:19,636][00338] RunningMeanStd input shape: (3, 72, 128)
[2024-12-30 20:15:19,638][00338] RunningMeanStd input shape: (1,)
[2024-12-30 20:15:19,654][00338] ConvEncoder: input_channels=3
[2024-12-30 20:15:19,716][00338] Conv encoder output size: 512
[2024-12-30 20:15:19,720][00338] Policy head output size: 512
[2024-12-30 20:15:19,748][00338] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
[2024-12-30 20:15:20,363][00338] Num frames 100...
[2024-12-30 20:15:20,521][00338] Num frames 200...
[2024-12-30 20:15:20,675][00338] Num frames 300...
[2024-12-30 20:15:20,852][00338] Num frames 400...
[2024-12-30 20:15:21,062][00338] Num frames 500...
[2024-12-30 20:15:21,240][00338] Num frames 600...
[2024-12-30 20:15:21,434][00338] Num frames 700...
[2024-12-30 20:15:21,599][00338] Num frames 800...
[2024-12-30 20:15:21,804][00338] Avg episode rewards: #0: 21.730, true rewards: #0: 8.730
[2024-12-30 20:15:21,806][00338] Avg episode reward: 21.730, avg true_objective: 8.730
[2024-12-30 20:15:21,871][00338] Num frames 900...
[2024-12-30 20:15:22,052][00338] Num frames 1000...
[2024-12-30 20:15:22,212][00338] Num frames 1100...
[2024-12-30 20:15:22,370][00338] Num frames 1200...
[2024-12-30 20:15:22,552][00338] Num frames 1300...
[2024-12-30 20:15:22,706][00338] Num frames 1400...
[2024-12-30 20:15:22,885][00338] Num frames 1500...
[2024-12-30 20:15:23,054][00338] Num frames 1600...
[2024-12-30 20:15:23,216][00338] Num frames 1700...
[2024-12-30 20:15:23,388][00338] Num frames 1800...
[2024-12-30 20:15:23,569][00338] Avg episode rewards: #0: 20.325, true rewards: #0: 9.325
[2024-12-30 20:15:23,571][00338] Avg episode reward: 20.325, avg true_objective: 9.325
[2024-12-30 20:15:23,639][00338] Num frames 1900...
[2024-12-30 20:15:23,838][00338] Num frames 2000...
[2024-12-30 20:15:24,033][00338] Num frames 2100...
[2024-12-30 20:15:24,221][00338] Num frames 2200...
[2024-12-30 20:15:24,427][00338] Num frames 2300...
[2024-12-30 20:15:24,642][00338] Num frames 2400...
[2024-12-30 20:15:24,874][00338] Num frames 2500...
[2024-12-30 20:15:25,093][00338] Num frames 2600...
[2024-12-30 20:15:25,294][00338] Num frames 2700...
[2024-12-30 20:15:25,474][00338] Num frames 2800...
[2024-12-30 20:15:25,683][00338] Num frames 2900...
[2024-12-30 20:15:25,884][00338] Num frames 3000...
[2024-12-30 20:15:26,099][00338] Num frames 3100...
[2024-12-30 20:15:26,191][00338] Avg episode rewards: #0: 23.377, true rewards: #0: 10.377
[2024-12-30 20:15:26,193][00338] Avg episode reward: 23.377, avg true_objective: 10.377
[2024-12-30 20:15:26,351][00338] Num frames 3200...
[2024-12-30 20:15:26,538][00338] Num frames 3300...
[2024-12-30 20:15:26,710][00338] Num frames 3400...
[2024-12-30 20:15:26,895][00338] Num frames 3500...
[2024-12-30 20:15:27,123][00338] Num frames 3600...
[2024-12-30 20:15:27,312][00338] Num frames 3700...
[2024-12-30 20:15:27,483][00338] Num frames 3800...
[2024-12-30 20:15:27,652][00338] Num frames 3900...
[2024-12-30 20:15:27,819][00338] Num frames 4000...
[2024-12-30 20:15:27,959][00338] Num frames 4100...
[2024-12-30 20:15:28,080][00338] Num frames 4200...
[2024-12-30 20:15:28,202][00338] Num frames 4300...
[2024-12-30 20:15:28,329][00338] Num frames 4400...
[2024-12-30 20:15:28,450][00338] Num frames 4500...
[2024-12-30 20:15:28,572][00338] Num frames 4600...
[2024-12-30 20:15:28,691][00338] Num frames 4700...
[2024-12-30 20:15:28,812][00338] Num frames 4800...
[2024-12-30 20:15:28,953][00338] Num frames 4900...
[2024-12-30 20:15:29,078][00338] Num frames 5000...
[2024-12-30 20:15:29,198][00338] Num frames 5100...
[2024-12-30 20:15:29,326][00338] Num frames 5200...
[2024-12-30 20:15:29,400][00338] Avg episode rewards: #0: 31.782, true rewards: #0: 13.032
[2024-12-30 20:15:29,402][00338] Avg episode reward: 31.782, avg true_objective: 13.032
[2024-12-30 20:15:29,508][00338] Num frames 5300...
[2024-12-30 20:15:29,637][00338] Num frames 5400...
[2024-12-30 20:15:29,754][00338] Num frames 5500...
[2024-12-30 20:15:29,879][00338] Num frames 5600...
[2024-12-30 20:15:29,997][00338] Num frames 5700...
[2024-12-30 20:15:30,117][00338] Num frames 5800...
[2024-12-30 20:15:30,239][00338] Num frames 5900...
[2024-12-30 20:15:30,358][00338] Avg episode rewards: #0: 28.096, true rewards: #0: 11.896
[2024-12-30 20:15:30,360][00338] Avg episode reward: 28.096, avg true_objective: 11.896
[2024-12-30 20:15:30,423][00338] Num frames 6000...
[2024-12-30 20:15:30,542][00338] Num frames 6100...
[2024-12-30 20:15:30,662][00338] Num frames 6200...
[2024-12-30 20:15:30,780][00338] Num frames 6300...
[2024-12-30 20:15:30,907][00338] Num frames 6400...
[2024-12-30 20:15:31,026][00338] Num frames 6500...
[2024-12-30 20:15:31,148][00338] Num frames 6600...
[2024-12-30 20:15:31,270][00338] Num frames 6700...
[2024-12-30 20:15:31,397][00338] Num frames 6800...
[2024-12-30 20:15:31,517][00338] Num frames 6900...
[2024-12-30 20:15:31,636][00338] Num frames 7000...
[2024-12-30 20:15:31,732][00338] Avg episode rewards: #0: 27.560, true rewards: #0: 11.727
[2024-12-30 20:15:31,734][00338] Avg episode reward: 27.560, avg true_objective: 11.727
[2024-12-30 20:15:31,810][00338] Num frames 7100...
[2024-12-30 20:15:31,933][00338] Num frames 7200...
[2024-12-30 20:15:32,053][00338] Num frames 7300...
[2024-12-30 20:15:32,171][00338] Num frames 7400...
[2024-12-30 20:15:32,290][00338] Num frames 7500...
[2024-12-30 20:15:32,416][00338] Num frames 7600...
[2024-12-30 20:15:32,535][00338] Num frames 7700...
[2024-12-30 20:15:32,661][00338] Num frames 7800...
[2024-12-30 20:15:32,782][00338] Num frames 7900...
[2024-12-30 20:15:32,910][00338] Num frames 8000...
[2024-12-30 20:15:33,027][00338] Num frames 8100...
[2024-12-30 20:15:33,147][00338] Num frames 8200...
[2024-12-30 20:15:33,269][00338] Num frames 8300...
[2024-12-30 20:15:33,398][00338] Num frames 8400...
[2024-12-30 20:15:33,521][00338] Num frames 8500...
[2024-12-30 20:15:33,641][00338] Num frames 8600...
[2024-12-30 20:15:33,759][00338] Num frames 8700...
[2024-12-30 20:15:33,885][00338] Num frames 8800...
[2024-12-30 20:15:34,009][00338] Num frames 8900...
[2024-12-30 20:15:34,074][00338] Avg episode rewards: #0: 30.581, true rewards: #0: 12.724
[2024-12-30 20:15:34,075][00338] Avg episode reward: 30.581, avg true_objective: 12.724
[2024-12-30 20:15:34,189][00338] Num frames 9000...
[2024-12-30 20:15:34,307][00338] Num frames 9100...
[2024-12-30 20:15:34,437][00338] Num frames 9200...
[2024-12-30 20:15:34,600][00338] Avg episode rewards: #0: 27.614, true rewards: #0: 11.614
[2024-12-30 20:15:34,602][00338] Avg episode reward: 27.614, avg true_objective: 11.614
[2024-12-30 20:15:34,616][00338] Num frames 9300...
[2024-12-30 20:15:34,732][00338] Num frames 9400...
[2024-12-30 20:15:34,854][00338] Num frames 9500...
[2024-12-30 20:15:34,973][00338] Num frames 9600...
[2024-12-30 20:15:35,097][00338] Num frames 9700...
[2024-12-30 20:15:35,218][00338] Num frames 9800...
[2024-12-30 20:15:35,361][00338] Num frames 9900...
[2024-12-30 20:15:35,546][00338] Num frames 10000...
[2024-12-30 20:15:35,712][00338] Num frames 10100...
[2024-12-30 20:15:35,880][00338] Num frames 10200...
[2024-12-30 20:15:35,982][00338] Avg episode rewards: #0: 26.918, true rewards: #0: 11.362
[2024-12-30 20:15:35,987][00338] Avg episode reward: 26.918, avg true_objective: 11.362
[2024-12-30 20:15:36,112][00338] Num frames 10300...
[2024-12-30 20:15:36,273][00338] Num frames 10400...
[2024-12-30 20:15:36,435][00338] Num frames 10500...
[2024-12-30 20:15:36,626][00338] Num frames 10600...
[2024-12-30 20:15:36,795][00338] Num frames 10700...
[2024-12-30 20:15:36,970][00338] Num frames 10800...
[2024-12-30 20:15:37,143][00338] Num frames 10900...
[2024-12-30 20:15:37,313][00338] Num frames 11000...
[2024-12-30 20:15:37,490][00338] Num frames 11100...
[2024-12-30 20:15:37,678][00338] Num frames 11200...
[2024-12-30 20:15:37,803][00338] Num frames 11300...
[2024-12-30 20:15:37,930][00338] Num frames 11400...
[2024-12-30 20:15:38,049][00338] Num frames 11500...
[2024-12-30 20:15:38,170][00338] Num frames 11600...
[2024-12-30 20:15:38,229][00338] Avg episode rewards: #0: 27.202, true rewards: #0: 11.602
[2024-12-30 20:15:38,230][00338] Avg episode reward: 27.202, avg true_objective: 11.602
[2024-12-30 20:16:42,066][00338] Replay video saved to /content/train_dir/default_experiment/replay.mp4!
[2024-12-30 20:49:40,252][00338] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
[2024-12-30 20:49:40,253][00338] Overriding arg 'num_workers' with value 1 passed from command line
[2024-12-30 20:49:40,255][00338] Adding new argument 'no_render'=True that is not in the saved config file!
[2024-12-30 20:49:40,257][00338] Adding new argument 'save_video'=True that is not in the saved config file!
[2024-12-30 20:49:40,258][00338] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
[2024-12-30 20:49:40,260][00338] Adding new argument 'video_name'=None that is not in the saved config file!
[2024-12-30 20:49:40,261][00338] Adding new argument 'max_num_frames'=100000 that is not in the saved config file!
[2024-12-30 20:49:40,262][00338] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
[2024-12-30 20:49:40,263][00338] Adding new argument 'push_to_hub'=True that is not in the saved config file!
[2024-12-30 20:49:40,264][00338] Adding new argument 'hf_repository'='qbbian/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file!
[2024-12-30 20:49:40,265][00338] Adding new argument 'policy_index'=0 that is not in the saved config file!
[2024-12-30 20:49:40,265][00338] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
[2024-12-30 20:49:40,266][00338] Adding new argument 'train_script'=None that is not in the saved config file!
[2024-12-30 20:49:40,267][00338] Adding new argument 'enjoy_script'=None that is not in the saved config file!
[2024-12-30 20:49:40,268][00338] Using frameskip 1 and render_action_repeat=4 for evaluation
[2024-12-30 20:49:40,302][00338] RunningMeanStd input shape: (3, 72, 128)
[2024-12-30 20:49:40,304][00338] RunningMeanStd input shape: (1,)
[2024-12-30 20:49:40,317][00338] ConvEncoder: input_channels=3
[2024-12-30 20:49:40,357][00338] Conv encoder output size: 512
[2024-12-30 20:49:40,358][00338] Policy head output size: 512
[2024-12-30 20:49:40,377][00338] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
[2024-12-30 20:49:40,789][00338] Num frames 100...
[2024-12-30 20:49:40,929][00338] Num frames 200...
[2024-12-30 20:49:41,048][00338] Num frames 300...
[2024-12-30 20:49:41,165][00338] Num frames 400...
[2024-12-30 20:49:41,285][00338] Num frames 500...
[2024-12-30 20:49:41,406][00338] Num frames 600...
[2024-12-30 20:49:41,538][00338] Num frames 700...
[2024-12-30 20:49:41,656][00338] Num frames 800...
[2024-12-30 20:49:41,777][00338] Num frames 900...
[2024-12-30 20:49:41,906][00338] Num frames 1000...
[2024-12-30 20:49:42,031][00338] Num frames 1100...
[2024-12-30 20:49:42,156][00338] Num frames 1200...
[2024-12-30 20:49:42,277][00338] Num frames 1300...
[2024-12-30 20:49:42,400][00338] Num frames 1400...
[2024-12-30 20:49:42,533][00338] Num frames 1500...
[2024-12-30 20:49:42,655][00338] Num frames 1600...
[2024-12-30 20:49:42,771][00338] Avg episode rewards: #0: 41.490, true rewards: #0: 16.490
[2024-12-30 20:49:42,773][00338] Avg episode reward: 41.490, avg true_objective: 16.490
[2024-12-30 20:49:42,837][00338] Num frames 1700...
[2024-12-30 20:49:42,965][00338] Num frames 1800...
[2024-12-30 20:49:43,083][00338] Num frames 1900...
[2024-12-30 20:49:43,204][00338] Num frames 2000...
[2024-12-30 20:49:43,324][00338] Num frames 2100...
[2024-12-30 20:49:43,446][00338] Num frames 2200...
[2024-12-30 20:49:43,575][00338] Num frames 2300...
[2024-12-30 20:49:43,697][00338] Num frames 2400...
[2024-12-30 20:49:43,819][00338] Num frames 2500...
[2024-12-30 20:49:43,946][00338] Num frames 2600...
[2024-12-30 20:49:44,067][00338] Num frames 2700...
[2024-12-30 20:49:44,190][00338] Num frames 2800...
[2024-12-30 20:49:44,312][00338] Num frames 2900...
[2024-12-30 20:49:44,435][00338] Num frames 3000...
[2024-12-30 20:49:44,560][00338] Num frames 3100...
[2024-12-30 20:49:44,686][00338] Num frames 3200...
[2024-12-30 20:49:44,808][00338] Num frames 3300...
[2024-12-30 20:49:44,941][00338] Num frames 3400...
[2024-12-30 20:49:45,065][00338] Num frames 3500...
[2024-12-30 20:49:45,185][00338] Num frames 3600...
[2024-12-30 20:49:45,311][00338] Num frames 3700...
[2024-12-30 20:49:45,429][00338] Avg episode rewards: #0: 48.244, true rewards: #0: 18.745
[2024-12-30 20:49:45,432][00338] Avg episode reward: 48.244, avg true_objective: 18.745
[2024-12-30 20:49:45,520][00338] Num frames 3800...
[2024-12-30 20:49:45,689][00338] Num frames 3900...
[2024-12-30 20:49:45,852][00338] Num frames 4000...
[2024-12-30 20:49:46,018][00338] Num frames 4100...
[2024-12-30 20:49:46,181][00338] Num frames 4200...
[2024-12-30 20:49:46,356][00338] Num frames 4300...
[2024-12-30 20:49:46,526][00338] Num frames 4400...
[2024-12-30 20:49:46,695][00338] Num frames 4500...
[2024-12-30 20:49:46,870][00338] Num frames 4600...
[2024-12-30 20:49:47,042][00338] Num frames 4700...
[2024-12-30 20:49:47,178][00338] Avg episode rewards: #0: 39.470, true rewards: #0: 15.803
[2024-12-30 20:49:47,179][00338] Avg episode reward: 39.470, avg true_objective: 15.803
[2024-12-30 20:49:47,276][00338] Num frames 4800...
[2024-12-30 20:49:47,453][00338] Num frames 4900...
[2024-12-30 20:49:47,628][00338] Num frames 5000...
[2024-12-30 20:49:47,816][00338] Num frames 5100...
[2024-12-30 20:49:47,987][00338] Num frames 5200...
[2024-12-30 20:49:48,121][00338] Num frames 5300...
[2024-12-30 20:49:48,242][00338] Num frames 5400...
[2024-12-30 20:49:48,368][00338] Num frames 5500...
[2024-12-30 20:49:48,495][00338] Num frames 5600...
[2024-12-30 20:49:48,562][00338] Avg episode rewards: #0: 34.262, true rewards: #0: 14.012
[2024-12-30 20:49:48,563][00338] Avg episode reward: 34.262, avg true_objective: 14.012
[2024-12-30 20:49:48,673][00338] Num frames 5700...
[2024-12-30 20:49:48,801][00338] Num frames 5800...
[2024-12-30 20:49:48,932][00338] Num frames 5900...
[2024-12-30 20:49:49,061][00338] Num frames 6000...
[2024-12-30 20:49:49,187][00338] Num frames 6100...
[2024-12-30 20:49:49,311][00338] Num frames 6200...
[2024-12-30 20:49:49,447][00338] Num frames 6300...
[2024-12-30 20:49:49,567][00338] Num frames 6400...
[2024-12-30 20:49:49,693][00338] Num frames 6500...
[2024-12-30 20:49:49,827][00338] Num frames 6600...
[2024-12-30 20:49:49,953][00338] Num frames 6700...
[2024-12-30 20:49:50,079][00338] Num frames 6800...
[2024-12-30 20:49:50,216][00338] Num frames 6900...
[2024-12-30 20:49:50,337][00338] Num frames 7000...
[2024-12-30 20:49:50,460][00338] Num frames 7100...
[2024-12-30 20:49:50,582][00338] Num frames 7200...
[2024-12-30 20:49:50,704][00338] Num frames 7300...
[2024-12-30 20:49:50,834][00338] Num frames 7400...
[2024-12-30 20:49:50,946][00338] Avg episode rewards: #0: 37.682, true rewards: #0: 14.882
[2024-12-30 20:49:50,948][00338] Avg episode reward: 37.682, avg true_objective: 14.882
[2024-12-30 20:49:51,022][00338] Num frames 7500...
[2024-12-30 20:49:51,143][00338] Num frames 7600...
[2024-12-30 20:49:51,266][00338] Num frames 7700...
[2024-12-30 20:49:51,385][00338] Num frames 7800...
[2024-12-30 20:49:51,510][00338] Num frames 7900...
[2024-12-30 20:49:51,630][00338] Num frames 8000...
[2024-12-30 20:49:51,763][00338] Num frames 8100...
[2024-12-30 20:49:51,900][00338] Num frames 8200...
[2024-12-30 20:49:52,027][00338] Num frames 8300...
[2024-12-30 20:49:52,154][00338] Num frames 8400...
[2024-12-30 20:49:52,278][00338] Num frames 8500...
[2024-12-30 20:49:52,371][00338] Avg episode rewards: #0: 35.548, true rewards: #0: 14.215
[2024-12-30 20:49:52,372][00338] Avg episode reward: 35.548, avg true_objective: 14.215
[2024-12-30 20:49:52,461][00338] Num frames 8600...
[2024-12-30 20:49:52,581][00338] Num frames 8700...
[2024-12-30 20:49:52,701][00338] Num frames 8800...
[2024-12-30 20:49:52,835][00338] Num frames 8900...
[2024-12-30 20:49:52,960][00338] Num frames 9000...
[2024-12-30 20:49:53,079][00338] Num frames 9100...
[2024-12-30 20:49:53,206][00338] Num frames 9200...
[2024-12-30 20:49:53,323][00338] Num frames 9300...
[2024-12-30 20:49:53,446][00338] Num frames 9400...
[2024-12-30 20:49:53,522][00338] Avg episode rewards: #0: 34.024, true rewards: #0: 13.453
[2024-12-30 20:49:53,524][00338] Avg episode reward: 34.024, avg true_objective: 13.453
[2024-12-30 20:49:53,622][00338] Num frames 9500...
[2024-12-30 20:49:53,742][00338] Num frames 9600...
[2024-12-30 20:49:53,878][00338] Num frames 9700...
[2024-12-30 20:49:53,995][00338] Num frames 9800...
[2024-12-30 20:49:54,117][00338] Num frames 9900...
[2024-12-30 20:49:54,237][00338] Num frames 10000...
[2024-12-30 20:49:54,361][00338] Num frames 10100...
[2024-12-30 20:49:54,456][00338] Avg episode rewards: #0: 31.536, true rewards: #0: 12.661
[2024-12-30 20:49:54,457][00338] Avg episode reward: 31.536, avg true_objective: 12.661
[2024-12-30 20:49:54,564][00338] Num frames 10200...
[2024-12-30 20:49:54,698][00338] Num frames 10300...
[2024-12-30 20:49:54,817][00338] Num frames 10400...
[2024-12-30 20:49:54,956][00338] Num frames 10500...
[2024-12-30 20:49:55,074][00338] Num frames 10600...
[2024-12-30 20:49:55,194][00338] Num frames 10700...
[2024-12-30 20:49:55,317][00338] Num frames 10800...
[2024-12-30 20:49:55,376][00338] Avg episode rewards: #0: 29.556, true rewards: #0: 12.001
[2024-12-30 20:49:55,378][00338] Avg episode reward: 29.556, avg true_objective: 12.001
[2024-12-30 20:49:55,496][00338] Num frames 10900...
[2024-12-30 20:49:55,617][00338] Num frames 11000...
[2024-12-30 20:49:55,735][00338] Num frames 11100...
[2024-12-30 20:49:55,865][00338] Num frames 11200...
[2024-12-30 20:49:55,998][00338] Num frames 11300...
[2024-12-30 20:49:56,119][00338] Num frames 11400...
[2024-12-30 20:49:56,240][00338] Num frames 11500...
[2024-12-30 20:49:56,362][00338] Num frames 11600...
[2024-12-30 20:49:56,491][00338] Num frames 11700...
[2024-12-30 20:49:56,610][00338] Num frames 11800...
[2024-12-30 20:49:56,729][00338] Num frames 11900...
[2024-12-30 20:49:56,856][00338] Num frames 12000...
[2024-12-30 20:49:56,984][00338] Num frames 12100...
[2024-12-30 20:49:57,106][00338] Num frames 12200...
[2024-12-30 20:49:57,174][00338] Avg episode rewards: #0: 30.109, true rewards: #0: 12.209
[2024-12-30 20:49:57,176][00338] Avg episode reward: 30.109, avg true_objective: 12.209
[2024-12-30 20:51:06,640][00338] Replay video saved to /content/train_dir/default_experiment/replay.mp4!