[2024-12-13 07:56:51,248][62028] Saving configuration to ./train_dir_humamoid/Ant/config.json... [2024-12-13 07:56:51,249][62028] Rollout worker 0 uses device cpu [2024-12-13 07:56:51,250][62028] Rollout worker 1 uses device cpu [2024-12-13 07:56:51,250][62028] Rollout worker 2 uses device cpu [2024-12-13 07:56:51,250][62028] Rollout worker 3 uses device cpu [2024-12-13 07:56:51,250][62028] Rollout worker 4 uses device cpu [2024-12-13 07:56:51,250][62028] Rollout worker 5 uses device cpu [2024-12-13 07:56:51,250][62028] Rollout worker 6 uses device cpu [2024-12-13 07:56:51,251][62028] Rollout worker 7 uses device cpu [2024-12-13 07:56:51,251][62028] In synchronous mode, we only accumulate one batch. Setting num_batches_to_accumulate to 1 [2024-12-13 07:58:28,162][62436] Saving configuration to ./train_dir_humamoid/Ant/config.json... [2024-12-13 07:58:28,163][62436] Rollout worker 0 uses device cpu [2024-12-13 07:58:28,163][62436] Rollout worker 1 uses device cpu [2024-12-13 07:58:28,164][62436] Rollout worker 2 uses device cpu [2024-12-13 07:58:28,164][62436] Rollout worker 3 uses device cpu [2024-12-13 07:58:28,164][62436] Rollout worker 4 uses device cpu [2024-12-13 07:58:28,164][62436] Rollout worker 5 uses device cpu [2024-12-13 07:58:28,164][62436] Rollout worker 6 uses device cpu [2024-12-13 07:58:28,164][62436] Rollout worker 7 uses device cpu [2024-12-13 07:58:28,165][62436] In synchronous mode, we only accumulate one batch. Setting num_batches_to_accumulate to 1 [2024-12-13 07:58:28,219][62436] InferenceWorker_p0-w0: min num requests: 2 [2024-12-13 07:58:28,277][62436] Starting all processes... [2024-12-13 07:58:28,277][62436] Starting process learner_proc0 [2024-12-13 07:58:28,288][62436] Starting all processes... [2024-12-13 07:58:28,299][62436] Starting process inference_proc0-0 [2024-12-13 07:58:28,300][62436] Starting process rollout_proc0 [2024-12-13 07:58:28,300][62436] Starting process rollout_proc1 [2024-12-13 07:58:28,300][62436] Starting process rollout_proc2 [2024-12-13 07:58:28,300][62436] Starting process rollout_proc3 [2024-12-13 07:58:28,300][62436] Starting process rollout_proc4 [2024-12-13 07:58:28,300][62436] Starting process rollout_proc5 [2024-12-13 07:58:28,300][62436] Starting process rollout_proc6 [2024-12-13 07:58:28,300][62436] Starting process rollout_proc7 [2024-12-13 07:58:52,823][62488] Worker 1 uses CPU cores [1] [2024-12-13 07:58:52,938][62436] Heartbeat connected on RolloutWorker_w1 [2024-12-13 07:58:53,039][62473] Starting seed is not provided [2024-12-13 07:58:53,040][62473] Initializing actor-critic model on device cpu [2024-12-13 07:58:53,040][62473] RunningMeanStd input shape: (376,) [2024-12-13 07:58:53,041][62473] RunningMeanStd input shape: (1,) [2024-12-13 07:58:53,054][62436] Heartbeat connected on Batcher_0 [2024-12-13 07:58:53,363][62486] Worker 2 uses CPU cores [0] [2024-12-13 07:58:53,414][62491] Worker 7 uses CPU cores [1] [2024-12-13 07:58:53,500][62436] Heartbeat connected on RolloutWorker_w2 [2024-12-13 07:58:53,565][62494] Worker 4 uses CPU cores [0] [2024-12-13 07:58:53,576][62493] Worker 6 uses CPU cores [0] [2024-12-13 07:58:53,622][62436] Heartbeat connected on RolloutWorker_w7 [2024-12-13 07:58:53,714][62436] Heartbeat connected on RolloutWorker_w4 [2024-12-13 07:58:53,718][62436] Heartbeat connected on RolloutWorker_w6 [2024-12-13 07:58:53,802][62489] Worker 5 uses CPU cores [1] [2024-12-13 07:58:53,852][62436] Heartbeat connected on InferenceWorker_p0-w0 [2024-12-13 07:58:53,857][62473] Created Actor Critic model with architecture: [2024-12-13 07:58:53,863][62473] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): MultiInputEncoder( (encoders): ModuleDict( (obs): MlpEncoder( (mlp_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=Tanh) (2): RecursiveScriptModule(original_name=Linear) (3): RecursiveScriptModule(original_name=Tanh) ) ) ) ) (core): ModelCoreIdentity() (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=64, out_features=1, bias=True) (action_parameterization): ActionParameterizationContinuousNonAdaptiveStddev( (distribution_linear): Linear(in_features=64, out_features=17, bias=True) ) ) [2024-12-13 07:58:53,872][62436] Heartbeat connected on RolloutWorker_w5 [2024-12-13 07:58:53,880][62487] Worker 0 uses CPU cores [0] [2024-12-13 07:58:53,903][62490] Worker 3 uses CPU cores [1] [2024-12-13 07:58:53,927][62436] Heartbeat connected on RolloutWorker_w0 [2024-12-13 07:58:53,934][62436] Heartbeat connected on RolloutWorker_w3 [2024-12-13 07:58:54,661][62473] Using optimizer [2024-12-13 07:58:56,313][62473] No checkpoints found [2024-12-13 07:58:56,313][62473] Did not load from checkpoint, starting from scratch! [2024-12-13 07:58:56,314][62473] Initialized policy 0 weights for model version 0 [2024-12-13 07:58:56,319][62492] RunningMeanStd input shape: (376,) [2024-12-13 07:58:56,323][62473] LearnerWorker_p0 finished initialization! [2024-12-13 07:58:56,326][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000000000_0.pth... [2024-12-13 07:58:56,320][62492] RunningMeanStd input shape: (1,) [2024-12-13 07:58:56,335][62436] Heartbeat connected on LearnerWorker_p0 [2024-12-13 07:58:56,334][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000000000_0.pth... [2024-12-13 07:58:56,519][62436] Inference worker 0-0 is ready! [2024-12-13 07:58:56,519][62436] All inference workers are ready! Signal rollout workers to start! [2024-12-13 07:58:57,770][62491] Decorrelating experience for 0 frames... [2024-12-13 07:58:57,771][62488] Decorrelating experience for 0 frames... [2024-12-13 07:58:57,768][62489] Decorrelating experience for 0 frames... [2024-12-13 07:58:57,769][62490] Decorrelating experience for 0 frames... [2024-12-13 07:58:57,777][62491] Decorrelating experience for 64 frames... [2024-12-13 07:58:57,775][62488] Decorrelating experience for 64 frames... [2024-12-13 07:58:57,774][62489] Decorrelating experience for 64 frames... [2024-12-13 07:58:57,775][62490] Decorrelating experience for 64 frames... [2024-12-13 07:58:58,104][62493] Decorrelating experience for 0 frames... [2024-12-13 07:58:58,103][62494] Decorrelating experience for 0 frames... [2024-12-13 07:58:58,106][62486] Decorrelating experience for 0 frames... [2024-12-13 07:58:58,111][62493] Decorrelating experience for 64 frames... [2024-12-13 07:58:58,112][62494] Decorrelating experience for 64 frames... [2024-12-13 07:58:58,113][62486] Decorrelating experience for 64 frames... [2024-12-13 07:58:58,121][62487] Decorrelating experience for 0 frames... [2024-12-13 07:58:58,126][62487] Decorrelating experience for 64 frames... [2024-12-13 07:58:58,228][62488] Decorrelating experience for 128 frames... [2024-12-13 07:58:58,239][62489] Decorrelating experience for 128 frames... [2024-12-13 07:58:58,253][62490] Decorrelating experience for 128 frames... [2024-12-13 07:58:58,264][62491] Decorrelating experience for 128 frames... [2024-12-13 07:58:58,428][62494] Decorrelating experience for 128 frames... [2024-12-13 07:58:58,456][62493] Decorrelating experience for 128 frames... [2024-12-13 07:58:58,469][62487] Decorrelating experience for 128 frames... [2024-12-13 07:58:58,498][62486] Decorrelating experience for 128 frames... [2024-12-13 07:58:58,865][62488] Decorrelating experience for 192 frames... [2024-12-13 07:58:58,884][62489] Decorrelating experience for 192 frames... [2024-12-13 07:58:58,905][62490] Decorrelating experience for 192 frames... [2024-12-13 07:58:58,904][62491] Decorrelating experience for 192 frames... [2024-12-13 07:58:59,054][62493] Decorrelating experience for 192 frames... [2024-12-13 07:58:59,058][62494] Decorrelating experience for 192 frames... [2024-12-13 07:58:59,076][62436] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-12-13 07:58:59,081][62487] Decorrelating experience for 192 frames... [2024-12-13 07:58:59,099][62486] Decorrelating experience for 192 frames... [2024-12-13 07:58:59,944][62488] Decorrelating experience for 256 frames... [2024-12-13 07:58:59,996][62489] Decorrelating experience for 256 frames... [2024-12-13 07:59:00,030][62490] Decorrelating experience for 256 frames... [2024-12-13 07:59:00,088][62491] Decorrelating experience for 256 frames... [2024-12-13 07:59:00,167][62493] Decorrelating experience for 256 frames... [2024-12-13 07:59:00,203][62494] Decorrelating experience for 256 frames... [2024-12-13 07:59:00,249][62486] Decorrelating experience for 256 frames... [2024-12-13 07:59:00,274][62487] Decorrelating experience for 256 frames... [2024-12-13 07:59:01,191][62488] Decorrelating experience for 320 frames... [2024-12-13 07:59:01,274][62489] Decorrelating experience for 320 frames... [2024-12-13 07:59:01,304][62490] Decorrelating experience for 320 frames... [2024-12-13 07:59:01,356][62491] Decorrelating experience for 320 frames... [2024-12-13 07:59:01,379][62493] Decorrelating experience for 320 frames... [2024-12-13 07:59:01,432][62494] Decorrelating experience for 320 frames... [2024-12-13 07:59:01,516][62486] Decorrelating experience for 320 frames... [2024-12-13 07:59:01,519][62487] Decorrelating experience for 320 frames... [2024-12-13 07:59:02,736][62488] Decorrelating experience for 384 frames... [2024-12-13 07:59:02,831][62489] Decorrelating experience for 384 frames... [2024-12-13 07:59:02,883][62493] Decorrelating experience for 384 frames... [2024-12-13 07:59:02,946][62490] Decorrelating experience for 384 frames... [2024-12-13 07:59:02,949][62494] Decorrelating experience for 384 frames... [2024-12-13 07:59:02,967][62491] Decorrelating experience for 384 frames... [2024-12-13 07:59:03,029][62486] Decorrelating experience for 384 frames... [2024-12-13 07:59:03,083][62487] Decorrelating experience for 384 frames... [2024-12-13 07:59:04,076][62436] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-12-13 07:59:04,607][62488] Decorrelating experience for 448 frames... [2024-12-13 07:59:04,673][62489] Decorrelating experience for 448 frames... [2024-12-13 07:59:04,687][62493] Decorrelating experience for 448 frames... [2024-12-13 07:59:04,799][62490] Decorrelating experience for 448 frames... [2024-12-13 07:59:04,820][62494] Decorrelating experience for 448 frames... [2024-12-13 07:59:04,877][62487] Decorrelating experience for 448 frames... [2024-12-13 07:59:04,887][62486] Decorrelating experience for 448 frames... [2024-12-13 07:59:04,898][62491] Decorrelating experience for 448 frames... [2024-12-13 07:59:09,076][62436] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 147.6. Samples: 1476. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-12-13 07:59:09,076][62436] Avg episode reward: [(0, '48.698')] [2024-12-13 07:59:09,079][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000000000_0.pth... [2024-12-13 07:59:14,076][62436] Fps is (10 sec: 409.6, 60 sec: 273.1, 300 sec: 273.1). Total num frames: 4096. Throughput: 0: 241.6. Samples: 3624. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 07:59:14,076][62436] Avg episode reward: [(0, '78.431')] [2024-12-13 07:59:19,076][62436] Fps is (10 sec: 819.2, 60 sec: 409.6, 300 sec: 409.6). Total num frames: 8192. Throughput: 0: 440.0. Samples: 8800. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 07:59:19,076][62436] Avg episode reward: [(0, '80.129')] [2024-12-13 07:59:24,076][62436] Fps is (10 sec: 819.2, 60 sec: 491.5, 300 sec: 491.5). Total num frames: 12288. Throughput: 0: 578.9. Samples: 14472. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 07:59:24,076][62436] Avg episode reward: [(0, '84.499')] [2024-12-13 07:59:24,083][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000000024_12288.pth... [2024-12-13 07:59:29,078][62436] Fps is (10 sec: 819.0, 60 sec: 546.1, 300 sec: 546.1). Total num frames: 16384. Throughput: 0: 548.4. Samples: 16452. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 07:59:29,078][62436] Avg episode reward: [(0, '98.961')] [2024-12-13 07:59:34,076][62436] Fps is (10 sec: 819.2, 60 sec: 585.1, 300 sec: 585.1). Total num frames: 20480. Throughput: 0: 617.5. Samples: 21612. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-12-13 07:59:34,076][62436] Avg episode reward: [(0, '117.412')] [2024-12-13 07:59:39,076][62436] Fps is (10 sec: 819.4, 60 sec: 614.4, 300 sec: 614.4). Total num frames: 24576. Throughput: 0: 693.6. Samples: 27744. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 07:59:39,076][62436] Avg episode reward: [(0, '134.550')] [2024-12-13 07:59:39,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000000048_24576.pth... [2024-12-13 07:59:39,091][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000000000_0.pth [2024-12-13 07:59:44,079][62436] Fps is (10 sec: 818.9, 60 sec: 637.1, 300 sec: 637.1). Total num frames: 28672. Throughput: 0: 667.2. Samples: 30024. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2024-12-13 07:59:44,079][62436] Avg episode reward: [(0, '149.286')] [2024-12-13 07:59:49,076][62436] Fps is (10 sec: 819.2, 60 sec: 655.4, 300 sec: 655.4). Total num frames: 32768. Throughput: 0: 772.4. Samples: 34760. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-12-13 07:59:49,076][62436] Avg episode reward: [(0, '166.461')] [2024-12-13 07:59:53,995][62492] Updated weights for policy 0, policy_version 80 (0.0016) [2024-12-13 07:59:54,076][62436] Fps is (10 sec: 1229.2, 60 sec: 744.7, 300 sec: 744.7). Total num frames: 40960. Throughput: 0: 872.0. Samples: 40716. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-12-13 07:59:54,078][62436] Avg episode reward: [(0, '183.448')] [2024-12-13 07:59:54,085][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000000080_40960.pth... [2024-12-13 07:59:54,091][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000000024_12288.pth [2024-12-13 07:59:59,075][62436] Fps is (10 sec: 819.2, 60 sec: 682.7, 300 sec: 682.7). Total num frames: 40960. Throughput: 0: 880.2. Samples: 43232. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-12-13 07:59:59,076][62436] Avg episode reward: [(0, '200.092')] [2024-12-13 08:00:04,076][62436] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 693.2). Total num frames: 45056. Throughput: 0: 861.0. Samples: 47544. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 08:00:04,076][62436] Avg episode reward: [(0, '200.740')] [2024-12-13 08:00:09,079][62436] Fps is (10 sec: 1228.3, 60 sec: 887.4, 300 sec: 760.6). Total num frames: 53248. Throughput: 0: 861.6. Samples: 53248. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:00:09,081][62436] Avg episode reward: [(0, '211.352')] [2024-12-13 08:00:09,087][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000000104_53248.pth... [2024-12-13 08:00:09,093][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000000048_24576.pth [2024-12-13 08:00:14,077][62436] Fps is (10 sec: 1228.6, 60 sec: 887.4, 300 sec: 764.6). Total num frames: 57344. Throughput: 0: 879.0. Samples: 56008. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:00:14,077][62436] Avg episode reward: [(0, '222.654')] [2024-12-13 08:00:19,076][62436] Fps is (10 sec: 409.7, 60 sec: 819.2, 300 sec: 716.8). Total num frames: 57344. Throughput: 0: 854.7. Samples: 60072. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:00:19,077][62436] Avg episode reward: [(0, '227.584')] [2024-12-13 08:00:24,076][62436] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 771.0). Total num frames: 65536. Throughput: 0: 844.7. Samples: 65756. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 08:00:24,076][62436] Avg episode reward: [(0, '233.098')] [2024-12-13 08:00:24,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000000128_65536.pth... [2024-12-13 08:00:24,087][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000000080_40960.pth [2024-12-13 08:00:29,076][62436] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 773.7). Total num frames: 69632. Throughput: 0: 859.8. Samples: 68712. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 08:00:29,077][62436] Avg episode reward: [(0, '244.154')] [2024-12-13 08:00:34,076][62436] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 776.1). Total num frames: 73728. Throughput: 0: 840.9. Samples: 72600. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-12-13 08:00:34,076][62436] Avg episode reward: [(0, '253.430')] [2024-12-13 08:00:39,083][62436] Fps is (10 sec: 818.6, 60 sec: 887.4, 300 sec: 778.2). Total num frames: 77824. Throughput: 0: 829.9. Samples: 78068. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-12-13 08:00:39,084][62436] Avg episode reward: [(0, '261.398')] [2024-12-13 08:00:39,089][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000000152_77824.pth... [2024-12-13 08:00:39,106][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000000104_53248.pth [2024-12-13 08:00:44,077][62436] Fps is (10 sec: 409.5, 60 sec: 819.2, 300 sec: 741.2). Total num frames: 77824. Throughput: 0: 818.5. Samples: 80064. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-12-13 08:00:44,078][62436] Avg episode reward: [(0, '263.794')] [2024-12-13 08:00:45,083][62492] Updated weights for policy 0, policy_version 160 (0.0013) [2024-12-13 08:00:49,076][62436] Fps is (10 sec: 409.9, 60 sec: 819.2, 300 sec: 744.7). Total num frames: 81920. Throughput: 0: 801.9. Samples: 83628. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:00:49,076][62436] Avg episode reward: [(0, '262.755')] [2024-12-13 08:00:54,076][62436] Fps is (10 sec: 819.3, 60 sec: 750.9, 300 sec: 748.0). Total num frames: 86016. Throughput: 0: 800.7. Samples: 89276. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 08:00:54,076][62436] Avg episode reward: [(0, '265.178')] [2024-12-13 08:00:54,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000000168_86016.pth... [2024-12-13 08:00:54,091][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000000128_65536.pth [2024-12-13 08:00:59,076][62436] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 785.1). Total num frames: 94208. Throughput: 0: 802.9. Samples: 92136. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:00:59,076][62436] Avg episode reward: [(0, '264.041')] [2024-12-13 08:01:04,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 753.7). Total num frames: 94208. Throughput: 0: 815.3. Samples: 96760. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:01:04,076][62436] Avg episode reward: [(0, '256.522')] [2024-12-13 08:01:09,076][62436] Fps is (10 sec: 409.6, 60 sec: 751.0, 300 sec: 756.2). Total num frames: 98304. Throughput: 0: 804.5. Samples: 101960. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 08:01:09,077][62436] Avg episode reward: [(0, '265.559')] [2024-12-13 08:01:09,190][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000000200_102400.pth... [2024-12-13 08:01:09,196][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000000152_77824.pth [2024-12-13 08:01:14,076][62436] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 788.9). Total num frames: 106496. Throughput: 0: 802.4. Samples: 104820. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-12-13 08:01:14,076][62436] Avg episode reward: [(0, '281.797')] [2024-12-13 08:01:14,080][62473] Saving new best policy, reward=281.797! [2024-12-13 08:01:19,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 760.7). Total num frames: 106496. Throughput: 0: 826.4. Samples: 109788. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-12-13 08:01:19,076][62436] Avg episode reward: [(0, '286.552')] [2024-12-13 08:01:19,153][62473] Saving new best policy, reward=286.552! [2024-12-13 08:01:24,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.0). Total num frames: 114688. Throughput: 0: 811.3. Samples: 114572. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:01:24,076][62436] Avg episode reward: [(0, '282.329')] [2024-12-13 08:01:24,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000000224_114688.pth... [2024-12-13 08:01:24,089][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000000168_86016.pth [2024-12-13 08:01:29,076][62436] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 791.9). Total num frames: 118784. Throughput: 0: 830.5. Samples: 117436. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:01:29,076][62436] Avg episode reward: [(0, '284.962')] [2024-12-13 08:01:33,309][62492] Updated weights for policy 0, policy_version 240 (0.0011) [2024-12-13 08:01:34,077][62436] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 792.8). Total num frames: 122880. Throughput: 0: 866.5. Samples: 122624. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 08:01:34,080][62436] Avg episode reward: [(0, '286.965')] [2024-12-13 08:01:34,081][62473] Saving new best policy, reward=286.965! [2024-12-13 08:01:39,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.3, 300 sec: 793.6). Total num frames: 126976. Throughput: 0: 839.6. Samples: 127056. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:01:39,076][62436] Avg episode reward: [(0, '297.008')] [2024-12-13 08:01:39,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000000248_126976.pth... [2024-12-13 08:01:39,087][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000000200_102400.pth [2024-12-13 08:01:39,093][62473] Saving new best policy, reward=297.008! [2024-12-13 08:01:44,076][62436] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 794.4). Total num frames: 131072. Throughput: 0: 842.6. Samples: 130052. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:01:44,076][62436] Avg episode reward: [(0, '297.550')] [2024-12-13 08:01:44,078][62473] Saving new best policy, reward=297.550! [2024-12-13 08:01:49,076][62436] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 795.1). Total num frames: 135168. Throughput: 0: 854.4. Samples: 135208. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:01:49,076][62436] Avg episode reward: [(0, '294.700')] [2024-12-13 08:01:54,076][62436] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 795.8). Total num frames: 139264. Throughput: 0: 836.3. Samples: 139592. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:01:54,076][62436] Avg episode reward: [(0, '289.768')] [2024-12-13 08:01:54,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000000272_139264.pth... [2024-12-13 08:01:54,087][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000000224_114688.pth [2024-12-13 08:01:59,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 796.4). Total num frames: 143360. Throughput: 0: 840.6. Samples: 142648. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:01:59,076][62436] Avg episode reward: [(0, '293.767')] [2024-12-13 08:02:04,076][62436] Fps is (10 sec: 819.1, 60 sec: 887.5, 300 sec: 797.1). Total num frames: 147456. Throughput: 0: 848.2. Samples: 147956. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2024-12-13 08:02:04,077][62436] Avg episode reward: [(0, '307.697')] [2024-12-13 08:02:04,078][62473] Saving new best policy, reward=307.697! [2024-12-13 08:02:09,076][62436] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 797.6). Total num frames: 151552. Throughput: 0: 833.8. Samples: 152092. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:02:09,076][62436] Avg episode reward: [(0, '305.859')] [2024-12-13 08:02:09,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000000296_151552.pth... [2024-12-13 08:02:09,092][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000000248_126976.pth [2024-12-13 08:02:14,076][62436] Fps is (10 sec: 819.3, 60 sec: 819.2, 300 sec: 798.2). Total num frames: 155648. Throughput: 0: 837.1. Samples: 155104. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 08:02:14,076][62436] Avg episode reward: [(0, '302.795')] [2024-12-13 08:02:19,079][62436] Fps is (10 sec: 818.9, 60 sec: 887.4, 300 sec: 798.7). Total num frames: 159744. Throughput: 0: 848.6. Samples: 160812. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 08:02:19,080][62436] Avg episode reward: [(0, '299.367')] [2024-12-13 08:02:22,520][62492] Updated weights for policy 0, policy_version 320 (0.0018) [2024-12-13 08:02:24,078][62436] Fps is (10 sec: 819.0, 60 sec: 819.2, 300 sec: 799.2). Total num frames: 163840. Throughput: 0: 836.8. Samples: 164712. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:02:24,079][62436] Avg episode reward: [(0, '308.764')] [2024-12-13 08:02:24,084][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000000320_163840.pth... [2024-12-13 08:02:24,090][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000000272_139264.pth [2024-12-13 08:02:24,091][62473] Saving new best policy, reward=308.764! [2024-12-13 08:02:29,076][62436] Fps is (10 sec: 819.5, 60 sec: 819.2, 300 sec: 799.7). Total num frames: 167936. Throughput: 0: 837.0. Samples: 167716. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-12-13 08:02:29,076][62436] Avg episode reward: [(0, '314.584')] [2024-12-13 08:02:29,077][62473] Saving new best policy, reward=314.584! [2024-12-13 08:02:34,076][62436] Fps is (10 sec: 819.4, 60 sec: 819.2, 300 sec: 800.1). Total num frames: 172032. Throughput: 0: 848.2. Samples: 173376. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:02:34,076][62436] Avg episode reward: [(0, '327.782')] [2024-12-13 08:02:34,077][62473] Saving new best policy, reward=327.782! [2024-12-13 08:02:39,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 800.6). Total num frames: 176128. Throughput: 0: 839.0. Samples: 177348. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-12-13 08:02:39,076][62436] Avg episode reward: [(0, '324.088')] [2024-12-13 08:02:39,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000000344_176128.pth... [2024-12-13 08:02:39,088][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000000296_151552.pth [2024-12-13 08:02:44,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 801.0). Total num frames: 180224. Throughput: 0: 834.9. Samples: 180220. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 08:02:44,076][62436] Avg episode reward: [(0, '320.480')] [2024-12-13 08:02:49,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 801.4). Total num frames: 184320. Throughput: 0: 844.1. Samples: 185940. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 08:02:49,076][62436] Avg episode reward: [(0, '322.534')] [2024-12-13 08:02:54,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 801.8). Total num frames: 188416. Throughput: 0: 846.5. Samples: 190184. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:02:54,076][62436] Avg episode reward: [(0, '321.835')] [2024-12-13 08:02:54,084][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000000368_188416.pth... [2024-12-13 08:02:54,096][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000000320_163840.pth [2024-12-13 08:02:59,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 802.1). Total num frames: 192512. Throughput: 0: 836.8. Samples: 192760. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:02:59,076][62436] Avg episode reward: [(0, '320.724')] [2024-12-13 08:03:04,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 802.5). Total num frames: 196608. Throughput: 0: 841.4. Samples: 198672. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:03:04,076][62436] Avg episode reward: [(0, '328.519')] [2024-12-13 08:03:04,078][62473] Saving new best policy, reward=328.519! [2024-12-13 08:03:09,076][62436] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 802.8). Total num frames: 200704. Throughput: 0: 856.1. Samples: 203236. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:03:09,077][62436] Avg episode reward: [(0, '336.565')] [2024-12-13 08:03:09,083][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000000392_200704.pth... [2024-12-13 08:03:09,095][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000000344_176128.pth [2024-12-13 08:03:09,098][62473] Saving new best policy, reward=336.565! [2024-12-13 08:03:11,016][62492] Updated weights for policy 0, policy_version 400 (0.0013) [2024-12-13 08:03:14,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 803.1). Total num frames: 204800. Throughput: 0: 836.9. Samples: 205376. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 08:03:14,076][62436] Avg episode reward: [(0, '338.801')] [2024-12-13 08:03:14,077][62473] Saving new best policy, reward=338.801! [2024-12-13 08:03:19,076][62436] Fps is (10 sec: 819.3, 60 sec: 819.2, 300 sec: 803.4). Total num frames: 208896. Throughput: 0: 840.4. Samples: 211192. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:03:19,076][62436] Avg episode reward: [(0, '336.978')] [2024-12-13 08:03:24,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 803.7). Total num frames: 212992. Throughput: 0: 859.7. Samples: 216036. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:03:24,076][62436] Avg episode reward: [(0, '347.149')] [2024-12-13 08:03:24,084][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000000416_212992.pth... [2024-12-13 08:03:24,096][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000000368_188416.pth [2024-12-13 08:03:24,097][62473] Saving new best policy, reward=347.149! [2024-12-13 08:03:29,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 804.0). Total num frames: 217088. Throughput: 0: 838.2. Samples: 217940. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:03:29,076][62436] Avg episode reward: [(0, '362.356')] [2024-12-13 08:03:29,077][62473] Saving new best policy, reward=362.356! [2024-12-13 08:03:34,076][62436] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 819.2). Total num frames: 225280. Throughput: 0: 845.2. Samples: 223976. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:03:34,077][62436] Avg episode reward: [(0, '359.266')] [2024-12-13 08:03:39,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 804.6). Total num frames: 225280. Throughput: 0: 866.8. Samples: 229188. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:03:39,076][62436] Avg episode reward: [(0, '350.379')] [2024-12-13 08:03:39,187][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000000448_229376.pth... [2024-12-13 08:03:39,194][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000000392_200704.pth [2024-12-13 08:03:44,076][62436] Fps is (10 sec: 409.6, 60 sec: 819.2, 300 sec: 804.8). Total num frames: 229376. Throughput: 0: 849.4. Samples: 230984. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:03:44,076][62436] Avg episode reward: [(0, '350.810')] [2024-12-13 08:03:49,076][62436] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 819.2). Total num frames: 237568. Throughput: 0: 844.4. Samples: 236672. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 08:03:49,076][62436] Avg episode reward: [(0, '359.979')] [2024-12-13 08:03:54,076][62436] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 819.2). Total num frames: 241664. Throughput: 0: 862.8. Samples: 242060. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:03:54,076][62436] Avg episode reward: [(0, '368.332')] [2024-12-13 08:03:54,083][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000000472_241664.pth... [2024-12-13 08:03:54,096][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000000416_212992.pth [2024-12-13 08:03:54,097][62473] Saving new best policy, reward=368.332! [2024-12-13 08:03:58,968][62492] Updated weights for policy 0, policy_version 480 (0.0021) [2024-12-13 08:03:59,076][62436] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 833.1). Total num frames: 245760. Throughput: 0: 859.1. Samples: 244036. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:03:59,076][62436] Avg episode reward: [(0, '360.075')] [2024-12-13 08:04:04,076][62436] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 249856. Throughput: 0: 848.7. Samples: 249384. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:04:04,076][62436] Avg episode reward: [(0, '343.114')] [2024-12-13 08:04:09,076][62436] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 253952. Throughput: 0: 864.3. Samples: 254928. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:04:09,081][62436] Avg episode reward: [(0, '339.270')] [2024-12-13 08:04:09,087][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000000496_253952.pth... [2024-12-13 08:04:09,119][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000000448_229376.pth [2024-12-13 08:04:14,076][62436] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 258048. Throughput: 0: 865.2. Samples: 256872. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:04:14,076][62436] Avg episode reward: [(0, '351.564')] [2024-12-13 08:04:19,076][62436] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 262144. Throughput: 0: 843.1. Samples: 261916. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:04:19,076][62436] Avg episode reward: [(0, '361.417')] [2024-12-13 08:04:24,076][62436] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 266240. Throughput: 0: 855.7. Samples: 267696. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:04:24,076][62436] Avg episode reward: [(0, '363.622')] [2024-12-13 08:04:24,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000000520_266240.pth... [2024-12-13 08:04:24,088][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000000472_241664.pth [2024-12-13 08:04:29,078][62436] Fps is (10 sec: 819.0, 60 sec: 887.4, 300 sec: 847.0). Total num frames: 270336. Throughput: 0: 862.3. Samples: 269788. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 08:04:29,078][62436] Avg episode reward: [(0, '360.414')] [2024-12-13 08:04:34,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 274432. Throughput: 0: 839.7. Samples: 274460. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:04:34,076][62436] Avg episode reward: [(0, '353.342')] [2024-12-13 08:04:39,076][62436] Fps is (10 sec: 819.4, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 278528. Throughput: 0: 850.3. Samples: 280324. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:04:39,076][62436] Avg episode reward: [(0, '372.455')] [2024-12-13 08:04:39,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000000544_278528.pth... [2024-12-13 08:04:39,087][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000000496_253952.pth [2024-12-13 08:04:39,089][62473] Saving new best policy, reward=372.455! [2024-12-13 08:04:44,076][62436] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 282624. Throughput: 0: 857.2. Samples: 282612. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 08:04:44,076][62436] Avg episode reward: [(0, '382.469')] [2024-12-13 08:04:44,078][62473] Saving new best policy, reward=382.469! [2024-12-13 08:04:47,298][62492] Updated weights for policy 0, policy_version 560 (0.0014) [2024-12-13 08:04:49,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 286720. Throughput: 0: 835.9. Samples: 287000. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 08:04:49,076][62436] Avg episode reward: [(0, '380.975')] [2024-12-13 08:04:54,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 290816. Throughput: 0: 836.6. Samples: 292576. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-12-13 08:04:54,076][62436] Avg episode reward: [(0, '385.852')] [2024-12-13 08:04:54,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000000568_290816.pth... [2024-12-13 08:04:54,091][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000000520_266240.pth [2024-12-13 08:04:54,092][62473] Saving new best policy, reward=385.852! [2024-12-13 08:04:59,077][62436] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 294912. Throughput: 0: 847.6. Samples: 295016. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 08:04:59,078][62436] Avg episode reward: [(0, '395.136')] [2024-12-13 08:04:59,079][62473] Saving new best policy, reward=395.136! [2024-12-13 08:05:04,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 299008. Throughput: 0: 833.7. Samples: 299432. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:05:04,076][62436] Avg episode reward: [(0, '378.904')] [2024-12-13 08:05:09,076][62436] Fps is (10 sec: 819.3, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 303104. Throughput: 0: 832.7. Samples: 305168. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2024-12-13 08:05:09,076][62436] Avg episode reward: [(0, '371.422')] [2024-12-13 08:05:09,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000000592_303104.pth... [2024-12-13 08:05:09,087][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000000544_278528.pth [2024-12-13 08:05:14,076][62436] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 307200. Throughput: 0: 828.6. Samples: 307072. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:05:14,077][62436] Avg episode reward: [(0, '374.058')] [2024-12-13 08:05:19,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 311296. Throughput: 0: 797.8. Samples: 310360. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:05:19,076][62436] Avg episode reward: [(0, '369.726')] [2024-12-13 08:05:24,076][62436] Fps is (10 sec: 819.3, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 315392. Throughput: 0: 788.5. Samples: 315808. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 08:05:24,076][62436] Avg episode reward: [(0, '381.848')] [2024-12-13 08:05:24,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000000616_315392.pth... [2024-12-13 08:05:24,089][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000000568_290816.pth [2024-12-13 08:05:29,078][62436] Fps is (10 sec: 819.0, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 319488. Throughput: 0: 805.7. Samples: 318872. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 08:05:29,079][62436] Avg episode reward: [(0, '378.689')] [2024-12-13 08:05:34,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 323584. Throughput: 0: 808.6. Samples: 323388. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:05:34,076][62436] Avg episode reward: [(0, '375.528')] [2024-12-13 08:05:37,803][62492] Updated weights for policy 0, policy_version 640 (0.0012) [2024-12-13 08:05:39,077][62436] Fps is (10 sec: 819.3, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 327680. Throughput: 0: 796.3. Samples: 328412. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:05:39,078][62436] Avg episode reward: [(0, '389.781')] [2024-12-13 08:05:39,083][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000000640_327680.pth... [2024-12-13 08:05:39,094][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000000592_303104.pth [2024-12-13 08:05:44,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 331776. Throughput: 0: 808.2. Samples: 331384. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:05:44,076][62436] Avg episode reward: [(0, '402.014')] [2024-12-13 08:05:44,077][62473] Saving new best policy, reward=402.014! [2024-12-13 08:05:49,075][62436] Fps is (10 sec: 819.3, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 335872. Throughput: 0: 814.2. Samples: 336072. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 08:05:49,076][62436] Avg episode reward: [(0, '395.741')] [2024-12-13 08:05:54,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 339968. Throughput: 0: 798.0. Samples: 341076. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 08:05:54,076][62436] Avg episode reward: [(0, '370.661')] [2024-12-13 08:05:54,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000000664_339968.pth... [2024-12-13 08:05:54,088][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000000616_315392.pth [2024-12-13 08:05:59,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 344064. Throughput: 0: 822.1. Samples: 344064. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:05:59,076][62436] Avg episode reward: [(0, '380.247')] [2024-12-13 08:06:04,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 348160. Throughput: 0: 860.9. Samples: 349100. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 08:06:04,077][62436] Avg episode reward: [(0, '395.859')] [2024-12-13 08:06:09,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 352256. Throughput: 0: 841.8. Samples: 353688. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:06:09,076][62436] Avg episode reward: [(0, '407.438')] [2024-12-13 08:06:09,085][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000000688_352256.pth... [2024-12-13 08:06:09,094][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000000640_327680.pth [2024-12-13 08:06:09,095][62473] Saving new best policy, reward=407.438! [2024-12-13 08:06:14,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 356352. Throughput: 0: 837.2. Samples: 356544. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:06:14,076][62436] Avg episode reward: [(0, '394.502')] [2024-12-13 08:06:19,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 360448. Throughput: 0: 856.5. Samples: 361932. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:06:19,077][62436] Avg episode reward: [(0, '389.319')] [2024-12-13 08:06:24,077][62436] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 364544. Throughput: 0: 843.0. Samples: 366348. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 08:06:24,077][62436] Avg episode reward: [(0, '390.282')] [2024-12-13 08:06:24,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000000712_364544.pth... [2024-12-13 08:06:24,090][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000000664_339968.pth [2024-12-13 08:06:26,057][62492] Updated weights for policy 0, policy_version 720 (0.0012) [2024-12-13 08:06:29,076][62436] Fps is (10 sec: 819.3, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 368640. Throughput: 0: 839.6. Samples: 369164. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:06:29,076][62436] Avg episode reward: [(0, '409.273')] [2024-12-13 08:06:29,077][62473] Saving new best policy, reward=409.273! [2024-12-13 08:06:34,076][62436] Fps is (10 sec: 819.3, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 372736. Throughput: 0: 862.6. Samples: 374888. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:06:34,081][62436] Avg episode reward: [(0, '397.900')] [2024-12-13 08:06:39,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 376832. Throughput: 0: 843.0. Samples: 379012. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:06:39,078][62436] Avg episode reward: [(0, '397.700')] [2024-12-13 08:06:39,083][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000000736_376832.pth... [2024-12-13 08:06:39,092][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000000688_352256.pth [2024-12-13 08:06:44,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 380928. Throughput: 0: 841.0. Samples: 381908. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 08:06:44,076][62436] Avg episode reward: [(0, '397.219')] [2024-12-13 08:06:49,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 385024. Throughput: 0: 860.8. Samples: 387836. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:06:49,076][62436] Avg episode reward: [(0, '413.256')] [2024-12-13 08:06:49,077][62473] Saving new best policy, reward=413.256! [2024-12-13 08:06:54,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 389120. Throughput: 0: 846.9. Samples: 391800. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2024-12-13 08:06:54,076][62436] Avg episode reward: [(0, '418.787')] [2024-12-13 08:06:54,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000000760_389120.pth... [2024-12-13 08:06:54,087][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000000712_364544.pth [2024-12-13 08:06:54,088][62473] Saving new best policy, reward=418.787! [2024-12-13 08:06:59,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 393216. Throughput: 0: 845.9. Samples: 394608. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:06:59,076][62436] Avg episode reward: [(0, '422.439')] [2024-12-13 08:06:59,077][62473] Saving new best policy, reward=422.439! [2024-12-13 08:07:04,076][62436] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 401408. Throughput: 0: 857.3. Samples: 400508. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:07:04,076][62436] Avg episode reward: [(0, '417.268')] [2024-12-13 08:07:09,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 401408. Throughput: 0: 853.7. Samples: 404764. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:07:09,077][62436] Avg episode reward: [(0, '402.788')] [2024-12-13 08:07:09,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000000784_401408.pth... [2024-12-13 08:07:09,093][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000000736_376832.pth [2024-12-13 08:07:14,067][62492] Updated weights for policy 0, policy_version 800 (0.0016) [2024-12-13 08:07:14,075][62436] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 409600. Throughput: 0: 846.0. Samples: 407232. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 08:07:14,076][62436] Avg episode reward: [(0, '408.492')] [2024-12-13 08:07:19,080][62436] Fps is (10 sec: 1228.2, 60 sec: 887.4, 300 sec: 847.0). Total num frames: 413696. Throughput: 0: 851.3. Samples: 413200. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 08:07:19,081][62436] Avg episode reward: [(0, '418.810')] [2024-12-13 08:07:24,076][62436] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 417792. Throughput: 0: 858.8. Samples: 417660. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-12-13 08:07:24,076][62436] Avg episode reward: [(0, '420.535')] [2024-12-13 08:07:24,083][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000000816_417792.pth... [2024-12-13 08:07:24,099][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000000760_389120.pth [2024-12-13 08:07:29,076][62436] Fps is (10 sec: 819.6, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 421888. Throughput: 0: 845.6. Samples: 419960. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2024-12-13 08:07:29,076][62436] Avg episode reward: [(0, '422.795')] [2024-12-13 08:07:29,077][62473] Saving new best policy, reward=422.795! [2024-12-13 08:07:34,076][62436] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 425984. Throughput: 0: 844.3. Samples: 425828. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:07:34,076][62436] Avg episode reward: [(0, '444.420')] [2024-12-13 08:07:34,077][62473] Saving new best policy, reward=444.420! [2024-12-13 08:07:39,076][62436] Fps is (10 sec: 819.1, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 430080. Throughput: 0: 859.1. Samples: 430460. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 08:07:39,077][62436] Avg episode reward: [(0, '446.308')] [2024-12-13 08:07:39,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000000840_430080.pth... [2024-12-13 08:07:39,091][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000000784_401408.pth [2024-12-13 08:07:39,092][62473] Saving new best policy, reward=446.308! [2024-12-13 08:07:44,076][62436] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 434176. Throughput: 0: 847.1. Samples: 432728. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:07:44,076][62436] Avg episode reward: [(0, '451.094')] [2024-12-13 08:07:44,077][62473] Saving new best policy, reward=451.094! [2024-12-13 08:07:49,077][62436] Fps is (10 sec: 819.1, 60 sec: 887.4, 300 sec: 847.0). Total num frames: 438272. Throughput: 0: 841.9. Samples: 438396. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:07:49,078][62436] Avg episode reward: [(0, '450.578')] [2024-12-13 08:07:54,078][62436] Fps is (10 sec: 819.0, 60 sec: 887.4, 300 sec: 847.0). Total num frames: 442368. Throughput: 0: 858.2. Samples: 443384. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:07:54,078][62436] Avg episode reward: [(0, '442.798')] [2024-12-13 08:07:54,090][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000000864_442368.pth... [2024-12-13 08:07:54,098][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000000816_417792.pth [2024-12-13 08:07:59,076][62436] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 446464. Throughput: 0: 848.8. Samples: 445428. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:07:59,076][62436] Avg episode reward: [(0, '450.115')] [2024-12-13 08:08:01,989][62492] Updated weights for policy 0, policy_version 880 (0.0011) [2024-12-13 08:08:04,076][62436] Fps is (10 sec: 819.4, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 450560. Throughput: 0: 843.5. Samples: 451152. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 08:08:04,076][62436] Avg episode reward: [(0, '439.271')] [2024-12-13 08:08:09,080][62436] Fps is (10 sec: 818.8, 60 sec: 887.4, 300 sec: 847.0). Total num frames: 454656. Throughput: 0: 863.9. Samples: 456540. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:08:09,081][62436] Avg episode reward: [(0, '446.338')] [2024-12-13 08:08:09,092][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000000888_454656.pth... [2024-12-13 08:08:09,107][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000000840_430080.pth [2024-12-13 08:08:14,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 458752. Throughput: 0: 855.8. Samples: 458472. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 08:08:14,076][62436] Avg episode reward: [(0, '452.307')] [2024-12-13 08:08:14,077][62473] Saving new best policy, reward=452.307! [2024-12-13 08:08:19,076][62436] Fps is (10 sec: 819.6, 60 sec: 819.3, 300 sec: 847.0). Total num frames: 462848. Throughput: 0: 847.2. Samples: 463952. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 08:08:19,076][62436] Avg episode reward: [(0, '444.615')] [2024-12-13 08:08:24,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 466944. Throughput: 0: 871.1. Samples: 469660. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:08:24,076][62436] Avg episode reward: [(0, '441.929')] [2024-12-13 08:08:24,087][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000000912_466944.pth... [2024-12-13 08:08:24,092][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000000864_442368.pth [2024-12-13 08:08:29,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 471040. Throughput: 0: 859.5. Samples: 471404. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:08:29,076][62436] Avg episode reward: [(0, '458.828')] [2024-12-13 08:08:29,077][62473] Saving new best policy, reward=458.828! [2024-12-13 08:08:34,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 475136. Throughput: 0: 850.9. Samples: 476684. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 08:08:34,076][62436] Avg episode reward: [(0, '470.064')] [2024-12-13 08:08:34,077][62473] Saving new best policy, reward=470.064! [2024-12-13 08:08:39,085][62436] Fps is (10 sec: 818.5, 60 sec: 819.1, 300 sec: 846.9). Total num frames: 479232. Throughput: 0: 866.1. Samples: 482364. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 08:08:39,087][62436] Avg episode reward: [(0, '452.988')] [2024-12-13 08:08:39,101][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000000936_479232.pth... [2024-12-13 08:08:39,110][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000000888_454656.pth [2024-12-13 08:08:44,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 483328. Throughput: 0: 860.6. Samples: 484156. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:08:44,076][62436] Avg episode reward: [(0, '444.883')] [2024-12-13 08:08:49,076][62436] Fps is (10 sec: 819.9, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 487424. Throughput: 0: 840.3. Samples: 488964. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 08:08:49,076][62436] Avg episode reward: [(0, '450.191')] [2024-12-13 08:08:50,302][62492] Updated weights for policy 0, policy_version 960 (0.0017) [2024-12-13 08:08:54,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 491520. Throughput: 0: 850.8. Samples: 494824. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:08:54,076][62436] Avg episode reward: [(0, '461.873')] [2024-12-13 08:08:54,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000000960_491520.pth... [2024-12-13 08:08:54,090][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000000912_466944.pth [2024-12-13 08:08:59,079][62436] Fps is (10 sec: 818.9, 60 sec: 819.1, 300 sec: 833.1). Total num frames: 495616. Throughput: 0: 852.9. Samples: 496856. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 08:08:59,080][62436] Avg episode reward: [(0, '490.627')] [2024-12-13 08:08:59,081][62473] Saving new best policy, reward=490.627! [2024-12-13 08:09:04,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 499712. Throughput: 0: 832.4. Samples: 501412. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:09:04,076][62436] Avg episode reward: [(0, '471.001')] [2024-12-13 08:09:09,076][62436] Fps is (10 sec: 819.5, 60 sec: 819.3, 300 sec: 833.1). Total num frames: 503808. Throughput: 0: 835.8. Samples: 507272. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:09:09,077][62436] Avg episode reward: [(0, '477.743')] [2024-12-13 08:09:09,084][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000000984_503808.pth... [2024-12-13 08:09:09,097][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000000936_479232.pth [2024-12-13 08:09:14,077][62436] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 507904. Throughput: 0: 848.9. Samples: 509608. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:09:14,079][62436] Avg episode reward: [(0, '475.728')] [2024-12-13 08:09:19,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 512000. Throughput: 0: 827.8. Samples: 513936. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-12-13 08:09:19,076][62436] Avg episode reward: [(0, '492.617')] [2024-12-13 08:09:19,077][62473] Saving new best policy, reward=492.617! [2024-12-13 08:09:24,076][62436] Fps is (10 sec: 819.3, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 516096. Throughput: 0: 831.2. Samples: 519760. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-12-13 08:09:24,076][62436] Avg episode reward: [(0, '485.880')] [2024-12-13 08:09:24,185][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000001016_520192.pth... [2024-12-13 08:09:24,192][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000000960_491520.pth [2024-12-13 08:09:29,076][62436] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 520192. Throughput: 0: 850.7. Samples: 522440. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:09:29,078][62436] Avg episode reward: [(0, '469.285')] [2024-12-13 08:09:34,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 524288. Throughput: 0: 834.4. Samples: 526512. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-12-13 08:09:34,076][62436] Avg episode reward: [(0, '472.747')] [2024-12-13 08:09:38,812][62492] Updated weights for policy 0, policy_version 1040 (0.0011) [2024-12-13 08:09:39,076][62436] Fps is (10 sec: 1228.9, 60 sec: 887.6, 300 sec: 847.0). Total num frames: 532480. Throughput: 0: 835.2. Samples: 532408. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:09:39,076][62436] Avg episode reward: [(0, '478.748')] [2024-12-13 08:09:39,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000001040_532480.pth... [2024-12-13 08:09:39,087][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000000984_503808.pth [2024-12-13 08:09:44,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 532480. Throughput: 0: 851.6. Samples: 535176. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:09:44,076][62436] Avg episode reward: [(0, '488.454')] [2024-12-13 08:09:49,076][62436] Fps is (10 sec: 409.6, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 536576. Throughput: 0: 817.7. Samples: 538208. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:09:49,076][62436] Avg episode reward: [(0, '494.401')] [2024-12-13 08:09:49,077][62473] Saving new best policy, reward=494.401! [2024-12-13 08:09:54,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 540672. Throughput: 0: 786.4. Samples: 542660. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:09:54,076][62436] Avg episode reward: [(0, '494.018')] [2024-12-13 08:09:54,083][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000001056_540672.pth... [2024-12-13 08:09:54,094][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000001016_520192.pth [2024-12-13 08:09:59,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.3, 300 sec: 833.1). Total num frames: 544768. Throughput: 0: 796.4. Samples: 545444. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-12-13 08:09:59,076][62436] Avg episode reward: [(0, '502.205')] [2024-12-13 08:09:59,078][62473] Saving new best policy, reward=502.205! [2024-12-13 08:10:04,077][62436] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 548864. Throughput: 0: 815.3. Samples: 550628. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 08:10:04,078][62436] Avg episode reward: [(0, '508.495')] [2024-12-13 08:10:04,080][62473] Saving new best policy, reward=508.495! [2024-12-13 08:10:09,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 552960. Throughput: 0: 784.8. Samples: 555076. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 08:10:09,076][62436] Avg episode reward: [(0, '518.738')] [2024-12-13 08:10:09,083][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000001080_552960.pth... [2024-12-13 08:10:09,090][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000001040_532480.pth [2024-12-13 08:10:09,098][62473] Saving new best policy, reward=518.738! [2024-12-13 08:10:14,076][62436] Fps is (10 sec: 819.3, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 557056. Throughput: 0: 785.8. Samples: 557800. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-12-13 08:10:14,076][62436] Avg episode reward: [(0, '521.590')] [2024-12-13 08:10:14,077][62473] Saving new best policy, reward=521.590! [2024-12-13 08:10:19,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 561152. Throughput: 0: 817.5. Samples: 563300. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:10:19,076][62436] Avg episode reward: [(0, '522.178')] [2024-12-13 08:10:19,077][62473] Saving new best policy, reward=522.178! [2024-12-13 08:10:24,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 565248. Throughput: 0: 781.6. Samples: 567580. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 08:10:24,076][62436] Avg episode reward: [(0, '510.413')] [2024-12-13 08:10:24,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000001104_565248.pth... [2024-12-13 08:10:24,089][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000001056_540672.pth [2024-12-13 08:10:29,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 569344. Throughput: 0: 783.8. Samples: 570448. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:10:29,076][62436] Avg episode reward: [(0, '496.864')] [2024-12-13 08:10:29,688][62492] Updated weights for policy 0, policy_version 1120 (0.0027) [2024-12-13 08:10:34,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 573440. Throughput: 0: 845.3. Samples: 576248. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:10:34,076][62436] Avg episode reward: [(0, '493.570')] [2024-12-13 08:10:39,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 833.1). Total num frames: 577536. Throughput: 0: 837.6. Samples: 580352. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:10:39,076][62436] Avg episode reward: [(0, '506.340')] [2024-12-13 08:10:39,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000001128_577536.pth... [2024-12-13 08:10:39,087][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000001080_552960.pth [2024-12-13 08:10:44,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 581632. Throughput: 0: 838.7. Samples: 583184. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 08:10:44,076][62436] Avg episode reward: [(0, '515.222')] [2024-12-13 08:10:49,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 585728. Throughput: 0: 852.5. Samples: 588988. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 08:10:49,076][62436] Avg episode reward: [(0, '525.132')] [2024-12-13 08:10:49,199][62473] Saving new best policy, reward=525.132! [2024-12-13 08:10:54,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 589824. Throughput: 0: 838.5. Samples: 592808. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 08:10:54,077][62436] Avg episode reward: [(0, '525.275')] [2024-12-13 08:10:54,083][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000001152_589824.pth... [2024-12-13 08:10:54,089][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000001104_565248.pth [2024-12-13 08:10:54,092][62473] Saving new best policy, reward=525.275! [2024-12-13 08:10:59,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 593920. Throughput: 0: 840.7. Samples: 595632. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:10:59,076][62436] Avg episode reward: [(0, '508.513')] [2024-12-13 08:11:04,076][62436] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 602112. Throughput: 0: 849.2. Samples: 601512. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 08:11:04,076][62436] Avg episode reward: [(0, '506.966')] [2024-12-13 08:11:09,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 602112. Throughput: 0: 846.2. Samples: 605660. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 08:11:09,076][62436] Avg episode reward: [(0, '514.939')] [2024-12-13 08:11:09,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000001176_602112.pth... [2024-12-13 08:11:09,092][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000001128_577536.pth [2024-12-13 08:11:14,078][62436] Fps is (10 sec: 819.0, 60 sec: 887.4, 300 sec: 847.0). Total num frames: 610304. Throughput: 0: 838.9. Samples: 608200. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:11:14,079][62436] Avg episode reward: [(0, '531.127')] [2024-12-13 08:11:14,080][62473] Saving new best policy, reward=531.127! [2024-12-13 08:11:17,989][62492] Updated weights for policy 0, policy_version 1200 (0.0011) [2024-12-13 08:11:19,079][62436] Fps is (10 sec: 1228.4, 60 sec: 887.4, 300 sec: 847.0). Total num frames: 614400. Throughput: 0: 842.7. Samples: 614172. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2024-12-13 08:11:19,080][62436] Avg episode reward: [(0, '517.147')] [2024-12-13 08:11:24,078][62436] Fps is (10 sec: 819.2, 60 sec: 887.4, 300 sec: 847.0). Total num frames: 618496. Throughput: 0: 846.1. Samples: 618428. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:11:24,079][62436] Avg episode reward: [(0, '514.861')] [2024-12-13 08:11:24,084][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000001208_618496.pth... [2024-12-13 08:11:24,091][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000001152_589824.pth [2024-12-13 08:11:29,075][62436] Fps is (10 sec: 819.5, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 622592. Throughput: 0: 837.0. Samples: 620848. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2024-12-13 08:11:29,076][62436] Avg episode reward: [(0, '504.798')] [2024-12-13 08:11:34,076][62436] Fps is (10 sec: 819.4, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 626688. Throughput: 0: 837.8. Samples: 626688. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-12-13 08:11:34,076][62436] Avg episode reward: [(0, '520.468')] [2024-12-13 08:11:39,076][62436] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 630784. Throughput: 0: 853.6. Samples: 631220. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:11:39,076][62436] Avg episode reward: [(0, '542.529')] [2024-12-13 08:11:39,084][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000001232_630784.pth... [2024-12-13 08:11:39,095][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000001176_602112.pth [2024-12-13 08:11:39,101][62473] Saving new best policy, reward=542.529! [2024-12-13 08:11:44,076][62436] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 634880. Throughput: 0: 844.9. Samples: 633652. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:11:44,076][62436] Avg episode reward: [(0, '518.277')] [2024-12-13 08:11:49,076][62436] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 638976. Throughput: 0: 842.5. Samples: 639424. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:11:49,076][62436] Avg episode reward: [(0, '532.661')] [2024-12-13 08:11:54,076][62436] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 643072. Throughput: 0: 857.0. Samples: 644224. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 08:11:54,076][62436] Avg episode reward: [(0, '526.868')] [2024-12-13 08:11:54,086][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000001256_643072.pth... [2024-12-13 08:11:54,092][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000001208_618496.pth [2024-12-13 08:11:59,076][62436] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 833.1). Total num frames: 647168. Throughput: 0: 848.9. Samples: 646400. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:11:59,076][62436] Avg episode reward: [(0, '544.267')] [2024-12-13 08:11:59,077][62473] Saving new best policy, reward=544.267! [2024-12-13 08:12:04,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 651264. Throughput: 0: 841.0. Samples: 652016. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 08:12:04,076][62436] Avg episode reward: [(0, '534.244')] [2024-12-13 08:12:06,057][62492] Updated weights for policy 0, policy_version 1280 (0.0012) [2024-12-13 08:12:09,076][62436] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 833.1). Total num frames: 655360. Throughput: 0: 859.4. Samples: 657100. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:12:09,076][62436] Avg episode reward: [(0, '526.269')] [2024-12-13 08:12:09,089][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000001280_655360.pth... [2024-12-13 08:12:09,096][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000001232_630784.pth [2024-12-13 08:12:14,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 659456. Throughput: 0: 848.7. Samples: 659040. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:12:14,076][62436] Avg episode reward: [(0, '531.597')] [2024-12-13 08:12:19,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.3, 300 sec: 833.1). Total num frames: 663552. Throughput: 0: 844.6. Samples: 664696. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 08:12:19,076][62436] Avg episode reward: [(0, '553.520')] [2024-12-13 08:12:19,077][62473] Saving new best policy, reward=553.520! [2024-12-13 08:12:24,080][62436] Fps is (10 sec: 818.8, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 667648. Throughput: 0: 864.4. Samples: 670120. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:12:24,081][62436] Avg episode reward: [(0, '553.059')] [2024-12-13 08:12:24,089][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000001304_667648.pth... [2024-12-13 08:12:24,102][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000001256_643072.pth [2024-12-13 08:12:29,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 671744. Throughput: 0: 850.9. Samples: 671944. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:12:29,076][62436] Avg episode reward: [(0, '534.689')] [2024-12-13 08:12:34,076][62436] Fps is (10 sec: 819.6, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 675840. Throughput: 0: 844.7. Samples: 677436. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:12:34,076][62436] Avg episode reward: [(0, '552.100')] [2024-12-13 08:12:39,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 679936. Throughput: 0: 866.4. Samples: 683212. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2024-12-13 08:12:39,076][62436] Avg episode reward: [(0, '552.394')] [2024-12-13 08:12:39,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000001328_679936.pth... [2024-12-13 08:12:39,087][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000001280_655360.pth [2024-12-13 08:12:44,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 684032. Throughput: 0: 858.8. Samples: 685044. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:12:44,076][62436] Avg episode reward: [(0, '563.623')] [2024-12-13 08:12:44,077][62473] Saving new best policy, reward=563.623! [2024-12-13 08:12:49,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 688128. Throughput: 0: 849.3. Samples: 690236. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 08:12:49,076][62436] Avg episode reward: [(0, '558.650')] [2024-12-13 08:12:54,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 692224. Throughput: 0: 866.7. Samples: 696100. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:12:54,076][62436] Avg episode reward: [(0, '562.456')] [2024-12-13 08:12:54,087][62492] Updated weights for policy 0, policy_version 1360 (0.0013) [2024-12-13 08:12:54,096][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000001360_696320.pth... [2024-12-13 08:12:54,111][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000001304_667648.pth [2024-12-13 08:12:59,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 696320. Throughput: 0: 865.9. Samples: 698004. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:12:59,076][62436] Avg episode reward: [(0, '546.664')] [2024-12-13 08:13:04,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 700416. Throughput: 0: 846.7. Samples: 702796. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 08:13:04,076][62436] Avg episode reward: [(0, '558.913')] [2024-12-13 08:13:09,076][62436] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 708608. Throughput: 0: 855.4. Samples: 708608. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:13:09,076][62436] Avg episode reward: [(0, '553.053')] [2024-12-13 08:13:09,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000001384_708608.pth... [2024-12-13 08:13:09,093][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000001328_679936.pth [2024-12-13 08:13:14,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 708608. Throughput: 0: 866.8. Samples: 710952. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:13:14,077][62436] Avg episode reward: [(0, '547.575')] [2024-12-13 08:13:19,076][62436] Fps is (10 sec: 409.6, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 712704. Throughput: 0: 845.1. Samples: 715464. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:13:19,076][62436] Avg episode reward: [(0, '537.330')] [2024-12-13 08:13:24,076][62436] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 720896. Throughput: 0: 845.1. Samples: 721240. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2024-12-13 08:13:24,076][62436] Avg episode reward: [(0, '574.121')] [2024-12-13 08:13:24,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000001408_720896.pth... [2024-12-13 08:13:24,090][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000001360_696320.pth [2024-12-13 08:13:24,091][62473] Saving new best policy, reward=574.121! [2024-12-13 08:13:29,076][62436] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 724992. Throughput: 0: 864.5. Samples: 723948. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:13:29,082][62436] Avg episode reward: [(0, '590.985')] [2024-12-13 08:13:29,083][62473] Saving new best policy, reward=590.985! [2024-12-13 08:13:34,076][62436] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 729088. Throughput: 0: 845.3. Samples: 728276. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:13:34,076][62436] Avg episode reward: [(0, '604.988')] [2024-12-13 08:13:34,077][62473] Saving new best policy, reward=604.988! [2024-12-13 08:13:39,076][62436] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 733184. Throughput: 0: 844.4. Samples: 734100. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:13:39,076][62436] Avg episode reward: [(0, '586.160')] [2024-12-13 08:13:39,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000001432_733184.pth... [2024-12-13 08:13:39,093][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000001384_708608.pth [2024-12-13 08:13:42,498][62492] Updated weights for policy 0, policy_version 1440 (0.0019) [2024-12-13 08:13:44,076][62436] Fps is (10 sec: 819.1, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 737280. Throughput: 0: 864.7. Samples: 736916. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:13:44,077][62436] Avg episode reward: [(0, '568.572')] [2024-12-13 08:13:49,076][62436] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 741376. Throughput: 0: 850.0. Samples: 741048. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:13:49,076][62436] Avg episode reward: [(0, '556.998')] [2024-12-13 08:13:54,076][62436] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 745472. Throughput: 0: 847.5. Samples: 746744. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:13:54,076][62436] Avg episode reward: [(0, '552.896')] [2024-12-13 08:13:54,084][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000001456_745472.pth... [2024-12-13 08:13:54,093][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000001408_720896.pth [2024-12-13 08:13:59,076][62436] Fps is (10 sec: 819.1, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 749568. Throughput: 0: 858.1. Samples: 749568. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:13:59,077][62436] Avg episode reward: [(0, '558.835')] [2024-12-13 08:14:04,076][62436] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 753664. Throughput: 0: 846.0. Samples: 753536. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:14:04,076][62436] Avg episode reward: [(0, '553.527')] [2024-12-13 08:14:09,076][62436] Fps is (10 sec: 819.3, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 757760. Throughput: 0: 843.6. Samples: 759200. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 08:14:09,076][62436] Avg episode reward: [(0, '563.625')] [2024-12-13 08:14:09,083][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000001480_757760.pth... [2024-12-13 08:14:09,094][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000001432_733184.pth [2024-12-13 08:14:14,080][62436] Fps is (10 sec: 818.8, 60 sec: 887.4, 300 sec: 847.0). Total num frames: 761856. Throughput: 0: 843.5. Samples: 761908. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:14:14,081][62436] Avg episode reward: [(0, '573.293')] [2024-12-13 08:14:19,076][62436] Fps is (10 sec: 409.6, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 761856. Throughput: 0: 822.8. Samples: 765300. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:14:19,076][62436] Avg episode reward: [(0, '572.871')] [2024-12-13 08:14:24,076][62436] Fps is (10 sec: 409.8, 60 sec: 750.9, 300 sec: 833.1). Total num frames: 765952. Throughput: 0: 786.7. Samples: 769500. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-12-13 08:14:24,077][62436] Avg episode reward: [(0, '558.862')] [2024-12-13 08:14:24,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000001496_765952.pth... [2024-12-13 08:14:24,090][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000001456_745472.pth [2024-12-13 08:14:29,076][62436] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 774144. Throughput: 0: 787.7. Samples: 772364. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:14:29,076][62436] Avg episode reward: [(0, '575.613')] [2024-12-13 08:14:33,963][62492] Updated weights for policy 0, policy_version 1520 (0.0016) [2024-12-13 08:14:34,076][62436] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 778240. Throughput: 0: 812.2. Samples: 777596. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:14:34,076][62436] Avg episode reward: [(0, '569.054')] [2024-12-13 08:14:39,076][62436] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 833.1). Total num frames: 778240. Throughput: 0: 783.4. Samples: 781996. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:14:39,076][62436] Avg episode reward: [(0, '589.716')] [2024-12-13 08:14:39,106][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000001528_782336.pth... [2024-12-13 08:14:39,117][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000001480_757760.pth [2024-12-13 08:14:44,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 786432. Throughput: 0: 781.9. Samples: 784752. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 08:14:44,076][62436] Avg episode reward: [(0, '590.477')] [2024-12-13 08:14:49,076][62436] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 790528. Throughput: 0: 814.9. Samples: 790208. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:14:49,076][62436] Avg episode reward: [(0, '585.191')] [2024-12-13 08:14:54,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 794624. Throughput: 0: 782.9. Samples: 794432. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:14:54,076][62436] Avg episode reward: [(0, '587.158')] [2024-12-13 08:14:54,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000001552_794624.pth... [2024-12-13 08:14:54,091][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000001496_765952.pth [2024-12-13 08:14:59,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 798720. Throughput: 0: 787.5. Samples: 797344. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:14:59,076][62436] Avg episode reward: [(0, '578.110')] [2024-12-13 08:15:04,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 802816. Throughput: 0: 836.4. Samples: 802936. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 08:15:04,076][62436] Avg episode reward: [(0, '601.835')] [2024-12-13 08:15:09,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 806912. Throughput: 0: 831.4. Samples: 806912. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 08:15:09,076][62436] Avg episode reward: [(0, '607.673')] [2024-12-13 08:15:09,084][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000001576_806912.pth... [2024-12-13 08:15:09,091][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000001528_782336.pth [2024-12-13 08:15:09,095][62473] Saving new best policy, reward=607.673! [2024-12-13 08:15:14,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.3, 300 sec: 847.0). Total num frames: 811008. Throughput: 0: 832.1. Samples: 809808. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:15:14,076][62436] Avg episode reward: [(0, '628.631')] [2024-12-13 08:15:14,078][62473] Saving new best policy, reward=628.631! [2024-12-13 08:15:19,078][62436] Fps is (10 sec: 819.0, 60 sec: 887.4, 300 sec: 847.0). Total num frames: 815104. Throughput: 0: 840.3. Samples: 815412. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 08:15:19,079][62436] Avg episode reward: [(0, '632.703')] [2024-12-13 08:15:19,080][62473] Saving new best policy, reward=632.703! [2024-12-13 08:15:23,262][62492] Updated weights for policy 0, policy_version 1600 (0.0011) [2024-12-13 08:15:24,076][62436] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 819200. Throughput: 0: 832.2. Samples: 819444. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:15:24,076][62436] Avg episode reward: [(0, '640.254')] [2024-12-13 08:15:24,090][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000001600_819200.pth... [2024-12-13 08:15:24,096][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000001552_794624.pth [2024-12-13 08:15:24,097][62473] Saving new best policy, reward=640.254! [2024-12-13 08:15:29,076][62436] Fps is (10 sec: 819.4, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 823296. Throughput: 0: 835.0. Samples: 822328. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:15:29,076][62436] Avg episode reward: [(0, '638.105')] [2024-12-13 08:15:34,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 827392. Throughput: 0: 842.3. Samples: 828112. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:15:34,076][62436] Avg episode reward: [(0, '633.328')] [2024-12-13 08:15:39,076][62436] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 847.0). Total num frames: 831488. Throughput: 0: 845.2. Samples: 832464. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:15:39,076][62436] Avg episode reward: [(0, '640.160')] [2024-12-13 08:15:39,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000001624_831488.pth... [2024-12-13 08:15:39,092][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000001576_806912.pth [2024-12-13 08:15:44,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 835584. Throughput: 0: 840.0. Samples: 835144. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:15:44,076][62436] Avg episode reward: [(0, '623.292')] [2024-12-13 08:15:49,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 839680. Throughput: 0: 841.6. Samples: 840808. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:15:49,076][62436] Avg episode reward: [(0, '640.058')] [2024-12-13 08:15:54,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 843776. Throughput: 0: 851.9. Samples: 845248. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:15:54,076][62436] Avg episode reward: [(0, '639.721')] [2024-12-13 08:15:54,086][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000001648_843776.pth... [2024-12-13 08:15:54,095][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000001600_819200.pth [2024-12-13 08:15:59,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 847872. Throughput: 0: 844.3. Samples: 847800. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:15:59,076][62436] Avg episode reward: [(0, '661.132')] [2024-12-13 08:15:59,077][62473] Saving new best policy, reward=661.132! [2024-12-13 08:16:04,077][62436] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 847.0). Total num frames: 851968. Throughput: 0: 846.9. Samples: 853520. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 08:16:04,078][62436] Avg episode reward: [(0, '685.326')] [2024-12-13 08:16:04,079][62473] Saving new best policy, reward=685.326! [2024-12-13 08:16:09,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 856064. Throughput: 0: 860.2. Samples: 858152. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:16:09,083][62436] Avg episode reward: [(0, '667.932')] [2024-12-13 08:16:09,090][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000001672_856064.pth... [2024-12-13 08:16:09,103][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000001624_831488.pth [2024-12-13 08:16:11,526][62492] Updated weights for policy 0, policy_version 1680 (0.0013) [2024-12-13 08:16:14,076][62436] Fps is (10 sec: 819.3, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 860160. Throughput: 0: 841.3. Samples: 860188. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:16:14,076][62436] Avg episode reward: [(0, '665.646')] [2024-12-13 08:16:19,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 864256. Throughput: 0: 842.5. Samples: 866024. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 08:16:19,078][62436] Avg episode reward: [(0, '655.849')] [2024-12-13 08:16:24,080][62436] Fps is (10 sec: 818.8, 60 sec: 819.1, 300 sec: 833.1). Total num frames: 868352. Throughput: 0: 855.6. Samples: 870972. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:16:24,081][62436] Avg episode reward: [(0, '671.382')] [2024-12-13 08:16:24,092][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000001696_868352.pth... [2024-12-13 08:16:24,104][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000001648_843776.pth [2024-12-13 08:16:29,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 872448. Throughput: 0: 837.2. Samples: 872816. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:16:29,076][62436] Avg episode reward: [(0, '681.156')] [2024-12-13 08:16:34,076][62436] Fps is (10 sec: 819.6, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 876544. Throughput: 0: 842.0. Samples: 878696. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 08:16:34,076][62436] Avg episode reward: [(0, '690.025')] [2024-12-13 08:16:34,077][62473] Saving new best policy, reward=690.025! [2024-12-13 08:16:39,078][62436] Fps is (10 sec: 819.0, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 880640. Throughput: 0: 857.8. Samples: 883852. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:16:39,079][62436] Avg episode reward: [(0, '707.266')] [2024-12-13 08:16:39,087][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000001720_880640.pth... [2024-12-13 08:16:39,097][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000001672_856064.pth [2024-12-13 08:16:39,098][62473] Saving new best policy, reward=707.266! [2024-12-13 08:16:44,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 884736. Throughput: 0: 839.7. Samples: 885588. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2024-12-13 08:16:44,076][62436] Avg episode reward: [(0, '670.589')] [2024-12-13 08:16:49,076][62436] Fps is (10 sec: 819.4, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 888832. Throughput: 0: 839.5. Samples: 891296. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:16:49,076][62436] Avg episode reward: [(0, '653.539')] [2024-12-13 08:16:54,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 892928. Throughput: 0: 833.0. Samples: 895636. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:16:54,076][62436] Avg episode reward: [(0, '666.820')] [2024-12-13 08:16:54,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000001744_892928.pth... [2024-12-13 08:16:54,093][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000001696_868352.pth [2024-12-13 08:16:59,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 897024. Throughput: 0: 827.5. Samples: 897424. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2024-12-13 08:16:59,076][62436] Avg episode reward: [(0, '662.101')] [2024-12-13 08:17:01,129][62492] Updated weights for policy 0, policy_version 1760 (0.0015) [2024-12-13 08:17:04,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 901120. Throughput: 0: 812.6. Samples: 902592. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:17:04,076][62436] Avg episode reward: [(0, '659.371')] [2024-12-13 08:17:09,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 905216. Throughput: 0: 831.9. Samples: 908404. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-12-13 08:17:09,076][62436] Avg episode reward: [(0, '677.338')] [2024-12-13 08:17:09,083][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000001768_905216.pth... [2024-12-13 08:17:09,100][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000001720_880640.pth [2024-12-13 08:17:14,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 909312. Throughput: 0: 826.8. Samples: 910020. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-12-13 08:17:14,076][62436] Avg episode reward: [(0, '669.707')] [2024-12-13 08:17:19,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 913408. Throughput: 0: 807.2. Samples: 915020. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:17:19,076][62436] Avg episode reward: [(0, '647.576')] [2024-12-13 08:17:24,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.3, 300 sec: 833.1). Total num frames: 917504. Throughput: 0: 821.6. Samples: 920824. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 08:17:24,077][62436] Avg episode reward: [(0, '634.901')] [2024-12-13 08:17:24,083][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000001792_917504.pth... [2024-12-13 08:17:24,100][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000001744_892928.pth [2024-12-13 08:17:29,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 921600. Throughput: 0: 827.5. Samples: 922824. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 08:17:29,076][62436] Avg episode reward: [(0, '650.713')] [2024-12-13 08:17:34,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 925696. Throughput: 0: 802.4. Samples: 927404. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:17:34,076][62436] Avg episode reward: [(0, '639.979')] [2024-12-13 08:17:39,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 929792. Throughput: 0: 835.1. Samples: 933216. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 08:17:39,076][62436] Avg episode reward: [(0, '648.187')] [2024-12-13 08:17:39,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000001816_929792.pth... [2024-12-13 08:17:39,089][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000001768_905216.pth [2024-12-13 08:17:44,077][62436] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 933888. Throughput: 0: 841.9. Samples: 935312. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 08:17:44,078][62436] Avg episode reward: [(0, '653.232')] [2024-12-13 08:17:49,078][62436] Fps is (10 sec: 819.0, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 937984. Throughput: 0: 820.3. Samples: 939508. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:17:49,079][62436] Avg episode reward: [(0, '693.188')] [2024-12-13 08:17:50,458][62492] Updated weights for policy 0, policy_version 1840 (0.0011) [2024-12-13 08:17:54,076][62436] Fps is (10 sec: 819.3, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 942080. Throughput: 0: 815.7. Samples: 945112. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:17:54,076][62436] Avg episode reward: [(0, '694.642')] [2024-12-13 08:17:54,084][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000001840_942080.pth... [2024-12-13 08:17:54,090][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000001792_917504.pth [2024-12-13 08:17:59,076][62436] Fps is (10 sec: 819.4, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 946176. Throughput: 0: 836.6. Samples: 947668. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:17:59,076][62436] Avg episode reward: [(0, '671.541')] [2024-12-13 08:18:04,075][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 950272. Throughput: 0: 814.3. Samples: 951664. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:18:04,076][62436] Avg episode reward: [(0, '667.103')] [2024-12-13 08:18:09,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 954368. Throughput: 0: 811.4. Samples: 957336. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-12-13 08:18:09,076][62436] Avg episode reward: [(0, '670.017')] [2024-12-13 08:18:09,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000001864_954368.pth... [2024-12-13 08:18:09,095][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000001816_929792.pth [2024-12-13 08:18:14,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 958464. Throughput: 0: 825.5. Samples: 959972. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:18:14,076][62436] Avg episode reward: [(0, '711.746')] [2024-12-13 08:18:14,078][62473] Saving new best policy, reward=711.746! [2024-12-13 08:18:19,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 962560. Throughput: 0: 807.6. Samples: 963744. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:18:19,076][62436] Avg episode reward: [(0, '727.804')] [2024-12-13 08:18:19,077][62473] Saving new best policy, reward=727.804! [2024-12-13 08:18:24,077][62436] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 966656. Throughput: 0: 804.0. Samples: 969396. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:18:24,078][62436] Avg episode reward: [(0, '744.466')] [2024-12-13 08:18:24,085][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000001888_966656.pth... [2024-12-13 08:18:24,094][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000001840_942080.pth [2024-12-13 08:18:24,100][62473] Saving new best policy, reward=744.466! [2024-12-13 08:18:29,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 970752. Throughput: 0: 819.1. Samples: 972168. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 08:18:29,076][62436] Avg episode reward: [(0, '732.673')] [2024-12-13 08:18:34,075][62436] Fps is (10 sec: 819.3, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 974848. Throughput: 0: 814.0. Samples: 976136. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:18:34,076][62436] Avg episode reward: [(0, '702.527')] [2024-12-13 08:18:39,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 978944. Throughput: 0: 816.3. Samples: 981844. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:18:39,076][62436] Avg episode reward: [(0, '685.219')] [2024-12-13 08:18:39,083][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000001912_978944.pth... [2024-12-13 08:18:39,090][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000001864_954368.pth [2024-12-13 08:18:40,217][62492] Updated weights for policy 0, policy_version 1920 (0.0023) [2024-12-13 08:18:44,075][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 983040. Throughput: 0: 806.4. Samples: 983956. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 08:18:44,076][62436] Avg episode reward: [(0, '671.988')] [2024-12-13 08:18:49,079][62436] Fps is (10 sec: 818.9, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 987136. Throughput: 0: 788.2. Samples: 987136. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 08:18:49,080][62436] Avg episode reward: [(0, '674.556')] [2024-12-13 08:18:54,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 991232. Throughput: 0: 774.1. Samples: 992172. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:18:54,076][62436] Avg episode reward: [(0, '668.175')] [2024-12-13 08:18:54,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000001936_991232.pth... [2024-12-13 08:18:54,087][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000001888_966656.pth [2024-12-13 08:18:59,075][62436] Fps is (10 sec: 819.5, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 995328. Throughput: 0: 781.9. Samples: 995156. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:18:59,076][62436] Avg episode reward: [(0, '705.247')] [2024-12-13 08:19:04,079][62436] Fps is (10 sec: 818.9, 60 sec: 819.1, 300 sec: 819.2). Total num frames: 999424. Throughput: 0: 804.3. Samples: 999940. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:19:04,080][62436] Avg episode reward: [(0, '705.417')] [2024-12-13 08:19:09,075][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1003520. Throughput: 0: 783.0. Samples: 1004628. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:19:09,076][62436] Avg episode reward: [(0, '698.333')] [2024-12-13 08:19:09,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000001960_1003520.pth... [2024-12-13 08:19:09,087][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000001912_978944.pth [2024-12-13 08:19:14,075][62436] Fps is (10 sec: 819.5, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 1007616. Throughput: 0: 786.7. Samples: 1007568. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:19:14,076][62436] Avg episode reward: [(0, '654.377')] [2024-12-13 08:19:19,075][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 1011712. Throughput: 0: 809.7. Samples: 1012572. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:19:19,076][62436] Avg episode reward: [(0, '646.555')] [2024-12-13 08:19:24,078][62436] Fps is (10 sec: 819.0, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1015808. Throughput: 0: 780.7. Samples: 1016976. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:19:24,079][62436] Avg episode reward: [(0, '655.083')] [2024-12-13 08:19:24,092][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000001984_1015808.pth... [2024-12-13 08:19:24,100][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000001936_991232.pth [2024-12-13 08:19:29,075][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1019904. Throughput: 0: 798.8. Samples: 1019900. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:19:29,076][62436] Avg episode reward: [(0, '666.345')] [2024-12-13 08:19:31,051][62492] Updated weights for policy 0, policy_version 2000 (0.0028) [2024-12-13 08:19:34,075][62436] Fps is (10 sec: 819.4, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 1024000. Throughput: 0: 846.4. Samples: 1025220. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-12-13 08:19:34,076][62436] Avg episode reward: [(0, '684.266')] [2024-12-13 08:19:39,075][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1028096. Throughput: 0: 826.5. Samples: 1029364. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 08:19:39,076][62436] Avg episode reward: [(0, '692.232')] [2024-12-13 08:19:39,083][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000002008_1028096.pth... [2024-12-13 08:19:39,090][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000001960_1003520.pth [2024-12-13 08:19:44,077][62436] Fps is (10 sec: 819.0, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1032192. Throughput: 0: 823.0. Samples: 1032192. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:19:44,078][62436] Avg episode reward: [(0, '692.512')] [2024-12-13 08:19:49,075][62436] Fps is (10 sec: 819.2, 60 sec: 819.3, 300 sec: 819.2). Total num frames: 1036288. Throughput: 0: 843.0. Samples: 1037872. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 08:19:49,076][62436] Avg episode reward: [(0, '701.981')] [2024-12-13 08:19:54,076][62436] Fps is (10 sec: 819.3, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1040384. Throughput: 0: 827.1. Samples: 1041848. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 08:19:54,076][62436] Avg episode reward: [(0, '705.034')] [2024-12-13 08:19:54,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000002032_1040384.pth... [2024-12-13 08:19:54,091][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000001984_1015808.pth [2024-12-13 08:19:59,075][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1044480. Throughput: 0: 824.4. Samples: 1044668. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:19:59,076][62436] Avg episode reward: [(0, '713.848')] [2024-12-13 08:20:04,075][62436] Fps is (10 sec: 819.2, 60 sec: 819.3, 300 sec: 819.2). Total num frames: 1048576. Throughput: 0: 842.2. Samples: 1050472. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-12-13 08:20:04,076][62436] Avg episode reward: [(0, '716.624')] [2024-12-13 08:20:09,075][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1052672. Throughput: 0: 831.9. Samples: 1054408. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:20:09,076][62436] Avg episode reward: [(0, '728.660')] [2024-12-13 08:20:09,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000002056_1052672.pth... [2024-12-13 08:20:09,091][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000002008_1028096.pth [2024-12-13 08:20:14,075][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1056768. Throughput: 0: 824.5. Samples: 1057004. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 08:20:14,076][62436] Avg episode reward: [(0, '730.864')] [2024-12-13 08:20:19,077][62436] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1060864. Throughput: 0: 833.9. Samples: 1062748. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-12-13 08:20:19,078][62436] Avg episode reward: [(0, '711.415')] [2024-12-13 08:20:20,618][62492] Updated weights for policy 0, policy_version 2080 (0.0012) [2024-12-13 08:20:24,075][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1064960. Throughput: 0: 833.8. Samples: 1066884. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 08:20:24,076][62436] Avg episode reward: [(0, '689.163')] [2024-12-13 08:20:24,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000002080_1064960.pth... [2024-12-13 08:20:24,096][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000002032_1040384.pth [2024-12-13 08:20:29,075][62436] Fps is (10 sec: 819.4, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1069056. Throughput: 0: 824.0. Samples: 1069272. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:20:29,076][62436] Avg episode reward: [(0, '708.548')] [2024-12-13 08:20:34,075][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1073152. Throughput: 0: 826.4. Samples: 1075060. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 08:20:34,076][62436] Avg episode reward: [(0, '721.250')] [2024-12-13 08:20:39,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1077248. Throughput: 0: 836.6. Samples: 1079496. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:20:39,076][62436] Avg episode reward: [(0, '722.571')] [2024-12-13 08:20:39,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000002104_1077248.pth... [2024-12-13 08:20:39,088][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000002056_1052672.pth [2024-12-13 08:20:44,075][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1081344. Throughput: 0: 821.3. Samples: 1081628. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 08:20:44,076][62436] Avg episode reward: [(0, '761.606')] [2024-12-13 08:20:44,077][62473] Saving new best policy, reward=761.606! [2024-12-13 08:20:49,075][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1085440. Throughput: 0: 820.7. Samples: 1087404. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:20:49,076][62436] Avg episode reward: [(0, '788.356')] [2024-12-13 08:20:49,081][62473] Saving new best policy, reward=788.356! [2024-12-13 08:20:54,077][62436] Fps is (10 sec: 819.0, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1089536. Throughput: 0: 837.7. Samples: 1092108. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-12-13 08:20:54,078][62436] Avg episode reward: [(0, '776.635')] [2024-12-13 08:20:54,083][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000002128_1089536.pth... [2024-12-13 08:20:54,096][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000002080_1064960.pth [2024-12-13 08:20:59,075][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1093632. Throughput: 0: 821.9. Samples: 1093988. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 08:20:59,076][62436] Avg episode reward: [(0, '758.337')] [2024-12-13 08:21:04,075][62436] Fps is (10 sec: 819.4, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1097728. Throughput: 0: 827.8. Samples: 1099996. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:21:04,077][62436] Avg episode reward: [(0, '742.279')] [2024-12-13 08:21:09,075][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1101824. Throughput: 0: 847.7. Samples: 1105032. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 08:21:09,078][62436] Avg episode reward: [(0, '693.611')] [2024-12-13 08:21:09,086][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000002152_1101824.pth... [2024-12-13 08:21:09,095][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000002104_1077248.pth [2024-12-13 08:21:10,300][62492] Updated weights for policy 0, policy_version 2160 (0.0011) [2024-12-13 08:21:14,085][62436] Fps is (10 sec: 818.4, 60 sec: 819.1, 300 sec: 819.2). Total num frames: 1105920. Throughput: 0: 833.2. Samples: 1106772. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 08:21:14,086][62436] Avg episode reward: [(0, '709.175')] [2024-12-13 08:21:19,075][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1110016. Throughput: 0: 831.8. Samples: 1112492. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:21:19,076][62436] Avg episode reward: [(0, '725.676')] [2024-12-13 08:21:24,075][62436] Fps is (10 sec: 820.0, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1114112. Throughput: 0: 850.1. Samples: 1117752. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 08:21:24,076][62436] Avg episode reward: [(0, '711.998')] [2024-12-13 08:21:24,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000002176_1114112.pth... [2024-12-13 08:21:24,090][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000002128_1089536.pth [2024-12-13 08:21:29,075][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1118208. Throughput: 0: 842.2. Samples: 1119528. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 08:21:29,076][62436] Avg episode reward: [(0, '734.007')] [2024-12-13 08:21:34,075][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1122304. Throughput: 0: 834.8. Samples: 1124968. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:21:34,076][62436] Avg episode reward: [(0, '748.744')] [2024-12-13 08:21:39,077][62436] Fps is (10 sec: 1228.6, 60 sec: 887.4, 300 sec: 833.1). Total num frames: 1130496. Throughput: 0: 850.9. Samples: 1130396. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:21:39,077][62436] Avg episode reward: [(0, '754.879')] [2024-12-13 08:21:39,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000002208_1130496.pth... [2024-12-13 08:21:39,089][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000002152_1101824.pth [2024-12-13 08:21:44,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1130496. Throughput: 0: 849.2. Samples: 1132200. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:21:44,083][62436] Avg episode reward: [(0, '763.065')] [2024-12-13 08:21:49,076][62436] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 833.1). Total num frames: 1138688. Throughput: 0: 833.3. Samples: 1137496. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2024-12-13 08:21:49,076][62436] Avg episode reward: [(0, '752.190')] [2024-12-13 08:21:54,081][62436] Fps is (10 sec: 1228.2, 60 sec: 887.4, 300 sec: 833.1). Total num frames: 1142784. Throughput: 0: 843.6. Samples: 1143000. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-12-13 08:21:54,081][62436] Avg episode reward: [(0, '758.459')] [2024-12-13 08:21:54,091][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000002232_1142784.pth... [2024-12-13 08:21:54,105][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000002176_1114112.pth [2024-12-13 08:21:59,078][62436] Fps is (10 sec: 409.5, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1142784. Throughput: 0: 849.4. Samples: 1144988. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-12-13 08:21:59,079][62436] Avg episode reward: [(0, '750.274')] [2024-12-13 08:21:59,536][62492] Updated weights for policy 0, policy_version 2240 (0.0013) [2024-12-13 08:22:04,079][62436] Fps is (10 sec: 819.3, 60 sec: 887.4, 300 sec: 833.1). Total num frames: 1150976. Throughput: 0: 831.5. Samples: 1149912. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:22:04,080][62436] Avg episode reward: [(0, '756.313')] [2024-12-13 08:22:09,076][62436] Fps is (10 sec: 1229.1, 60 sec: 887.5, 300 sec: 833.1). Total num frames: 1155072. Throughput: 0: 842.7. Samples: 1155672. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:22:09,076][62436] Avg episode reward: [(0, '789.615')] [2024-12-13 08:22:09,084][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000002256_1155072.pth... [2024-12-13 08:22:09,097][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000002208_1130496.pth [2024-12-13 08:22:09,098][62473] Saving new best policy, reward=789.615! [2024-12-13 08:22:14,076][62436] Fps is (10 sec: 819.5, 60 sec: 887.6, 300 sec: 833.1). Total num frames: 1159168. Throughput: 0: 851.8. Samples: 1157860. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 08:22:14,077][62436] Avg episode reward: [(0, '816.944')] [2024-12-13 08:22:14,078][62473] Saving new best policy, reward=816.944! [2024-12-13 08:22:19,076][62436] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 833.1). Total num frames: 1163264. Throughput: 0: 832.5. Samples: 1162432. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 08:22:19,076][62436] Avg episode reward: [(0, '845.503')] [2024-12-13 08:22:19,077][62473] Saving new best policy, reward=845.503! [2024-12-13 08:22:24,076][62436] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 833.1). Total num frames: 1167360. Throughput: 0: 837.0. Samples: 1168060. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:22:24,076][62436] Avg episode reward: [(0, '805.114')] [2024-12-13 08:22:24,083][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000002280_1167360.pth... [2024-12-13 08:22:24,092][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000002232_1142784.pth [2024-12-13 08:22:29,079][62436] Fps is (10 sec: 818.9, 60 sec: 887.4, 300 sec: 833.1). Total num frames: 1171456. Throughput: 0: 851.5. Samples: 1170520. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:22:29,080][62436] Avg episode reward: [(0, '795.828')] [2024-12-13 08:22:34,076][62436] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 833.1). Total num frames: 1175552. Throughput: 0: 831.0. Samples: 1174892. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 08:22:34,076][62436] Avg episode reward: [(0, '787.813')] [2024-12-13 08:22:39,076][62436] Fps is (10 sec: 819.5, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 1179648. Throughput: 0: 836.9. Samples: 1180656. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 08:22:39,076][62436] Avg episode reward: [(0, '781.472')] [2024-12-13 08:22:39,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000002304_1179648.pth... [2024-12-13 08:22:39,087][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000002256_1155072.pth [2024-12-13 08:22:44,078][62436] Fps is (10 sec: 819.0, 60 sec: 887.4, 300 sec: 833.1). Total num frames: 1183744. Throughput: 0: 854.0. Samples: 1183420. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:22:44,079][62436] Avg episode reward: [(0, '815.907')] [2024-12-13 08:22:48,086][62492] Updated weights for policy 0, policy_version 2320 (0.0021) [2024-12-13 08:22:49,075][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 1187840. Throughput: 0: 834.8. Samples: 1187476. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:22:49,076][62436] Avg episode reward: [(0, '855.364')] [2024-12-13 08:22:49,077][62473] Saving new best policy, reward=855.364! [2024-12-13 08:22:54,076][62436] Fps is (10 sec: 819.4, 60 sec: 819.3, 300 sec: 833.1). Total num frames: 1191936. Throughput: 0: 834.1. Samples: 1193208. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:22:54,076][62436] Avg episode reward: [(0, '871.799')] [2024-12-13 08:22:54,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000002328_1191936.pth... [2024-12-13 08:22:54,087][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000002280_1167360.pth [2024-12-13 08:22:54,094][62473] Saving new best policy, reward=871.799! [2024-12-13 08:22:59,075][62436] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 833.1). Total num frames: 1196032. Throughput: 0: 848.5. Samples: 1196040. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:22:59,077][62436] Avg episode reward: [(0, '886.071')] [2024-12-13 08:22:59,079][62473] Saving new best policy, reward=886.071! [2024-12-13 08:23:04,076][62436] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 1200128. Throughput: 0: 833.9. Samples: 1199960. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 08:23:04,077][62436] Avg episode reward: [(0, '903.484')] [2024-12-13 08:23:04,078][62473] Saving new best policy, reward=903.484! [2024-12-13 08:23:09,078][62436] Fps is (10 sec: 819.0, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 1204224. Throughput: 0: 829.2. Samples: 1205376. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 08:23:09,079][62436] Avg episode reward: [(0, '888.368')] [2024-12-13 08:23:09,084][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000002352_1204224.pth... [2024-12-13 08:23:09,091][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000002304_1179648.pth [2024-12-13 08:23:14,075][62436] Fps is (10 sec: 819.3, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 1208320. Throughput: 0: 819.0. Samples: 1207372. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:23:14,076][62436] Avg episode reward: [(0, '911.015')] [2024-12-13 08:23:14,077][62473] Saving new best policy, reward=911.015! [2024-12-13 08:23:19,075][62436] Fps is (10 sec: 409.7, 60 sec: 750.9, 300 sec: 819.2). Total num frames: 1208320. Throughput: 0: 800.6. Samples: 1210920. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:23:19,076][62436] Avg episode reward: [(0, '927.303')] [2024-12-13 08:23:19,077][62473] Saving new best policy, reward=927.303! [2024-12-13 08:23:24,076][62436] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 819.2). Total num frames: 1212416. Throughput: 0: 788.9. Samples: 1216156. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:23:24,076][62436] Avg episode reward: [(0, '921.967')] [2024-12-13 08:23:24,113][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000002376_1216512.pth... [2024-12-13 08:23:24,119][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000002328_1191936.pth [2024-12-13 08:23:29,076][62436] Fps is (10 sec: 1228.8, 60 sec: 819.3, 300 sec: 833.1). Total num frames: 1220608. Throughput: 0: 790.3. Samples: 1218980. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:23:29,076][62436] Avg episode reward: [(0, '912.483')] [2024-12-13 08:23:34,075][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 819.2). Total num frames: 1220608. Throughput: 0: 800.4. Samples: 1223496. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:23:34,076][62436] Avg episode reward: [(0, '932.480')] [2024-12-13 08:23:34,077][62473] Saving new best policy, reward=932.480! [2024-12-13 08:23:39,079][62436] Fps is (10 sec: 409.5, 60 sec: 750.9, 300 sec: 819.2). Total num frames: 1224704. Throughput: 0: 783.1. Samples: 1228452. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:23:39,079][62436] Avg episode reward: [(0, '927.544')] [2024-12-13 08:23:39,080][62492] Updated weights for policy 0, policy_version 2400 (0.0011) [2024-12-13 08:23:39,084][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000002400_1228800.pth... [2024-12-13 08:23:39,101][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000002352_1204224.pth [2024-12-13 08:23:44,075][62436] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 1232896. Throughput: 0: 782.6. Samples: 1231256. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 08:23:44,076][62436] Avg episode reward: [(0, '901.384')] [2024-12-13 08:23:49,076][62436] Fps is (10 sec: 819.5, 60 sec: 750.9, 300 sec: 819.2). Total num frames: 1232896. Throughput: 0: 806.9. Samples: 1236272. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 08:23:49,076][62436] Avg episode reward: [(0, '893.820')] [2024-12-13 08:23:54,081][62436] Fps is (10 sec: 818.7, 60 sec: 819.1, 300 sec: 833.1). Total num frames: 1241088. Throughput: 0: 788.3. Samples: 1240852. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:23:54,082][62436] Avg episode reward: [(0, '888.234')] [2024-12-13 08:23:54,090][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000002424_1241088.pth... [2024-12-13 08:23:54,097][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000002376_1216512.pth [2024-12-13 08:23:59,075][62436] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 1245184. Throughput: 0: 807.1. Samples: 1243692. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 08:23:59,076][62436] Avg episode reward: [(0, '833.617')] [2024-12-13 08:24:04,075][62436] Fps is (10 sec: 819.7, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 1249280. Throughput: 0: 842.5. Samples: 1248832. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:24:04,076][62436] Avg episode reward: [(0, '825.951')] [2024-12-13 08:24:09,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 1253376. Throughput: 0: 826.7. Samples: 1253356. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 08:24:09,076][62436] Avg episode reward: [(0, '809.667')] [2024-12-13 08:24:09,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000002448_1253376.pth... [2024-12-13 08:24:09,088][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000002400_1228800.pth [2024-12-13 08:24:14,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 1257472. Throughput: 0: 825.9. Samples: 1256144. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:24:14,076][62436] Avg episode reward: [(0, '855.307')] [2024-12-13 08:24:19,077][62436] Fps is (10 sec: 819.1, 60 sec: 887.4, 300 sec: 833.1). Total num frames: 1261568. Throughput: 0: 841.9. Samples: 1261384. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:24:19,077][62436] Avg episode reward: [(0, '888.851')] [2024-12-13 08:24:24,076][62436] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 833.1). Total num frames: 1265664. Throughput: 0: 824.1. Samples: 1265532. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 08:24:24,076][62436] Avg episode reward: [(0, '910.859')] [2024-12-13 08:24:24,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000002472_1265664.pth... [2024-12-13 08:24:24,090][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000002424_1241088.pth [2024-12-13 08:24:28,257][62492] Updated weights for policy 0, policy_version 2480 (0.0014) [2024-12-13 08:24:29,075][62436] Fps is (10 sec: 819.3, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 1269760. Throughput: 0: 822.8. Samples: 1268284. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-12-13 08:24:29,076][62436] Avg episode reward: [(0, '941.569')] [2024-12-13 08:24:29,077][62473] Saving new best policy, reward=941.569! [2024-12-13 08:24:34,078][62436] Fps is (10 sec: 819.0, 60 sec: 887.4, 300 sec: 833.1). Total num frames: 1273856. Throughput: 0: 832.2. Samples: 1273724. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:24:34,078][62436] Avg episode reward: [(0, '948.350')] [2024-12-13 08:24:34,079][62473] Saving new best policy, reward=948.350! [2024-12-13 08:24:39,076][62436] Fps is (10 sec: 409.6, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1273856. Throughput: 0: 817.3. Samples: 1277624. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:24:39,076][62436] Avg episode reward: [(0, '933.723')] [2024-12-13 08:24:39,135][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000002496_1277952.pth... [2024-12-13 08:24:39,142][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000002448_1253376.pth [2024-12-13 08:24:44,075][62436] Fps is (10 sec: 819.4, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 1282048. Throughput: 0: 815.1. Samples: 1280372. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:24:44,076][62436] Avg episode reward: [(0, '938.866')] [2024-12-13 08:24:49,078][62436] Fps is (10 sec: 1228.5, 60 sec: 887.4, 300 sec: 833.1). Total num frames: 1286144. Throughput: 0: 824.8. Samples: 1285948. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 08:24:49,078][62436] Avg episode reward: [(0, '1002.866')] [2024-12-13 08:24:49,085][62473] Saving new best policy, reward=1002.866! [2024-12-13 08:24:54,076][62436] Fps is (10 sec: 409.6, 60 sec: 751.0, 300 sec: 819.2). Total num frames: 1286144. Throughput: 0: 808.6. Samples: 1289744. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 08:24:54,077][62436] Avg episode reward: [(0, '1021.485')] [2024-12-13 08:24:54,083][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000002512_1286144.pth... [2024-12-13 08:24:54,090][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000002472_1265664.pth [2024-12-13 08:24:54,091][62473] Saving new best policy, reward=1021.485! [2024-12-13 08:24:59,075][62436] Fps is (10 sec: 819.4, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 1294336. Throughput: 0: 805.7. Samples: 1292400. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:24:59,076][62436] Avg episode reward: [(0, '1055.959')] [2024-12-13 08:24:59,077][62473] Saving new best policy, reward=1055.959! [2024-12-13 08:25:04,075][62436] Fps is (10 sec: 1228.9, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 1298432. Throughput: 0: 814.6. Samples: 1298040. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 08:25:04,076][62436] Avg episode reward: [(0, '1053.247')] [2024-12-13 08:25:09,077][62436] Fps is (10 sec: 409.5, 60 sec: 750.9, 300 sec: 819.2). Total num frames: 1298432. Throughput: 0: 814.2. Samples: 1302172. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 08:25:09,078][62436] Avg episode reward: [(0, '1038.846')] [2024-12-13 08:25:09,084][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000002536_1298432.pth... [2024-12-13 08:25:09,101][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000002496_1277952.pth [2024-12-13 08:25:14,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 1306624. Throughput: 0: 806.7. Samples: 1304584. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:25:14,076][62436] Avg episode reward: [(0, '986.202')] [2024-12-13 08:25:18,355][62492] Updated weights for policy 0, policy_version 2560 (0.0012) [2024-12-13 08:25:19,075][62436] Fps is (10 sec: 1229.0, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 1310720. Throughput: 0: 810.6. Samples: 1310200. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 08:25:19,076][62436] Avg episode reward: [(0, '982.853')] [2024-12-13 08:25:24,076][62436] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 819.2). Total num frames: 1310720. Throughput: 0: 821.6. Samples: 1314596. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 08:25:24,076][62436] Avg episode reward: [(0, '1003.816')] [2024-12-13 08:25:24,241][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000002568_1314816.pth... [2024-12-13 08:25:24,246][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000002512_1286144.pth [2024-12-13 08:25:29,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 1318912. Throughput: 0: 807.6. Samples: 1316712. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 08:25:29,076][62436] Avg episode reward: [(0, '1005.390')] [2024-12-13 08:25:34,075][62436] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 1323008. Throughput: 0: 812.2. Samples: 1322496. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:25:34,076][62436] Avg episode reward: [(0, '1026.114')] [2024-12-13 08:25:39,079][62436] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 833.1). Total num frames: 1327104. Throughput: 0: 830.2. Samples: 1327104. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:25:39,079][62436] Avg episode reward: [(0, '1076.072')] [2024-12-13 08:25:39,086][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000002592_1327104.pth... [2024-12-13 08:25:39,093][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000002536_1298432.pth [2024-12-13 08:25:39,099][62473] Saving new best policy, reward=1076.072! [2024-12-13 08:25:44,077][62436] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 1331200. Throughput: 0: 816.4. Samples: 1329140. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 08:25:44,078][62436] Avg episode reward: [(0, '1119.888')] [2024-12-13 08:25:44,079][62473] Saving new best policy, reward=1119.888! [2024-12-13 08:25:49,075][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 1335296. Throughput: 0: 817.0. Samples: 1334804. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2024-12-13 08:25:49,076][62436] Avg episode reward: [(0, '1110.117')] [2024-12-13 08:25:54,076][62436] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 833.1). Total num frames: 1339392. Throughput: 0: 828.9. Samples: 1339472. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:25:54,076][62436] Avg episode reward: [(0, '1137.746')] [2024-12-13 08:25:54,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000002616_1339392.pth... [2024-12-13 08:25:54,091][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000002568_1314816.pth [2024-12-13 08:25:54,092][62473] Saving new best policy, reward=1137.746! [2024-12-13 08:25:59,075][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 1343488. Throughput: 0: 818.0. Samples: 1341392. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:25:59,076][62436] Avg episode reward: [(0, '1152.417')] [2024-12-13 08:25:59,077][62473] Saving new best policy, reward=1152.417! [2024-12-13 08:26:04,075][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 1347584. Throughput: 0: 819.5. Samples: 1347076. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:26:04,076][62436] Avg episode reward: [(0, '1131.771')] [2024-12-13 08:26:08,214][62492] Updated weights for policy 0, policy_version 2640 (0.0017) [2024-12-13 08:26:09,076][62436] Fps is (10 sec: 819.1, 60 sec: 887.5, 300 sec: 833.1). Total num frames: 1351680. Throughput: 0: 830.1. Samples: 1351952. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 08:26:09,077][62436] Avg episode reward: [(0, '1120.002')] [2024-12-13 08:26:09,086][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000002640_1351680.pth... [2024-12-13 08:26:09,095][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000002592_1327104.pth [2024-12-13 08:26:14,075][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 1355776. Throughput: 0: 825.2. Samples: 1353848. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 08:26:14,076][62436] Avg episode reward: [(0, '1094.540')] [2024-12-13 08:26:19,076][62436] Fps is (10 sec: 819.3, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 1359872. Throughput: 0: 818.8. Samples: 1359344. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:26:19,076][62436] Avg episode reward: [(0, '1142.708')] [2024-12-13 08:26:24,077][62436] Fps is (10 sec: 819.1, 60 sec: 887.4, 300 sec: 833.1). Total num frames: 1363968. Throughput: 0: 832.5. Samples: 1364568. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:26:24,078][62436] Avg episode reward: [(0, '1144.865')] [2024-12-13 08:26:24,083][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000002664_1363968.pth... [2024-12-13 08:26:24,095][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000002616_1339392.pth [2024-12-13 08:26:29,076][62436] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 819.2). Total num frames: 1363968. Throughput: 0: 829.6. Samples: 1366472. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:26:29,076][62436] Avg episode reward: [(0, '1146.749')] [2024-12-13 08:26:34,075][62436] Fps is (10 sec: 819.4, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1372160. Throughput: 0: 814.6. Samples: 1371460. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:26:34,076][62436] Avg episode reward: [(0, '1172.742')] [2024-12-13 08:26:34,077][62473] Saving new best policy, reward=1172.742! [2024-12-13 08:26:39,076][62436] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 1376256. Throughput: 0: 837.2. Samples: 1377144. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-12-13 08:26:39,077][62436] Avg episode reward: [(0, '1179.203')] [2024-12-13 08:26:39,086][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000002688_1376256.pth... [2024-12-13 08:26:39,094][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000002640_1351680.pth [2024-12-13 08:26:39,104][62473] Saving new best policy, reward=1179.203! [2024-12-13 08:26:44,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1380352. Throughput: 0: 837.7. Samples: 1379088. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2024-12-13 08:26:44,076][62436] Avg episode reward: [(0, '1156.503')] [2024-12-13 08:26:49,075][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1384448. Throughput: 0: 818.0. Samples: 1383888. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:26:49,076][62436] Avg episode reward: [(0, '1086.769')] [2024-12-13 08:26:54,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 1388544. Throughput: 0: 833.7. Samples: 1389468. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 08:26:54,076][62436] Avg episode reward: [(0, '1066.063')] [2024-12-13 08:26:54,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000002712_1388544.pth... [2024-12-13 08:26:54,090][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000002664_1363968.pth [2024-12-13 08:26:58,196][62492] Updated weights for policy 0, policy_version 2720 (0.0011) [2024-12-13 08:26:59,075][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1392640. Throughput: 0: 842.6. Samples: 1391764. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 08:26:59,076][62436] Avg episode reward: [(0, '1079.580')] [2024-12-13 08:27:04,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1396736. Throughput: 0: 819.6. Samples: 1396224. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:27:04,076][62436] Avg episode reward: [(0, '1040.528')] [2024-12-13 08:27:09,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1400832. Throughput: 0: 828.6. Samples: 1401852. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:27:09,076][62436] Avg episode reward: [(0, '1027.192')] [2024-12-13 08:27:09,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000002736_1400832.pth... [2024-12-13 08:27:09,087][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000002688_1376256.pth [2024-12-13 08:27:14,075][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1404928. Throughput: 0: 845.3. Samples: 1404512. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:27:14,076][62436] Avg episode reward: [(0, '1009.109')] [2024-12-13 08:27:19,075][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1409024. Throughput: 0: 827.3. Samples: 1408688. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:27:19,076][62436] Avg episode reward: [(0, '954.952')] [2024-12-13 08:27:24,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1413120. Throughput: 0: 826.4. Samples: 1414332. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 08:27:24,076][62436] Avg episode reward: [(0, '963.205')] [2024-12-13 08:27:24,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000002760_1413120.pth... [2024-12-13 08:27:24,091][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000002712_1388544.pth [2024-12-13 08:27:29,078][62436] Fps is (10 sec: 819.0, 60 sec: 887.4, 300 sec: 819.2). Total num frames: 1417216. Throughput: 0: 845.9. Samples: 1417156. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:27:29,079][62436] Avg episode reward: [(0, '939.105')] [2024-12-13 08:27:34,077][62436] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1421312. Throughput: 0: 827.8. Samples: 1421140. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:27:34,077][62436] Avg episode reward: [(0, '907.928')] [2024-12-13 08:27:39,076][62436] Fps is (10 sec: 819.4, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1425408. Throughput: 0: 819.1. Samples: 1426328. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 08:27:39,076][62436] Avg episode reward: [(0, '920.261')] [2024-12-13 08:27:39,085][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000002784_1425408.pth... [2024-12-13 08:27:39,092][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000002736_1400832.pth [2024-12-13 08:27:44,075][62436] Fps is (10 sec: 819.3, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1429504. Throughput: 0: 812.3. Samples: 1428316. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:27:44,076][62436] Avg episode reward: [(0, '904.141')] [2024-12-13 08:27:49,077][62436] Fps is (10 sec: 409.5, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 1429504. Throughput: 0: 791.0. Samples: 1431820. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:27:49,078][62436] Avg episode reward: [(0, '873.595')] [2024-12-13 08:27:49,880][62492] Updated weights for policy 0, policy_version 2800 (0.0014) [2024-12-13 08:27:54,076][62436] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 1433600. Throughput: 0: 787.3. Samples: 1437280. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:27:54,076][62436] Avg episode reward: [(0, '878.711')] [2024-12-13 08:27:54,084][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000002800_1433600.pth... [2024-12-13 08:27:54,089][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000002760_1413120.pth [2024-12-13 08:27:59,075][62436] Fps is (10 sec: 1229.0, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1441792. Throughput: 0: 790.1. Samples: 1440068. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-12-13 08:27:59,076][62436] Avg episode reward: [(0, '869.815')] [2024-12-13 08:28:04,081][62436] Fps is (10 sec: 818.7, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 1441792. Throughput: 0: 796.0. Samples: 1444512. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-12-13 08:28:04,082][62436] Avg episode reward: [(0, '850.538')] [2024-12-13 08:28:09,076][62436] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 1445888. Throughput: 0: 784.7. Samples: 1449644. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:28:09,076][62436] Avg episode reward: [(0, '873.829')] [2024-12-13 08:28:09,119][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000002832_1449984.pth... [2024-12-13 08:28:09,124][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000002784_1425408.pth [2024-12-13 08:28:14,076][62436] Fps is (10 sec: 1229.5, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 1454080. Throughput: 0: 783.1. Samples: 1452392. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 08:28:14,076][62436] Avg episode reward: [(0, '867.175')] [2024-12-13 08:28:19,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 819.2). Total num frames: 1454080. Throughput: 0: 800.0. Samples: 1457140. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 08:28:19,076][62436] Avg episode reward: [(0, '890.982')] [2024-12-13 08:28:24,076][62436] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 1458176. Throughput: 0: 790.4. Samples: 1461896. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:28:24,076][62436] Avg episode reward: [(0, '904.679')] [2024-12-13 08:28:24,154][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000002856_1462272.pth... [2024-12-13 08:28:24,161][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000002800_1433600.pth [2024-12-13 08:28:29,075][62436] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 1466368. Throughput: 0: 808.2. Samples: 1464684. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:28:29,076][62436] Avg episode reward: [(0, '906.788')] [2024-12-13 08:28:34,076][62436] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 1470464. Throughput: 0: 843.9. Samples: 1469796. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:28:34,076][62436] Avg episode reward: [(0, '906.430')] [2024-12-13 08:28:38,983][62492] Updated weights for policy 0, policy_version 2880 (0.0011) [2024-12-13 08:28:39,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1474560. Throughput: 0: 823.3. Samples: 1474328. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 08:28:39,076][62436] Avg episode reward: [(0, '923.280')] [2024-12-13 08:28:39,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000002880_1474560.pth... [2024-12-13 08:28:39,093][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000002832_1449984.pth [2024-12-13 08:28:44,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 1478656. Throughput: 0: 822.6. Samples: 1477084. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:28:44,076][62436] Avg episode reward: [(0, '950.427')] [2024-12-13 08:28:49,076][62436] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 819.2). Total num frames: 1482752. Throughput: 0: 841.0. Samples: 1482352. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:28:49,077][62436] Avg episode reward: [(0, '966.248')] [2024-12-13 08:28:54,076][62436] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 819.2). Total num frames: 1486848. Throughput: 0: 821.2. Samples: 1486600. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:28:54,076][62436] Avg episode reward: [(0, '971.030')] [2024-12-13 08:28:54,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000002904_1486848.pth... [2024-12-13 08:28:54,089][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000002856_1462272.pth [2024-12-13 08:28:59,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1490944. Throughput: 0: 823.2. Samples: 1489436. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 08:28:59,076][62436] Avg episode reward: [(0, '949.359')] [2024-12-13 08:29:04,076][62436] Fps is (10 sec: 819.1, 60 sec: 887.5, 300 sec: 819.2). Total num frames: 1495040. Throughput: 0: 839.1. Samples: 1494900. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2024-12-13 08:29:04,077][62436] Avg episode reward: [(0, '981.428')] [2024-12-13 08:29:09,076][62436] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 819.2). Total num frames: 1499136. Throughput: 0: 822.7. Samples: 1498916. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:29:09,076][62436] Avg episode reward: [(0, '980.083')] [2024-12-13 08:29:09,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000002928_1499136.pth... [2024-12-13 08:29:09,091][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000002880_1474560.pth [2024-12-13 08:29:14,076][62436] Fps is (10 sec: 819.3, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1503232. Throughput: 0: 822.3. Samples: 1501688. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-12-13 08:29:14,076][62436] Avg episode reward: [(0, '981.521')] [2024-12-13 08:29:19,075][62436] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 819.2). Total num frames: 1507328. Throughput: 0: 834.0. Samples: 1507324. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 08:29:19,076][62436] Avg episode reward: [(0, '990.285')] [2024-12-13 08:29:24,076][62436] Fps is (10 sec: 409.6, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 1507328. Throughput: 0: 819.2. Samples: 1511192. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 08:29:24,076][62436] Avg episode reward: [(0, '979.794')] [2024-12-13 08:29:24,089][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000002952_1511424.pth... [2024-12-13 08:29:24,098][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000002904_1486848.pth [2024-12-13 08:29:28,447][62492] Updated weights for policy 0, policy_version 2960 (0.0013) [2024-12-13 08:29:29,075][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1515520. Throughput: 0: 817.7. Samples: 1513880. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 08:29:29,076][62436] Avg episode reward: [(0, '976.782')] [2024-12-13 08:29:34,075][62436] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 1519616. Throughput: 0: 828.1. Samples: 1519616. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:29:34,076][62436] Avg episode reward: [(0, '987.982')] [2024-12-13 08:29:39,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1523712. Throughput: 0: 824.7. Samples: 1523712. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:29:39,076][62436] Avg episode reward: [(0, '959.585')] [2024-12-13 08:29:39,086][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000002976_1523712.pth... [2024-12-13 08:29:39,093][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000002928_1499136.pth [2024-12-13 08:29:44,075][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1527808. Throughput: 0: 821.4. Samples: 1526400. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-12-13 08:29:44,076][62436] Avg episode reward: [(0, '939.510')] [2024-12-13 08:29:49,075][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 1531904. Throughput: 0: 823.7. Samples: 1531964. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:29:49,076][62436] Avg episode reward: [(0, '940.745')] [2024-12-13 08:29:54,079][62436] Fps is (10 sec: 818.9, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1536000. Throughput: 0: 831.9. Samples: 1536356. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-12-13 08:29:54,080][62436] Avg episode reward: [(0, '984.586')] [2024-12-13 08:29:54,085][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000003000_1536000.pth... [2024-12-13 08:29:54,102][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000002952_1511424.pth [2024-12-13 08:29:59,075][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1540096. Throughput: 0: 820.9. Samples: 1538628. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:29:59,076][62436] Avg episode reward: [(0, '1052.918')] [2024-12-13 08:30:04,076][62436] Fps is (10 sec: 819.5, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 1544192. Throughput: 0: 819.3. Samples: 1544192. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2024-12-13 08:30:04,076][62436] Avg episode reward: [(0, '1096.945')] [2024-12-13 08:30:09,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1548288. Throughput: 0: 835.6. Samples: 1548792. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 08:30:09,076][62436] Avg episode reward: [(0, '1138.099')] [2024-12-13 08:30:09,084][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000003024_1548288.pth... [2024-12-13 08:30:09,097][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000002976_1523712.pth [2024-12-13 08:30:14,076][62436] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1552384. Throughput: 0: 821.9. Samples: 1550864. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:30:14,077][62436] Avg episode reward: [(0, '1202.302')] [2024-12-13 08:30:14,078][62473] Saving new best policy, reward=1202.302! [2024-12-13 08:30:17,676][62492] Updated weights for policy 0, policy_version 3040 (0.0011) [2024-12-13 08:30:19,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 1556480. Throughput: 0: 819.2. Samples: 1556480. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 08:30:19,077][62436] Avg episode reward: [(0, '1176.778')] [2024-12-13 08:30:24,075][62436] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 819.2). Total num frames: 1560576. Throughput: 0: 834.8. Samples: 1561276. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-12-13 08:30:24,076][62436] Avg episode reward: [(0, '1110.835')] [2024-12-13 08:30:24,084][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000003048_1560576.pth... [2024-12-13 08:30:24,090][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000003000_1536000.pth [2024-12-13 08:30:29,075][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1564672. Throughput: 0: 818.0. Samples: 1563208. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:30:29,076][62436] Avg episode reward: [(0, '1094.194')] [2024-12-13 08:30:34,075][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1568768. Throughput: 0: 819.2. Samples: 1568828. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:30:34,076][62436] Avg episode reward: [(0, '1087.779')] [2024-12-13 08:30:39,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1572864. Throughput: 0: 836.9. Samples: 1574012. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:30:39,076][62436] Avg episode reward: [(0, '1092.622')] [2024-12-13 08:30:39,084][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000003072_1572864.pth... [2024-12-13 08:30:39,093][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000003024_1548288.pth [2024-12-13 08:30:44,075][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1576960. Throughput: 0: 829.7. Samples: 1575964. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 08:30:44,076][62436] Avg episode reward: [(0, '1095.331')] [2024-12-13 08:30:49,078][62436] Fps is (10 sec: 819.0, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1581056. Throughput: 0: 822.7. Samples: 1581216. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:30:49,078][62436] Avg episode reward: [(0, '1163.135')] [2024-12-13 08:30:54,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1585152. Throughput: 0: 841.5. Samples: 1586660. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:30:54,080][62436] Avg episode reward: [(0, '1168.109')] [2024-12-13 08:30:54,089][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000003096_1585152.pth... [2024-12-13 08:30:54,101][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000003048_1560576.pth [2024-12-13 08:30:59,076][62436] Fps is (10 sec: 819.4, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1589248. Throughput: 0: 838.9. Samples: 1588612. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:30:59,076][62436] Avg episode reward: [(0, '1123.696')] [2024-12-13 08:31:04,075][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1593344. Throughput: 0: 826.5. Samples: 1593672. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 08:31:04,076][62436] Avg episode reward: [(0, '1131.436')] [2024-12-13 08:31:06,629][62492] Updated weights for policy 0, policy_version 3120 (0.0013) [2024-12-13 08:31:09,079][62436] Fps is (10 sec: 818.9, 60 sec: 819.1, 300 sec: 819.2). Total num frames: 1597440. Throughput: 0: 845.2. Samples: 1599312. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-12-13 08:31:09,080][62436] Avg episode reward: [(0, '1101.778')] [2024-12-13 08:31:09,091][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000003120_1597440.pth... [2024-12-13 08:31:09,099][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000003072_1572864.pth [2024-12-13 08:31:14,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1601536. Throughput: 0: 846.1. Samples: 1601284. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:31:14,076][62436] Avg episode reward: [(0, '1093.653')] [2024-12-13 08:31:19,075][62436] Fps is (10 sec: 819.5, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1605632. Throughput: 0: 827.5. Samples: 1606064. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-12-13 08:31:19,076][62436] Avg episode reward: [(0, '1083.147')] [2024-12-13 08:31:24,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 1609728. Throughput: 0: 841.3. Samples: 1611872. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:31:24,076][62436] Avg episode reward: [(0, '1068.604')] [2024-12-13 08:31:24,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000003144_1609728.pth... [2024-12-13 08:31:24,092][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000003096_1585152.pth [2024-12-13 08:31:29,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1613824. Throughput: 0: 843.7. Samples: 1613932. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 08:31:29,076][62436] Avg episode reward: [(0, '1083.527')] [2024-12-13 08:31:34,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1617920. Throughput: 0: 828.0. Samples: 1618472. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:31:34,076][62436] Avg episode reward: [(0, '1111.568')] [2024-12-13 08:31:39,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1622016. Throughput: 0: 837.4. Samples: 1624344. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 08:31:39,076][62436] Avg episode reward: [(0, '1086.940')] [2024-12-13 08:31:39,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000003168_1622016.pth... [2024-12-13 08:31:39,090][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000003120_1597440.pth [2024-12-13 08:31:44,075][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1626112. Throughput: 0: 845.1. Samples: 1626640. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:31:44,076][62436] Avg episode reward: [(0, '1055.858')] [2024-12-13 08:31:49,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1630208. Throughput: 0: 829.6. Samples: 1631004. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:31:49,076][62436] Avg episode reward: [(0, '1060.383')] [2024-12-13 08:31:54,078][62436] Fps is (10 sec: 819.0, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1634304. Throughput: 0: 833.9. Samples: 1636836. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:31:54,078][62436] Avg episode reward: [(0, '1092.648')] [2024-12-13 08:31:54,084][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000003192_1634304.pth... [2024-12-13 08:31:54,090][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000003144_1609728.pth [2024-12-13 08:31:55,384][62492] Updated weights for policy 0, policy_version 3200 (0.0014) [2024-12-13 08:31:59,075][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1638400. Throughput: 0: 845.2. Samples: 1639320. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:31:59,076][62436] Avg episode reward: [(0, '1079.793')] [2024-12-13 08:32:04,076][62436] Fps is (10 sec: 819.4, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1642496. Throughput: 0: 829.2. Samples: 1643376. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-12-13 08:32:04,076][62436] Avg episode reward: [(0, '1083.052')] [2024-12-13 08:32:09,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1646592. Throughput: 0: 819.4. Samples: 1648744. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:32:09,076][62436] Avg episode reward: [(0, '1080.994')] [2024-12-13 08:32:09,084][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000003216_1646592.pth... [2024-12-13 08:32:09,094][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000003168_1622016.pth [2024-12-13 08:32:14,077][62436] Fps is (10 sec: 819.0, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1650688. Throughput: 0: 815.3. Samples: 1650620. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:32:14,079][62436] Avg episode reward: [(0, '1091.377')] [2024-12-13 08:32:19,075][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1654784. Throughput: 0: 791.3. Samples: 1654080. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:32:19,076][62436] Avg episode reward: [(0, '1027.460')] [2024-12-13 08:32:24,076][62436] Fps is (10 sec: 819.3, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1658880. Throughput: 0: 780.9. Samples: 1659484. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 08:32:24,077][62436] Avg episode reward: [(0, '1032.454')] [2024-12-13 08:32:24,087][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000003240_1658880.pth... [2024-12-13 08:32:24,093][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000003192_1634304.pth [2024-12-13 08:32:29,075][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1662976. Throughput: 0: 795.5. Samples: 1662436. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-12-13 08:32:29,076][62436] Avg episode reward: [(0, '1027.534')] [2024-12-13 08:32:34,078][62436] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1667072. Throughput: 0: 796.6. Samples: 1666852. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 08:32:34,078][62436] Avg episode reward: [(0, '1044.413')] [2024-12-13 08:32:39,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1671168. Throughput: 0: 779.2. Samples: 1671896. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 08:32:39,076][62436] Avg episode reward: [(0, '1084.134')] [2024-12-13 08:32:39,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000003264_1671168.pth... [2024-12-13 08:32:39,089][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000003216_1646592.pth [2024-12-13 08:32:44,076][62436] Fps is (10 sec: 819.4, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 1675264. Throughput: 0: 790.4. Samples: 1674888. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:32:44,076][62436] Avg episode reward: [(0, '1123.891')] [2024-12-13 08:32:47,819][62492] Updated weights for policy 0, policy_version 3280 (0.0012) [2024-12-13 08:32:49,075][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 1679360. Throughput: 0: 799.3. Samples: 1679344. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 08:32:49,076][62436] Avg episode reward: [(0, '1138.586')] [2024-12-13 08:32:54,076][62436] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1683456. Throughput: 0: 787.5. Samples: 1684180. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-12-13 08:32:54,077][62436] Avg episode reward: [(0, '1187.776')] [2024-12-13 08:32:54,087][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000003288_1683456.pth... [2024-12-13 08:32:54,097][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000003240_1658880.pth [2024-12-13 08:32:59,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 1687552. Throughput: 0: 807.9. Samples: 1686976. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:32:59,076][62436] Avg episode reward: [(0, '1210.715')] [2024-12-13 08:32:59,077][62473] Saving new best policy, reward=1210.715! [2024-12-13 08:33:04,075][62436] Fps is (10 sec: 819.3, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 1691648. Throughput: 0: 833.8. Samples: 1691600. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:33:04,079][62436] Avg episode reward: [(0, '1206.229')] [2024-12-13 08:33:09,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1695744. Throughput: 0: 819.7. Samples: 1696368. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:33:09,076][62436] Avg episode reward: [(0, '1181.217')] [2024-12-13 08:33:09,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000003312_1695744.pth... [2024-12-13 08:33:09,088][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000003264_1671168.pth [2024-12-13 08:33:14,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 1699840. Throughput: 0: 819.8. Samples: 1699328. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:33:14,076][62436] Avg episode reward: [(0, '1175.917')] [2024-12-13 08:33:19,080][62436] Fps is (10 sec: 818.8, 60 sec: 819.1, 300 sec: 833.1). Total num frames: 1703936. Throughput: 0: 825.8. Samples: 1704016. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 08:33:19,081][62436] Avg episode reward: [(0, '1187.670')] [2024-12-13 08:33:24,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1708032. Throughput: 0: 817.8. Samples: 1708696. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2024-12-13 08:33:24,078][62436] Avg episode reward: [(0, '1221.084')] [2024-12-13 08:33:24,084][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000003336_1708032.pth... [2024-12-13 08:33:24,093][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000003288_1683456.pth [2024-12-13 08:33:24,094][62473] Saving new best policy, reward=1221.084! [2024-12-13 08:33:29,075][62436] Fps is (10 sec: 819.6, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1712128. Throughput: 0: 817.3. Samples: 1711668. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:33:29,076][62436] Avg episode reward: [(0, '1208.545')] [2024-12-13 08:33:34,075][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1716224. Throughput: 0: 828.4. Samples: 1716620. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:33:34,076][62436] Avg episode reward: [(0, '1176.275')] [2024-12-13 08:33:37,835][62492] Updated weights for policy 0, policy_version 3360 (0.0011) [2024-12-13 08:33:39,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1720320. Throughput: 0: 819.7. Samples: 1721064. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:33:39,076][62436] Avg episode reward: [(0, '1210.898')] [2024-12-13 08:33:39,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000003360_1720320.pth... [2024-12-13 08:33:39,090][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000003312_1695744.pth [2024-12-13 08:33:44,075][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1724416. Throughput: 0: 824.0. Samples: 1724056. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:33:44,076][62436] Avg episode reward: [(0, '1195.852')] [2024-12-13 08:33:49,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1728512. Throughput: 0: 835.6. Samples: 1729200. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 08:33:49,076][62436] Avg episode reward: [(0, '1194.692')] [2024-12-13 08:33:54,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1732608. Throughput: 0: 821.2. Samples: 1733324. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:33:54,076][62436] Avg episode reward: [(0, '1204.770')] [2024-12-13 08:33:54,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000003384_1732608.pth... [2024-12-13 08:33:54,089][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000003336_1708032.pth [2024-12-13 08:33:59,075][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1736704. Throughput: 0: 822.5. Samples: 1736340. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:33:59,076][62436] Avg episode reward: [(0, '1213.666')] [2024-12-13 08:34:04,075][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1740800. Throughput: 0: 842.8. Samples: 1741940. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 08:34:04,076][62436] Avg episode reward: [(0, '1214.399')] [2024-12-13 08:34:09,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1744896. Throughput: 0: 826.0. Samples: 1745868. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 08:34:09,076][62436] Avg episode reward: [(0, '1232.618')] [2024-12-13 08:34:09,083][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000003408_1744896.pth... [2024-12-13 08:34:09,089][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000003360_1720320.pth [2024-12-13 08:34:09,090][62473] Saving new best policy, reward=1232.618! [2024-12-13 08:34:14,075][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1748992. Throughput: 0: 825.0. Samples: 1748792. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-12-13 08:34:14,076][62436] Avg episode reward: [(0, '1143.072')] [2024-12-13 08:34:19,081][62436] Fps is (10 sec: 818.7, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 1753088. Throughput: 0: 840.8. Samples: 1754460. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:34:19,082][62436] Avg episode reward: [(0, '1120.563')] [2024-12-13 08:34:24,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1757184. Throughput: 0: 829.9. Samples: 1758408. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:34:24,077][62436] Avg episode reward: [(0, '1074.800')] [2024-12-13 08:34:24,088][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000003432_1757184.pth... [2024-12-13 08:34:24,100][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000003384_1732608.pth [2024-12-13 08:34:26,761][62492] Updated weights for policy 0, policy_version 3440 (0.0012) [2024-12-13 08:34:29,076][62436] Fps is (10 sec: 819.7, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1761280. Throughput: 0: 826.0. Samples: 1761228. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:34:29,076][62436] Avg episode reward: [(0, '1085.266')] [2024-12-13 08:34:34,077][62436] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1765376. Throughput: 0: 839.0. Samples: 1766956. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 08:34:34,078][62436] Avg episode reward: [(0, '1032.979')] [2024-12-13 08:34:39,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1769472. Throughput: 0: 841.5. Samples: 1771192. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 08:34:39,076][62436] Avg episode reward: [(0, '1059.717')] [2024-12-13 08:34:39,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000003456_1769472.pth... [2024-12-13 08:34:39,090][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000003408_1744896.pth [2024-12-13 08:34:44,075][62436] Fps is (10 sec: 819.3, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1773568. Throughput: 0: 828.8. Samples: 1773636. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:34:44,076][62436] Avg episode reward: [(0, '1093.263')] [2024-12-13 08:34:49,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1777664. Throughput: 0: 833.0. Samples: 1779424. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 08:34:49,076][62436] Avg episode reward: [(0, '1125.343')] [2024-12-13 08:34:54,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1781760. Throughput: 0: 844.3. Samples: 1783860. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:34:54,076][62436] Avg episode reward: [(0, '1118.318')] [2024-12-13 08:34:54,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000003480_1781760.pth... [2024-12-13 08:34:54,090][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000003432_1757184.pth [2024-12-13 08:34:59,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1785856. Throughput: 0: 826.7. Samples: 1785992. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:34:59,076][62436] Avg episode reward: [(0, '1134.846')] [2024-12-13 08:35:04,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1789952. Throughput: 0: 830.2. Samples: 1791812. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:35:04,076][62436] Avg episode reward: [(0, '1171.258')] [2024-12-13 08:35:09,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1794048. Throughput: 0: 846.5. Samples: 1796500. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 08:35:09,076][62436] Avg episode reward: [(0, '1194.015')] [2024-12-13 08:35:09,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000003504_1794048.pth... [2024-12-13 08:35:09,094][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000003456_1769472.pth [2024-12-13 08:35:14,075][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1798144. Throughput: 0: 824.9. Samples: 1798348. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 08:35:14,076][62436] Avg episode reward: [(0, '1241.973')] [2024-12-13 08:35:14,077][62473] Saving new best policy, reward=1241.973! [2024-12-13 08:35:15,685][62492] Updated weights for policy 0, policy_version 3520 (0.0013) [2024-12-13 08:35:19,075][62436] Fps is (10 sec: 819.2, 60 sec: 819.3, 300 sec: 819.2). Total num frames: 1802240. Throughput: 0: 825.6. Samples: 1804108. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:35:19,076][62436] Avg episode reward: [(0, '1259.695')] [2024-12-13 08:35:19,077][62473] Saving new best policy, reward=1259.695! [2024-12-13 08:35:24,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1806336. Throughput: 0: 842.0. Samples: 1809080. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:35:24,076][62436] Avg episode reward: [(0, '1302.911')] [2024-12-13 08:35:24,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000003528_1806336.pth... [2024-12-13 08:35:24,092][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000003480_1781760.pth [2024-12-13 08:35:24,095][62473] Saving new best policy, reward=1302.911! [2024-12-13 08:35:29,075][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1810432. Throughput: 0: 826.7. Samples: 1810836. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:35:29,076][62436] Avg episode reward: [(0, '1329.907')] [2024-12-13 08:35:29,078][62473] Saving new best policy, reward=1329.907! [2024-12-13 08:35:34,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1814528. Throughput: 0: 823.2. Samples: 1816468. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:35:34,076][62436] Avg episode reward: [(0, '1360.939')] [2024-12-13 08:35:34,077][62473] Saving new best policy, reward=1360.939! [2024-12-13 08:35:39,080][62436] Fps is (10 sec: 819.0, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1818624. Throughput: 0: 841.0. Samples: 1821708. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:35:39,081][62436] Avg episode reward: [(0, '1378.536')] [2024-12-13 08:35:39,087][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000003552_1818624.pth... [2024-12-13 08:35:39,099][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000003504_1794048.pth [2024-12-13 08:35:39,105][62473] Saving new best policy, reward=1378.536! [2024-12-13 08:35:44,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1822720. Throughput: 0: 832.2. Samples: 1823440. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-12-13 08:35:44,076][62436] Avg episode reward: [(0, '1350.114')] [2024-12-13 08:35:49,076][62436] Fps is (10 sec: 819.3, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1826816. Throughput: 0: 821.0. Samples: 1828756. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-12-13 08:35:49,077][62436] Avg episode reward: [(0, '1354.244')] [2024-12-13 08:35:54,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1830912. Throughput: 0: 837.6. Samples: 1834192. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:35:54,076][62436] Avg episode reward: [(0, '1360.540')] [2024-12-13 08:35:54,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000003576_1830912.pth... [2024-12-13 08:35:54,090][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000003528_1806336.pth [2024-12-13 08:35:59,076][62436] Fps is (10 sec: 819.3, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1835008. Throughput: 0: 836.0. Samples: 1835968. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:35:59,076][62436] Avg episode reward: [(0, '1387.389')] [2024-12-13 08:35:59,077][62473] Saving new best policy, reward=1387.389! [2024-12-13 08:36:04,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1839104. Throughput: 0: 821.6. Samples: 1841080. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 08:36:04,076][62436] Avg episode reward: [(0, '1383.484')] [2024-12-13 08:36:04,936][62492] Updated weights for policy 0, policy_version 3600 (0.0013) [2024-12-13 08:36:09,078][62436] Fps is (10 sec: 819.0, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1843200. Throughput: 0: 839.1. Samples: 1846844. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:36:09,079][62436] Avg episode reward: [(0, '1376.045')] [2024-12-13 08:36:09,088][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000003600_1843200.pth... [2024-12-13 08:36:09,101][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000003552_1818624.pth [2024-12-13 08:36:14,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1847296. Throughput: 0: 839.7. Samples: 1848624. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:36:14,077][62436] Avg episode reward: [(0, '1381.370')] [2024-12-13 08:36:19,076][62436] Fps is (10 sec: 819.4, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1851392. Throughput: 0: 820.4. Samples: 1853384. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 08:36:19,076][62436] Avg episode reward: [(0, '1355.567')] [2024-12-13 08:36:24,076][62436] Fps is (10 sec: 819.3, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1855488. Throughput: 0: 829.1. Samples: 1859016. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 08:36:24,076][62436] Avg episode reward: [(0, '1416.103')] [2024-12-13 08:36:24,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000003624_1855488.pth... [2024-12-13 08:36:24,089][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000003576_1830912.pth [2024-12-13 08:36:24,093][62473] Saving new best policy, reward=1416.103! [2024-12-13 08:36:29,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1859584. Throughput: 0: 837.8. Samples: 1861140. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-12-13 08:36:29,076][62436] Avg episode reward: [(0, '1393.802')] [2024-12-13 08:36:34,075][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1863680. Throughput: 0: 816.6. Samples: 1865504. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:36:34,076][62436] Avg episode reward: [(0, '1334.245')] [2024-12-13 08:36:39,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1867776. Throughput: 0: 824.4. Samples: 1871288. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 08:36:39,076][62436] Avg episode reward: [(0, '1338.313')] [2024-12-13 08:36:39,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000003648_1867776.pth... [2024-12-13 08:36:39,086][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000003600_1843200.pth [2024-12-13 08:36:44,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1871872. Throughput: 0: 824.8. Samples: 1873084. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:36:44,076][62436] Avg episode reward: [(0, '1321.844')] [2024-12-13 08:36:49,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1875968. Throughput: 0: 775.3. Samples: 1875968. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 08:36:49,076][62436] Avg episode reward: [(0, '1314.531')] [2024-12-13 08:36:54,076][62436] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1880064. Throughput: 0: 766.6. Samples: 1881340. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 08:36:54,077][62436] Avg episode reward: [(0, '1287.032')] [2024-12-13 08:36:54,083][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000003672_1880064.pth... [2024-12-13 08:36:54,089][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000003624_1855488.pth [2024-12-13 08:36:56,555][62492] Updated weights for policy 0, policy_version 3680 (0.0011) [2024-12-13 08:36:59,075][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1884160. Throughput: 0: 791.1. Samples: 1884224. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:36:59,076][62436] Avg episode reward: [(0, '1250.712')] [2024-12-13 08:37:04,076][62436] Fps is (10 sec: 819.3, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1888256. Throughput: 0: 781.0. Samples: 1888528. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:37:04,076][62436] Avg episode reward: [(0, '1240.358')] [2024-12-13 08:37:09,077][62436] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1892352. Throughput: 0: 772.9. Samples: 1893796. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 08:37:09,078][62436] Avg episode reward: [(0, '1162.272')] [2024-12-13 08:37:09,083][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000003696_1892352.pth... [2024-12-13 08:37:09,089][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000003648_1867776.pth [2024-12-13 08:37:14,075][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1896448. Throughput: 0: 786.8. Samples: 1896548. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:37:14,076][62436] Avg episode reward: [(0, '1137.719')] [2024-12-13 08:37:19,075][62436] Fps is (10 sec: 819.3, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1900544. Throughput: 0: 794.8. Samples: 1901268. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 08:37:19,079][62436] Avg episode reward: [(0, '1119.971')] [2024-12-13 08:37:24,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1904640. Throughput: 0: 775.6. Samples: 1906192. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:37:24,076][62436] Avg episode reward: [(0, '1100.348')] [2024-12-13 08:37:24,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000003720_1904640.pth... [2024-12-13 08:37:24,090][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000003672_1880064.pth [2024-12-13 08:37:29,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1908736. Throughput: 0: 797.9. Samples: 1908988. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:37:29,076][62436] Avg episode reward: [(0, '1112.233')] [2024-12-13 08:37:34,075][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1912832. Throughput: 0: 845.7. Samples: 1914024. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:37:34,076][62436] Avg episode reward: [(0, '1109.943')] [2024-12-13 08:37:39,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1916928. Throughput: 0: 829.3. Samples: 1918660. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:37:39,076][62436] Avg episode reward: [(0, '1100.494')] [2024-12-13 08:37:39,083][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000003744_1916928.pth... [2024-12-13 08:37:39,092][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000003696_1892352.pth [2024-12-13 08:37:44,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1921024. Throughput: 0: 828.2. Samples: 1921492. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:37:44,076][62436] Avg episode reward: [(0, '1089.199')] [2024-12-13 08:37:45,464][62492] Updated weights for policy 0, policy_version 3760 (0.0011) [2024-12-13 08:37:49,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1925120. Throughput: 0: 848.4. Samples: 1926704. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:37:49,076][62436] Avg episode reward: [(0, '1080.743')] [2024-12-13 08:37:54,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1929216. Throughput: 0: 828.2. Samples: 1931064. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:37:54,076][62436] Avg episode reward: [(0, '1108.400')] [2024-12-13 08:37:54,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000003768_1929216.pth... [2024-12-13 08:37:54,090][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000003720_1904640.pth [2024-12-13 08:37:59,075][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1933312. Throughput: 0: 830.9. Samples: 1933940. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 08:37:59,076][62436] Avg episode reward: [(0, '1152.590')] [2024-12-13 08:38:04,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1937408. Throughput: 0: 850.4. Samples: 1939536. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:38:04,076][62436] Avg episode reward: [(0, '1143.762')] [2024-12-13 08:38:09,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1941504. Throughput: 0: 831.2. Samples: 1943596. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:38:09,076][62436] Avg episode reward: [(0, '1207.046')] [2024-12-13 08:38:09,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000003792_1941504.pth... [2024-12-13 08:38:09,092][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000003744_1916928.pth [2024-12-13 08:38:14,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1945600. Throughput: 0: 831.8. Samples: 1946420. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:38:14,076][62436] Avg episode reward: [(0, '1213.355')] [2024-12-13 08:38:19,076][62436] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1949696. Throughput: 0: 848.5. Samples: 1952208. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:38:19,077][62436] Avg episode reward: [(0, '1289.407')] [2024-12-13 08:38:24,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1953792. Throughput: 0: 829.7. Samples: 1955996. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 08:38:24,076][62436] Avg episode reward: [(0, '1298.801')] [2024-12-13 08:38:24,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000003816_1953792.pth... [2024-12-13 08:38:24,090][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000003768_1929216.pth [2024-12-13 08:38:29,076][62436] Fps is (10 sec: 819.3, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1957888. Throughput: 0: 829.6. Samples: 1958824. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 08:38:29,076][62436] Avg episode reward: [(0, '1252.706')] [2024-12-13 08:38:34,075][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1961984. Throughput: 0: 843.6. Samples: 1964664. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 08:38:34,076][62436] Avg episode reward: [(0, '1286.070')] [2024-12-13 08:38:34,727][62492] Updated weights for policy 0, policy_version 3840 (0.0013) [2024-12-13 08:38:39,076][62436] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1966080. Throughput: 0: 835.1. Samples: 1968644. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:38:39,077][62436] Avg episode reward: [(0, '1290.517')] [2024-12-13 08:38:39,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000003840_1966080.pth... [2024-12-13 08:38:39,093][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000003792_1941504.pth [2024-12-13 08:38:44,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1970176. Throughput: 0: 828.5. Samples: 1971224. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:38:44,076][62436] Avg episode reward: [(0, '1321.052')] [2024-12-13 08:38:49,077][62436] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1974272. Throughput: 0: 834.6. Samples: 1977096. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:38:49,078][62436] Avg episode reward: [(0, '1304.247')] [2024-12-13 08:38:54,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1978368. Throughput: 0: 837.3. Samples: 1981276. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 08:38:54,076][62436] Avg episode reward: [(0, '1306.024')] [2024-12-13 08:38:54,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000003864_1978368.pth... [2024-12-13 08:38:54,090][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000003816_1953792.pth [2024-12-13 08:38:59,076][62436] Fps is (10 sec: 819.4, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1982464. Throughput: 0: 825.2. Samples: 1983552. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:38:59,076][62436] Avg episode reward: [(0, '1274.218')] [2024-12-13 08:39:04,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1986560. Throughput: 0: 825.1. Samples: 1989336. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 08:39:04,077][62436] Avg episode reward: [(0, '1302.152')] [2024-12-13 08:39:09,081][62436] Fps is (10 sec: 818.7, 60 sec: 819.1, 300 sec: 819.2). Total num frames: 1990656. Throughput: 0: 845.5. Samples: 1994048. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 08:39:09,084][62436] Avg episode reward: [(0, '1330.026')] [2024-12-13 08:39:09,089][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000003888_1990656.pth... [2024-12-13 08:39:09,101][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000003840_1966080.pth [2024-12-13 08:39:14,077][62436] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 1994752. Throughput: 0: 825.7. Samples: 1995980. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:39:14,078][62436] Avg episode reward: [(0, '1344.245')] [2024-12-13 08:39:19,076][62436] Fps is (10 sec: 1229.5, 60 sec: 887.5, 300 sec: 833.1). Total num frames: 2002944. Throughput: 0: 823.9. Samples: 2001740. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-12-13 08:39:19,076][62436] Avg episode reward: [(0, '1356.765')] [2024-12-13 08:39:24,076][62436] Fps is (10 sec: 819.3, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 2002944. Throughput: 0: 843.5. Samples: 2006600. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-12-13 08:39:24,076][62436] Avg episode reward: [(0, '1436.992')] [2024-12-13 08:39:24,089][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000003912_2002944.pth... [2024-12-13 08:39:24,095][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000003864_1978368.pth [2024-12-13 08:39:24,096][62473] Saving new best policy, reward=1436.992! [2024-12-13 08:39:24,645][62492] Updated weights for policy 0, policy_version 3920 (0.0010) [2024-12-13 08:39:29,077][62436] Fps is (10 sec: 409.5, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 2007040. Throughput: 0: 824.4. Samples: 2008324. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:39:29,078][62436] Avg episode reward: [(0, '1456.693')] [2024-12-13 08:39:29,079][62473] Saving new best policy, reward=1456.693! [2024-12-13 08:39:34,076][62436] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 833.1). Total num frames: 2015232. Throughput: 0: 820.1. Samples: 2014000. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:39:34,077][62436] Avg episode reward: [(0, '1457.231')] [2024-12-13 08:39:34,078][62473] Saving new best policy, reward=1457.231! [2024-12-13 08:39:39,076][62436] Fps is (10 sec: 819.3, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 2015232. Throughput: 0: 839.7. Samples: 2019064. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:39:39,076][62436] Avg episode reward: [(0, '1418.322')] [2024-12-13 08:39:39,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000003936_2015232.pth... [2024-12-13 08:39:39,096][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000003888_1990656.pth [2024-12-13 08:39:44,076][62436] Fps is (10 sec: 409.6, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 2019328. Throughput: 0: 827.6. Samples: 2020796. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:39:44,076][62436] Avg episode reward: [(0, '1340.400')] [2024-12-13 08:39:49,076][62436] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 833.1). Total num frames: 2027520. Throughput: 0: 826.0. Samples: 2026504. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2024-12-13 08:39:49,076][62436] Avg episode reward: [(0, '1321.476')] [2024-12-13 08:39:54,077][62436] Fps is (10 sec: 1228.6, 60 sec: 887.4, 300 sec: 833.1). Total num frames: 2031616. Throughput: 0: 833.2. Samples: 2031540. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:39:54,079][62436] Avg episode reward: [(0, '1308.457')] [2024-12-13 08:39:54,085][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000003968_2031616.pth... [2024-12-13 08:39:54,097][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000003912_2002944.pth [2024-12-13 08:39:59,076][62436] Fps is (10 sec: 409.6, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 2031616. Throughput: 0: 828.8. Samples: 2033276. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:39:59,076][62436] Avg episode reward: [(0, '1253.106')] [2024-12-13 08:40:04,075][62436] Fps is (10 sec: 819.4, 60 sec: 887.5, 300 sec: 833.1). Total num frames: 2039808. Throughput: 0: 820.4. Samples: 2038660. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:40:04,076][62436] Avg episode reward: [(0, '1258.440')] [2024-12-13 08:40:09,076][62436] Fps is (10 sec: 1228.8, 60 sec: 887.6, 300 sec: 833.1). Total num frames: 2043904. Throughput: 0: 828.7. Samples: 2043892. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 08:40:09,076][62436] Avg episode reward: [(0, '1232.608')] [2024-12-13 08:40:09,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000003992_2043904.pth... [2024-12-13 08:40:09,099][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000003936_2015232.pth [2024-12-13 08:40:14,075][62436] Fps is (10 sec: 409.6, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 2043904. Throughput: 0: 830.0. Samples: 2045672. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 08:40:14,076][62436] Avg episode reward: [(0, '1306.840')] [2024-12-13 08:40:14,712][62492] Updated weights for policy 0, policy_version 4000 (0.0010) [2024-12-13 08:40:19,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 2052096. Throughput: 0: 818.5. Samples: 2050832. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:40:19,076][62436] Avg episode reward: [(0, '1267.544')] [2024-12-13 08:40:24,080][62436] Fps is (10 sec: 1228.2, 60 sec: 887.4, 300 sec: 833.1). Total num frames: 2056192. Throughput: 0: 825.0. Samples: 2056192. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:40:24,081][62436] Avg episode reward: [(0, '1279.154')] [2024-12-13 08:40:24,086][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000004016_2056192.pth... [2024-12-13 08:40:24,096][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000003968_2031616.pth [2024-12-13 08:40:29,076][62436] Fps is (10 sec: 409.6, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 2056192. Throughput: 0: 828.8. Samples: 2058092. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:40:29,076][62436] Avg episode reward: [(0, '1274.860')] [2024-12-13 08:40:34,076][62436] Fps is (10 sec: 409.8, 60 sec: 750.9, 300 sec: 819.2). Total num frames: 2060288. Throughput: 0: 808.9. Samples: 2062904. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:40:34,076][62436] Avg episode reward: [(0, '1281.445')] [2024-12-13 08:40:39,076][62436] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 833.1). Total num frames: 2068480. Throughput: 0: 822.8. Samples: 2068564. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 08:40:39,076][62436] Avg episode reward: [(0, '1270.616')] [2024-12-13 08:40:39,087][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000004040_2068480.pth... [2024-12-13 08:40:39,097][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000003992_2043904.pth [2024-12-13 08:40:44,077][62436] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 2068480. Throughput: 0: 833.5. Samples: 2070784. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 08:40:44,078][62436] Avg episode reward: [(0, '1198.389')] [2024-12-13 08:40:49,076][62436] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 819.2). Total num frames: 2072576. Throughput: 0: 815.3. Samples: 2075348. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-12-13 08:40:49,076][62436] Avg episode reward: [(0, '1190.918')] [2024-12-13 08:40:54,076][62436] Fps is (10 sec: 1229.0, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 2080768. Throughput: 0: 821.7. Samples: 2080868. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:40:54,076][62436] Avg episode reward: [(0, '1192.855')] [2024-12-13 08:40:54,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000004064_2080768.pth... [2024-12-13 08:40:54,089][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000004016_2056192.pth [2024-12-13 08:40:59,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 2080768. Throughput: 0: 836.4. Samples: 2083308. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:40:59,077][62436] Avg episode reward: [(0, '1186.389')] [2024-12-13 08:41:04,076][62436] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 819.2). Total num frames: 2084864. Throughput: 0: 814.7. Samples: 2087492. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 08:41:04,077][62436] Avg episode reward: [(0, '1172.120')] [2024-12-13 08:41:04,290][62492] Updated weights for policy 0, policy_version 4080 (0.0017) [2024-12-13 08:41:09,076][62436] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 2093056. Throughput: 0: 819.3. Samples: 2093056. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:41:09,076][62436] Avg episode reward: [(0, '1194.543')] [2024-12-13 08:41:09,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000004088_2093056.pth... [2024-12-13 08:41:09,092][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000004040_2068480.pth [2024-12-13 08:41:14,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 2093056. Throughput: 0: 838.1. Samples: 2095808. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:41:14,076][62436] Avg episode reward: [(0, '1228.576')] [2024-12-13 08:41:19,078][62436] Fps is (10 sec: 409.5, 60 sec: 750.9, 300 sec: 819.2). Total num frames: 2097152. Throughput: 0: 800.1. Samples: 2098912. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 08:41:19,079][62436] Avg episode reward: [(0, '1220.580')] [2024-12-13 08:41:24,076][62436] Fps is (10 sec: 819.2, 60 sec: 751.0, 300 sec: 819.2). Total num frames: 2101248. Throughput: 0: 772.4. Samples: 2103324. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:41:24,076][62436] Avg episode reward: [(0, '1164.799')] [2024-12-13 08:41:24,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000004104_2101248.pth... [2024-12-13 08:41:24,091][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000004064_2080768.pth [2024-12-13 08:41:29,076][62436] Fps is (10 sec: 819.4, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 2105344. Throughput: 0: 786.0. Samples: 2106152. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:41:29,076][62436] Avg episode reward: [(0, '1145.217')] [2024-12-13 08:41:34,075][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 2109440. Throughput: 0: 789.3. Samples: 2110868. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:41:34,076][62436] Avg episode reward: [(0, '1109.744')] [2024-12-13 08:41:39,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 819.2). Total num frames: 2113536. Throughput: 0: 777.3. Samples: 2115848. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:41:39,076][62436] Avg episode reward: [(0, '1101.172')] [2024-12-13 08:41:39,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000004128_2113536.pth... [2024-12-13 08:41:39,087][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000004088_2093056.pth [2024-12-13 08:41:44,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 2117632. Throughput: 0: 784.5. Samples: 2118612. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:41:44,076][62436] Avg episode reward: [(0, '1070.062')] [2024-12-13 08:41:49,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 2121728. Throughput: 0: 799.9. Samples: 2123488. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:41:49,076][62436] Avg episode reward: [(0, '1090.564')] [2024-12-13 08:41:54,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 819.2). Total num frames: 2125824. Throughput: 0: 775.6. Samples: 2127956. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2024-12-13 08:41:54,076][62436] Avg episode reward: [(0, '1145.872')] [2024-12-13 08:41:54,087][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000004152_2125824.pth... [2024-12-13 08:41:54,097][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000004104_2101248.pth [2024-12-13 08:41:55,773][62492] Updated weights for policy 0, policy_version 4160 (0.0011) [2024-12-13 08:41:59,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 2129920. Throughput: 0: 775.3. Samples: 2130696. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2024-12-13 08:41:59,076][62436] Avg episode reward: [(0, '1133.548')] [2024-12-13 08:42:04,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 2134016. Throughput: 0: 820.1. Samples: 2135816. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:42:04,076][62436] Avg episode reward: [(0, '1128.646')] [2024-12-13 08:42:09,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 819.2). Total num frames: 2138112. Throughput: 0: 816.3. Samples: 2140056. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 08:42:09,076][62436] Avg episode reward: [(0, '1134.791')] [2024-12-13 08:42:09,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000004176_2138112.pth... [2024-12-13 08:42:09,092][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000004128_2113536.pth [2024-12-13 08:42:14,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 2142208. Throughput: 0: 814.5. Samples: 2142804. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 08:42:14,076][62436] Avg episode reward: [(0, '1139.822')] [2024-12-13 08:42:19,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 2146304. Throughput: 0: 832.8. Samples: 2148344. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 08:42:19,076][62436] Avg episode reward: [(0, '1123.565')] [2024-12-13 08:42:24,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 2150400. Throughput: 0: 809.3. Samples: 2152268. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 08:42:24,076][62436] Avg episode reward: [(0, '1096.052')] [2024-12-13 08:42:24,083][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000004200_2150400.pth... [2024-12-13 08:42:24,089][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000004152_2125824.pth [2024-12-13 08:42:29,075][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 2154496. Throughput: 0: 807.8. Samples: 2154964. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2024-12-13 08:42:29,076][62436] Avg episode reward: [(0, '1150.297')] [2024-12-13 08:42:34,075][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 2158592. Throughput: 0: 825.7. Samples: 2160644. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2024-12-13 08:42:34,076][62436] Avg episode reward: [(0, '1140.827')] [2024-12-13 08:42:39,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 2162688. Throughput: 0: 810.4. Samples: 2164424. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:42:39,076][62436] Avg episode reward: [(0, '1128.069')] [2024-12-13 08:42:39,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000004224_2162688.pth... [2024-12-13 08:42:39,087][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000004176_2138112.pth [2024-12-13 08:42:44,075][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 2166784. Throughput: 0: 810.8. Samples: 2167184. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:42:44,076][62436] Avg episode reward: [(0, '1128.542')] [2024-12-13 08:42:45,440][62492] Updated weights for policy 0, policy_version 4240 (0.0015) [2024-12-13 08:42:49,081][62436] Fps is (10 sec: 818.7, 60 sec: 819.1, 300 sec: 819.2). Total num frames: 2170880. Throughput: 0: 825.7. Samples: 2172976. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:42:49,082][62436] Avg episode reward: [(0, '1141.589')] [2024-12-13 08:42:54,080][62436] Fps is (10 sec: 818.9, 60 sec: 819.1, 300 sec: 819.2). Total num frames: 2174976. Throughput: 0: 818.1. Samples: 2176872. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:42:54,080][62436] Avg episode reward: [(0, '1165.786')] [2024-12-13 08:42:54,088][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000004248_2174976.pth... [2024-12-13 08:42:54,094][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000004200_2150400.pth [2024-12-13 08:42:59,075][62436] Fps is (10 sec: 819.7, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 2179072. Throughput: 0: 814.6. Samples: 2179460. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:42:59,076][62436] Avg episode reward: [(0, '1161.058')] [2024-12-13 08:43:04,077][62436] Fps is (10 sec: 819.4, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 2183168. Throughput: 0: 818.9. Samples: 2185196. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:43:04,077][62436] Avg episode reward: [(0, '1202.103')] [2024-12-13 08:43:09,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 2187264. Throughput: 0: 824.0. Samples: 2189348. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:43:09,076][62436] Avg episode reward: [(0, '1191.533')] [2024-12-13 08:43:09,085][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000004272_2187264.pth... [2024-12-13 08:43:09,101][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000004224_2162688.pth [2024-12-13 08:43:14,076][62436] Fps is (10 sec: 819.3, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 2191360. Throughput: 0: 819.7. Samples: 2191852. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2024-12-13 08:43:14,076][62436] Avg episode reward: [(0, '1231.994')] [2024-12-13 08:43:19,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 2195456. Throughput: 0: 821.3. Samples: 2197604. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2024-12-13 08:43:19,076][62436] Avg episode reward: [(0, '1268.666')] [2024-12-13 08:43:24,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 2199552. Throughput: 0: 836.0. Samples: 2202044. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:43:24,076][62436] Avg episode reward: [(0, '1228.740')] [2024-12-13 08:43:24,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000004296_2199552.pth... [2024-12-13 08:43:24,088][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000004248_2174976.pth [2024-12-13 08:43:29,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 2203648. Throughput: 0: 822.8. Samples: 2204208. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:43:29,076][62436] Avg episode reward: [(0, '1245.293')] [2024-12-13 08:43:34,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 2207744. Throughput: 0: 821.1. Samples: 2209920. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:43:34,076][62436] Avg episode reward: [(0, '1227.165')] [2024-12-13 08:43:34,617][62492] Updated weights for policy 0, policy_version 4320 (0.0010) [2024-12-13 08:43:39,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 2211840. Throughput: 0: 841.0. Samples: 2214712. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 08:43:39,077][62436] Avg episode reward: [(0, '1267.161')] [2024-12-13 08:43:39,085][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000004320_2211840.pth... [2024-12-13 08:43:39,092][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000004272_2187264.pth [2024-12-13 08:43:44,077][62436] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 2215936. Throughput: 0: 823.6. Samples: 2216524. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:43:44,078][62436] Avg episode reward: [(0, '1244.908')] [2024-12-13 08:43:49,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.3, 300 sec: 819.2). Total num frames: 2220032. Throughput: 0: 823.0. Samples: 2222232. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:43:49,078][62436] Avg episode reward: [(0, '1263.495')] [2024-12-13 08:43:54,076][62436] Fps is (10 sec: 819.3, 60 sec: 819.3, 300 sec: 819.2). Total num frames: 2224128. Throughput: 0: 840.6. Samples: 2227176. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 08:43:54,076][62436] Avg episode reward: [(0, '1254.530')] [2024-12-13 08:43:54,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000004344_2224128.pth... [2024-12-13 08:43:54,089][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000004296_2199552.pth [2024-12-13 08:43:59,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 2228224. Throughput: 0: 820.9. Samples: 2228792. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:43:59,076][62436] Avg episode reward: [(0, '1299.604')] [2024-12-13 08:44:04,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 2232320. Throughput: 0: 817.9. Samples: 2234408. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 08:44:04,076][62436] Avg episode reward: [(0, '1363.387')] [2024-12-13 08:44:09,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 2236416. Throughput: 0: 832.0. Samples: 2239484. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 08:44:09,076][62436] Avg episode reward: [(0, '1335.190')] [2024-12-13 08:44:09,084][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000004368_2236416.pth... [2024-12-13 08:44:09,091][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000004320_2211840.pth [2024-12-13 08:44:14,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 2240512. Throughput: 0: 819.6. Samples: 2241092. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:44:14,076][62436] Avg episode reward: [(0, '1378.118')] [2024-12-13 08:44:19,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 2244608. Throughput: 0: 816.0. Samples: 2246640. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:44:19,076][62436] Avg episode reward: [(0, '1370.630')] [2024-12-13 08:44:24,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 2248704. Throughput: 0: 824.3. Samples: 2251804. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:44:24,077][62436] Avg episode reward: [(0, '1325.620')] [2024-12-13 08:44:24,084][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000004392_2248704.pth... [2024-12-13 08:44:24,090][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000004344_2224128.pth [2024-12-13 08:44:25,422][62492] Updated weights for policy 0, policy_version 4400 (0.0017) [2024-12-13 08:44:29,079][62436] Fps is (10 sec: 818.9, 60 sec: 819.1, 300 sec: 805.3). Total num frames: 2252800. Throughput: 0: 823.2. Samples: 2253568. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 08:44:29,081][62436] Avg episode reward: [(0, '1339.626')] [2024-12-13 08:44:34,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 2256896. Throughput: 0: 811.1. Samples: 2258732. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:44:34,076][62436] Avg episode reward: [(0, '1374.768')] [2024-12-13 08:44:39,081][62436] Fps is (10 sec: 819.0, 60 sec: 819.1, 300 sec: 819.2). Total num frames: 2260992. Throughput: 0: 825.1. Samples: 2264308. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:44:39,082][62436] Avg episode reward: [(0, '1359.452')] [2024-12-13 08:44:39,093][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000004416_2260992.pth... [2024-12-13 08:44:39,105][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000004368_2236416.pth [2024-12-13 08:44:44,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 2265088. Throughput: 0: 827.7. Samples: 2266040. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:44:44,076][62436] Avg episode reward: [(0, '1396.084')] [2024-12-13 08:44:49,075][62436] Fps is (10 sec: 819.7, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 2269184. Throughput: 0: 815.1. Samples: 2271088. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2024-12-13 08:44:49,076][62436] Avg episode reward: [(0, '1407.829')] [2024-12-13 08:44:54,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 2273280. Throughput: 0: 827.6. Samples: 2276724. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2024-12-13 08:44:54,076][62436] Avg episode reward: [(0, '1406.802')] [2024-12-13 08:44:54,083][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000004440_2273280.pth... [2024-12-13 08:44:54,094][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000004392_2248704.pth [2024-12-13 08:44:59,080][62436] Fps is (10 sec: 818.8, 60 sec: 819.1, 300 sec: 805.3). Total num frames: 2277376. Throughput: 0: 833.4. Samples: 2278600. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 08:44:59,081][62436] Avg episode reward: [(0, '1433.395')] [2024-12-13 08:45:04,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 2281472. Throughput: 0: 812.4. Samples: 2283200. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2024-12-13 08:45:04,076][62436] Avg episode reward: [(0, '1480.004')] [2024-12-13 08:45:04,080][62473] Saving new best policy, reward=1480.004! [2024-12-13 08:45:09,076][62436] Fps is (10 sec: 819.6, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 2285568. Throughput: 0: 823.3. Samples: 2288852. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2024-12-13 08:45:09,076][62436] Avg episode reward: [(0, '1461.883')] [2024-12-13 08:45:09,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000004464_2285568.pth... [2024-12-13 08:45:09,094][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000004416_2260992.pth [2024-12-13 08:45:14,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 2289664. Throughput: 0: 832.2. Samples: 2291016. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2024-12-13 08:45:14,076][62436] Avg episode reward: [(0, '1469.713')] [2024-12-13 08:45:16,096][62492] Updated weights for policy 0, policy_version 4480 (0.0010) [2024-12-13 08:45:19,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 2293760. Throughput: 0: 810.8. Samples: 2295216. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:45:19,076][62436] Avg episode reward: [(0, '1505.917')] [2024-12-13 08:45:19,077][62473] Saving new best policy, reward=1505.917! [2024-12-13 08:45:24,076][62436] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 2297856. Throughput: 0: 811.2. Samples: 2300808. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:45:24,077][62436] Avg episode reward: [(0, '1515.567')] [2024-12-13 08:45:24,083][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000004488_2297856.pth... [2024-12-13 08:45:24,094][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000004440_2273280.pth [2024-12-13 08:45:24,097][62473] Saving new best policy, reward=1515.567! [2024-12-13 08:45:29,075][62436] Fps is (10 sec: 819.2, 60 sec: 819.3, 300 sec: 819.2). Total num frames: 2301952. Throughput: 0: 827.6. Samples: 2303280. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 08:45:29,076][62436] Avg episode reward: [(0, '1549.838')] [2024-12-13 08:45:29,077][62473] Saving new best policy, reward=1549.838! [2024-12-13 08:45:34,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 2306048. Throughput: 0: 801.3. Samples: 2307148. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-12-13 08:45:34,077][62436] Avg episode reward: [(0, '1568.445')] [2024-12-13 08:45:34,078][62473] Saving new best policy, reward=1568.445! [2024-12-13 08:45:39,083][62436] Fps is (10 sec: 818.6, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 2310144. Throughput: 0: 801.2. Samples: 2312784. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-12-13 08:45:39,083][62436] Avg episode reward: [(0, '1617.753')] [2024-12-13 08:45:39,089][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000004512_2310144.pth... [2024-12-13 08:45:39,095][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000004464_2285568.pth [2024-12-13 08:45:39,096][62473] Saving new best policy, reward=1617.753! [2024-12-13 08:45:44,075][62436] Fps is (10 sec: 819.3, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 2314240. Throughput: 0: 817.9. Samples: 2315400. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:45:44,076][62436] Avg episode reward: [(0, '1619.941')] [2024-12-13 08:45:44,080][62473] Saving new best policy, reward=1619.941! [2024-12-13 08:45:49,076][62436] Fps is (10 sec: 819.7, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 2318336. Throughput: 0: 798.3. Samples: 2319124. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:45:49,077][62436] Avg episode reward: [(0, '1580.314')] [2024-12-13 08:45:54,077][62436] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 2322432. Throughput: 0: 758.9. Samples: 2323004. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:45:54,077][62436] Avg episode reward: [(0, '1624.028')] [2024-12-13 08:45:54,083][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000004536_2322432.pth... [2024-12-13 08:45:54,090][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000004488_2297856.pth [2024-12-13 08:45:54,091][62473] Saving new best policy, reward=1624.028! [2024-12-13 08:45:59,076][62436] Fps is (10 sec: 819.3, 60 sec: 819.3, 300 sec: 819.2). Total num frames: 2326528. Throughput: 0: 767.7. Samples: 2325564. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-12-13 08:45:59,076][62436] Avg episode reward: [(0, '1628.170')] [2024-12-13 08:45:59,077][62473] Saving new best policy, reward=1628.170! [2024-12-13 08:46:04,075][62436] Fps is (10 sec: 409.7, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 2326528. Throughput: 0: 755.2. Samples: 2329200. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-12-13 08:46:04,076][62436] Avg episode reward: [(0, '1660.315')] [2024-12-13 08:46:04,077][62473] Saving new best policy, reward=1660.315! [2024-12-13 08:46:08,483][62492] Updated weights for policy 0, policy_version 4560 (0.0010) [2024-12-13 08:46:09,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 2334720. Throughput: 0: 756.0. Samples: 2334828. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-12-13 08:46:09,076][62436] Avg episode reward: [(0, '1691.138')] [2024-12-13 08:46:09,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000004560_2334720.pth... [2024-12-13 08:46:09,087][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000004512_2310144.pth [2024-12-13 08:46:09,090][62473] Saving new best policy, reward=1691.138! [2024-12-13 08:46:14,077][62436] Fps is (10 sec: 1228.6, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 2338816. Throughput: 0: 763.9. Samples: 2337656. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:46:14,078][62436] Avg episode reward: [(0, '1726.264')] [2024-12-13 08:46:14,079][62473] Saving new best policy, reward=1726.264! [2024-12-13 08:46:19,076][62436] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 2338816. Throughput: 0: 769.8. Samples: 2341788. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:46:19,076][62436] Avg episode reward: [(0, '1722.902')] [2024-12-13 08:46:24,076][62436] Fps is (10 sec: 819.3, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 2347008. Throughput: 0: 764.9. Samples: 2347200. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2024-12-13 08:46:24,077][62436] Avg episode reward: [(0, '1698.605')] [2024-12-13 08:46:24,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000004584_2347008.pth... [2024-12-13 08:46:24,089][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000004536_2322432.pth [2024-12-13 08:46:29,076][62436] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 2351104. Throughput: 0: 770.9. Samples: 2350092. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:46:29,076][62436] Avg episode reward: [(0, '1696.194')] [2024-12-13 08:46:34,076][62436] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 2351104. Throughput: 0: 781.8. Samples: 2354304. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:46:34,076][62436] Avg episode reward: [(0, '1635.852')] [2024-12-13 08:46:39,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.3, 300 sec: 819.2). Total num frames: 2359296. Throughput: 0: 806.5. Samples: 2359296. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:46:39,076][62436] Avg episode reward: [(0, '1602.369')] [2024-12-13 08:46:39,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000004608_2359296.pth... [2024-12-13 08:46:39,093][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000004560_2334720.pth [2024-12-13 08:46:44,076][62436] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 2363392. Throughput: 0: 814.7. Samples: 2362224. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-12-13 08:46:44,076][62436] Avg episode reward: [(0, '1613.712')] [2024-12-13 08:46:49,076][62436] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 2363392. Throughput: 0: 835.1. Samples: 2366780. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-12-13 08:46:49,076][62436] Avg episode reward: [(0, '1577.644')] [2024-12-13 08:46:54,077][62436] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 2371584. Throughput: 0: 814.5. Samples: 2371484. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:46:54,078][62436] Avg episode reward: [(0, '1524.697')] [2024-12-13 08:46:54,101][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000004632_2371584.pth... [2024-12-13 08:46:54,114][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000004584_2347008.pth [2024-12-13 08:46:58,325][62492] Updated weights for policy 0, policy_version 4640 (0.0010) [2024-12-13 08:46:59,075][62436] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 2375680. Throughput: 0: 811.1. Samples: 2374152. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:46:59,076][62436] Avg episode reward: [(0, '1521.928')] [2024-12-13 08:47:04,076][62436] Fps is (10 sec: 409.7, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 2375680. Throughput: 0: 827.9. Samples: 2379044. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:47:04,076][62436] Avg episode reward: [(0, '1516.273')] [2024-12-13 08:47:09,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 2383872. Throughput: 0: 808.4. Samples: 2383576. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 08:47:09,080][62436] Avg episode reward: [(0, '1438.336')] [2024-12-13 08:47:09,085][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000004656_2383872.pth... [2024-12-13 08:47:09,092][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000004608_2359296.pth [2024-12-13 08:47:14,076][62436] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 2387968. Throughput: 0: 805.1. Samples: 2386320. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 08:47:14,076][62436] Avg episode reward: [(0, '1401.125')] [2024-12-13 08:47:19,075][62436] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 819.2). Total num frames: 2392064. Throughput: 0: 825.8. Samples: 2391464. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 08:47:19,076][62436] Avg episode reward: [(0, '1427.867')] [2024-12-13 08:47:24,076][62436] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 2392064. Throughput: 0: 811.0. Samples: 2395792. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 08:47:24,076][62436] Avg episode reward: [(0, '1397.951')] [2024-12-13 08:47:24,119][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000004680_2396160.pth... [2024-12-13 08:47:24,133][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000004632_2371584.pth [2024-12-13 08:47:29,078][62436] Fps is (10 sec: 819.0, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 2400256. Throughput: 0: 807.1. Samples: 2398544. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:47:29,079][62436] Avg episode reward: [(0, '1381.008')] [2024-12-13 08:47:34,076][62436] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 819.2). Total num frames: 2404352. Throughput: 0: 823.3. Samples: 2403828. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:47:34,076][62436] Avg episode reward: [(0, '1353.512')] [2024-12-13 08:47:39,076][62436] Fps is (10 sec: 409.7, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 2404352. Throughput: 0: 810.1. Samples: 2407936. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:47:39,076][62436] Avg episode reward: [(0, '1411.201')] [2024-12-13 08:47:39,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000004696_2404352.pth... [2024-12-13 08:47:39,089][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000004656_2383872.pth [2024-12-13 08:47:44,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 2412544. Throughput: 0: 810.8. Samples: 2410636. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:47:44,076][62436] Avg episode reward: [(0, '1450.493')] [2024-12-13 08:47:48,764][62492] Updated weights for policy 0, policy_version 4720 (0.0016) [2024-12-13 08:47:49,075][62436] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 819.2). Total num frames: 2416640. Throughput: 0: 823.0. Samples: 2416080. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 08:47:49,083][62436] Avg episode reward: [(0, '1418.320')] [2024-12-13 08:47:54,076][62436] Fps is (10 sec: 409.6, 60 sec: 751.0, 300 sec: 805.3). Total num frames: 2416640. Throughput: 0: 807.8. Samples: 2419928. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 08:47:54,076][62436] Avg episode reward: [(0, '1344.764')] [2024-12-13 08:47:54,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000004720_2416640.pth... [2024-12-13 08:47:54,091][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000004680_2396160.pth [2024-12-13 08:47:59,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 2424832. Throughput: 0: 806.5. Samples: 2422612. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:47:59,076][62436] Avg episode reward: [(0, '1347.283')] [2024-12-13 08:48:04,075][62436] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 819.2). Total num frames: 2428928. Throughput: 0: 818.6. Samples: 2428300. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:48:04,076][62436] Avg episode reward: [(0, '1418.393')] [2024-12-13 08:48:09,076][62436] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 2428928. Throughput: 0: 803.9. Samples: 2431968. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:48:09,076][62436] Avg episode reward: [(0, '1468.523')] [2024-12-13 08:48:09,083][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000004744_2428928.pth... [2024-12-13 08:48:09,091][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000004696_2404352.pth [2024-12-13 08:48:14,076][62436] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 2433024. Throughput: 0: 801.6. Samples: 2434612. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2024-12-13 08:48:14,076][62436] Avg episode reward: [(0, '1407.612')] [2024-12-13 08:48:19,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 2437120. Throughput: 0: 805.2. Samples: 2440060. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2024-12-13 08:48:19,076][62436] Avg episode reward: [(0, '1390.464')] [2024-12-13 08:48:24,076][62436] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 2441216. Throughput: 0: 800.0. Samples: 2443936. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:48:24,077][62436] Avg episode reward: [(0, '1364.678')] [2024-12-13 08:48:24,083][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000004768_2441216.pth... [2024-12-13 08:48:24,095][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000004720_2416640.pth [2024-12-13 08:48:29,078][62436] Fps is (10 sec: 819.0, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 2445312. Throughput: 0: 792.8. Samples: 2446316. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:48:29,079][62436] Avg episode reward: [(0, '1374.301')] [2024-12-13 08:48:34,076][62436] Fps is (10 sec: 819.3, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 2449408. Throughput: 0: 792.8. Samples: 2451756. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:48:34,076][62436] Avg episode reward: [(0, '1379.831')] [2024-12-13 08:48:39,076][62436] Fps is (10 sec: 819.4, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 2453504. Throughput: 0: 799.2. Samples: 2455892. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:48:39,076][62436] Avg episode reward: [(0, '1373.138')] [2024-12-13 08:48:39,080][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000004792_2453504.pth... [2024-12-13 08:48:39,088][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000004744_2428928.pth [2024-12-13 08:48:41,114][62492] Updated weights for policy 0, policy_version 4800 (0.0010) [2024-12-13 08:48:44,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 2457600. Throughput: 0: 787.5. Samples: 2458048. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:48:44,076][62436] Avg episode reward: [(0, '1353.954')] [2024-12-13 08:48:49,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 2461696. Throughput: 0: 784.0. Samples: 2463580. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:48:49,076][62436] Avg episode reward: [(0, '1372.342')] [2024-12-13 08:48:54,077][62436] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 2465792. Throughput: 0: 798.9. Samples: 2467920. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 08:48:54,078][62436] Avg episode reward: [(0, '1362.673')] [2024-12-13 08:48:54,086][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000004816_2465792.pth... [2024-12-13 08:48:54,100][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000004768_2441216.pth [2024-12-13 08:48:59,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 2469888. Throughput: 0: 775.9. Samples: 2469528. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:48:59,076][62436] Avg episode reward: [(0, '1353.521')] [2024-12-13 08:49:04,076][62436] Fps is (10 sec: 819.3, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 2473984. Throughput: 0: 775.9. Samples: 2474976. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:49:04,076][62436] Avg episode reward: [(0, '1345.792')] [2024-12-13 08:49:09,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 2478080. Throughput: 0: 803.0. Samples: 2480072. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-12-13 08:49:09,076][62436] Avg episode reward: [(0, '1354.459')] [2024-12-13 08:49:09,089][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000004840_2478080.pth... [2024-12-13 08:49:09,096][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000004792_2453504.pth [2024-12-13 08:49:14,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 2482176. Throughput: 0: 791.7. Samples: 2481940. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-12-13 08:49:14,076][62436] Avg episode reward: [(0, '1347.975')] [2024-12-13 08:49:19,075][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 2486272. Throughput: 0: 785.7. Samples: 2487112. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-12-13 08:49:19,076][62436] Avg episode reward: [(0, '1327.443')] [2024-12-13 08:49:24,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 2490368. Throughput: 0: 810.7. Samples: 2492372. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 08:49:24,076][62436] Avg episode reward: [(0, '1317.433')] [2024-12-13 08:49:24,083][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000004864_2490368.pth... [2024-12-13 08:49:24,104][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000004816_2465792.pth [2024-12-13 08:49:29,075][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 2494464. Throughput: 0: 803.6. Samples: 2494208. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-12-13 08:49:29,076][62436] Avg episode reward: [(0, '1306.876')] [2024-12-13 08:49:32,044][62492] Updated weights for policy 0, policy_version 4880 (0.0011) [2024-12-13 08:49:34,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 2498560. Throughput: 0: 789.2. Samples: 2499092. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-12-13 08:49:34,076][62436] Avg episode reward: [(0, '1357.413')] [2024-12-13 08:49:39,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 2502656. Throughput: 0: 816.5. Samples: 2504660. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:49:39,076][62436] Avg episode reward: [(0, '1357.640')] [2024-12-13 08:49:39,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000004888_2502656.pth... [2024-12-13 08:49:39,089][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000004840_2478080.pth [2024-12-13 08:49:44,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 2506752. Throughput: 0: 822.4. Samples: 2506536. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:49:44,076][62436] Avg episode reward: [(0, '1293.312')] [2024-12-13 08:49:49,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 2510848. Throughput: 0: 803.1. Samples: 2511116. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:49:49,076][62436] Avg episode reward: [(0, '1340.655')] [2024-12-13 08:49:54,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 2514944. Throughput: 0: 817.5. Samples: 2516860. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:49:54,077][62436] Avg episode reward: [(0, '1350.660')] [2024-12-13 08:49:54,084][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000004912_2514944.pth... [2024-12-13 08:49:54,091][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000004864_2490368.pth [2024-12-13 08:49:59,079][62436] Fps is (10 sec: 818.9, 60 sec: 819.1, 300 sec: 805.3). Total num frames: 2519040. Throughput: 0: 820.5. Samples: 2518864. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 08:49:59,080][62436] Avg episode reward: [(0, '1357.724')] [2024-12-13 08:50:04,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 2523136. Throughput: 0: 804.4. Samples: 2523308. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 08:50:04,076][62436] Avg episode reward: [(0, '1323.555')] [2024-12-13 08:50:09,077][62436] Fps is (10 sec: 819.4, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 2527232. Throughput: 0: 814.3. Samples: 2529016. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 08:50:09,078][62436] Avg episode reward: [(0, '1379.839')] [2024-12-13 08:50:09,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000004936_2527232.pth... [2024-12-13 08:50:09,093][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000004888_2502656.pth [2024-12-13 08:50:14,076][62436] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 2531328. Throughput: 0: 823.1. Samples: 2531248. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:50:14,077][62436] Avg episode reward: [(0, '1453.606')] [2024-12-13 08:50:19,076][62436] Fps is (10 sec: 819.3, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 2535424. Throughput: 0: 809.2. Samples: 2535508. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:50:19,076][62436] Avg episode reward: [(0, '1505.632')] [2024-12-13 08:50:22,398][62492] Updated weights for policy 0, policy_version 4960 (0.0011) [2024-12-13 08:50:24,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 2539520. Throughput: 0: 791.3. Samples: 2540268. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:50:24,077][62436] Avg episode reward: [(0, '1519.723')] [2024-12-13 08:50:24,084][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000004960_2539520.pth... [2024-12-13 08:50:24,090][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000004912_2514944.pth [2024-12-13 08:50:29,076][62436] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 2539520. Throughput: 0: 787.8. Samples: 2541988. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:50:29,076][62436] Avg episode reward: [(0, '1570.001')] [2024-12-13 08:50:34,076][62436] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 2543616. Throughput: 0: 767.5. Samples: 2545652. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:50:34,076][62436] Avg episode reward: [(0, '1566.832')] [2024-12-13 08:50:39,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 2547712. Throughput: 0: 765.9. Samples: 2551324. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:50:39,076][62436] Avg episode reward: [(0, '1571.490')] [2024-12-13 08:50:39,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000004976_2547712.pth... [2024-12-13 08:50:39,089][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000004936_2527232.pth [2024-12-13 08:50:44,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 2551808. Throughput: 0: 781.8. Samples: 2554040. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:50:44,076][62436] Avg episode reward: [(0, '1607.546')] [2024-12-13 08:50:49,075][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 2555904. Throughput: 0: 766.3. Samples: 2557792. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:50:49,076][62436] Avg episode reward: [(0, '1590.406')] [2024-12-13 08:50:54,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 2560000. Throughput: 0: 761.7. Samples: 2563292. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:50:54,076][62436] Avg episode reward: [(0, '1598.274')] [2024-12-13 08:50:54,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000005000_2560000.pth... [2024-12-13 08:50:54,088][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000004960_2539520.pth [2024-12-13 08:50:59,076][62436] Fps is (10 sec: 819.2, 60 sec: 751.0, 300 sec: 805.3). Total num frames: 2564096. Throughput: 0: 770.1. Samples: 2565900. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:50:59,076][62436] Avg episode reward: [(0, '1558.274')] [2024-12-13 08:51:04,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 2568192. Throughput: 0: 767.5. Samples: 2570044. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-12-13 08:51:04,076][62436] Avg episode reward: [(0, '1576.309')] [2024-12-13 08:51:09,083][62436] Fps is (10 sec: 818.6, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 2572288. Throughput: 0: 776.9. Samples: 2575236. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:51:09,084][62436] Avg episode reward: [(0, '1550.859')] [2024-12-13 08:51:09,089][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000005024_2572288.pth... [2024-12-13 08:51:09,095][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000004976_2547712.pth [2024-12-13 08:51:14,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 2576384. Throughput: 0: 799.5. Samples: 2577964. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:51:14,076][62436] Avg episode reward: [(0, '1510.437')] [2024-12-13 08:51:14,675][62492] Updated weights for policy 0, policy_version 5040 (0.0012) [2024-12-13 08:51:19,080][62436] Fps is (10 sec: 819.4, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 2580480. Throughput: 0: 815.9. Samples: 2582372. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-12-13 08:51:19,081][62436] Avg episode reward: [(0, '1540.036')] [2024-12-13 08:51:24,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 2584576. Throughput: 0: 800.5. Samples: 2587348. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:51:24,076][62436] Avg episode reward: [(0, '1514.686')] [2024-12-13 08:51:24,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000005048_2584576.pth... [2024-12-13 08:51:24,088][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000005000_2560000.pth [2024-12-13 08:51:29,076][62436] Fps is (10 sec: 819.6, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 2588672. Throughput: 0: 802.0. Samples: 2590128. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:51:29,076][62436] Avg episode reward: [(0, '1565.270')] [2024-12-13 08:51:34,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 2592768. Throughput: 0: 823.2. Samples: 2594836. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:51:34,076][62436] Avg episode reward: [(0, '1571.612')] [2024-12-13 08:51:39,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 2596864. Throughput: 0: 803.9. Samples: 2599468. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:51:39,076][62436] Avg episode reward: [(0, '1581.690')] [2024-12-13 08:51:39,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000005072_2596864.pth... [2024-12-13 08:51:39,087][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000005024_2572288.pth [2024-12-13 08:51:44,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 2600960. Throughput: 0: 804.9. Samples: 2602120. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:51:44,076][62436] Avg episode reward: [(0, '1581.283')] [2024-12-13 08:51:49,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 2605056. Throughput: 0: 824.4. Samples: 2607144. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-12-13 08:51:49,076][62436] Avg episode reward: [(0, '1622.340')] [2024-12-13 08:51:54,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 2609152. Throughput: 0: 806.4. Samples: 2611520. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 08:51:54,076][62436] Avg episode reward: [(0, '1645.347')] [2024-12-13 08:51:54,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000005096_2609152.pth... [2024-12-13 08:51:54,095][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000005048_2584576.pth [2024-12-13 08:51:59,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 2613248. Throughput: 0: 803.3. Samples: 2614112. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 08:51:59,076][62436] Avg episode reward: [(0, '1715.935')] [2024-12-13 08:52:04,076][62436] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 2617344. Throughput: 0: 820.9. Samples: 2619308. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:52:04,078][62436] Avg episode reward: [(0, '1709.309')] [2024-12-13 08:52:06,463][62492] Updated weights for policy 0, policy_version 5120 (0.0013) [2024-12-13 08:52:09,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.3, 300 sec: 791.4). Total num frames: 2621440. Throughput: 0: 801.3. Samples: 2623408. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:52:09,076][62436] Avg episode reward: [(0, '1732.470')] [2024-12-13 08:52:09,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000005120_2621440.pth... [2024-12-13 08:52:09,088][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000005072_2596864.pth [2024-12-13 08:52:09,091][62473] Saving new best policy, reward=1732.470! [2024-12-13 08:52:14,076][62436] Fps is (10 sec: 819.3, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 2625536. Throughput: 0: 799.3. Samples: 2626096. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:52:14,078][62436] Avg episode reward: [(0, '1733.078')] [2024-12-13 08:52:14,079][62473] Saving new best policy, reward=1733.078! [2024-12-13 08:52:19,078][62436] Fps is (10 sec: 819.0, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 2629632. Throughput: 0: 815.8. Samples: 2631548. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:52:19,079][62436] Avg episode reward: [(0, '1736.373')] [2024-12-13 08:52:19,080][62473] Saving new best policy, reward=1736.373! [2024-12-13 08:52:24,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 2633728. Throughput: 0: 800.4. Samples: 2635488. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:52:24,076][62436] Avg episode reward: [(0, '1770.000')] [2024-12-13 08:52:24,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000005144_2633728.pth... [2024-12-13 08:52:24,091][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000005096_2609152.pth [2024-12-13 08:52:24,093][62473] Saving new best policy, reward=1770.000! [2024-12-13 08:52:29,076][62436] Fps is (10 sec: 819.4, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 2637824. Throughput: 0: 797.8. Samples: 2638020. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:52:29,076][62436] Avg episode reward: [(0, '1793.514')] [2024-12-13 08:52:29,077][62473] Saving new best policy, reward=1793.514! [2024-12-13 08:52:34,075][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 2641920. Throughput: 0: 810.2. Samples: 2643604. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-12-13 08:52:34,076][62436] Avg episode reward: [(0, '1820.804')] [2024-12-13 08:52:34,077][62473] Saving new best policy, reward=1820.804! [2024-12-13 08:52:39,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 2646016. Throughput: 0: 797.5. Samples: 2647408. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 08:52:39,076][62436] Avg episode reward: [(0, '1829.768')] [2024-12-13 08:52:39,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000005168_2646016.pth... [2024-12-13 08:52:39,093][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000005120_2621440.pth [2024-12-13 08:52:39,094][62473] Saving new best policy, reward=1829.768! [2024-12-13 08:52:44,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 2650112. Throughput: 0: 801.7. Samples: 2650188. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 08:52:44,076][62436] Avg episode reward: [(0, '1810.322')] [2024-12-13 08:52:49,078][62436] Fps is (10 sec: 819.0, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 2654208. Throughput: 0: 812.7. Samples: 2655880. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 08:52:49,078][62436] Avg episode reward: [(0, '1848.495')] [2024-12-13 08:52:49,085][62473] Saving new best policy, reward=1848.495! [2024-12-13 08:52:54,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 2658304. Throughput: 0: 804.9. Samples: 2659628. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:52:54,076][62436] Avg episode reward: [(0, '1837.491')] [2024-12-13 08:52:54,083][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000005192_2658304.pth... [2024-12-13 08:52:54,089][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000005144_2633728.pth [2024-12-13 08:52:56,671][62492] Updated weights for policy 0, policy_version 5200 (0.0018) [2024-12-13 08:52:59,076][62436] Fps is (10 sec: 819.4, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 2662400. Throughput: 0: 806.8. Samples: 2662400. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:52:59,076][62436] Avg episode reward: [(0, '1843.498')] [2024-12-13 08:53:04,078][62436] Fps is (10 sec: 819.0, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 2666496. Throughput: 0: 808.8. Samples: 2667944. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:53:04,079][62436] Avg episode reward: [(0, '1868.841')] [2024-12-13 08:53:04,080][62473] Saving new best policy, reward=1868.841! [2024-12-13 08:53:09,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 2670592. Throughput: 0: 809.9. Samples: 2671932. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-12-13 08:53:09,081][62436] Avg episode reward: [(0, '1797.733')] [2024-12-13 08:53:09,087][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000005216_2670592.pth... [2024-12-13 08:53:09,108][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000005168_2646016.pth [2024-12-13 08:53:14,076][62436] Fps is (10 sec: 819.4, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 2674688. Throughput: 0: 811.8. Samples: 2674552. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:53:14,079][62436] Avg episode reward: [(0, '1848.057')] [2024-12-13 08:53:19,075][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 2678784. Throughput: 0: 810.8. Samples: 2680088. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:53:19,076][62436] Avg episode reward: [(0, '1816.540')] [2024-12-13 08:53:24,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 2682880. Throughput: 0: 819.7. Samples: 2684296. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:53:24,076][62436] Avg episode reward: [(0, '1821.146')] [2024-12-13 08:53:24,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000005240_2682880.pth... [2024-12-13 08:53:24,088][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000005192_2658304.pth [2024-12-13 08:53:29,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 2686976. Throughput: 0: 812.6. Samples: 2686756. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-12-13 08:53:29,076][62436] Avg episode reward: [(0, '1787.240')] [2024-12-13 08:53:34,075][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 2691072. Throughput: 0: 809.1. Samples: 2692288. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-12-13 08:53:34,076][62436] Avg episode reward: [(0, '1805.166')] [2024-12-13 08:53:39,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 2695168. Throughput: 0: 824.1. Samples: 2696712. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:53:39,080][62436] Avg episode reward: [(0, '1832.570')] [2024-12-13 08:53:39,088][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000005264_2695168.pth... [2024-12-13 08:53:39,095][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000005216_2670592.pth [2024-12-13 08:53:44,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 2699264. Throughput: 0: 814.1. Samples: 2699036. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 08:53:44,076][62436] Avg episode reward: [(0, '1967.815')] [2024-12-13 08:53:44,077][62473] Saving new best policy, reward=1967.815! [2024-12-13 08:53:46,403][62492] Updated weights for policy 0, policy_version 5280 (0.0010) [2024-12-13 08:53:49,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 2703360. Throughput: 0: 813.2. Samples: 2704536. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 08:53:49,076][62436] Avg episode reward: [(0, '2021.197')] [2024-12-13 08:53:49,077][62473] Saving new best policy, reward=2021.197! [2024-12-13 08:53:54,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 2707456. Throughput: 0: 830.2. Samples: 2709292. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:53:54,076][62436] Avg episode reward: [(0, '2060.232')] [2024-12-13 08:53:54,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000005288_2707456.pth... [2024-12-13 08:53:54,099][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000005240_2682880.pth [2024-12-13 08:53:54,107][62473] Saving new best policy, reward=2060.232! [2024-12-13 08:53:59,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 2711552. Throughput: 0: 815.7. Samples: 2711260. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 08:53:59,076][62436] Avg episode reward: [(0, '1999.706')] [2024-12-13 08:54:04,075][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 2715648. Throughput: 0: 813.2. Samples: 2716680. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 08:54:04,076][62436] Avg episode reward: [(0, '2016.088')] [2024-12-13 08:54:09,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 2719744. Throughput: 0: 830.6. Samples: 2721672. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:54:09,076][62436] Avg episode reward: [(0, '1984.417')] [2024-12-13 08:54:09,087][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000005312_2719744.pth... [2024-12-13 08:54:09,097][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000005264_2695168.pth [2024-12-13 08:54:14,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 2723840. Throughput: 0: 818.1. Samples: 2723572. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 08:54:14,076][62436] Avg episode reward: [(0, '1931.279')] [2024-12-13 08:54:19,077][62436] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 2727936. Throughput: 0: 814.8. Samples: 2728956. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 08:54:19,077][62436] Avg episode reward: [(0, '1827.947')] [2024-12-13 08:54:24,075][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 2732032. Throughput: 0: 834.8. Samples: 2734276. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:54:24,076][62436] Avg episode reward: [(0, '1695.523')] [2024-12-13 08:54:24,091][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000005336_2732032.pth... [2024-12-13 08:54:24,096][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000005288_2707456.pth [2024-12-13 08:54:29,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 2736128. Throughput: 0: 822.5. Samples: 2736048. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:54:29,077][62436] Avg episode reward: [(0, '1571.529')] [2024-12-13 08:54:34,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 2740224. Throughput: 0: 811.6. Samples: 2741056. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:54:34,076][62436] Avg episode reward: [(0, '1557.584')] [2024-12-13 08:54:36,134][62492] Updated weights for policy 0, policy_version 5360 (0.0010) [2024-12-13 08:54:39,076][62436] Fps is (10 sec: 819.3, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 2744320. Throughput: 0: 828.0. Samples: 2746552. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 08:54:39,076][62436] Avg episode reward: [(0, '1548.001')] [2024-12-13 08:54:39,080][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000005360_2744320.pth... [2024-12-13 08:54:39,088][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000005312_2719744.pth [2024-12-13 08:54:44,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 2748416. Throughput: 0: 825.7. Samples: 2748416. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:54:44,076][62436] Avg episode reward: [(0, '1495.731')] [2024-12-13 08:54:49,075][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 2752512. Throughput: 0: 808.9. Samples: 2753080. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:54:49,076][62436] Avg episode reward: [(0, '1454.471')] [2024-12-13 08:54:54,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 2756608. Throughput: 0: 822.2. Samples: 2758672. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:54:54,077][62436] Avg episode reward: [(0, '1465.609')] [2024-12-13 08:54:54,088][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000005384_2756608.pth... [2024-12-13 08:54:54,096][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000005336_2732032.pth [2024-12-13 08:54:59,077][62436] Fps is (10 sec: 819.0, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 2760704. Throughput: 0: 819.3. Samples: 2760440. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 08:54:59,079][62436] Avg episode reward: [(0, '1458.160')] [2024-12-13 08:55:04,075][62436] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 2760704. Throughput: 0: 762.5. Samples: 2763268. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 08:55:04,076][62436] Avg episode reward: [(0, '1465.171')] [2024-12-13 08:55:09,076][62436] Fps is (10 sec: 819.3, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 2768896. Throughput: 0: 764.4. Samples: 2768672. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 08:55:09,076][62436] Avg episode reward: [(0, '1523.625')] [2024-12-13 08:55:09,084][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000005408_2768896.pth... [2024-12-13 08:55:09,093][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000005360_2744320.pth [2024-12-13 08:55:14,076][62436] Fps is (10 sec: 1228.7, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 2772992. Throughput: 0: 786.4. Samples: 2771436. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:55:14,076][62436] Avg episode reward: [(0, '1531.546')] [2024-12-13 08:55:19,075][62436] Fps is (10 sec: 409.6, 60 sec: 751.0, 300 sec: 791.4). Total num frames: 2772992. Throughput: 0: 760.3. Samples: 2775268. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:55:19,076][62436] Avg episode reward: [(0, '1544.382')] [2024-12-13 08:55:24,076][62436] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 2777088. Throughput: 0: 762.7. Samples: 2780872. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:55:24,076][62436] Avg episode reward: [(0, '1516.158')] [2024-12-13 08:55:24,107][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000005432_2781184.pth... [2024-12-13 08:55:24,113][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000005384_2756608.pth [2024-12-13 08:55:28,840][62492] Updated weights for policy 0, policy_version 5440 (0.0011) [2024-12-13 08:55:29,076][62436] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 2785280. Throughput: 0: 781.2. Samples: 2783568. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:55:29,076][62436] Avg episode reward: [(0, '1557.391')] [2024-12-13 08:55:34,075][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 2785280. Throughput: 0: 768.5. Samples: 2787664. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:55:34,076][62436] Avg episode reward: [(0, '1579.990')] [2024-12-13 08:55:39,076][62436] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 2789376. Throughput: 0: 759.4. Samples: 2792844. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 08:55:39,076][62436] Avg episode reward: [(0, '1647.461')] [2024-12-13 08:55:39,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000005448_2789376.pth... [2024-12-13 08:55:39,089][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000005408_2768896.pth [2024-12-13 08:55:44,079][62436] Fps is (10 sec: 1228.3, 60 sec: 819.1, 300 sec: 819.2). Total num frames: 2797568. Throughput: 0: 780.4. Samples: 2795560. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2024-12-13 08:55:44,080][62436] Avg episode reward: [(0, '1743.301')] [2024-12-13 08:55:49,079][62436] Fps is (10 sec: 818.9, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 2797568. Throughput: 0: 813.7. Samples: 2799888. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2024-12-13 08:55:49,080][62436] Avg episode reward: [(0, '1773.813')] [2024-12-13 08:55:54,076][62436] Fps is (10 sec: 409.7, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 2801664. Throughput: 0: 802.4. Samples: 2804780. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 08:55:54,076][62436] Avg episode reward: [(0, '1756.834')] [2024-12-13 08:55:54,084][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000005472_2801664.pth... [2024-12-13 08:55:54,091][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000005432_2781184.pth [2024-12-13 08:55:59,076][62436] Fps is (10 sec: 819.5, 60 sec: 751.0, 300 sec: 805.3). Total num frames: 2805760. Throughput: 0: 799.7. Samples: 2807424. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 08:55:59,076][62436] Avg episode reward: [(0, '1749.932')] [2024-12-13 08:56:04,075][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 2809856. Throughput: 0: 815.8. Samples: 2811980. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:56:04,076][62436] Avg episode reward: [(0, '1790.767')] [2024-12-13 08:56:09,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 2813952. Throughput: 0: 788.4. Samples: 2816352. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:56:09,078][62436] Avg episode reward: [(0, '1786.719')] [2024-12-13 08:56:09,085][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000005496_2813952.pth... [2024-12-13 08:56:09,091][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000005448_2789376.pth [2024-12-13 08:56:14,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 2818048. Throughput: 0: 788.8. Samples: 2819064. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:56:14,076][62436] Avg episode reward: [(0, '1759.683')] [2024-12-13 08:56:19,077][62436] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 2822144. Throughput: 0: 811.7. Samples: 2824192. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 08:56:19,077][62436] Avg episode reward: [(0, '1764.394')] [2024-12-13 08:56:21,136][62492] Updated weights for policy 0, policy_version 5520 (0.0011) [2024-12-13 08:56:24,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 2826240. Throughput: 0: 790.1. Samples: 2828400. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 08:56:24,076][62436] Avg episode reward: [(0, '1733.104')] [2024-12-13 08:56:24,083][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000005520_2826240.pth... [2024-12-13 08:56:24,088][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000005472_2801664.pth [2024-12-13 08:56:29,076][62436] Fps is (10 sec: 819.3, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 2830336. Throughput: 0: 790.6. Samples: 2831132. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 08:56:29,076][62436] Avg episode reward: [(0, '1665.187')] [2024-12-13 08:56:34,080][62436] Fps is (10 sec: 818.8, 60 sec: 819.1, 300 sec: 805.3). Total num frames: 2834432. Throughput: 0: 811.8. Samples: 2836420. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 08:56:34,081][62436] Avg episode reward: [(0, '1677.330')] [2024-12-13 08:56:39,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 2838528. Throughput: 0: 791.8. Samples: 2840412. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:56:39,076][62436] Avg episode reward: [(0, '1701.348')] [2024-12-13 08:56:39,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000005544_2838528.pth... [2024-12-13 08:56:39,089][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000005496_2813952.pth [2024-12-13 08:56:44,076][62436] Fps is (10 sec: 819.6, 60 sec: 751.0, 300 sec: 805.3). Total num frames: 2842624. Throughput: 0: 794.0. Samples: 2843156. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:56:44,076][62436] Avg episode reward: [(0, '1752.969')] [2024-12-13 08:56:49,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 2846720. Throughput: 0: 820.7. Samples: 2848912. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 08:56:49,076][62436] Avg episode reward: [(0, '1744.626')] [2024-12-13 08:56:54,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 2850816. Throughput: 0: 806.9. Samples: 2852664. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-12-13 08:56:54,076][62436] Avg episode reward: [(0, '1785.928')] [2024-12-13 08:56:54,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000005568_2850816.pth... [2024-12-13 08:56:54,092][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000005520_2826240.pth [2024-12-13 08:56:59,075][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 2854912. Throughput: 0: 801.6. Samples: 2855136. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 08:56:59,076][62436] Avg episode reward: [(0, '1793.154')] [2024-12-13 08:57:04,076][62436] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 2859008. Throughput: 0: 814.1. Samples: 2860824. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 08:57:04,077][62436] Avg episode reward: [(0, '1846.036')] [2024-12-13 08:57:09,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 2863104. Throughput: 0: 806.4. Samples: 2864688. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 08:57:09,076][62436] Avg episode reward: [(0, '1887.284')] [2024-12-13 08:57:09,083][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000005592_2863104.pth... [2024-12-13 08:57:09,091][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000005544_2838528.pth [2024-12-13 08:57:11,409][62492] Updated weights for policy 0, policy_version 5600 (0.0010) [2024-12-13 08:57:14,076][62436] Fps is (10 sec: 819.3, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 2867200. Throughput: 0: 805.3. Samples: 2867372. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 08:57:14,076][62436] Avg episode reward: [(0, '1798.926')] [2024-12-13 08:57:19,080][62436] Fps is (10 sec: 818.8, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 2871296. Throughput: 0: 816.3. Samples: 2873152. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:57:19,081][62436] Avg episode reward: [(0, '1845.758')] [2024-12-13 08:57:24,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 2875392. Throughput: 0: 820.4. Samples: 2877332. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:57:24,076][62436] Avg episode reward: [(0, '1873.054')] [2024-12-13 08:57:24,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000005616_2875392.pth... [2024-12-13 08:57:24,091][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000005568_2850816.pth [2024-12-13 08:57:29,077][62436] Fps is (10 sec: 819.4, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 2879488. Throughput: 0: 811.2. Samples: 2879660. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:57:29,078][62436] Avg episode reward: [(0, '1793.236')] [2024-12-13 08:57:34,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.3, 300 sec: 805.3). Total num frames: 2883584. Throughput: 0: 807.6. Samples: 2885252. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:57:34,076][62436] Avg episode reward: [(0, '1802.643')] [2024-12-13 08:57:39,076][62436] Fps is (10 sec: 819.3, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 2887680. Throughput: 0: 819.6. Samples: 2889544. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:57:39,076][62436] Avg episode reward: [(0, '1802.290')] [2024-12-13 08:57:39,080][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000005640_2887680.pth... [2024-12-13 08:57:39,088][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000005592_2863104.pth [2024-12-13 08:57:44,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 2891776. Throughput: 0: 814.2. Samples: 2891776. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:57:44,076][62436] Avg episode reward: [(0, '1783.640')] [2024-12-13 08:57:49,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 2895872. Throughput: 0: 813.3. Samples: 2897420. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:57:49,076][62436] Avg episode reward: [(0, '1767.589')] [2024-12-13 08:57:54,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 2899968. Throughput: 0: 828.5. Samples: 2901972. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:57:54,076][62436] Avg episode reward: [(0, '1733.678')] [2024-12-13 08:57:54,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000005664_2899968.pth... [2024-12-13 08:57:54,090][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000005616_2875392.pth [2024-12-13 08:57:59,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 2904064. Throughput: 0: 812.8. Samples: 2903948. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:57:59,076][62436] Avg episode reward: [(0, '1758.930')] [2024-12-13 08:58:01,485][62492] Updated weights for policy 0, policy_version 5680 (0.0011) [2024-12-13 08:58:04,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 2908160. Throughput: 0: 801.1. Samples: 2909196. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:58:04,076][62436] Avg episode reward: [(0, '1663.348')] [2024-12-13 08:58:09,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 2912256. Throughput: 0: 817.4. Samples: 2914116. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 08:58:09,076][62436] Avg episode reward: [(0, '1704.590')] [2024-12-13 08:58:09,093][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000005688_2912256.pth... [2024-12-13 08:58:09,100][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000005640_2887680.pth [2024-12-13 08:58:14,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 2916352. Throughput: 0: 806.7. Samples: 2915960. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:58:14,076][62436] Avg episode reward: [(0, '1729.203')] [2024-12-13 08:58:19,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.3, 300 sec: 805.3). Total num frames: 2920448. Throughput: 0: 795.8. Samples: 2921064. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:58:19,076][62436] Avg episode reward: [(0, '1726.905')] [2024-12-13 08:58:24,077][62436] Fps is (10 sec: 819.0, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 2924544. Throughput: 0: 815.3. Samples: 2926236. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:58:24,078][62436] Avg episode reward: [(0, '1744.835')] [2024-12-13 08:58:24,089][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000005712_2924544.pth... [2024-12-13 08:58:24,099][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000005664_2899968.pth [2024-12-13 08:58:29,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 2928640. Throughput: 0: 807.7. Samples: 2928124. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-12-13 08:58:29,076][62436] Avg episode reward: [(0, '1679.610')] [2024-12-13 08:58:34,076][62436] Fps is (10 sec: 819.4, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 2932736. Throughput: 0: 792.3. Samples: 2933072. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-12-13 08:58:34,076][62436] Avg episode reward: [(0, '1644.134')] [2024-12-13 08:58:39,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 2936832. Throughput: 0: 815.9. Samples: 2938688. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:58:39,076][62436] Avg episode reward: [(0, '1611.540')] [2024-12-13 08:58:39,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000005736_2936832.pth... [2024-12-13 08:58:39,089][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000005688_2912256.pth [2024-12-13 08:58:44,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 2940928. Throughput: 0: 814.3. Samples: 2940592. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:58:44,076][62436] Avg episode reward: [(0, '1575.404')] [2024-12-13 08:58:49,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 2945024. Throughput: 0: 802.0. Samples: 2945288. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:58:49,076][62436] Avg episode reward: [(0, '1631.124')] [2024-12-13 08:58:51,704][62492] Updated weights for policy 0, policy_version 5760 (0.0012) [2024-12-13 08:58:54,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 2949120. Throughput: 0: 819.5. Samples: 2950992. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:58:54,076][62436] Avg episode reward: [(0, '1621.364')] [2024-12-13 08:58:54,083][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000005760_2949120.pth... [2024-12-13 08:58:54,089][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000005712_2924544.pth [2024-12-13 08:58:59,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 2953216. Throughput: 0: 822.8. Samples: 2952988. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:58:59,076][62436] Avg episode reward: [(0, '1632.808')] [2024-12-13 08:59:04,075][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 2957312. Throughput: 0: 808.8. Samples: 2957460. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:59:04,076][62436] Avg episode reward: [(0, '1612.096')] [2024-12-13 08:59:09,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 2961408. Throughput: 0: 820.7. Samples: 2963164. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-12-13 08:59:09,076][62436] Avg episode reward: [(0, '1569.459')] [2024-12-13 08:59:09,080][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000005784_2961408.pth... [2024-12-13 08:59:09,087][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000005736_2936832.pth [2024-12-13 08:59:14,075][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 2965504. Throughput: 0: 826.9. Samples: 2965336. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-12-13 08:59:14,083][62436] Avg episode reward: [(0, '1610.011')] [2024-12-13 08:59:19,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 2969600. Throughput: 0: 814.3. Samples: 2969716. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 08:59:19,076][62436] Avg episode reward: [(0, '1663.950')] [2024-12-13 08:59:24,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 2973696. Throughput: 0: 818.8. Samples: 2975532. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 08:59:24,076][62436] Avg episode reward: [(0, '1710.476')] [2024-12-13 08:59:24,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000005808_2973696.pth... [2024-12-13 08:59:24,086][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000005760_2949120.pth [2024-12-13 08:59:29,078][62436] Fps is (10 sec: 819.0, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 2977792. Throughput: 0: 826.6. Samples: 2977792. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:59:29,079][62436] Avg episode reward: [(0, '1731.235')] [2024-12-13 08:59:34,075][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 2981888. Throughput: 0: 813.1. Samples: 2981876. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:59:34,076][62436] Avg episode reward: [(0, '1648.427')] [2024-12-13 08:59:39,076][62436] Fps is (10 sec: 819.4, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 2985984. Throughput: 0: 775.4. Samples: 2985884. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:59:39,076][62436] Avg episode reward: [(0, '1634.973')] [2024-12-13 08:59:39,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000005832_2985984.pth... [2024-12-13 08:59:39,090][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000005784_2961408.pth [2024-12-13 08:59:44,076][62436] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 2985984. Throughput: 0: 787.2. Samples: 2988412. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:59:44,076][62436] Avg episode reward: [(0, '1635.107')] [2024-12-13 08:59:44,552][62492] Updated weights for policy 0, policy_version 5840 (0.0010) [2024-12-13 08:59:49,076][62436] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 2990080. Throughput: 0: 777.3. Samples: 2992440. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:59:49,076][62436] Avg episode reward: [(0, '1683.599')] [2024-12-13 08:59:54,076][62436] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 2998272. Throughput: 0: 778.2. Samples: 2998184. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:59:54,076][62436] Avg episode reward: [(0, '1770.543')] [2024-12-13 08:59:54,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000005856_2998272.pth... [2024-12-13 08:59:54,091][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000005808_2973696.pth [2024-12-13 08:59:59,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 2998272. Throughput: 0: 790.1. Samples: 3000892. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 08:59:59,076][62436] Avg episode reward: [(0, '1732.409')] [2024-12-13 09:00:04,076][62436] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 3002368. Throughput: 0: 774.8. Samples: 3004584. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:00:04,076][62436] Avg episode reward: [(0, '1744.297')] [2024-12-13 09:00:09,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 3006464. Throughput: 0: 771.3. Samples: 3010240. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:00:09,076][62436] Avg episode reward: [(0, '1666.275')] [2024-12-13 09:00:09,116][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000005880_3010560.pth... [2024-12-13 09:00:09,123][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000005832_2985984.pth [2024-12-13 09:00:14,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 3010560. Throughput: 0: 782.5. Samples: 3013004. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:00:14,076][62436] Avg episode reward: [(0, '1602.163')] [2024-12-13 09:00:19,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 3014656. Throughput: 0: 779.0. Samples: 3016932. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 09:00:19,076][62436] Avg episode reward: [(0, '1655.172')] [2024-12-13 09:00:24,076][62436] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 3022848. Throughput: 0: 814.7. Samples: 3022544. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:00:24,076][62436] Avg episode reward: [(0, '1627.677')] [2024-12-13 09:00:24,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000005904_3022848.pth... [2024-12-13 09:00:24,090][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000005856_2998272.pth [2024-12-13 09:00:29,078][62436] Fps is (10 sec: 1228.5, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 3026944. Throughput: 0: 821.0. Samples: 3025360. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:00:29,080][62436] Avg episode reward: [(0, '1654.501')] [2024-12-13 09:00:34,076][62436] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 3026944. Throughput: 0: 825.6. Samples: 3029592. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:00:34,076][62436] Avg episode reward: [(0, '1623.231')] [2024-12-13 09:00:34,683][62492] Updated weights for policy 0, policy_version 5920 (0.0010) [2024-12-13 09:00:39,076][62436] Fps is (10 sec: 819.4, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 3035136. Throughput: 0: 816.5. Samples: 3034928. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:00:39,076][62436] Avg episode reward: [(0, '1679.230')] [2024-12-13 09:00:39,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000005928_3035136.pth... [2024-12-13 09:00:39,093][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000005880_3010560.pth [2024-12-13 09:00:44,076][62436] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 819.2). Total num frames: 3039232. Throughput: 0: 818.8. Samples: 3037740. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 09:00:44,078][62436] Avg episode reward: [(0, '1653.609')] [2024-12-13 09:00:49,076][62436] Fps is (10 sec: 409.6, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 3039232. Throughput: 0: 840.4. Samples: 3042400. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 09:00:49,076][62436] Avg episode reward: [(0, '1719.421')] [2024-12-13 09:00:54,076][62436] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 3047424. Throughput: 0: 823.9. Samples: 3047316. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 09:00:54,077][62436] Avg episode reward: [(0, '1641.274')] [2024-12-13 09:00:54,093][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000005952_3047424.pth... [2024-12-13 09:00:54,103][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000005904_3022848.pth [2024-12-13 09:00:59,076][62436] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 819.2). Total num frames: 3051520. Throughput: 0: 824.5. Samples: 3050108. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:00:59,076][62436] Avg episode reward: [(0, '1655.974')] [2024-12-13 09:01:04,075][62436] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 819.2). Total num frames: 3055616. Throughput: 0: 842.6. Samples: 3054848. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-12-13 09:01:04,076][62436] Avg episode reward: [(0, '1646.438')] [2024-12-13 09:01:09,076][62436] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 819.2). Total num frames: 3059712. Throughput: 0: 825.2. Samples: 3059680. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-12-13 09:01:09,076][62436] Avg episode reward: [(0, '1574.552')] [2024-12-13 09:01:09,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000005976_3059712.pth... [2024-12-13 09:01:09,091][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000005928_3035136.pth [2024-12-13 09:01:14,078][62436] Fps is (10 sec: 819.0, 60 sec: 887.4, 300 sec: 819.2). Total num frames: 3063808. Throughput: 0: 826.6. Samples: 3062556. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-12-13 09:01:14,079][62436] Avg episode reward: [(0, '1467.417')] [2024-12-13 09:01:19,075][62436] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 819.2). Total num frames: 3067904. Throughput: 0: 848.4. Samples: 3067768. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-12-13 09:01:19,076][62436] Avg episode reward: [(0, '1484.446')] [2024-12-13 09:01:23,486][62492] Updated weights for policy 0, policy_version 6000 (0.0014) [2024-12-13 09:01:24,076][62436] Fps is (10 sec: 819.4, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 3072000. Throughput: 0: 826.0. Samples: 3072096. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-12-13 09:01:24,076][62436] Avg episode reward: [(0, '1470.664')] [2024-12-13 09:01:24,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000006000_3072000.pth... [2024-12-13 09:01:24,090][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000005952_3047424.pth [2024-12-13 09:01:29,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 3076096. Throughput: 0: 830.0. Samples: 3075088. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-12-13 09:01:29,076][62436] Avg episode reward: [(0, '1566.495')] [2024-12-13 09:01:34,076][62436] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 819.2). Total num frames: 3080192. Throughput: 0: 839.8. Samples: 3080192. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 09:01:34,076][62436] Avg episode reward: [(0, '1523.325')] [2024-12-13 09:01:39,078][62436] Fps is (10 sec: 819.0, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 3084288. Throughput: 0: 825.8. Samples: 3084476. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 09:01:39,078][62436] Avg episode reward: [(0, '1536.394')] [2024-12-13 09:01:39,083][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000006024_3084288.pth... [2024-12-13 09:01:39,091][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000005976_3059712.pth [2024-12-13 09:01:44,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 3088384. Throughput: 0: 830.5. Samples: 3087480. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 09:01:44,076][62436] Avg episode reward: [(0, '1551.011')] [2024-12-13 09:01:49,076][62436] Fps is (10 sec: 819.4, 60 sec: 887.5, 300 sec: 819.2). Total num frames: 3092480. Throughput: 0: 846.3. Samples: 3092932. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2024-12-13 09:01:49,076][62436] Avg episode reward: [(0, '1564.131')] [2024-12-13 09:01:54,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 3096576. Throughput: 0: 826.1. Samples: 3096856. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:01:54,076][62436] Avg episode reward: [(0, '1657.160')] [2024-12-13 09:01:54,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000006048_3096576.pth... [2024-12-13 09:01:54,088][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000006000_3072000.pth [2024-12-13 09:01:59,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 3100672. Throughput: 0: 828.8. Samples: 3099852. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:01:59,076][62436] Avg episode reward: [(0, '1713.421')] [2024-12-13 09:02:04,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 3104768. Throughput: 0: 837.0. Samples: 3105432. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-12-13 09:02:04,076][62436] Avg episode reward: [(0, '1626.146')] [2024-12-13 09:02:09,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 3108864. Throughput: 0: 826.0. Samples: 3109264. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:02:09,076][62436] Avg episode reward: [(0, '1602.876')] [2024-12-13 09:02:09,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000006072_3108864.pth... [2024-12-13 09:02:09,090][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000006024_3084288.pth [2024-12-13 09:02:12,510][62492] Updated weights for policy 0, policy_version 6080 (0.0010) [2024-12-13 09:02:14,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 3112960. Throughput: 0: 824.8. Samples: 3112204. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:02:14,076][62436] Avg episode reward: [(0, '1714.501')] [2024-12-13 09:02:19,077][62436] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 3117056. Throughput: 0: 837.7. Samples: 3117892. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:02:19,078][62436] Avg episode reward: [(0, '1586.748')] [2024-12-13 09:02:24,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 3121152. Throughput: 0: 833.2. Samples: 3121968. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:02:24,076][62436] Avg episode reward: [(0, '1651.444')] [2024-12-13 09:02:24,090][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000006096_3121152.pth... [2024-12-13 09:02:24,096][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000006048_3096576.pth [2024-12-13 09:02:29,076][62436] Fps is (10 sec: 819.3, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 3125248. Throughput: 0: 827.8. Samples: 3124732. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:02:29,076][62436] Avg episode reward: [(0, '1571.554')] [2024-12-13 09:02:34,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 3129344. Throughput: 0: 829.8. Samples: 3130272. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:02:34,076][62436] Avg episode reward: [(0, '1447.748')] [2024-12-13 09:02:39,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 3133440. Throughput: 0: 836.9. Samples: 3134516. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 09:02:39,076][62436] Avg episode reward: [(0, '1461.281')] [2024-12-13 09:02:39,087][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000006120_3133440.pth... [2024-12-13 09:02:39,110][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000006072_3108864.pth [2024-12-13 09:02:44,078][62436] Fps is (10 sec: 819.0, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 3137536. Throughput: 0: 826.0. Samples: 3137024. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 09:02:44,079][62436] Avg episode reward: [(0, '1526.760')] [2024-12-13 09:02:49,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 3141632. Throughput: 0: 826.7. Samples: 3142632. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2024-12-13 09:02:49,076][62436] Avg episode reward: [(0, '1517.744')] [2024-12-13 09:02:54,081][62436] Fps is (10 sec: 818.9, 60 sec: 819.1, 300 sec: 819.2). Total num frames: 3145728. Throughput: 0: 843.1. Samples: 3147208. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2024-12-13 09:02:54,082][62436] Avg episode reward: [(0, '1633.718')] [2024-12-13 09:02:54,087][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000006144_3145728.pth... [2024-12-13 09:02:54,095][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000006096_3121152.pth [2024-12-13 09:02:59,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 3149824. Throughput: 0: 827.0. Samples: 3149420. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 09:02:59,076][62436] Avg episode reward: [(0, '1638.398')] [2024-12-13 09:03:01,536][62492] Updated weights for policy 0, policy_version 6160 (0.0013) [2024-12-13 09:03:04,076][62436] Fps is (10 sec: 819.7, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 3153920. Throughput: 0: 823.1. Samples: 3154932. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 09:03:04,076][62436] Avg episode reward: [(0, '1591.927')] [2024-12-13 09:03:09,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 3158016. Throughput: 0: 840.7. Samples: 3159800. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 09:03:09,076][62436] Avg episode reward: [(0, '1601.530')] [2024-12-13 09:03:09,084][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000006168_3158016.pth... [2024-12-13 09:03:09,097][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000006120_3133440.pth [2024-12-13 09:03:14,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 3162112. Throughput: 0: 822.3. Samples: 3161736. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:03:14,076][62436] Avg episode reward: [(0, '1688.743')] [2024-12-13 09:03:19,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 3166208. Throughput: 0: 821.7. Samples: 3167248. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:03:19,076][62436] Avg episode reward: [(0, '1724.821')] [2024-12-13 09:03:24,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 3170304. Throughput: 0: 841.6. Samples: 3172388. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:03:24,076][62436] Avg episode reward: [(0, '1727.633')] [2024-12-13 09:03:24,083][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000006192_3170304.pth... [2024-12-13 09:03:24,093][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000006144_3145728.pth [2024-12-13 09:03:29,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 3174400. Throughput: 0: 827.1. Samples: 3174240. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:03:29,076][62436] Avg episode reward: [(0, '1759.855')] [2024-12-13 09:03:34,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 3178496. Throughput: 0: 817.4. Samples: 3179416. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:03:34,076][62436] Avg episode reward: [(0, '1786.611')] [2024-12-13 09:03:39,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 3182592. Throughput: 0: 835.5. Samples: 3184800. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:03:39,078][62436] Avg episode reward: [(0, '1768.356')] [2024-12-13 09:03:39,083][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000006216_3182592.pth... [2024-12-13 09:03:39,092][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000006168_3158016.pth [2024-12-13 09:03:44,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 3186688. Throughput: 0: 827.3. Samples: 3186648. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-12-13 09:03:44,076][62436] Avg episode reward: [(0, '1791.325')] [2024-12-13 09:03:49,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 3190784. Throughput: 0: 816.4. Samples: 3191672. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-12-13 09:03:49,076][62436] Avg episode reward: [(0, '1812.907')] [2024-12-13 09:03:50,988][62492] Updated weights for policy 0, policy_version 6240 (0.0011) [2024-12-13 09:03:54,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.3, 300 sec: 819.2). Total num frames: 3194880. Throughput: 0: 834.8. Samples: 3197368. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:03:54,076][62436] Avg episode reward: [(0, '1809.676')] [2024-12-13 09:03:54,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000006240_3194880.pth... [2024-12-13 09:03:54,088][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000006192_3170304.pth [2024-12-13 09:03:59,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 3198976. Throughput: 0: 831.2. Samples: 3199140. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:03:59,076][62436] Avg episode reward: [(0, '1850.739')] [2024-12-13 09:04:04,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 3203072. Throughput: 0: 816.7. Samples: 3204000. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:04:04,076][62436] Avg episode reward: [(0, '1858.025')] [2024-12-13 09:04:09,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 3207168. Throughput: 0: 821.3. Samples: 3209348. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:04:09,079][62436] Avg episode reward: [(0, '1900.811')] [2024-12-13 09:04:09,089][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000006264_3207168.pth... [2024-12-13 09:04:09,107][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000006216_3182592.pth [2024-12-13 09:04:14,076][62436] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 3211264. Throughput: 0: 814.7. Samples: 3210904. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:04:14,077][62436] Avg episode reward: [(0, '1918.286')] [2024-12-13 09:04:19,080][62436] Fps is (10 sec: 818.8, 60 sec: 819.1, 300 sec: 819.2). Total num frames: 3215360. Throughput: 0: 774.1. Samples: 3214256. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:04:19,081][62436] Avg episode reward: [(0, '1854.343')] [2024-12-13 09:04:24,076][62436] Fps is (10 sec: 819.3, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 3219456. Throughput: 0: 778.6. Samples: 3219836. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:04:24,076][62436] Avg episode reward: [(0, '1813.581')] [2024-12-13 09:04:24,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000006288_3219456.pth... [2024-12-13 09:04:24,090][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000006240_3194880.pth [2024-12-13 09:04:29,083][62436] Fps is (10 sec: 818.9, 60 sec: 819.1, 300 sec: 819.2). Total num frames: 3223552. Throughput: 0: 802.6. Samples: 3222772. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 09:04:29,084][62436] Avg episode reward: [(0, '1844.477')] [2024-12-13 09:04:34,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 3227648. Throughput: 0: 774.9. Samples: 3226544. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-12-13 09:04:34,076][62436] Avg episode reward: [(0, '1871.927')] [2024-12-13 09:04:39,076][62436] Fps is (10 sec: 819.8, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 3231744. Throughput: 0: 770.2. Samples: 3232028. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-12-13 09:04:39,076][62436] Avg episode reward: [(0, '1947.472')] [2024-12-13 09:04:39,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000006312_3231744.pth... [2024-12-13 09:04:39,089][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000006264_3207168.pth [2024-12-13 09:04:42,653][62492] Updated weights for policy 0, policy_version 6320 (0.0013) [2024-12-13 09:04:44,075][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 3235840. Throughput: 0: 796.0. Samples: 3234960. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 09:04:44,076][62436] Avg episode reward: [(0, '2067.957')] [2024-12-13 09:04:44,080][62473] Saving new best policy, reward=2067.957! [2024-12-13 09:04:49,082][62436] Fps is (10 sec: 818.7, 60 sec: 819.1, 300 sec: 819.2). Total num frames: 3239936. Throughput: 0: 779.4. Samples: 3239076. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:04:49,083][62436] Avg episode reward: [(0, '2066.875')] [2024-12-13 09:04:54,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 3244032. Throughput: 0: 774.3. Samples: 3244192. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:04:54,076][62436] Avg episode reward: [(0, '2103.056')] [2024-12-13 09:04:54,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000006336_3244032.pth... [2024-12-13 09:04:54,090][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000006288_3219456.pth [2024-12-13 09:04:54,091][62473] Saving new best policy, reward=2103.056! [2024-12-13 09:04:59,076][62436] Fps is (10 sec: 819.7, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 3248128. Throughput: 0: 802.7. Samples: 3247024. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 09:04:59,076][62436] Avg episode reward: [(0, '2068.122')] [2024-12-13 09:05:04,076][62436] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 819.2). Total num frames: 3248128. Throughput: 0: 818.8. Samples: 3251096. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 09:05:04,076][62436] Avg episode reward: [(0, '2084.305')] [2024-12-13 09:05:09,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 3256320. Throughput: 0: 804.2. Samples: 3256024. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 09:05:09,076][62436] Avg episode reward: [(0, '2068.854')] [2024-12-13 09:05:09,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000006360_3256320.pth... [2024-12-13 09:05:09,089][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000006312_3231744.pth [2024-12-13 09:05:14,076][62436] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 3260416. Throughput: 0: 799.6. Samples: 3258748. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 09:05:14,076][62436] Avg episode reward: [(0, '2073.801')] [2024-12-13 09:05:19,076][62436] Fps is (10 sec: 409.6, 60 sec: 751.0, 300 sec: 805.3). Total num frames: 3260416. Throughput: 0: 823.4. Samples: 3263596. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 09:05:19,076][62436] Avg episode reward: [(0, '1993.403')] [2024-12-13 09:05:24,076][62436] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 3264512. Throughput: 0: 804.1. Samples: 3268212. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-12-13 09:05:24,077][62436] Avg episode reward: [(0, '1970.562')] [2024-12-13 09:05:24,166][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000006384_3268608.pth... [2024-12-13 09:05:24,172][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000006336_3244032.pth [2024-12-13 09:05:29,078][62436] Fps is (10 sec: 1228.5, 60 sec: 819.3, 300 sec: 833.1). Total num frames: 3272704. Throughput: 0: 800.1. Samples: 3270968. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2024-12-13 09:05:29,079][62436] Avg episode reward: [(0, '1877.150')] [2024-12-13 09:05:34,064][62492] Updated weights for policy 0, policy_version 6400 (0.0010) [2024-12-13 09:05:34,083][62436] Fps is (10 sec: 1227.9, 60 sec: 819.1, 300 sec: 819.2). Total num frames: 3276800. Throughput: 0: 821.5. Samples: 3276044. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2024-12-13 09:05:34,083][62436] Avg episode reward: [(0, '1853.388')] [2024-12-13 09:05:39,076][62436] Fps is (10 sec: 409.7, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 3276800. Throughput: 0: 802.5. Samples: 3280304. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2024-12-13 09:05:39,076][62436] Avg episode reward: [(0, '1888.779')] [2024-12-13 09:05:39,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000006400_3276800.pth... [2024-12-13 09:05:39,092][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000006360_3256320.pth [2024-12-13 09:05:44,076][62436] Fps is (10 sec: 819.8, 60 sec: 819.2, 300 sec: 833.1). Total num frames: 3284992. Throughput: 0: 800.1. Samples: 3283028. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 09:05:44,076][62436] Avg episode reward: [(0, '1830.357')] [2024-12-13 09:05:49,076][62436] Fps is (10 sec: 1228.8, 60 sec: 819.3, 300 sec: 819.2). Total num frames: 3289088. Throughput: 0: 830.6. Samples: 3288472. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 09:05:49,080][62436] Avg episode reward: [(0, '1841.343')] [2024-12-13 09:05:54,076][62436] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 3289088. Throughput: 0: 809.4. Samples: 3292448. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 09:05:54,076][62436] Avg episode reward: [(0, '1922.346')] [2024-12-13 09:05:54,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000006424_3289088.pth... [2024-12-13 09:05:54,089][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000006384_3268608.pth [2024-12-13 09:05:59,078][62436] Fps is (10 sec: 819.0, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 3297280. Throughput: 0: 810.5. Samples: 3295224. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:05:59,078][62436] Avg episode reward: [(0, '1992.596')] [2024-12-13 09:06:04,076][62436] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 819.2). Total num frames: 3301376. Throughput: 0: 827.4. Samples: 3300828. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:06:04,079][62436] Avg episode reward: [(0, '2004.633')] [2024-12-13 09:06:09,076][62436] Fps is (10 sec: 409.7, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 3301376. Throughput: 0: 808.1. Samples: 3304576. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:06:09,076][62436] Avg episode reward: [(0, '2081.104')] [2024-12-13 09:06:09,083][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000006448_3301376.pth... [2024-12-13 09:06:09,091][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000006400_3276800.pth [2024-12-13 09:06:14,076][62436] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 3305472. Throughput: 0: 806.7. Samples: 3307268. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 09:06:14,076][62436] Avg episode reward: [(0, '2081.476')] [2024-12-13 09:06:19,076][62436] Fps is (10 sec: 1228.7, 60 sec: 887.5, 300 sec: 819.2). Total num frames: 3313664. Throughput: 0: 821.4. Samples: 3313000. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 09:06:19,077][62436] Avg episode reward: [(0, '2105.178')] [2024-12-13 09:06:19,078][62473] Saving new best policy, reward=2105.178! [2024-12-13 09:06:24,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 3313664. Throughput: 0: 816.9. Samples: 3317064. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 09:06:24,078][62436] Avg episode reward: [(0, '2198.656')] [2024-12-13 09:06:24,087][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000006472_3313664.pth... [2024-12-13 09:06:24,098][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000006424_3289088.pth [2024-12-13 09:06:24,099][62473] Saving new best policy, reward=2198.656! [2024-12-13 09:06:24,766][62492] Updated weights for policy 0, policy_version 6480 (0.0014) [2024-12-13 09:06:29,075][62436] Fps is (10 sec: 409.6, 60 sec: 751.0, 300 sec: 805.3). Total num frames: 3317760. Throughput: 0: 811.2. Samples: 3319532. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 09:06:29,076][62436] Avg episode reward: [(0, '2302.864')] [2024-12-13 09:06:29,190][62473] Saving new best policy, reward=2302.864! [2024-12-13 09:06:34,076][62436] Fps is (10 sec: 1228.8, 60 sec: 819.3, 300 sec: 819.2). Total num frames: 3325952. Throughput: 0: 814.0. Samples: 3325100. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 09:06:34,076][62436] Avg episode reward: [(0, '2363.802')] [2024-12-13 09:06:34,077][62473] Saving new best policy, reward=2363.802! [2024-12-13 09:06:39,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 3325952. Throughput: 0: 821.2. Samples: 3329404. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 09:06:39,077][62436] Avg episode reward: [(0, '2423.535')] [2024-12-13 09:06:39,083][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000006496_3325952.pth... [2024-12-13 09:06:39,091][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000006448_3301376.pth [2024-12-13 09:06:39,092][62473] Saving new best policy, reward=2423.535! [2024-12-13 09:06:44,076][62436] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 3330048. Throughput: 0: 809.0. Samples: 3331628. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-12-13 09:06:44,076][62436] Avg episode reward: [(0, '2425.221')] [2024-12-13 09:06:44,077][62473] Saving new best policy, reward=2425.221! [2024-12-13 09:06:49,076][62436] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 3338240. Throughput: 0: 810.0. Samples: 3337280. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:06:49,076][62436] Avg episode reward: [(0, '2374.308')] [2024-12-13 09:06:54,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 3338240. Throughput: 0: 825.5. Samples: 3341724. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:06:54,078][62436] Avg episode reward: [(0, '2417.199')] [2024-12-13 09:06:54,095][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000006520_3338240.pth... [2024-12-13 09:06:54,102][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000006472_3313664.pth [2024-12-13 09:06:59,076][62436] Fps is (10 sec: 409.6, 60 sec: 751.0, 300 sec: 805.3). Total num frames: 3342336. Throughput: 0: 810.6. Samples: 3343744. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:06:59,076][62436] Avg episode reward: [(0, '2372.156')] [2024-12-13 09:07:04,076][62436] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 3350528. Throughput: 0: 808.1. Samples: 3349364. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 09:07:04,076][62436] Avg episode reward: [(0, '2342.554')] [2024-12-13 09:07:09,078][62436] Fps is (10 sec: 819.0, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 3350528. Throughput: 0: 822.6. Samples: 3354084. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 09:07:09,083][62436] Avg episode reward: [(0, '2343.249')] [2024-12-13 09:07:09,089][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000006544_3350528.pth... [2024-12-13 09:07:09,095][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000006496_3325952.pth [2024-12-13 09:07:14,076][62436] Fps is (10 sec: 409.6, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 3354624. Throughput: 0: 806.5. Samples: 3355824. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 09:07:14,076][62436] Avg episode reward: [(0, '2396.470')] [2024-12-13 09:07:14,781][62492] Updated weights for policy 0, policy_version 6560 (0.0012) [2024-12-13 09:07:19,076][62436] Fps is (10 sec: 819.4, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 3358720. Throughput: 0: 808.2. Samples: 3361468. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-12-13 09:07:19,076][62436] Avg episode reward: [(0, '2410.875')] [2024-12-13 09:07:24,080][62436] Fps is (10 sec: 818.8, 60 sec: 819.1, 300 sec: 805.3). Total num frames: 3362816. Throughput: 0: 820.4. Samples: 3366324. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-12-13 09:07:24,081][62436] Avg episode reward: [(0, '2520.200')] [2024-12-13 09:07:24,093][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000006568_3362816.pth... [2024-12-13 09:07:24,114][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000006520_3338240.pth [2024-12-13 09:07:24,116][62473] Saving new best policy, reward=2520.200! [2024-12-13 09:07:29,078][62436] Fps is (10 sec: 819.0, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 3366912. Throughput: 0: 808.8. Samples: 3368028. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2024-12-13 09:07:29,079][62436] Avg episode reward: [(0, '2464.304')] [2024-12-13 09:07:34,076][62436] Fps is (10 sec: 819.6, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 3371008. Throughput: 0: 802.9. Samples: 3373412. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2024-12-13 09:07:34,076][62436] Avg episode reward: [(0, '2491.449')] [2024-12-13 09:07:39,076][62436] Fps is (10 sec: 819.4, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 3375104. Throughput: 0: 819.8. Samples: 3378616. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 09:07:39,076][62436] Avg episode reward: [(0, '2534.997')] [2024-12-13 09:07:39,085][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000006592_3375104.pth... [2024-12-13 09:07:39,093][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000006544_3350528.pth [2024-12-13 09:07:39,095][62473] Saving new best policy, reward=2534.997! [2024-12-13 09:07:44,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 3379200. Throughput: 0: 810.0. Samples: 3380192. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 09:07:44,076][62436] Avg episode reward: [(0, '2514.135')] [2024-12-13 09:07:49,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 3383296. Throughput: 0: 800.0. Samples: 3385364. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:07:49,076][62436] Avg episode reward: [(0, '2501.526')] [2024-12-13 09:07:54,080][62436] Fps is (10 sec: 818.8, 60 sec: 819.1, 300 sec: 805.3). Total num frames: 3387392. Throughput: 0: 815.1. Samples: 3390764. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:07:54,081][62436] Avg episode reward: [(0, '2515.926')] [2024-12-13 09:07:54,088][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000006616_3387392.pth... [2024-12-13 09:07:54,104][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000006568_3362816.pth [2024-12-13 09:07:59,078][62436] Fps is (10 sec: 819.0, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 3391488. Throughput: 0: 814.5. Samples: 3392480. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:07:59,079][62436] Avg episode reward: [(0, '2454.819')] [2024-12-13 09:08:04,076][62436] Fps is (10 sec: 819.6, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 3395584. Throughput: 0: 796.3. Samples: 3397300. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:08:04,076][62436] Avg episode reward: [(0, '2459.729')] [2024-12-13 09:08:05,253][62492] Updated weights for policy 0, policy_version 6640 (0.0016) [2024-12-13 09:08:09,079][62436] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 3399680. Throughput: 0: 813.0. Samples: 3402908. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:08:09,080][62436] Avg episode reward: [(0, '2558.040')] [2024-12-13 09:08:09,091][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000006640_3399680.pth... [2024-12-13 09:08:09,104][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000006592_3375104.pth [2024-12-13 09:08:09,105][62473] Saving new best policy, reward=2558.040! [2024-12-13 09:08:14,078][62436] Fps is (10 sec: 819.0, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 3403776. Throughput: 0: 810.1. Samples: 3404480. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:08:14,078][62436] Avg episode reward: [(0, '2635.351')] [2024-12-13 09:08:14,079][62473] Saving new best policy, reward=2635.351! [2024-12-13 09:08:19,076][62436] Fps is (10 sec: 819.5, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 3407872. Throughput: 0: 797.0. Samples: 3409276. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 09:08:19,076][62436] Avg episode reward: [(0, '2638.937')] [2024-12-13 09:08:19,077][62473] Saving new best policy, reward=2638.937! [2024-12-13 09:08:24,078][62436] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 3411968. Throughput: 0: 804.2. Samples: 3414808. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 09:08:24,079][62436] Avg episode reward: [(0, '2548.377')] [2024-12-13 09:08:24,088][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000006664_3411968.pth... [2024-12-13 09:08:24,095][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000006616_3387392.pth [2024-12-13 09:08:29,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 3416064. Throughput: 0: 810.4. Samples: 3416660. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:08:29,076][62436] Avg episode reward: [(0, '2586.033')] [2024-12-13 09:08:34,076][62436] Fps is (10 sec: 819.4, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 3420160. Throughput: 0: 794.0. Samples: 3421092. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:08:34,076][62436] Avg episode reward: [(0, '2579.548')] [2024-12-13 09:08:39,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 3424256. Throughput: 0: 797.2. Samples: 3426636. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 09:08:39,076][62436] Avg episode reward: [(0, '2635.144')] [2024-12-13 09:08:39,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000006688_3424256.pth... [2024-12-13 09:08:39,088][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000006640_3399680.pth [2024-12-13 09:08:44,077][62436] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 3428352. Throughput: 0: 802.6. Samples: 3428596. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 09:08:44,079][62436] Avg episode reward: [(0, '2587.478')] [2024-12-13 09:08:49,076][62436] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 3428352. Throughput: 0: 761.1. Samples: 3431548. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 09:08:49,076][62436] Avg episode reward: [(0, '2625.248')] [2024-12-13 09:08:54,076][62436] Fps is (10 sec: 819.3, 60 sec: 819.3, 300 sec: 805.3). Total num frames: 3436544. Throughput: 0: 747.5. Samples: 3436544. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:08:54,076][62436] Avg episode reward: [(0, '2693.243')] [2024-12-13 09:08:54,090][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000006712_3436544.pth... [2024-12-13 09:08:54,097][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000006664_3411968.pth [2024-12-13 09:08:54,101][62473] Saving new best policy, reward=2693.243! [2024-12-13 09:08:58,148][62492] Updated weights for policy 0, policy_version 6720 (0.0011) [2024-12-13 09:08:59,076][62436] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 3440640. Throughput: 0: 774.6. Samples: 3439336. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:08:59,076][62436] Avg episode reward: [(0, '2735.983')] [2024-12-13 09:08:59,077][62473] Saving new best policy, reward=2735.983! [2024-12-13 09:09:04,076][62436] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 3440640. Throughput: 0: 762.0. Samples: 3443568. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:09:04,076][62436] Avg episode reward: [(0, '2790.010')] [2024-12-13 09:09:04,077][62473] Saving new best policy, reward=2790.010! [2024-12-13 09:09:09,076][62436] Fps is (10 sec: 409.6, 60 sec: 751.0, 300 sec: 791.4). Total num frames: 3444736. Throughput: 0: 748.8. Samples: 3448500. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:09:09,077][62436] Avg episode reward: [(0, '2720.991')] [2024-12-13 09:09:09,116][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000006736_3448832.pth... [2024-12-13 09:09:09,120][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000006688_3424256.pth [2024-12-13 09:09:14,076][62436] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 3452928. Throughput: 0: 767.6. Samples: 3451200. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:09:14,076][62436] Avg episode reward: [(0, '2593.907')] [2024-12-13 09:09:19,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 3452928. Throughput: 0: 774.4. Samples: 3455940. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:09:19,076][62436] Avg episode reward: [(0, '2604.431')] [2024-12-13 09:09:24,076][62436] Fps is (10 sec: 409.6, 60 sec: 751.0, 300 sec: 791.5). Total num frames: 3457024. Throughput: 0: 754.5. Samples: 3460588. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 09:09:24,076][62436] Avg episode reward: [(0, '2509.616')] [2024-12-13 09:09:24,083][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000006752_3457024.pth... [2024-12-13 09:09:24,090][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000006712_3436544.pth [2024-12-13 09:09:29,076][62436] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 3465216. Throughput: 0: 770.6. Samples: 3463272. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:09:29,077][62436] Avg episode reward: [(0, '2397.079')] [2024-12-13 09:09:34,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 3465216. Throughput: 0: 813.5. Samples: 3468156. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:09:34,076][62436] Avg episode reward: [(0, '2313.448')] [2024-12-13 09:09:39,076][62436] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 3469312. Throughput: 0: 798.3. Samples: 3472468. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:09:39,076][62436] Avg episode reward: [(0, '2231.537')] [2024-12-13 09:09:39,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000006776_3469312.pth... [2024-12-13 09:09:39,088][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000006736_3448832.pth [2024-12-13 09:09:44,076][62436] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 3477504. Throughput: 0: 798.0. Samples: 3475248. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:09:44,076][62436] Avg episode reward: [(0, '2090.972')] [2024-12-13 09:09:49,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 3477504. Throughput: 0: 821.2. Samples: 3480520. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:09:49,076][62436] Avg episode reward: [(0, '2071.920')] [2024-12-13 09:09:49,676][62492] Updated weights for policy 0, policy_version 6800 (0.0011) [2024-12-13 09:09:54,076][62436] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 3481600. Throughput: 0: 803.2. Samples: 3484644. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2024-12-13 09:09:54,076][62436] Avg episode reward: [(0, '2058.189')] [2024-12-13 09:09:54,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000006800_3481600.pth... [2024-12-13 09:09:54,094][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000006752_3457024.pth [2024-12-13 09:09:59,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 3485696. Throughput: 0: 802.1. Samples: 3487296. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2024-12-13 09:09:59,076][62436] Avg episode reward: [(0, '2118.491')] [2024-12-13 09:10:04,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 3489792. Throughput: 0: 816.3. Samples: 3492672. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:10:04,076][62436] Avg episode reward: [(0, '2060.937')] [2024-12-13 09:10:09,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 3493888. Throughput: 0: 797.9. Samples: 3496492. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:10:09,076][62436] Avg episode reward: [(0, '2108.058')] [2024-12-13 09:10:09,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000006824_3493888.pth... [2024-12-13 09:10:09,089][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000006776_3469312.pth [2024-12-13 09:10:14,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 3497984. Throughput: 0: 797.2. Samples: 3499144. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:10:14,076][62436] Avg episode reward: [(0, '2201.282')] [2024-12-13 09:10:19,075][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 3502080. Throughput: 0: 816.4. Samples: 3504896. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:10:19,076][62436] Avg episode reward: [(0, '2164.194')] [2024-12-13 09:10:24,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 3506176. Throughput: 0: 804.4. Samples: 3508668. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:10:24,079][62436] Avg episode reward: [(0, '2141.091')] [2024-12-13 09:10:24,085][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000006848_3506176.pth... [2024-12-13 09:10:24,094][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000006800_3481600.pth [2024-12-13 09:10:29,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 791.5). Total num frames: 3510272. Throughput: 0: 802.7. Samples: 3511368. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:10:29,076][62436] Avg episode reward: [(0, '2207.858')] [2024-12-13 09:10:34,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 3514368. Throughput: 0: 807.6. Samples: 3516864. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:10:34,076][62436] Avg episode reward: [(0, '2243.069')] [2024-12-13 09:10:39,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 3518464. Throughput: 0: 805.1. Samples: 3520872. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:10:39,076][62436] Avg episode reward: [(0, '2269.385')] [2024-12-13 09:10:39,083][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000006872_3518464.pth... [2024-12-13 09:10:39,100][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000006824_3493888.pth [2024-12-13 09:10:40,761][62492] Updated weights for policy 0, policy_version 6880 (0.0011) [2024-12-13 09:10:44,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 3522560. Throughput: 0: 802.0. Samples: 3523384. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:10:44,076][62436] Avg episode reward: [(0, '2484.690')] [2024-12-13 09:10:49,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 3526656. Throughput: 0: 808.5. Samples: 3529056. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:10:49,076][62436] Avg episode reward: [(0, '2503.512')] [2024-12-13 09:10:54,080][62436] Fps is (10 sec: 818.9, 60 sec: 819.1, 300 sec: 791.4). Total num frames: 3530752. Throughput: 0: 817.3. Samples: 3533272. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 09:10:54,080][62436] Avg episode reward: [(0, '2396.397')] [2024-12-13 09:10:54,092][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000006896_3530752.pth... [2024-12-13 09:10:54,098][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000006848_3506176.pth [2024-12-13 09:10:59,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 3534848. Throughput: 0: 807.2. Samples: 3535468. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2024-12-13 09:10:59,076][62436] Avg episode reward: [(0, '2390.924')] [2024-12-13 09:11:04,076][62436] Fps is (10 sec: 819.5, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 3538944. Throughput: 0: 805.1. Samples: 3541124. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2024-12-13 09:11:04,076][62436] Avg episode reward: [(0, '2386.667')] [2024-12-13 09:11:09,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 3543040. Throughput: 0: 822.8. Samples: 3545692. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:11:09,076][62436] Avg episode reward: [(0, '2473.736')] [2024-12-13 09:11:09,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000006920_3543040.pth... [2024-12-13 09:11:09,088][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000006872_3518464.pth [2024-12-13 09:11:14,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 3547136. Throughput: 0: 804.5. Samples: 3547572. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-12-13 09:11:14,076][62436] Avg episode reward: [(0, '2515.717')] [2024-12-13 09:11:19,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 3551232. Throughput: 0: 809.5. Samples: 3553292. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-12-13 09:11:19,076][62436] Avg episode reward: [(0, '2518.738')] [2024-12-13 09:11:24,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 3555328. Throughput: 0: 826.8. Samples: 3558076. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2024-12-13 09:11:24,076][62436] Avg episode reward: [(0, '2598.207')] [2024-12-13 09:11:24,095][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000006944_3555328.pth... [2024-12-13 09:11:24,103][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000006896_3530752.pth [2024-12-13 09:11:29,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 3559424. Throughput: 0: 808.6. Samples: 3559772. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:11:29,076][62436] Avg episode reward: [(0, '2694.261')] [2024-12-13 09:11:30,673][62492] Updated weights for policy 0, policy_version 6960 (0.0010) [2024-12-13 09:11:34,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 3563520. Throughput: 0: 806.6. Samples: 3565352. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:11:34,076][62436] Avg episode reward: [(0, '2771.636')] [2024-12-13 09:11:39,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 3567616. Throughput: 0: 823.8. Samples: 3570340. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:11:39,076][62436] Avg episode reward: [(0, '2842.065')] [2024-12-13 09:11:39,083][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000006968_3567616.pth... [2024-12-13 09:11:39,089][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000006920_3543040.pth [2024-12-13 09:11:39,092][62473] Saving new best policy, reward=2842.065! [2024-12-13 09:11:44,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 3571712. Throughput: 0: 813.2. Samples: 3572060. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:11:44,076][62436] Avg episode reward: [(0, '2644.706')] [2024-12-13 09:11:49,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 3575808. Throughput: 0: 805.5. Samples: 3577372. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:11:49,076][62436] Avg episode reward: [(0, '2662.110')] [2024-12-13 09:11:54,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.3, 300 sec: 805.3). Total num frames: 3579904. Throughput: 0: 821.1. Samples: 3582640. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:11:54,076][62436] Avg episode reward: [(0, '2755.638')] [2024-12-13 09:11:54,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000006992_3579904.pth... [2024-12-13 09:11:54,092][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000006944_3555328.pth [2024-12-13 09:11:59,075][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 3584000. Throughput: 0: 818.8. Samples: 3584416. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:11:59,076][62436] Avg episode reward: [(0, '2783.242')] [2024-12-13 09:12:04,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 3588096. Throughput: 0: 801.8. Samples: 3589372. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:12:04,076][62436] Avg episode reward: [(0, '2795.456')] [2024-12-13 09:12:09,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 3592192. Throughput: 0: 819.4. Samples: 3594948. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-12-13 09:12:09,076][62436] Avg episode reward: [(0, '2735.370')] [2024-12-13 09:12:09,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000007016_3592192.pth... [2024-12-13 09:12:09,086][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000006968_3567616.pth [2024-12-13 09:12:14,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 3596288. Throughput: 0: 818.0. Samples: 3596580. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-12-13 09:12:14,077][62436] Avg episode reward: [(0, '2714.367')] [2024-12-13 09:12:19,075][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 3600384. Throughput: 0: 802.2. Samples: 3601452. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:12:19,076][62436] Avg episode reward: [(0, '2628.916')] [2024-12-13 09:12:20,921][62492] Updated weights for policy 0, policy_version 7040 (0.0010) [2024-12-13 09:12:24,078][62436] Fps is (10 sec: 819.0, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 3604480. Throughput: 0: 816.2. Samples: 3607072. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:12:24,078][62436] Avg episode reward: [(0, '2562.198')] [2024-12-13 09:12:24,083][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000007040_3604480.pth... [2024-12-13 09:12:24,103][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000006992_3579904.pth [2024-12-13 09:12:29,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 3608576. Throughput: 0: 819.8. Samples: 3608952. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:12:29,076][62436] Avg episode reward: [(0, '2531.521')] [2024-12-13 09:12:34,076][62436] Fps is (10 sec: 819.4, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 3612672. Throughput: 0: 805.5. Samples: 3613620. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:12:34,076][62436] Avg episode reward: [(0, '2466.880')] [2024-12-13 09:12:39,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 3616768. Throughput: 0: 814.6. Samples: 3619296. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:12:39,076][62436] Avg episode reward: [(0, '2522.721')] [2024-12-13 09:12:39,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000007064_3616768.pth... [2024-12-13 09:12:39,090][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000007016_3592192.pth [2024-12-13 09:12:44,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 3620864. Throughput: 0: 823.9. Samples: 3621492. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:12:44,078][62436] Avg episode reward: [(0, '2504.941')] [2024-12-13 09:12:49,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 3624960. Throughput: 0: 808.6. Samples: 3625760. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:12:49,076][62436] Avg episode reward: [(0, '2480.503')] [2024-12-13 09:12:54,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 3629056. Throughput: 0: 810.8. Samples: 3631436. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:12:54,076][62436] Avg episode reward: [(0, '2380.982')] [2024-12-13 09:12:54,084][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000007088_3629056.pth... [2024-12-13 09:12:54,092][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000007040_3604480.pth [2024-12-13 09:12:59,077][62436] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 3633152. Throughput: 0: 828.0. Samples: 3633840. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:12:59,078][62436] Avg episode reward: [(0, '2343.627')] [2024-12-13 09:13:04,076][62436] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 3637248. Throughput: 0: 809.9. Samples: 3637896. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:13:04,077][62436] Avg episode reward: [(0, '2308.300')] [2024-12-13 09:13:09,076][62436] Fps is (10 sec: 819.3, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 3641344. Throughput: 0: 811.3. Samples: 3643580. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:13:09,076][62436] Avg episode reward: [(0, '2271.919')] [2024-12-13 09:13:09,084][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000007112_3641344.pth... [2024-12-13 09:13:09,090][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000007064_3616768.pth [2024-12-13 09:13:10,704][62492] Updated weights for policy 0, policy_version 7120 (0.0011) [2024-12-13 09:13:14,076][62436] Fps is (10 sec: 819.3, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 3645440. Throughput: 0: 829.5. Samples: 3646280. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 09:13:14,076][62436] Avg episode reward: [(0, '2365.961')] [2024-12-13 09:13:19,078][62436] Fps is (10 sec: 819.0, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 3649536. Throughput: 0: 800.0. Samples: 3649620. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:13:19,079][62436] Avg episode reward: [(0, '2326.178')] [2024-12-13 09:13:24,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 3653632. Throughput: 0: 765.0. Samples: 3653720. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:13:24,076][62436] Avg episode reward: [(0, '2364.728')] [2024-12-13 09:13:24,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000007136_3653632.pth... [2024-12-13 09:13:24,102][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000007088_3629056.pth [2024-12-13 09:13:29,076][62436] Fps is (10 sec: 819.4, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 3657728. Throughput: 0: 782.1. Samples: 3656688. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:13:29,076][62436] Avg episode reward: [(0, '2306.013')] [2024-12-13 09:13:34,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 3661824. Throughput: 0: 792.3. Samples: 3661412. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:13:34,076][62436] Avg episode reward: [(0, '2298.853')] [2024-12-13 09:13:39,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 3665920. Throughput: 0: 767.4. Samples: 3665968. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:13:39,076][62436] Avg episode reward: [(0, '2195.585')] [2024-12-13 09:13:39,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000007160_3665920.pth... [2024-12-13 09:13:39,090][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000007112_3641344.pth [2024-12-13 09:13:44,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 3670016. Throughput: 0: 779.1. Samples: 3668896. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:13:44,077][62436] Avg episode reward: [(0, '2233.044')] [2024-12-13 09:13:49,076][62436] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 3674112. Throughput: 0: 800.7. Samples: 3673928. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:13:49,080][62436] Avg episode reward: [(0, '2248.424')] [2024-12-13 09:13:54,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 3678208. Throughput: 0: 771.7. Samples: 3678308. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:13:54,076][62436] Avg episode reward: [(0, '2207.099')] [2024-12-13 09:13:54,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000007184_3678208.pth... [2024-12-13 09:13:54,095][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000007136_3653632.pth [2024-12-13 09:13:59,076][62436] Fps is (10 sec: 819.3, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 3682304. Throughput: 0: 776.6. Samples: 3681228. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2024-12-13 09:13:59,076][62436] Avg episode reward: [(0, '2268.473')] [2024-12-13 09:14:02,742][62492] Updated weights for policy 0, policy_version 7200 (0.0013) [2024-12-13 09:14:04,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 3686400. Throughput: 0: 817.4. Samples: 3686400. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2024-12-13 09:14:04,076][62436] Avg episode reward: [(0, '2361.072')] [2024-12-13 09:14:09,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 3690496. Throughput: 0: 817.2. Samples: 3690496. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:14:09,076][62436] Avg episode reward: [(0, '2384.321')] [2024-12-13 09:14:09,085][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000007208_3690496.pth... [2024-12-13 09:14:09,092][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000007160_3665920.pth [2024-12-13 09:14:14,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 3694592. Throughput: 0: 812.9. Samples: 3693268. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:14:14,076][62436] Avg episode reward: [(0, '2338.264')] [2024-12-13 09:14:19,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 3698688. Throughput: 0: 828.4. Samples: 3698688. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:14:19,076][62436] Avg episode reward: [(0, '2291.635')] [2024-12-13 09:14:24,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 3702784. Throughput: 0: 814.8. Samples: 3702632. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:14:24,076][62436] Avg episode reward: [(0, '2366.922')] [2024-12-13 09:14:24,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000007232_3702784.pth... [2024-12-13 09:14:24,090][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000007184_3678208.pth [2024-12-13 09:14:29,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 3706880. Throughput: 0: 810.7. Samples: 3705376. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-12-13 09:14:29,076][62436] Avg episode reward: [(0, '2448.460')] [2024-12-13 09:14:34,075][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 3710976. Throughput: 0: 823.3. Samples: 3710976. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-12-13 09:14:34,076][62436] Avg episode reward: [(0, '2420.248')] [2024-12-13 09:14:39,076][62436] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 3710976. Throughput: 0: 813.0. Samples: 3714892. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-12-13 09:14:39,077][62436] Avg episode reward: [(0, '2433.435')] [2024-12-13 09:14:39,127][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000007256_3715072.pth... [2024-12-13 09:14:39,139][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000007208_3690496.pth [2024-12-13 09:14:44,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 3719168. Throughput: 0: 805.2. Samples: 3717460. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:14:44,079][62436] Avg episode reward: [(0, '2370.876')] [2024-12-13 09:14:49,078][62436] Fps is (10 sec: 1228.5, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 3723264. Throughput: 0: 817.2. Samples: 3723176. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:14:49,079][62436] Avg episode reward: [(0, '2471.884')] [2024-12-13 09:14:53,902][62492] Updated weights for policy 0, policy_version 7280 (0.0013) [2024-12-13 09:14:54,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 3727360. Throughput: 0: 817.9. Samples: 3727300. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:14:54,076][62436] Avg episode reward: [(0, '2609.676')] [2024-12-13 09:14:54,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000007280_3727360.pth... [2024-12-13 09:14:54,092][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000007232_3702784.pth [2024-12-13 09:14:59,076][62436] Fps is (10 sec: 819.4, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 3731456. Throughput: 0: 809.4. Samples: 3729692. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:14:59,076][62436] Avg episode reward: [(0, '2513.372')] [2024-12-13 09:15:04,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 3735552. Throughput: 0: 814.0. Samples: 3735316. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:15:04,076][62436] Avg episode reward: [(0, '2558.285')] [2024-12-13 09:15:09,078][62436] Fps is (10 sec: 819.0, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 3739648. Throughput: 0: 820.1. Samples: 3739540. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-12-13 09:15:09,085][62436] Avg episode reward: [(0, '2547.210')] [2024-12-13 09:15:09,091][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000007304_3739648.pth... [2024-12-13 09:15:09,096][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000007256_3715072.pth [2024-12-13 09:15:14,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 3743744. Throughput: 0: 806.4. Samples: 3741664. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-12-13 09:15:14,079][62436] Avg episode reward: [(0, '2619.425')] [2024-12-13 09:15:19,076][62436] Fps is (10 sec: 819.4, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 3747840. Throughput: 0: 807.3. Samples: 3747304. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 09:15:19,076][62436] Avg episode reward: [(0, '2693.325')] [2024-12-13 09:15:24,078][62436] Fps is (10 sec: 819.0, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 3751936. Throughput: 0: 823.2. Samples: 3751936. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 09:15:24,079][62436] Avg episode reward: [(0, '2652.011')] [2024-12-13 09:15:24,091][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000007328_3751936.pth... [2024-12-13 09:15:24,105][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000007280_3727360.pth [2024-12-13 09:15:29,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 3756032. Throughput: 0: 809.2. Samples: 3753872. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2024-12-13 09:15:29,077][62436] Avg episode reward: [(0, '2680.906')] [2024-12-13 09:15:34,076][62436] Fps is (10 sec: 819.4, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 3760128. Throughput: 0: 807.6. Samples: 3759516. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2024-12-13 09:15:34,076][62436] Avg episode reward: [(0, '2708.616')] [2024-12-13 09:15:39,078][62436] Fps is (10 sec: 819.0, 60 sec: 887.4, 300 sec: 819.2). Total num frames: 3764224. Throughput: 0: 821.9. Samples: 3764288. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:15:39,079][62436] Avg episode reward: [(0, '2760.777')] [2024-12-13 09:15:39,083][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000007352_3764224.pth... [2024-12-13 09:15:39,093][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000007304_3739648.pth [2024-12-13 09:15:44,076][62436] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 3764224. Throughput: 0: 809.6. Samples: 3766124. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:15:44,077][62436] Avg episode reward: [(0, '2786.444')] [2024-12-13 09:15:44,203][62492] Updated weights for policy 0, policy_version 7360 (0.0035) [2024-12-13 09:15:49,076][62436] Fps is (10 sec: 819.4, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 3772416. Throughput: 0: 807.0. Samples: 3771632. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 09:15:49,076][62436] Avg episode reward: [(0, '2875.618')] [2024-12-13 09:15:49,077][62473] Saving new best policy, reward=2875.618! [2024-12-13 09:15:54,076][62436] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 3776512. Throughput: 0: 821.6. Samples: 3776512. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 09:15:54,076][62436] Avg episode reward: [(0, '2947.082')] [2024-12-13 09:15:54,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000007376_3776512.pth... [2024-12-13 09:15:54,088][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000007328_3751936.pth [2024-12-13 09:15:54,093][62473] Saving new best policy, reward=2947.082! [2024-12-13 09:15:59,076][62436] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 3776512. Throughput: 0: 815.7. Samples: 3778372. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 09:15:59,076][62436] Avg episode reward: [(0, '3007.378')] [2024-12-13 09:15:59,077][62473] Saving new best policy, reward=3007.378! [2024-12-13 09:16:04,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 3784704. Throughput: 0: 809.3. Samples: 3783724. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:16:04,076][62436] Avg episode reward: [(0, '3121.542')] [2024-12-13 09:16:04,077][62473] Saving new best policy, reward=3121.542! [2024-12-13 09:16:09,077][62436] Fps is (10 sec: 1228.6, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 3788800. Throughput: 0: 820.6. Samples: 3788860. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 09:16:09,078][62436] Avg episode reward: [(0, '3218.139')] [2024-12-13 09:16:09,087][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000007400_3788800.pth... [2024-12-13 09:16:09,093][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000007352_3764224.pth [2024-12-13 09:16:09,095][62473] Saving new best policy, reward=3218.139! [2024-12-13 09:16:14,076][62436] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 3788800. Throughput: 0: 819.8. Samples: 3790764. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 09:16:14,076][62436] Avg episode reward: [(0, '3240.968')] [2024-12-13 09:16:14,077][62473] Saving new best policy, reward=3240.968! [2024-12-13 09:16:19,076][62436] Fps is (10 sec: 819.3, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 3796992. Throughput: 0: 805.6. Samples: 3795768. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:16:19,076][62436] Avg episode reward: [(0, '3382.202')] [2024-12-13 09:16:19,077][62473] Saving new best policy, reward=3382.202! [2024-12-13 09:16:24,076][62436] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 3801088. Throughput: 0: 817.8. Samples: 3801088. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:16:24,078][62436] Avg episode reward: [(0, '3377.583')] [2024-12-13 09:16:24,083][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000007424_3801088.pth... [2024-12-13 09:16:24,090][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000007376_3776512.pth [2024-12-13 09:16:29,076][62436] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 3801088. Throughput: 0: 817.7. Samples: 3802920. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:16:29,076][62436] Avg episode reward: [(0, '3375.144')] [2024-12-13 09:16:34,076][62436] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 3805184. Throughput: 0: 798.5. Samples: 3807564. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:16:34,076][62436] Avg episode reward: [(0, '3408.956')] [2024-12-13 09:16:34,077][62473] Saving new best policy, reward=3408.956! [2024-12-13 09:16:34,564][62492] Updated weights for policy 0, policy_version 7440 (0.0011) [2024-12-13 09:16:39,076][62436] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 3813376. Throughput: 0: 813.5. Samples: 3813120. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 09:16:39,076][62436] Avg episode reward: [(0, '3458.731')] [2024-12-13 09:16:39,080][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000007448_3813376.pth... [2024-12-13 09:16:39,087][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000007400_3788800.pth [2024-12-13 09:16:39,090][62473] Saving new best policy, reward=3458.731! [2024-12-13 09:16:44,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 3813376. Throughput: 0: 816.8. Samples: 3815128. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 09:16:44,079][62436] Avg episode reward: [(0, '3532.513')] [2024-12-13 09:16:44,081][62473] Saving new best policy, reward=3532.513! [2024-12-13 09:16:49,076][62436] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 3817472. Throughput: 0: 796.0. Samples: 3819544. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:16:49,076][62436] Avg episode reward: [(0, '3600.681')] [2024-12-13 09:16:49,077][62473] Saving new best policy, reward=3600.681! [2024-12-13 09:16:54,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 3821568. Throughput: 0: 808.8. Samples: 3825256. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:16:54,076][62436] Avg episode reward: [(0, '3559.885')] [2024-12-13 09:16:54,205][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000007472_3825664.pth... [2024-12-13 09:16:54,218][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000007424_3801088.pth [2024-12-13 09:16:59,078][62436] Fps is (10 sec: 819.0, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 3825664. Throughput: 0: 817.6. Samples: 3827556. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:16:59,079][62436] Avg episode reward: [(0, '3572.404')] [2024-12-13 09:17:04,077][62436] Fps is (10 sec: 819.1, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 3829760. Throughput: 0: 794.9. Samples: 3831540. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:17:04,078][62436] Avg episode reward: [(0, '3565.068')] [2024-12-13 09:17:09,076][62436] Fps is (10 sec: 819.4, 60 sec: 751.0, 300 sec: 805.3). Total num frames: 3833856. Throughput: 0: 802.4. Samples: 3837196. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:17:09,076][62436] Avg episode reward: [(0, '3626.987')] [2024-12-13 09:17:09,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000007488_3833856.pth... [2024-12-13 09:17:09,090][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000007448_3813376.pth [2024-12-13 09:17:09,093][62473] Saving new best policy, reward=3626.987! [2024-12-13 09:17:14,076][62436] Fps is (10 sec: 819.3, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 3837952. Throughput: 0: 819.6. Samples: 3839800. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:17:14,076][62436] Avg episode reward: [(0, '3585.115')] [2024-12-13 09:17:19,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 3842048. Throughput: 0: 802.5. Samples: 3843676. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:17:19,076][62436] Avg episode reward: [(0, '3497.350')] [2024-12-13 09:17:24,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 3846144. Throughput: 0: 806.0. Samples: 3849388. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:17:24,077][62436] Avg episode reward: [(0, '3399.172')] [2024-12-13 09:17:24,087][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000007512_3846144.pth... [2024-12-13 09:17:24,096][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000007472_3825664.pth [2024-12-13 09:17:24,635][62492] Updated weights for policy 0, policy_version 7520 (0.0011) [2024-12-13 09:17:29,075][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 3850240. Throughput: 0: 822.2. Samples: 3852128. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:17:29,076][62436] Avg episode reward: [(0, '3367.609')] [2024-12-13 09:17:34,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 3854336. Throughput: 0: 812.4. Samples: 3856100. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:17:34,076][62436] Avg episode reward: [(0, '3369.753')] [2024-12-13 09:17:39,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 3858432. Throughput: 0: 809.4. Samples: 3861680. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:17:39,076][62436] Avg episode reward: [(0, '3218.044')] [2024-12-13 09:17:39,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000007536_3858432.pth... [2024-12-13 09:17:39,090][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000007488_3833856.pth [2024-12-13 09:17:44,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 3862528. Throughput: 0: 818.5. Samples: 3864384. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:17:44,076][62436] Avg episode reward: [(0, '3204.738')] [2024-12-13 09:17:49,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 3866624. Throughput: 0: 822.9. Samples: 3868568. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 09:17:49,076][62436] Avg episode reward: [(0, '3113.053')] [2024-12-13 09:17:54,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 3870720. Throughput: 0: 785.3. Samples: 3872536. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 09:17:54,076][62436] Avg episode reward: [(0, '3031.818')] [2024-12-13 09:17:54,090][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000007560_3870720.pth... [2024-12-13 09:17:54,105][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000007512_3846144.pth [2024-12-13 09:17:59,076][62436] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 3874816. Throughput: 0: 776.2. Samples: 3874728. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:17:59,077][62436] Avg episode reward: [(0, '3000.083')] [2024-12-13 09:18:04,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 3878912. Throughput: 0: 786.5. Samples: 3879068. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:18:04,076][62436] Avg episode reward: [(0, '3030.408')] [2024-12-13 09:18:09,076][62436] Fps is (10 sec: 819.3, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 3883008. Throughput: 0: 770.4. Samples: 3884056. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-12-13 09:18:09,076][62436] Avg episode reward: [(0, '3050.031')] [2024-12-13 09:18:09,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000007584_3883008.pth... [2024-12-13 09:18:09,089][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000007536_3858432.pth [2024-12-13 09:18:14,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 3887104. Throughput: 0: 773.7. Samples: 3886944. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-12-13 09:18:14,076][62436] Avg episode reward: [(0, '2959.157')] [2024-12-13 09:18:17,438][62492] Updated weights for policy 0, policy_version 7600 (0.0018) [2024-12-13 09:18:19,076][62436] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 3891200. Throughput: 0: 782.5. Samples: 3891312. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:18:19,077][62436] Avg episode reward: [(0, '2991.286')] [2024-12-13 09:18:24,078][62436] Fps is (10 sec: 819.0, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 3895296. Throughput: 0: 765.7. Samples: 3896140. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:18:24,079][62436] Avg episode reward: [(0, '2861.378')] [2024-12-13 09:18:24,086][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000007608_3895296.pth... [2024-12-13 09:18:24,093][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000007560_3870720.pth [2024-12-13 09:18:29,076][62436] Fps is (10 sec: 819.3, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 3899392. Throughput: 0: 769.6. Samples: 3899016. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 09:18:29,076][62436] Avg episode reward: [(0, '2919.302')] [2024-12-13 09:18:34,076][62436] Fps is (10 sec: 819.4, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 3903488. Throughput: 0: 778.0. Samples: 3903576. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 09:18:34,076][62436] Avg episode reward: [(0, '2838.763')] [2024-12-13 09:18:39,077][62436] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 3907584. Throughput: 0: 791.4. Samples: 3908148. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:18:39,077][62436] Avg episode reward: [(0, '2780.530')] [2024-12-13 09:18:39,083][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000007632_3907584.pth... [2024-12-13 09:18:39,090][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000007584_3883008.pth [2024-12-13 09:18:44,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 3911680. Throughput: 0: 808.1. Samples: 3911092. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:18:44,076][62436] Avg episode reward: [(0, '2789.812')] [2024-12-13 09:18:49,080][62436] Fps is (10 sec: 818.9, 60 sec: 819.1, 300 sec: 805.3). Total num frames: 3915776. Throughput: 0: 819.2. Samples: 3915936. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 09:18:49,081][62436] Avg episode reward: [(0, '2778.771')] [2024-12-13 09:18:54,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 3919872. Throughput: 0: 805.2. Samples: 3920292. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 09:18:54,076][62436] Avg episode reward: [(0, '2738.649')] [2024-12-13 09:18:54,084][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000007656_3919872.pth... [2024-12-13 09:18:54,090][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000007608_3895296.pth [2024-12-13 09:18:59,076][62436] Fps is (10 sec: 819.6, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 3923968. Throughput: 0: 804.3. Samples: 3923136. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:18:59,076][62436] Avg episode reward: [(0, '2617.250')] [2024-12-13 09:19:04,077][62436] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 3928064. Throughput: 0: 818.8. Samples: 3928160. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:19:04,078][62436] Avg episode reward: [(0, '2576.769')] [2024-12-13 09:19:08,536][62492] Updated weights for policy 0, policy_version 7680 (0.0014) [2024-12-13 09:19:09,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 3932160. Throughput: 0: 801.8. Samples: 3932220. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:19:09,076][62436] Avg episode reward: [(0, '2526.358')] [2024-12-13 09:19:09,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000007680_3932160.pth... [2024-12-13 09:19:09,094][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000007632_3907584.pth [2024-12-13 09:19:14,076][62436] Fps is (10 sec: 819.3, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 3936256. Throughput: 0: 801.2. Samples: 3935072. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:19:14,076][62436] Avg episode reward: [(0, '2482.493')] [2024-12-13 09:19:19,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 3940352. Throughput: 0: 817.2. Samples: 3940352. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:19:19,078][62436] Avg episode reward: [(0, '2505.836')] [2024-12-13 09:19:24,076][62436] Fps is (10 sec: 409.6, 60 sec: 751.0, 300 sec: 791.4). Total num frames: 3940352. Throughput: 0: 788.4. Samples: 3943624. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:19:24,081][62436] Avg episode reward: [(0, '2481.724')] [2024-12-13 09:19:24,087][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000007696_3940352.pth... [2024-12-13 09:19:24,092][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000007656_3919872.pth [2024-12-13 09:19:29,077][62436] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 3944448. Throughput: 0: 761.4. Samples: 3945356. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:19:29,079][62436] Avg episode reward: [(0, '2492.867')] [2024-12-13 09:19:34,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 3948544. Throughput: 0: 749.6. Samples: 3949664. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:19:34,076][62436] Avg episode reward: [(0, '2546.597')] [2024-12-13 09:19:39,077][62436] Fps is (10 sec: 819.1, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 3952640. Throughput: 0: 731.3. Samples: 3953204. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:19:39,079][62436] Avg episode reward: [(0, '2547.958')] [2024-12-13 09:19:39,087][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000007720_3952640.pth... [2024-12-13 09:19:39,095][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000007680_3932160.pth [2024-12-13 09:19:44,076][62436] Fps is (10 sec: 409.6, 60 sec: 682.7, 300 sec: 777.6). Total num frames: 3952640. Throughput: 0: 703.8. Samples: 3954808. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:19:44,076][62436] Avg episode reward: [(0, '2608.674')] [2024-12-13 09:19:49,076][62436] Fps is (10 sec: 409.7, 60 sec: 682.7, 300 sec: 777.5). Total num frames: 3956736. Throughput: 0: 679.0. Samples: 3958712. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 09:19:49,076][62436] Avg episode reward: [(0, '2611.469')] [2024-12-13 09:19:54,076][62436] Fps is (10 sec: 819.1, 60 sec: 682.7, 300 sec: 777.5). Total num frames: 3960832. Throughput: 0: 686.3. Samples: 3963104. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 09:19:54,082][62436] Avg episode reward: [(0, '2584.715')] [2024-12-13 09:19:54,092][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000007736_3960832.pth... [2024-12-13 09:19:54,102][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000007696_3940352.pth [2024-12-13 09:19:59,076][62436] Fps is (10 sec: 819.2, 60 sec: 682.7, 300 sec: 777.5). Total num frames: 3964928. Throughput: 0: 659.2. Samples: 3964736. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:19:59,076][62436] Avg episode reward: [(0, '2630.218')] [2024-12-13 09:20:04,076][62436] Fps is (10 sec: 409.6, 60 sec: 614.4, 300 sec: 763.7). Total num frames: 3964928. Throughput: 0: 611.8. Samples: 3967884. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:20:04,076][62436] Avg episode reward: [(0, '2732.210')] [2024-12-13 09:20:09,080][62436] Fps is (10 sec: 409.4, 60 sec: 614.4, 300 sec: 763.7). Total num frames: 3969024. Throughput: 0: 632.7. Samples: 3972100. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:20:09,080][62436] Avg episode reward: [(0, '2906.370')] [2024-12-13 09:20:09,092][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000007752_3969024.pth... [2024-12-13 09:20:09,103][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000007720_3952640.pth [2024-12-13 09:20:10,133][62492] Updated weights for policy 0, policy_version 7760 (0.0019) [2024-12-13 09:20:14,081][62436] Fps is (10 sec: 818.7, 60 sec: 614.3, 300 sec: 763.6). Total num frames: 3973120. Throughput: 0: 640.8. Samples: 3974196. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:20:14,082][62436] Avg episode reward: [(0, '2995.739')] [2024-12-13 09:20:19,076][62436] Fps is (10 sec: 819.5, 60 sec: 614.4, 300 sec: 763.7). Total num frames: 3977216. Throughput: 0: 615.7. Samples: 3977372. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:20:19,076][62436] Avg episode reward: [(0, '3029.714')] [2024-12-13 09:20:24,076][62436] Fps is (10 sec: 819.7, 60 sec: 682.7, 300 sec: 763.7). Total num frames: 3981312. Throughput: 0: 624.2. Samples: 3981292. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:20:24,076][62436] Avg episode reward: [(0, '3163.266')] [2024-12-13 09:20:24,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000007776_3981312.pth... [2024-12-13 09:20:24,089][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000007736_3960832.pth [2024-12-13 09:20:29,076][62436] Fps is (10 sec: 409.6, 60 sec: 614.4, 300 sec: 749.8). Total num frames: 3981312. Throughput: 0: 636.3. Samples: 3983440. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:20:29,076][62436] Avg episode reward: [(0, '3240.282')] [2024-12-13 09:20:34,077][62436] Fps is (10 sec: 409.5, 60 sec: 614.4, 300 sec: 749.8). Total num frames: 3985408. Throughput: 0: 636.6. Samples: 3987360. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:20:34,078][62436] Avg episode reward: [(0, '3313.759')] [2024-12-13 09:20:39,076][62436] Fps is (10 sec: 819.2, 60 sec: 614.4, 300 sec: 763.7). Total num frames: 3989504. Throughput: 0: 606.8. Samples: 3990408. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 09:20:39,076][62436] Avg episode reward: [(0, '3292.682')] [2024-12-13 09:20:39,094][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000007792_3989504.pth... [2024-12-13 09:20:39,106][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000007752_3969024.pth [2024-12-13 09:20:44,076][62436] Fps is (10 sec: 819.3, 60 sec: 682.7, 300 sec: 749.8). Total num frames: 3993600. Throughput: 0: 619.4. Samples: 3992608. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 09:20:44,076][62436] Avg episode reward: [(0, '3328.753')] [2024-12-13 09:20:49,076][62436] Fps is (10 sec: 819.2, 60 sec: 682.7, 300 sec: 749.8). Total num frames: 3997696. Throughput: 0: 646.9. Samples: 3996996. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:20:49,076][62436] Avg episode reward: [(0, '3406.013')] [2024-12-13 09:20:54,076][62436] Fps is (10 sec: 409.6, 60 sec: 614.4, 300 sec: 749.8). Total num frames: 3997696. Throughput: 0: 625.0. Samples: 4000224. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:20:54,078][62436] Avg episode reward: [(0, '3461.549')] [2024-12-13 09:20:54,087][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000007808_3997696.pth... [2024-12-13 09:20:54,098][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000007776_3981312.pth [2024-12-13 09:20:59,080][62436] Fps is (10 sec: 409.4, 60 sec: 614.4, 300 sec: 735.9). Total num frames: 4001792. Throughput: 0: 613.3. Samples: 4001792. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 09:20:59,081][62436] Avg episode reward: [(0, '3492.371')] [2024-12-13 09:21:04,076][62436] Fps is (10 sec: 819.2, 60 sec: 682.7, 300 sec: 735.9). Total num frames: 4005888. Throughput: 0: 633.7. Samples: 4005888. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 09:21:04,076][62436] Avg episode reward: [(0, '3475.335')] [2024-12-13 09:21:09,076][62436] Fps is (10 sec: 819.6, 60 sec: 682.7, 300 sec: 749.8). Total num frames: 4009984. Throughput: 0: 636.9. Samples: 4009952. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:21:09,078][62436] Avg episode reward: [(0, '3453.183')] [2024-12-13 09:21:09,083][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000007832_4009984.pth... [2024-12-13 09:21:09,093][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000007792_3989504.pth [2024-12-13 09:21:14,082][62436] Fps is (10 sec: 409.3, 60 sec: 614.4, 300 sec: 722.0). Total num frames: 4009984. Throughput: 0: 621.7. Samples: 4011420. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:21:14,083][62436] Avg episode reward: [(0, '3458.655')] [2024-12-13 09:21:16,405][62492] Updated weights for policy 0, policy_version 7840 (0.0022) [2024-12-13 09:21:19,076][62436] Fps is (10 sec: 409.6, 60 sec: 614.4, 300 sec: 722.0). Total num frames: 4014080. Throughput: 0: 613.4. Samples: 4014960. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:21:19,076][62436] Avg episode reward: [(0, '3443.649')] [2024-12-13 09:21:24,076][62436] Fps is (10 sec: 819.7, 60 sec: 614.4, 300 sec: 735.9). Total num frames: 4018176. Throughput: 0: 642.2. Samples: 4019308. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:21:24,076][62436] Avg episode reward: [(0, '3452.745')] [2024-12-13 09:21:24,083][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000007848_4018176.pth... [2024-12-13 09:21:24,089][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000007808_3997696.pth [2024-12-13 09:21:29,077][62436] Fps is (10 sec: 819.1, 60 sec: 682.6, 300 sec: 735.9). Total num frames: 4022272. Throughput: 0: 638.1. Samples: 4021324. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 09:21:29,078][62436] Avg episode reward: [(0, '3535.473')] [2024-12-13 09:21:34,076][62436] Fps is (10 sec: 409.6, 60 sec: 614.4, 300 sec: 708.1). Total num frames: 4022272. Throughput: 0: 608.0. Samples: 4024356. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 09:21:34,076][62436] Avg episode reward: [(0, '3552.347')] [2024-12-13 09:21:39,079][62436] Fps is (10 sec: 409.5, 60 sec: 614.4, 300 sec: 722.0). Total num frames: 4026368. Throughput: 0: 626.5. Samples: 4028420. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:21:39,079][62436] Avg episode reward: [(0, '3529.041')] [2024-12-13 09:21:39,085][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000007864_4026368.pth... [2024-12-13 09:21:39,092][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000007832_4009984.pth [2024-12-13 09:21:44,078][62436] Fps is (10 sec: 819.0, 60 sec: 614.4, 300 sec: 722.0). Total num frames: 4030464. Throughput: 0: 638.5. Samples: 4030524. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:21:44,079][62436] Avg episode reward: [(0, '3563.690')] [2024-12-13 09:21:49,077][62436] Fps is (10 sec: 819.3, 60 sec: 614.4, 300 sec: 722.0). Total num frames: 4034560. Throughput: 0: 630.5. Samples: 4034260. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 09:21:49,078][62436] Avg episode reward: [(0, '3557.729')] [2024-12-13 09:21:54,076][62436] Fps is (10 sec: 409.7, 60 sec: 614.4, 300 sec: 708.1). Total num frames: 4034560. Throughput: 0: 615.4. Samples: 4037644. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 09:21:54,076][62436] Avg episode reward: [(0, '3524.471')] [2024-12-13 09:21:54,083][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000007880_4034560.pth... [2024-12-13 09:21:54,093][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000007848_4018176.pth [2024-12-13 09:21:59,076][62436] Fps is (10 sec: 409.7, 60 sec: 614.4, 300 sec: 708.1). Total num frames: 4038656. Throughput: 0: 630.8. Samples: 4039800. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:21:59,082][62436] Avg episode reward: [(0, '3441.007')] [2024-12-13 09:22:04,076][62436] Fps is (10 sec: 819.2, 60 sec: 614.4, 300 sec: 708.1). Total num frames: 4042752. Throughput: 0: 648.6. Samples: 4044148. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:22:04,076][62436] Avg episode reward: [(0, '3535.737')] [2024-12-13 09:22:09,076][62436] Fps is (10 sec: 819.2, 60 sec: 614.4, 300 sec: 708.1). Total num frames: 4046848. Throughput: 0: 619.6. Samples: 4047192. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:22:09,077][62436] Avg episode reward: [(0, '3549.853')] [2024-12-13 09:22:09,091][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000007904_4046848.pth... [2024-12-13 09:22:09,098][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000007864_4026368.pth [2024-12-13 09:22:14,076][62436] Fps is (10 sec: 409.6, 60 sec: 614.5, 300 sec: 694.2). Total num frames: 4046848. Throughput: 0: 612.6. Samples: 4048892. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:22:14,077][62436] Avg episode reward: [(0, '3570.581')] [2024-12-13 09:22:19,075][62436] Fps is (10 sec: 409.6, 60 sec: 614.4, 300 sec: 694.2). Total num frames: 4050944. Throughput: 0: 644.4. Samples: 4053356. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:22:19,076][62436] Avg episode reward: [(0, '3554.620')] [2024-12-13 09:22:20,097][62492] Updated weights for policy 0, policy_version 7920 (0.0012) [2024-12-13 09:22:24,076][62436] Fps is (10 sec: 819.2, 60 sec: 614.4, 300 sec: 694.2). Total num frames: 4055040. Throughput: 0: 638.4. Samples: 4057144. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 09:22:24,076][62436] Avg episode reward: [(0, '3586.972')] [2024-12-13 09:22:24,085][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000007920_4055040.pth... [2024-12-13 09:22:24,101][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000007880_4034560.pth [2024-12-13 09:22:29,077][62436] Fps is (10 sec: 819.1, 60 sec: 614.4, 300 sec: 694.2). Total num frames: 4059136. Throughput: 0: 628.0. Samples: 4058784. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:22:29,077][62436] Avg episode reward: [(0, '3680.461')] [2024-12-13 09:22:29,078][62473] Saving new best policy, reward=3680.461! [2024-12-13 09:22:34,076][62436] Fps is (10 sec: 409.6, 60 sec: 614.4, 300 sec: 680.4). Total num frames: 4059136. Throughput: 0: 614.1. Samples: 4061896. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:22:34,077][62436] Avg episode reward: [(0, '3624.300')] [2024-12-13 09:22:39,076][62436] Fps is (10 sec: 409.6, 60 sec: 614.4, 300 sec: 680.4). Total num frames: 4063232. Throughput: 0: 642.0. Samples: 4066532. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 09:22:39,076][62436] Avg episode reward: [(0, '3589.342')] [2024-12-13 09:22:39,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000007936_4063232.pth... [2024-12-13 09:22:39,090][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000007904_4046848.pth [2024-12-13 09:22:44,080][62436] Fps is (10 sec: 818.9, 60 sec: 614.4, 300 sec: 680.3). Total num frames: 4067328. Throughput: 0: 639.6. Samples: 4068584. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 09:22:44,081][62436] Avg episode reward: [(0, '3562.061')] [2024-12-13 09:22:49,076][62436] Fps is (10 sec: 819.2, 60 sec: 614.4, 300 sec: 680.4). Total num frames: 4071424. Throughput: 0: 638.8. Samples: 4072896. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:22:49,076][62436] Avg episode reward: [(0, '3604.863')] [2024-12-13 09:22:54,076][62436] Fps is (10 sec: 819.6, 60 sec: 682.7, 300 sec: 680.4). Total num frames: 4075520. Throughput: 0: 694.0. Samples: 4078420. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:22:54,076][62436] Avg episode reward: [(0, '3524.924')] [2024-12-13 09:22:54,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000007960_4075520.pth... [2024-12-13 09:22:54,090][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000007920_4055040.pth [2024-12-13 09:22:59,076][62436] Fps is (10 sec: 819.2, 60 sec: 682.7, 300 sec: 680.4). Total num frames: 4079616. Throughput: 0: 708.8. Samples: 4080788. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:22:59,078][62436] Avg episode reward: [(0, '3521.367')] [2024-12-13 09:23:04,076][62436] Fps is (10 sec: 819.2, 60 sec: 682.7, 300 sec: 680.4). Total num frames: 4083712. Throughput: 0: 700.5. Samples: 4084880. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:23:04,076][62436] Avg episode reward: [(0, '3329.777')] [2024-12-13 09:23:09,076][62436] Fps is (10 sec: 819.2, 60 sec: 682.7, 300 sec: 680.4). Total num frames: 4087808. Throughput: 0: 744.4. Samples: 4090644. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 09:23:09,076][62436] Avg episode reward: [(0, '3365.660')] [2024-12-13 09:23:09,085][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000007984_4087808.pth... [2024-12-13 09:23:09,093][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000007936_4063232.pth [2024-12-13 09:23:14,076][62436] Fps is (10 sec: 819.1, 60 sec: 750.9, 300 sec: 680.4). Total num frames: 4091904. Throughput: 0: 767.0. Samples: 4093300. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 09:23:14,079][62436] Avg episode reward: [(0, '3360.688')] [2024-12-13 09:23:16,071][62492] Updated weights for policy 0, policy_version 8000 (0.0010) [2024-12-13 09:23:19,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 680.4). Total num frames: 4096000. Throughput: 0: 782.9. Samples: 4097128. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:23:19,076][62436] Avg episode reward: [(0, '3304.266')] [2024-12-13 09:23:24,076][62436] Fps is (10 sec: 819.3, 60 sec: 750.9, 300 sec: 680.4). Total num frames: 4100096. Throughput: 0: 807.2. Samples: 4102856. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:23:24,076][62436] Avg episode reward: [(0, '3166.721')] [2024-12-13 09:23:24,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000008008_4100096.pth... [2024-12-13 09:23:24,090][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000007960_4075520.pth [2024-12-13 09:23:29,076][62436] Fps is (10 sec: 819.1, 60 sec: 750.9, 300 sec: 680.4). Total num frames: 4104192. Throughput: 0: 824.6. Samples: 4105688. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-12-13 09:23:29,077][62436] Avg episode reward: [(0, '3089.023')] [2024-12-13 09:23:34,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 680.4). Total num frames: 4108288. Throughput: 0: 813.3. Samples: 4109496. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-12-13 09:23:34,076][62436] Avg episode reward: [(0, '3075.691')] [2024-12-13 09:23:39,076][62436] Fps is (10 sec: 819.3, 60 sec: 819.2, 300 sec: 680.4). Total num frames: 4112384. Throughput: 0: 816.4. Samples: 4115160. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:23:39,076][62436] Avg episode reward: [(0, '2998.897')] [2024-12-13 09:23:39,085][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000008032_4112384.pth... [2024-12-13 09:23:39,091][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000007984_4087808.pth [2024-12-13 09:23:44,079][62436] Fps is (10 sec: 818.9, 60 sec: 819.2, 300 sec: 680.4). Total num frames: 4116480. Throughput: 0: 823.8. Samples: 4117864. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:23:44,080][62436] Avg episode reward: [(0, '2991.422')] [2024-12-13 09:23:49,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 680.4). Total num frames: 4120576. Throughput: 0: 820.9. Samples: 4121820. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:23:49,076][62436] Avg episode reward: [(0, '2986.283')] [2024-12-13 09:23:54,076][62436] Fps is (10 sec: 819.5, 60 sec: 819.2, 300 sec: 680.4). Total num frames: 4124672. Throughput: 0: 815.0. Samples: 4127320. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:23:54,076][62436] Avg episode reward: [(0, '2975.080')] [2024-12-13 09:23:54,083][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000008056_4124672.pth... [2024-12-13 09:23:54,089][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000008008_4100096.pth [2024-12-13 09:23:59,077][62436] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 680.4). Total num frames: 4128768. Throughput: 0: 818.3. Samples: 4130124. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:23:59,078][62436] Avg episode reward: [(0, '2924.958')] [2024-12-13 09:24:04,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 680.4). Total num frames: 4132864. Throughput: 0: 824.6. Samples: 4134236. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:24:04,076][62436] Avg episode reward: [(0, '2906.162')] [2024-12-13 09:24:05,937][62492] Updated weights for policy 0, policy_version 8080 (0.0010) [2024-12-13 09:24:09,076][62436] Fps is (10 sec: 819.3, 60 sec: 819.2, 300 sec: 680.4). Total num frames: 4136960. Throughput: 0: 815.6. Samples: 4139556. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:24:09,076][62436] Avg episode reward: [(0, '2934.813')] [2024-12-13 09:24:09,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000008080_4136960.pth... [2024-12-13 09:24:09,091][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000008032_4112384.pth [2024-12-13 09:24:14,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 680.4). Total num frames: 4141056. Throughput: 0: 812.3. Samples: 4142240. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:24:14,076][62436] Avg episode reward: [(0, '2858.875')] [2024-12-13 09:24:19,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 694.2). Total num frames: 4145152. Throughput: 0: 824.5. Samples: 4146600. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 09:24:19,076][62436] Avg episode reward: [(0, '2904.607')] [2024-12-13 09:24:24,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 694.2). Total num frames: 4149248. Throughput: 0: 812.6. Samples: 4151728. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 09:24:24,076][62436] Avg episode reward: [(0, '2862.297')] [2024-12-13 09:24:24,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000008104_4149248.pth... [2024-12-13 09:24:24,091][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000008056_4124672.pth [2024-12-13 09:24:29,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 694.2). Total num frames: 4153344. Throughput: 0: 811.5. Samples: 4154380. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 09:24:29,076][62436] Avg episode reward: [(0, '2867.862')] [2024-12-13 09:24:34,075][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 694.2). Total num frames: 4157440. Throughput: 0: 823.3. Samples: 4158868. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 09:24:34,076][62436] Avg episode reward: [(0, '2832.972')] [2024-12-13 09:24:39,077][62436] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 708.1). Total num frames: 4161536. Throughput: 0: 810.0. Samples: 4163772. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:24:39,078][62436] Avg episode reward: [(0, '2791.704')] [2024-12-13 09:24:39,089][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000008128_4161536.pth... [2024-12-13 09:24:39,101][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000008080_4136960.pth [2024-12-13 09:24:44,078][62436] Fps is (10 sec: 819.0, 60 sec: 819.2, 300 sec: 708.1). Total num frames: 4165632. Throughput: 0: 808.3. Samples: 4166500. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:24:44,078][62436] Avg episode reward: [(0, '2880.704')] [2024-12-13 09:24:49,077][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 708.1). Total num frames: 4169728. Throughput: 0: 821.5. Samples: 4171204. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:24:49,085][62436] Avg episode reward: [(0, '2913.718')] [2024-12-13 09:24:54,076][62436] Fps is (10 sec: 819.4, 60 sec: 819.2, 300 sec: 708.1). Total num frames: 4173824. Throughput: 0: 805.1. Samples: 4175784. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:24:54,076][62436] Avg episode reward: [(0, '2873.848')] [2024-12-13 09:24:54,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000008152_4173824.pth... [2024-12-13 09:24:54,090][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000008104_4149248.pth [2024-12-13 09:24:55,899][62492] Updated weights for policy 0, policy_version 8160 (0.0010) [2024-12-13 09:24:59,076][62436] Fps is (10 sec: 819.3, 60 sec: 819.2, 300 sec: 722.0). Total num frames: 4177920. Throughput: 0: 807.6. Samples: 4178580. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 09:24:59,076][62436] Avg episode reward: [(0, '2861.710')] [2024-12-13 09:25:04,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 722.0). Total num frames: 4182016. Throughput: 0: 821.2. Samples: 4183552. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 09:25:04,076][62436] Avg episode reward: [(0, '2799.680')] [2024-12-13 09:25:09,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 722.0). Total num frames: 4186112. Throughput: 0: 800.6. Samples: 4187756. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:25:09,076][62436] Avg episode reward: [(0, '2800.545')] [2024-12-13 09:25:09,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000008176_4186112.pth... [2024-12-13 09:25:09,089][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000008128_4161536.pth [2024-12-13 09:25:14,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 722.0). Total num frames: 4190208. Throughput: 0: 800.1. Samples: 4190384. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:25:14,076][62436] Avg episode reward: [(0, '2977.885')] [2024-12-13 09:25:19,079][62436] Fps is (10 sec: 819.0, 60 sec: 819.2, 300 sec: 722.0). Total num frames: 4194304. Throughput: 0: 821.0. Samples: 4195816. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:25:19,079][62436] Avg episode reward: [(0, '3072.925')] [2024-12-13 09:25:24,078][62436] Fps is (10 sec: 819.0, 60 sec: 819.2, 300 sec: 735.9). Total num frames: 4198400. Throughput: 0: 800.5. Samples: 4199796. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:25:24,078][62436] Avg episode reward: [(0, '3135.751')] [2024-12-13 09:25:24,084][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000008200_4198400.pth... [2024-12-13 09:25:24,098][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000008152_4173824.pth [2024-12-13 09:25:29,076][62436] Fps is (10 sec: 819.4, 60 sec: 819.2, 300 sec: 735.9). Total num frames: 4202496. Throughput: 0: 800.0. Samples: 4202496. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-12-13 09:25:29,076][62436] Avg episode reward: [(0, '3217.188')] [2024-12-13 09:25:34,075][62436] Fps is (10 sec: 819.4, 60 sec: 819.2, 300 sec: 735.9). Total num frames: 4206592. Throughput: 0: 820.8. Samples: 4208140. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-12-13 09:25:34,076][62436] Avg episode reward: [(0, '3263.983')] [2024-12-13 09:25:39,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 735.9). Total num frames: 4210688. Throughput: 0: 804.0. Samples: 4211964. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-12-13 09:25:39,076][62436] Avg episode reward: [(0, '3250.663')] [2024-12-13 09:25:39,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000008224_4210688.pth... [2024-12-13 09:25:39,087][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000008176_4186112.pth [2024-12-13 09:25:44,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 735.9). Total num frames: 4214784. Throughput: 0: 804.0. Samples: 4214760. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-12-13 09:25:44,078][62436] Avg episode reward: [(0, '3337.686')] [2024-12-13 09:25:46,113][62492] Updated weights for policy 0, policy_version 8240 (0.0010) [2024-12-13 09:25:49,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 749.8). Total num frames: 4218880. Throughput: 0: 816.4. Samples: 4220292. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 09:25:49,076][62436] Avg episode reward: [(0, '3359.477')] [2024-12-13 09:25:54,078][62436] Fps is (10 sec: 819.0, 60 sec: 819.2, 300 sec: 749.8). Total num frames: 4222976. Throughput: 0: 810.2. Samples: 4224216. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 09:25:54,078][62436] Avg episode reward: [(0, '3463.570')] [2024-12-13 09:25:54,083][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000008248_4222976.pth... [2024-12-13 09:25:54,088][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000008200_4198400.pth [2024-12-13 09:25:59,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 749.8). Total num frames: 4227072. Throughput: 0: 813.3. Samples: 4226984. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-12-13 09:25:59,076][62436] Avg episode reward: [(0, '3538.931')] [2024-12-13 09:26:04,076][62436] Fps is (10 sec: 819.4, 60 sec: 819.2, 300 sec: 749.8). Total num frames: 4231168. Throughput: 0: 814.5. Samples: 4232464. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-12-13 09:26:04,076][62436] Avg episode reward: [(0, '3592.672')] [2024-12-13 09:26:09,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 763.7). Total num frames: 4235264. Throughput: 0: 818.4. Samples: 4236624. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:26:09,076][62436] Avg episode reward: [(0, '3594.899')] [2024-12-13 09:26:09,085][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000008272_4235264.pth... [2024-12-13 09:26:09,091][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000008224_4210688.pth [2024-12-13 09:26:14,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 763.7). Total num frames: 4239360. Throughput: 0: 811.7. Samples: 4239024. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:26:14,076][62436] Avg episode reward: [(0, '3622.559')] [2024-12-13 09:26:19,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 763.7). Total num frames: 4243456. Throughput: 0: 808.2. Samples: 4244508. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-12-13 09:26:19,076][62436] Avg episode reward: [(0, '3760.740')] [2024-12-13 09:26:19,077][62473] Saving new best policy, reward=3760.740! [2024-12-13 09:26:24,078][62436] Fps is (10 sec: 819.0, 60 sec: 819.2, 300 sec: 763.7). Total num frames: 4247552. Throughput: 0: 819.5. Samples: 4248844. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-12-13 09:26:24,079][62436] Avg episode reward: [(0, '3888.853')] [2024-12-13 09:26:24,090][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000008296_4247552.pth... [2024-12-13 09:26:24,099][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000008248_4222976.pth [2024-12-13 09:26:24,102][62473] Saving new best policy, reward=3888.853! [2024-12-13 09:26:29,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 777.5). Total num frames: 4251648. Throughput: 0: 807.8. Samples: 4251112. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:26:29,076][62436] Avg episode reward: [(0, '3909.692')] [2024-12-13 09:26:29,077][62473] Saving new best policy, reward=3909.692! [2024-12-13 09:26:34,076][62436] Fps is (10 sec: 819.4, 60 sec: 819.2, 300 sec: 777.6). Total num frames: 4255744. Throughput: 0: 808.7. Samples: 4256684. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:26:34,076][62436] Avg episode reward: [(0, '3788.817')] [2024-12-13 09:26:36,352][62492] Updated weights for policy 0, policy_version 8320 (0.0010) [2024-12-13 09:26:39,077][62436] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 777.5). Total num frames: 4259840. Throughput: 0: 823.8. Samples: 4261288. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:26:39,081][62436] Avg episode reward: [(0, '3668.422')] [2024-12-13 09:26:39,089][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000008320_4259840.pth... [2024-12-13 09:26:39,100][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000008272_4235264.pth [2024-12-13 09:26:44,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 777.6). Total num frames: 4263936. Throughput: 0: 804.6. Samples: 4263192. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:26:44,076][62436] Avg episode reward: [(0, '3609.302')] [2024-12-13 09:26:49,076][62436] Fps is (10 sec: 819.3, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 4268032. Throughput: 0: 805.9. Samples: 4268728. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:26:49,076][62436] Avg episode reward: [(0, '3508.101')] [2024-12-13 09:26:54,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 4272128. Throughput: 0: 823.8. Samples: 4273696. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:26:54,080][62436] Avg episode reward: [(0, '3498.369')] [2024-12-13 09:26:54,089][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000008344_4272128.pth... [2024-12-13 09:26:54,096][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000008296_4247552.pth [2024-12-13 09:26:59,075][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 4276224. Throughput: 0: 812.9. Samples: 4275604. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:26:59,076][62436] Avg episode reward: [(0, '3449.801')] [2024-12-13 09:27:04,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 4280320. Throughput: 0: 805.0. Samples: 4280732. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:27:04,076][62436] Avg episode reward: [(0, '3440.808')] [2024-12-13 09:27:09,081][62436] Fps is (10 sec: 818.7, 60 sec: 819.1, 300 sec: 805.3). Total num frames: 4284416. Throughput: 0: 817.5. Samples: 4285636. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:27:09,082][62436] Avg episode reward: [(0, '3403.941')] [2024-12-13 09:27:09,093][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000008368_4284416.pth... [2024-12-13 09:27:09,117][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000008320_4259840.pth [2024-12-13 09:27:14,080][62436] Fps is (10 sec: 409.4, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 4284416. Throughput: 0: 801.1. Samples: 4287164. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:27:14,081][62436] Avg episode reward: [(0, '3275.203')] [2024-12-13 09:27:19,076][62436] Fps is (10 sec: 409.8, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 4288512. Throughput: 0: 757.0. Samples: 4290748. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 09:27:19,076][62436] Avg episode reward: [(0, '3244.388')] [2024-12-13 09:27:24,076][62436] Fps is (10 sec: 1229.4, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 4296704. Throughput: 0: 781.4. Samples: 4296448. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 09:27:24,076][62436] Avg episode reward: [(0, '3319.113')] [2024-12-13 09:27:24,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000008392_4296704.pth... [2024-12-13 09:27:24,089][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000008344_4272128.pth [2024-12-13 09:27:29,078][62436] Fps is (10 sec: 819.0, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 4296704. Throughput: 0: 795.8. Samples: 4299004. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 09:27:29,079][62436] Avg episode reward: [(0, '3261.204')] [2024-12-13 09:27:29,656][62492] Updated weights for policy 0, policy_version 8400 (0.0010) [2024-12-13 09:27:34,076][62436] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 4300800. Throughput: 0: 757.6. Samples: 4302820. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 09:27:34,076][62436] Avg episode reward: [(0, '3198.273')] [2024-12-13 09:27:39,076][62436] Fps is (10 sec: 819.4, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 4304896. Throughput: 0: 773.7. Samples: 4308512. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:27:39,076][62436] Avg episode reward: [(0, '3003.898')] [2024-12-13 09:27:39,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000008408_4304896.pth... [2024-12-13 09:27:39,089][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000008368_4284416.pth [2024-12-13 09:27:44,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 4308992. Throughput: 0: 791.2. Samples: 4311208. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:27:44,076][62436] Avg episode reward: [(0, '2928.108')] [2024-12-13 09:27:49,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 4313088. Throughput: 0: 760.9. Samples: 4314972. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:27:49,076][62436] Avg episode reward: [(0, '3013.114')] [2024-12-13 09:27:54,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 4317184. Throughput: 0: 775.8. Samples: 4320544. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:27:54,077][62436] Avg episode reward: [(0, '3157.736')] [2024-12-13 09:27:54,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000008432_4317184.pth... [2024-12-13 09:27:54,090][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000008392_4296704.pth [2024-12-13 09:27:59,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 4321280. Throughput: 0: 801.8. Samples: 4323240. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 09:27:59,076][62436] Avg episode reward: [(0, '3204.317')] [2024-12-13 09:28:04,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 4325376. Throughput: 0: 812.4. Samples: 4327304. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 09:28:04,076][62436] Avg episode reward: [(0, '3187.721')] [2024-12-13 09:28:09,076][62436] Fps is (10 sec: 819.2, 60 sec: 751.0, 300 sec: 805.3). Total num frames: 4329472. Throughput: 0: 803.4. Samples: 4332600. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 09:28:09,076][62436] Avg episode reward: [(0, '3060.138')] [2024-12-13 09:28:09,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000008456_4329472.pth... [2024-12-13 09:28:09,093][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000008408_4304896.pth [2024-12-13 09:28:14,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.3, 300 sec: 805.3). Total num frames: 4333568. Throughput: 0: 805.9. Samples: 4335268. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 09:28:14,076][62436] Avg episode reward: [(0, '3155.830')] [2024-12-13 09:28:19,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 4337664. Throughput: 0: 817.7. Samples: 4339616. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 09:28:19,076][62436] Avg episode reward: [(0, '3181.412')] [2024-12-13 09:28:20,587][62492] Updated weights for policy 0, policy_version 8480 (0.0017) [2024-12-13 09:28:24,075][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 4341760. Throughput: 0: 803.7. Samples: 4344680. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 09:28:24,076][62436] Avg episode reward: [(0, '3153.004')] [2024-12-13 09:28:24,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000008480_4341760.pth... [2024-12-13 09:28:24,093][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000008432_4317184.pth [2024-12-13 09:28:29,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 4345856. Throughput: 0: 804.8. Samples: 4347424. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:28:29,076][62436] Avg episode reward: [(0, '3016.340')] [2024-12-13 09:28:34,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 4349952. Throughput: 0: 823.8. Samples: 4352044. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:28:34,076][62436] Avg episode reward: [(0, '3071.432')] [2024-12-13 09:28:39,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 4354048. Throughput: 0: 806.4. Samples: 4356832. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:28:39,076][62436] Avg episode reward: [(0, '3084.928')] [2024-12-13 09:28:39,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000008504_4354048.pth... [2024-12-13 09:28:39,088][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000008456_4329472.pth [2024-12-13 09:28:44,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 4358144. Throughput: 0: 804.4. Samples: 4359440. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:28:44,077][62436] Avg episode reward: [(0, '3115.190')] [2024-12-13 09:28:49,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 4362240. Throughput: 0: 825.7. Samples: 4364460. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 09:28:49,076][62436] Avg episode reward: [(0, '3149.426')] [2024-12-13 09:28:54,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 4366336. Throughput: 0: 807.6. Samples: 4368944. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 09:28:54,076][62436] Avg episode reward: [(0, '3179.984')] [2024-12-13 09:28:54,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000008528_4366336.pth... [2024-12-13 09:28:54,088][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000008480_4341760.pth [2024-12-13 09:28:59,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 4370432. Throughput: 0: 809.7. Samples: 4371704. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:28:59,076][62436] Avg episode reward: [(0, '3288.804')] [2024-12-13 09:29:04,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 4374528. Throughput: 0: 828.8. Samples: 4376912. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:29:04,077][62436] Avg episode reward: [(0, '3247.410')] [2024-12-13 09:29:09,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 4378624. Throughput: 0: 808.0. Samples: 4381040. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 09:29:09,076][62436] Avg episode reward: [(0, '3223.013')] [2024-12-13 09:29:09,083][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000008552_4378624.pth... [2024-12-13 09:29:09,091][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000008504_4354048.pth [2024-12-13 09:29:10,463][62492] Updated weights for policy 0, policy_version 8560 (0.0010) [2024-12-13 09:29:14,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 4382720. Throughput: 0: 807.6. Samples: 4383764. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 09:29:14,076][62436] Avg episode reward: [(0, '3145.555')] [2024-12-13 09:29:19,075][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 4386816. Throughput: 0: 828.0. Samples: 4389304. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-12-13 09:29:19,076][62436] Avg episode reward: [(0, '3077.482')] [2024-12-13 09:29:24,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 4390912. Throughput: 0: 810.0. Samples: 4393284. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-12-13 09:29:24,076][62436] Avg episode reward: [(0, '3156.633')] [2024-12-13 09:29:24,083][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000008576_4390912.pth... [2024-12-13 09:29:24,092][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000008528_4366336.pth [2024-12-13 09:29:29,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 4395008. Throughput: 0: 812.5. Samples: 4396004. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:29:29,076][62436] Avg episode reward: [(0, '3252.737')] [2024-12-13 09:29:34,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 4399104. Throughput: 0: 827.9. Samples: 4401716. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:29:34,076][62436] Avg episode reward: [(0, '3353.985')] [2024-12-13 09:29:39,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 4403200. Throughput: 0: 810.5. Samples: 4405416. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:29:39,076][62436] Avg episode reward: [(0, '3413.182')] [2024-12-13 09:29:39,083][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000008600_4403200.pth... [2024-12-13 09:29:39,088][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000008552_4378624.pth [2024-12-13 09:29:44,082][62436] Fps is (10 sec: 818.7, 60 sec: 819.1, 300 sec: 805.3). Total num frames: 4407296. Throughput: 0: 806.9. Samples: 4408020. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:29:44,082][62436] Avg episode reward: [(0, '3460.143')] [2024-12-13 09:29:49,082][62436] Fps is (10 sec: 818.7, 60 sec: 819.1, 300 sec: 805.3). Total num frames: 4411392. Throughput: 0: 814.6. Samples: 4413572. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 09:29:49,083][62436] Avg episode reward: [(0, '3576.784')] [2024-12-13 09:29:54,076][62436] Fps is (10 sec: 819.7, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 4415488. Throughput: 0: 808.4. Samples: 4417420. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 09:29:54,076][62436] Avg episode reward: [(0, '3547.771')] [2024-12-13 09:29:54,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000008624_4415488.pth... [2024-12-13 09:29:54,091][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000008576_4390912.pth [2024-12-13 09:29:59,076][62436] Fps is (10 sec: 819.8, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 4419584. Throughput: 0: 804.2. Samples: 4419952. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 09:29:59,076][62436] Avg episode reward: [(0, '3591.464')] [2024-12-13 09:30:00,726][62492] Updated weights for policy 0, policy_version 8640 (0.0012) [2024-12-13 09:30:04,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 4423680. Throughput: 0: 802.6. Samples: 4425420. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 09:30:04,076][62436] Avg episode reward: [(0, '3512.201')] [2024-12-13 09:30:09,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 4427776. Throughput: 0: 801.7. Samples: 4429360. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:30:09,077][62436] Avg episode reward: [(0, '3486.006')] [2024-12-13 09:30:09,084][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000008648_4427776.pth... [2024-12-13 09:30:09,092][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000008600_4403200.pth [2024-12-13 09:30:14,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 4431872. Throughput: 0: 795.0. Samples: 4431780. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:30:14,076][62436] Avg episode reward: [(0, '3527.557')] [2024-12-13 09:30:19,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 4435968. Throughput: 0: 789.6. Samples: 4437248. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:30:19,076][62436] Avg episode reward: [(0, '3542.419')] [2024-12-13 09:30:24,077][62436] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 4440064. Throughput: 0: 802.3. Samples: 4441520. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:30:24,078][62436] Avg episode reward: [(0, '3561.845')] [2024-12-13 09:30:24,091][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000008672_4440064.pth... [2024-12-13 09:30:24,102][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000008624_4415488.pth [2024-12-13 09:30:29,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 4444160. Throughput: 0: 795.7. Samples: 4443820. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 09:30:29,076][62436] Avg episode reward: [(0, '3642.552')] [2024-12-13 09:30:34,077][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 4448256. Throughput: 0: 792.4. Samples: 4449224. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 09:30:34,078][62436] Avg episode reward: [(0, '3632.471')] [2024-12-13 09:30:39,077][62436] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 4452352. Throughput: 0: 806.5. Samples: 4453712. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:30:39,078][62436] Avg episode reward: [(0, '3786.559')] [2024-12-13 09:30:39,083][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000008696_4452352.pth... [2024-12-13 09:30:39,092][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000008648_4427776.pth [2024-12-13 09:30:44,076][62436] Fps is (10 sec: 819.3, 60 sec: 819.3, 300 sec: 805.3). Total num frames: 4456448. Throughput: 0: 795.9. Samples: 4455768. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:30:44,076][62436] Avg episode reward: [(0, '4003.984')] [2024-12-13 09:30:44,077][62473] Saving new best policy, reward=4003.984! [2024-12-13 09:30:49,076][62436] Fps is (10 sec: 819.3, 60 sec: 819.3, 300 sec: 805.3). Total num frames: 4460544. Throughput: 0: 794.4. Samples: 4461168. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:30:49,076][62436] Avg episode reward: [(0, '4029.811')] [2024-12-13 09:30:49,077][62473] Saving new best policy, reward=4029.811! [2024-12-13 09:30:51,543][62492] Updated weights for policy 0, policy_version 8720 (0.0012) [2024-12-13 09:30:54,076][62436] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 4464640. Throughput: 0: 814.0. Samples: 4465992. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:30:54,078][62436] Avg episode reward: [(0, '4074.031')] [2024-12-13 09:30:54,086][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000008720_4464640.pth... [2024-12-13 09:30:54,097][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000008672_4440064.pth [2024-12-13 09:30:54,101][62473] Saving new best policy, reward=4074.031! [2024-12-13 09:30:59,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 4468736. Throughput: 0: 802.0. Samples: 4467868. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 09:30:59,076][62436] Avg episode reward: [(0, '4160.888')] [2024-12-13 09:30:59,077][62473] Saving new best policy, reward=4160.888! [2024-12-13 09:31:04,076][62436] Fps is (10 sec: 819.3, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 4472832. Throughput: 0: 796.3. Samples: 4473080. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 09:31:04,076][62436] Avg episode reward: [(0, '4129.877')] [2024-12-13 09:31:09,081][62436] Fps is (10 sec: 818.7, 60 sec: 819.1, 300 sec: 805.3). Total num frames: 4476928. Throughput: 0: 812.4. Samples: 4478080. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:31:09,082][62436] Avg episode reward: [(0, '4131.324')] [2024-12-13 09:31:09,088][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000008744_4476928.pth... [2024-12-13 09:31:09,108][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000008696_4452352.pth [2024-12-13 09:31:14,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 4481024. Throughput: 0: 803.2. Samples: 4479964. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:31:14,076][62436] Avg episode reward: [(0, '4058.311')] [2024-12-13 09:31:19,076][62436] Fps is (10 sec: 819.7, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 4485120. Throughput: 0: 795.8. Samples: 4485032. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 09:31:19,076][62436] Avg episode reward: [(0, '4095.135')] [2024-12-13 09:31:24,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 4489216. Throughput: 0: 816.0. Samples: 4490432. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 09:31:24,076][62436] Avg episode reward: [(0, '4035.017')] [2024-12-13 09:31:24,083][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000008768_4489216.pth... [2024-12-13 09:31:24,090][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000008720_4464640.pth [2024-12-13 09:31:29,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 4493312. Throughput: 0: 811.7. Samples: 4492296. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:31:29,077][62436] Avg episode reward: [(0, '4034.399')] [2024-12-13 09:31:34,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 4497408. Throughput: 0: 799.4. Samples: 4497140. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:31:34,076][62436] Avg episode reward: [(0, '4022.056')] [2024-12-13 09:31:39,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 4501504. Throughput: 0: 812.9. Samples: 4502572. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 09:31:39,076][62436] Avg episode reward: [(0, '4123.721')] [2024-12-13 09:31:39,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000008792_4501504.pth... [2024-12-13 09:31:39,092][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000008744_4476928.pth [2024-12-13 09:31:43,458][62492] Updated weights for policy 0, policy_version 8800 (0.0018) [2024-12-13 09:31:44,079][62436] Fps is (10 sec: 818.9, 60 sec: 819.1, 300 sec: 805.3). Total num frames: 4505600. Throughput: 0: 816.7. Samples: 4504624. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 09:31:44,080][62436] Avg episode reward: [(0, '4094.249')] [2024-12-13 09:31:49,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 4509696. Throughput: 0: 798.0. Samples: 4508992. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-12-13 09:31:49,076][62436] Avg episode reward: [(0, '4135.293')] [2024-12-13 09:31:54,076][62436] Fps is (10 sec: 409.7, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 4509696. Throughput: 0: 768.0. Samples: 4512636. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-12-13 09:31:54,076][62436] Avg episode reward: [(0, '4130.616')] [2024-12-13 09:31:54,087][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000008808_4509696.pth... [2024-12-13 09:31:54,096][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000008768_4489216.pth [2024-12-13 09:31:59,079][62436] Fps is (10 sec: 409.5, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 4513792. Throughput: 0: 774.9. Samples: 4514836. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-12-13 09:31:59,080][62436] Avg episode reward: [(0, '4141.822')] [2024-12-13 09:32:04,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 4517888. Throughput: 0: 752.7. Samples: 4518904. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-12-13 09:32:04,076][62436] Avg episode reward: [(0, '4206.696')] [2024-12-13 09:32:04,077][62473] Saving new best policy, reward=4206.696! [2024-12-13 09:32:09,076][62436] Fps is (10 sec: 819.5, 60 sec: 751.0, 300 sec: 805.3). Total num frames: 4521984. Throughput: 0: 757.9. Samples: 4524536. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-12-13 09:32:09,076][62436] Avg episode reward: [(0, '4180.504')] [2024-12-13 09:32:09,084][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000008832_4521984.pth... [2024-12-13 09:32:09,090][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000008792_4501504.pth [2024-12-13 09:32:14,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 4526080. Throughput: 0: 772.5. Samples: 4527060. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:32:14,076][62436] Avg episode reward: [(0, '4091.431')] [2024-12-13 09:32:19,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 4530176. Throughput: 0: 748.9. Samples: 4530840. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:32:19,076][62436] Avg episode reward: [(0, '4049.385')] [2024-12-13 09:32:24,078][62436] Fps is (10 sec: 819.0, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 4534272. Throughput: 0: 752.2. Samples: 4536424. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2024-12-13 09:32:24,079][62436] Avg episode reward: [(0, '4058.932')] [2024-12-13 09:32:24,085][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000008856_4534272.pth... [2024-12-13 09:32:24,091][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000008808_4509696.pth [2024-12-13 09:32:29,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 4538368. Throughput: 0: 767.2. Samples: 4539144. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2024-12-13 09:32:29,076][62436] Avg episode reward: [(0, '4018.055')] [2024-12-13 09:32:34,076][62436] Fps is (10 sec: 819.4, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 4542464. Throughput: 0: 752.4. Samples: 4542852. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:32:34,076][62436] Avg episode reward: [(0, '4022.542')] [2024-12-13 09:32:36,599][62492] Updated weights for policy 0, policy_version 8880 (0.0010) [2024-12-13 09:32:39,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 4546560. Throughput: 0: 797.2. Samples: 4548508. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:32:39,076][62436] Avg episode reward: [(0, '3988.490')] [2024-12-13 09:32:39,083][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000008880_4546560.pth... [2024-12-13 09:32:39,090][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000008832_4521984.pth [2024-12-13 09:32:44,076][62436] Fps is (10 sec: 819.2, 60 sec: 751.0, 300 sec: 805.3). Total num frames: 4550656. Throughput: 0: 808.0. Samples: 4551192. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:32:44,076][62436] Avg episode reward: [(0, '4000.476')] [2024-12-13 09:32:49,077][62436] Fps is (10 sec: 819.1, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 4554752. Throughput: 0: 806.0. Samples: 4555176. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:32:49,078][62436] Avg episode reward: [(0, '4005.423')] [2024-12-13 09:32:54,076][62436] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 4558848. Throughput: 0: 800.1. Samples: 4560540. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:32:54,077][62436] Avg episode reward: [(0, '4020.721')] [2024-12-13 09:32:54,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000008904_4558848.pth... [2024-12-13 09:32:54,088][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000008856_4534272.pth [2024-12-13 09:32:59,078][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 4562944. Throughput: 0: 804.9. Samples: 4563280. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:32:59,082][62436] Avg episode reward: [(0, '3974.113')] [2024-12-13 09:33:04,076][62436] Fps is (10 sec: 819.3, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 4567040. Throughput: 0: 814.8. Samples: 4567508. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:33:04,076][62436] Avg episode reward: [(0, '3972.322')] [2024-12-13 09:33:09,076][62436] Fps is (10 sec: 819.4, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 4571136. Throughput: 0: 799.2. Samples: 4572388. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:33:09,076][62436] Avg episode reward: [(0, '4012.844')] [2024-12-13 09:33:09,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000008928_4571136.pth... [2024-12-13 09:33:09,088][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000008880_4546560.pth [2024-12-13 09:33:14,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 4575232. Throughput: 0: 802.0. Samples: 4575232. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 09:33:14,076][62436] Avg episode reward: [(0, '3971.600')] [2024-12-13 09:33:19,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 4579328. Throughput: 0: 821.4. Samples: 4579816. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 09:33:19,077][62436] Avg episode reward: [(0, '3931.787')] [2024-12-13 09:33:24,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 4583424. Throughput: 0: 801.7. Samples: 4584584. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 09:33:24,076][62436] Avg episode reward: [(0, '3882.826')] [2024-12-13 09:33:24,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000008952_4583424.pth... [2024-12-13 09:33:24,090][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000008904_4558848.pth [2024-12-13 09:33:26,754][62492] Updated weights for policy 0, policy_version 8960 (0.0012) [2024-12-13 09:33:29,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 4587520. Throughput: 0: 806.3. Samples: 4587476. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 09:33:29,076][62436] Avg episode reward: [(0, '3919.078')] [2024-12-13 09:33:34,076][62436] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 4591616. Throughput: 0: 826.9. Samples: 4592388. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:33:34,078][62436] Avg episode reward: [(0, '3909.585')] [2024-12-13 09:33:39,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 4595712. Throughput: 0: 804.9. Samples: 4596760. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:33:39,076][62436] Avg episode reward: [(0, '3895.378')] [2024-12-13 09:33:39,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000008976_4595712.pth... [2024-12-13 09:33:39,089][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000008928_4571136.pth [2024-12-13 09:33:44,076][62436] Fps is (10 sec: 819.3, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 4599808. Throughput: 0: 807.1. Samples: 4599596. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:33:44,076][62436] Avg episode reward: [(0, '3897.818')] [2024-12-13 09:33:49,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 4603904. Throughput: 0: 826.0. Samples: 4604676. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:33:49,076][62436] Avg episode reward: [(0, '3969.736')] [2024-12-13 09:33:54,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 4608000. Throughput: 0: 808.3. Samples: 4608760. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 09:33:54,076][62436] Avg episode reward: [(0, '3942.222')] [2024-12-13 09:33:54,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000009000_4608000.pth... [2024-12-13 09:33:54,088][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000008952_4583424.pth [2024-12-13 09:33:59,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 4612096. Throughput: 0: 809.2. Samples: 4611648. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 09:33:59,076][62436] Avg episode reward: [(0, '3860.097')] [2024-12-13 09:34:04,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 4616192. Throughput: 0: 825.5. Samples: 4616964. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:34:04,076][62436] Avg episode reward: [(0, '3871.069')] [2024-12-13 09:34:09,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 4620288. Throughput: 0: 803.4. Samples: 4620736. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:34:09,076][62436] Avg episode reward: [(0, '3954.137')] [2024-12-13 09:34:09,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000009024_4620288.pth... [2024-12-13 09:34:09,093][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000008976_4595712.pth [2024-12-13 09:34:14,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 4624384. Throughput: 0: 802.9. Samples: 4623608. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:34:14,076][62436] Avg episode reward: [(0, '3958.654')] [2024-12-13 09:34:16,988][62492] Updated weights for policy 0, policy_version 9040 (0.0011) [2024-12-13 09:34:19,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 4628480. Throughput: 0: 814.7. Samples: 4629048. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:34:19,076][62436] Avg episode reward: [(0, '3979.493')] [2024-12-13 09:34:24,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 4632576. Throughput: 0: 801.6. Samples: 4632832. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:34:24,076][62436] Avg episode reward: [(0, '3936.448')] [2024-12-13 09:34:24,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000009048_4632576.pth... [2024-12-13 09:34:24,089][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000009000_4608000.pth [2024-12-13 09:34:29,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 4636672. Throughput: 0: 801.6. Samples: 4635668. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:34:29,078][62436] Avg episode reward: [(0, '3954.152')] [2024-12-13 09:34:34,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 4640768. Throughput: 0: 812.4. Samples: 4641232. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:34:34,076][62436] Avg episode reward: [(0, '3910.430')] [2024-12-13 09:34:39,077][62436] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 4644864. Throughput: 0: 812.8. Samples: 4645336. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:34:39,078][62436] Avg episode reward: [(0, '3927.772')] [2024-12-13 09:34:39,091][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000009072_4644864.pth... [2024-12-13 09:34:39,099][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000009024_4620288.pth [2024-12-13 09:34:44,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 4648960. Throughput: 0: 804.9. Samples: 4647868. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 09:34:44,076][62436] Avg episode reward: [(0, '4041.277')] [2024-12-13 09:34:49,076][62436] Fps is (10 sec: 819.3, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 4653056. Throughput: 0: 806.5. Samples: 4653256. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 09:34:49,076][62436] Avg episode reward: [(0, '3909.798')] [2024-12-13 09:34:54,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 4657152. Throughput: 0: 819.9. Samples: 4657632. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:34:54,076][62436] Avg episode reward: [(0, '3865.863')] [2024-12-13 09:34:54,094][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000009096_4657152.pth... [2024-12-13 09:34:54,110][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000009048_4632576.pth [2024-12-13 09:34:59,077][62436] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 4661248. Throughput: 0: 806.5. Samples: 4659900. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:34:59,078][62436] Avg episode reward: [(0, '3851.752')] [2024-12-13 09:35:04,077][62436] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 4665344. Throughput: 0: 807.8. Samples: 4665400. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:35:04,080][62436] Avg episode reward: [(0, '3850.102')] [2024-12-13 09:35:07,624][62492] Updated weights for policy 0, policy_version 9120 (0.0011) [2024-12-13 09:35:09,076][62436] Fps is (10 sec: 819.3, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 4669440. Throughput: 0: 826.9. Samples: 4670044. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:35:09,076][62436] Avg episode reward: [(0, '3799.944')] [2024-12-13 09:35:09,083][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000009120_4669440.pth... [2024-12-13 09:35:09,089][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000009072_4644864.pth [2024-12-13 09:35:14,076][62436] Fps is (10 sec: 819.3, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 4673536. Throughput: 0: 807.1. Samples: 4671988. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:35:14,076][62436] Avg episode reward: [(0, '3755.682')] [2024-12-13 09:35:19,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 4677632. Throughput: 0: 806.0. Samples: 4677504. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:35:19,076][62436] Avg episode reward: [(0, '3782.301')] [2024-12-13 09:35:24,078][62436] Fps is (10 sec: 819.0, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 4681728. Throughput: 0: 823.7. Samples: 4682404. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:35:24,079][62436] Avg episode reward: [(0, '3897.181')] [2024-12-13 09:35:24,091][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000009144_4681728.pth... [2024-12-13 09:35:24,114][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000009096_4657152.pth [2024-12-13 09:35:29,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 4685824. Throughput: 0: 809.7. Samples: 4684304. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:35:29,076][62436] Avg episode reward: [(0, '3916.260')] [2024-12-13 09:35:34,076][62436] Fps is (10 sec: 819.4, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 4689920. Throughput: 0: 809.0. Samples: 4689660. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:35:34,076][62436] Avg episode reward: [(0, '3981.913')] [2024-12-13 09:35:39,076][62436] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 4694016. Throughput: 0: 825.5. Samples: 4694780. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:35:39,077][62436] Avg episode reward: [(0, '3992.376')] [2024-12-13 09:35:39,085][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000009168_4694016.pth... [2024-12-13 09:35:39,092][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000009120_4669440.pth [2024-12-13 09:35:44,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 4698112. Throughput: 0: 816.1. Samples: 4696624. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 09:35:44,076][62436] Avg episode reward: [(0, '3997.311')] [2024-12-13 09:35:49,076][62436] Fps is (10 sec: 819.3, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 4702208. Throughput: 0: 806.4. Samples: 4701688. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 09:35:49,076][62436] Avg episode reward: [(0, '4043.836')] [2024-12-13 09:35:54,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 4706304. Throughput: 0: 822.4. Samples: 4707052. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 09:35:54,076][62436] Avg episode reward: [(0, '3967.372')] [2024-12-13 09:35:54,085][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000009192_4706304.pth... [2024-12-13 09:35:54,091][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000009144_4681728.pth [2024-12-13 09:35:59,067][62492] Updated weights for policy 0, policy_version 9200 (0.0010) [2024-12-13 09:35:59,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 4710400. Throughput: 0: 822.3. Samples: 4708992. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 09:35:59,076][62436] Avg episode reward: [(0, '3902.842')] [2024-12-13 09:36:04,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 4714496. Throughput: 0: 804.5. Samples: 4713708. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 09:36:04,076][62436] Avg episode reward: [(0, '3931.699')] [2024-12-13 09:36:09,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 4718592. Throughput: 0: 817.4. Samples: 4719184. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 09:36:09,076][62436] Avg episode reward: [(0, '3892.224')] [2024-12-13 09:36:09,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000009216_4718592.pth... [2024-12-13 09:36:09,091][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000009168_4694016.pth [2024-12-13 09:36:14,077][62436] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 4718592. Throughput: 0: 819.9. Samples: 4721200. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 09:36:14,077][62436] Avg episode reward: [(0, '4045.036')] [2024-12-13 09:36:19,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 4726784. Throughput: 0: 800.6. Samples: 4725688. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:36:19,076][62436] Avg episode reward: [(0, '4032.342')] [2024-12-13 09:36:24,076][62436] Fps is (10 sec: 1228.9, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 4730880. Throughput: 0: 806.5. Samples: 4731072. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 09:36:24,076][62436] Avg episode reward: [(0, '4022.230')] [2024-12-13 09:36:24,087][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000009240_4730880.pth... [2024-12-13 09:36:24,095][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000009192_4706304.pth [2024-12-13 09:36:29,076][62436] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 4730880. Throughput: 0: 818.0. Samples: 4733432. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 09:36:29,076][62436] Avg episode reward: [(0, '3978.544')] [2024-12-13 09:36:34,075][62436] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 4734976. Throughput: 0: 778.7. Samples: 4736728. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 09:36:34,076][62436] Avg episode reward: [(0, '4007.371')] [2024-12-13 09:36:39,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 4739072. Throughput: 0: 758.3. Samples: 4741176. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 09:36:39,076][62436] Avg episode reward: [(0, '4127.090')] [2024-12-13 09:36:39,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000009256_4739072.pth... [2024-12-13 09:36:39,088][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000009216_4718592.pth [2024-12-13 09:36:44,077][62436] Fps is (10 sec: 819.1, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 4743168. Throughput: 0: 775.4. Samples: 4743888. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 09:36:44,079][62436] Avg episode reward: [(0, '4102.172')] [2024-12-13 09:36:49,075][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 4747264. Throughput: 0: 752.1. Samples: 4747552. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 09:36:49,076][62436] Avg episode reward: [(0, '4046.327')] [2024-12-13 09:36:51,709][62492] Updated weights for policy 0, policy_version 9280 (0.0010) [2024-12-13 09:36:54,076][62436] Fps is (10 sec: 819.3, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 4751360. Throughput: 0: 755.0. Samples: 4753160. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 09:36:54,078][62436] Avg episode reward: [(0, '4003.419')] [2024-12-13 09:36:54,084][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000009280_4751360.pth... [2024-12-13 09:36:54,091][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000009240_4730880.pth [2024-12-13 09:36:59,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 4755456. Throughput: 0: 769.9. Samples: 4755844. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 09:36:59,076][62436] Avg episode reward: [(0, '4052.490')] [2024-12-13 09:37:04,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 4759552. Throughput: 0: 759.2. Samples: 4759852. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 09:37:04,076][62436] Avg episode reward: [(0, '4002.597')] [2024-12-13 09:37:09,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 4763648. Throughput: 0: 757.1. Samples: 4765140. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-12-13 09:37:09,076][62436] Avg episode reward: [(0, '3918.101')] [2024-12-13 09:37:09,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000009304_4763648.pth... [2024-12-13 09:37:09,091][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000009256_4739072.pth [2024-12-13 09:37:14,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 4767744. Throughput: 0: 763.7. Samples: 4767800. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-12-13 09:37:14,076][62436] Avg episode reward: [(0, '3940.938')] [2024-12-13 09:37:19,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 4771840. Throughput: 0: 782.3. Samples: 4771932. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-12-13 09:37:19,076][62436] Avg episode reward: [(0, '3997.142')] [2024-12-13 09:37:24,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 4775936. Throughput: 0: 798.5. Samples: 4777108. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-12-13 09:37:24,076][62436] Avg episode reward: [(0, '4107.620')] [2024-12-13 09:37:24,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000009328_4775936.pth... [2024-12-13 09:37:24,090][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000009280_4751360.pth [2024-12-13 09:37:29,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 4780032. Throughput: 0: 802.2. Samples: 4779984. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:37:29,076][62436] Avg episode reward: [(0, '4288.279')] [2024-12-13 09:37:29,077][62473] Saving new best policy, reward=4288.279! [2024-12-13 09:37:34,084][62436] Fps is (10 sec: 818.5, 60 sec: 819.1, 300 sec: 805.3). Total num frames: 4784128. Throughput: 0: 816.6. Samples: 4784308. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:37:34,085][62436] Avg episode reward: [(0, '4288.946')] [2024-12-13 09:37:34,086][62473] Saving new best policy, reward=4288.946! [2024-12-13 09:37:39,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 4788224. Throughput: 0: 800.3. Samples: 4789172. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 09:37:39,076][62436] Avg episode reward: [(0, '4404.584')] [2024-12-13 09:37:39,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000009352_4788224.pth... [2024-12-13 09:37:39,088][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000009304_4763648.pth [2024-12-13 09:37:39,089][62473] Saving new best policy, reward=4404.584! [2024-12-13 09:37:42,045][62492] Updated weights for policy 0, policy_version 9360 (0.0011) [2024-12-13 09:37:44,076][62436] Fps is (10 sec: 819.9, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 4792320. Throughput: 0: 804.3. Samples: 4792036. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 09:37:44,076][62436] Avg episode reward: [(0, '4341.909')] [2024-12-13 09:37:49,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 4796416. Throughput: 0: 812.5. Samples: 4796416. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:37:49,078][62436] Avg episode reward: [(0, '4376.555')] [2024-12-13 09:37:54,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 4800512. Throughput: 0: 798.5. Samples: 4801072. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:37:54,076][62436] Avg episode reward: [(0, '4419.802')] [2024-12-13 09:37:54,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000009376_4800512.pth... [2024-12-13 09:37:54,091][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000009328_4775936.pth [2024-12-13 09:37:54,092][62473] Saving new best policy, reward=4419.802! [2024-12-13 09:37:59,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 4804608. Throughput: 0: 803.5. Samples: 4803956. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:37:59,076][62436] Avg episode reward: [(0, '4469.261')] [2024-12-13 09:37:59,077][62473] Saving new best policy, reward=4469.261! [2024-12-13 09:38:04,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 4808704. Throughput: 0: 819.6. Samples: 4808812. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:38:04,076][62436] Avg episode reward: [(0, '4496.760')] [2024-12-13 09:38:04,078][62473] Saving new best policy, reward=4496.760! [2024-12-13 09:38:09,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 4812800. Throughput: 0: 799.1. Samples: 4813068. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:38:09,078][62436] Avg episode reward: [(0, '4559.553')] [2024-12-13 09:38:09,089][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000009400_4812800.pth... [2024-12-13 09:38:09,096][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000009352_4788224.pth [2024-12-13 09:38:09,097][62473] Saving new best policy, reward=4559.553! [2024-12-13 09:38:14,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 4816896. Throughput: 0: 798.1. Samples: 4815900. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:38:14,076][62436] Avg episode reward: [(0, '4618.213')] [2024-12-13 09:38:14,077][62473] Saving new best policy, reward=4618.213! [2024-12-13 09:38:19,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 4820992. Throughput: 0: 815.4. Samples: 4820992. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:38:19,076][62436] Avg episode reward: [(0, '4602.462')] [2024-12-13 09:38:24,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 4825088. Throughput: 0: 798.1. Samples: 4825088. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:38:24,076][62436] Avg episode reward: [(0, '4594.899')] [2024-12-13 09:38:24,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000009424_4825088.pth... [2024-12-13 09:38:24,088][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000009376_4800512.pth [2024-12-13 09:38:29,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 4829184. Throughput: 0: 797.2. Samples: 4827912. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 09:38:29,076][62436] Avg episode reward: [(0, '4762.683')] [2024-12-13 09:38:29,077][62473] Saving new best policy, reward=4762.683! [2024-12-13 09:38:32,843][62492] Updated weights for policy 0, policy_version 9440 (0.0010) [2024-12-13 09:38:34,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.3, 300 sec: 805.3). Total num frames: 4833280. Throughput: 0: 818.2. Samples: 4833236. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 09:38:34,076][62436] Avg episode reward: [(0, '4748.106')] [2024-12-13 09:38:39,079][62436] Fps is (10 sec: 818.9, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 4837376. Throughput: 0: 800.2. Samples: 4837084. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:38:39,080][62436] Avg episode reward: [(0, '4754.680')] [2024-12-13 09:38:39,086][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000009448_4837376.pth... [2024-12-13 09:38:39,092][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000009400_4812800.pth [2024-12-13 09:38:44,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 4841472. Throughput: 0: 795.5. Samples: 4839752. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:38:44,076][62436] Avg episode reward: [(0, '4761.350')] [2024-12-13 09:38:49,080][62436] Fps is (10 sec: 819.1, 60 sec: 819.1, 300 sec: 805.3). Total num frames: 4845568. Throughput: 0: 810.5. Samples: 4845288. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 09:38:49,081][62436] Avg episode reward: [(0, '4847.686')] [2024-12-13 09:38:49,082][62473] Saving new best policy, reward=4847.686! [2024-12-13 09:38:54,076][62436] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 4845568. Throughput: 0: 800.4. Samples: 4849084. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 09:38:54,076][62436] Avg episode reward: [(0, '4818.508')] [2024-12-13 09:38:54,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000009464_4845568.pth... [2024-12-13 09:38:54,090][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000009424_4825088.pth [2024-12-13 09:38:59,076][62436] Fps is (10 sec: 819.6, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 4853760. Throughput: 0: 795.0. Samples: 4851676. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:38:59,076][62436] Avg episode reward: [(0, '4806.798')] [2024-12-13 09:39:04,076][62436] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 4857856. Throughput: 0: 806.6. Samples: 4857288. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:39:04,076][62436] Avg episode reward: [(0, '4725.895')] [2024-12-13 09:39:09,076][62436] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 4857856. Throughput: 0: 805.7. Samples: 4861344. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:39:09,076][62436] Avg episode reward: [(0, '4680.308')] [2024-12-13 09:39:09,083][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000009488_4857856.pth... [2024-12-13 09:39:09,091][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000009448_4837376.pth [2024-12-13 09:39:14,076][62436] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 4861952. Throughput: 0: 793.8. Samples: 4863632. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:39:14,076][62436] Avg episode reward: [(0, '4644.912')] [2024-12-13 09:39:19,079][62436] Fps is (10 sec: 1228.3, 60 sec: 819.1, 300 sec: 805.3). Total num frames: 4870144. Throughput: 0: 798.6. Samples: 4869176. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:39:19,080][62436] Avg episode reward: [(0, '4582.347')] [2024-12-13 09:39:24,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 4870144. Throughput: 0: 812.0. Samples: 4873620. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:39:24,076][62436] Avg episode reward: [(0, '4662.840')] [2024-12-13 09:39:24,089][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000009512_4870144.pth... [2024-12-13 09:39:24,095][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000009464_4845568.pth [2024-12-13 09:39:24,835][62492] Updated weights for policy 0, policy_version 9520 (0.0010) [2024-12-13 09:39:29,076][62436] Fps is (10 sec: 409.8, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 4874240. Throughput: 0: 794.0. Samples: 4875484. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:39:29,076][62436] Avg episode reward: [(0, '4587.684')] [2024-12-13 09:39:34,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 4878336. Throughput: 0: 790.8. Samples: 4880872. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-12-13 09:39:34,076][62436] Avg episode reward: [(0, '4569.361')] [2024-12-13 09:39:39,076][62436] Fps is (10 sec: 819.1, 60 sec: 751.0, 300 sec: 791.4). Total num frames: 4882432. Throughput: 0: 812.9. Samples: 4885664. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-12-13 09:39:39,077][62436] Avg episode reward: [(0, '4617.854')] [2024-12-13 09:39:39,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000009536_4882432.pth... [2024-12-13 09:39:39,087][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000009488_4857856.pth [2024-12-13 09:39:44,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 4886528. Throughput: 0: 792.2. Samples: 4887324. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:39:44,076][62436] Avg episode reward: [(0, '4523.485')] [2024-12-13 09:39:49,076][62436] Fps is (10 sec: 819.3, 60 sec: 751.0, 300 sec: 791.4). Total num frames: 4890624. Throughput: 0: 787.6. Samples: 4892728. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:39:49,076][62436] Avg episode reward: [(0, '4505.328')] [2024-12-13 09:39:54,078][62436] Fps is (10 sec: 819.0, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 4894720. Throughput: 0: 803.4. Samples: 4897500. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-12-13 09:39:54,079][62436] Avg episode reward: [(0, '4508.724')] [2024-12-13 09:39:54,087][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000009560_4894720.pth... [2024-12-13 09:39:54,100][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000009512_4870144.pth [2024-12-13 09:39:59,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 4898816. Throughput: 0: 787.5. Samples: 4899068. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-12-13 09:39:59,076][62436] Avg episode reward: [(0, '4477.661')] [2024-12-13 09:40:04,076][62436] Fps is (10 sec: 819.4, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 4902912. Throughput: 0: 780.9. Samples: 4904316. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:40:04,077][62436] Avg episode reward: [(0, '4476.614')] [2024-12-13 09:40:09,078][62436] Fps is (10 sec: 819.0, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 4907008. Throughput: 0: 790.4. Samples: 4909192. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:40:09,079][62436] Avg episode reward: [(0, '4463.036')] [2024-12-13 09:40:09,088][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000009584_4907008.pth... [2024-12-13 09:40:09,101][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000009536_4882432.pth [2024-12-13 09:40:14,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 4911104. Throughput: 0: 788.3. Samples: 4910956. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-12-13 09:40:14,076][62436] Avg episode reward: [(0, '4404.673')] [2024-12-13 09:40:16,753][62492] Updated weights for policy 0, policy_version 9600 (0.0010) [2024-12-13 09:40:19,076][62436] Fps is (10 sec: 819.4, 60 sec: 751.0, 300 sec: 791.4). Total num frames: 4915200. Throughput: 0: 780.5. Samples: 4915996. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-12-13 09:40:19,076][62436] Avg episode reward: [(0, '4431.368')] [2024-12-13 09:40:24,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 4919296. Throughput: 0: 790.9. Samples: 4921252. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:40:24,076][62436] Avg episode reward: [(0, '4404.745')] [2024-12-13 09:40:24,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000009608_4919296.pth... [2024-12-13 09:40:24,094][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000009560_4894720.pth [2024-12-13 09:40:29,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 4923392. Throughput: 0: 795.0. Samples: 4923100. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:40:29,076][62436] Avg episode reward: [(0, '4444.906')] [2024-12-13 09:40:34,078][62436] Fps is (10 sec: 819.0, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 4927488. Throughput: 0: 782.6. Samples: 4927948. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:40:34,078][62436] Avg episode reward: [(0, '4386.813')] [2024-12-13 09:40:39,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 4931584. Throughput: 0: 801.0. Samples: 4933544. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:40:39,076][62436] Avg episode reward: [(0, '4433.138')] [2024-12-13 09:40:39,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000009632_4931584.pth... [2024-12-13 09:40:39,095][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000009584_4907008.pth [2024-12-13 09:40:44,077][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 4935680. Throughput: 0: 808.1. Samples: 4935436. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2024-12-13 09:40:44,078][62436] Avg episode reward: [(0, '4439.769')] [2024-12-13 09:40:49,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 4939776. Throughput: 0: 791.2. Samples: 4939920. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2024-12-13 09:40:49,076][62436] Avg episode reward: [(0, '4408.140')] [2024-12-13 09:40:54,076][62436] Fps is (10 sec: 819.3, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 4943872. Throughput: 0: 807.0. Samples: 4945504. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:40:54,076][62436] Avg episode reward: [(0, '4421.340')] [2024-12-13 09:40:54,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000009656_4943872.pth... [2024-12-13 09:40:54,088][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000009608_4919296.pth [2024-12-13 09:40:59,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 4947968. Throughput: 0: 814.9. Samples: 4947628. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:40:59,076][62436] Avg episode reward: [(0, '4436.729')] [2024-12-13 09:41:04,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 4952064. Throughput: 0: 793.2. Samples: 4951688. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 09:41:04,076][62436] Avg episode reward: [(0, '4363.259')] [2024-12-13 09:41:07,681][62492] Updated weights for policy 0, policy_version 9680 (0.0010) [2024-12-13 09:41:09,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 4956160. Throughput: 0: 795.4. Samples: 4957044. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 09:41:09,076][62436] Avg episode reward: [(0, '4405.716')] [2024-12-13 09:41:09,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000009680_4956160.pth... [2024-12-13 09:41:09,089][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000009632_4931584.pth [2024-12-13 09:41:14,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 4960256. Throughput: 0: 808.6. Samples: 4959488. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:41:14,076][62436] Avg episode reward: [(0, '4326.859')] [2024-12-13 09:41:19,077][62436] Fps is (10 sec: 409.5, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 4960256. Throughput: 0: 764.5. Samples: 4962348. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:41:19,078][62436] Avg episode reward: [(0, '4281.073')] [2024-12-13 09:41:24,076][62436] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 4964352. Throughput: 0: 747.3. Samples: 4967172. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:41:24,076][62436] Avg episode reward: [(0, '4328.075')] [2024-12-13 09:41:24,084][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000009696_4964352.pth... [2024-12-13 09:41:24,097][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000009656_4943872.pth [2024-12-13 09:41:29,078][62436] Fps is (10 sec: 819.1, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 4968448. Throughput: 0: 763.6. Samples: 4969800. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 09:41:29,079][62436] Avg episode reward: [(0, '4359.148')] [2024-12-13 09:41:34,083][62436] Fps is (10 sec: 818.6, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 4972544. Throughput: 0: 764.8. Samples: 4974344. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 09:41:34,084][62436] Avg episode reward: [(0, '4307.996')] [2024-12-13 09:41:39,078][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 4976640. Throughput: 0: 749.2. Samples: 4979220. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 09:41:39,079][62436] Avg episode reward: [(0, '4294.923')] [2024-12-13 09:41:39,084][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000009720_4976640.pth... [2024-12-13 09:41:39,090][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000009680_4956160.pth [2024-12-13 09:41:44,076][62436] Fps is (10 sec: 819.8, 60 sec: 751.0, 300 sec: 791.4). Total num frames: 4980736. Throughput: 0: 763.4. Samples: 4981980. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 09:41:44,076][62436] Avg episode reward: [(0, '4240.327')] [2024-12-13 09:41:49,076][62436] Fps is (10 sec: 819.4, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 4984832. Throughput: 0: 779.8. Samples: 4986780. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:41:49,076][62436] Avg episode reward: [(0, '4260.735')] [2024-12-13 09:41:54,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 4988928. Throughput: 0: 761.0. Samples: 4991288. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:41:54,076][62436] Avg episode reward: [(0, '4277.380')] [2024-12-13 09:41:54,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000009744_4988928.pth... [2024-12-13 09:41:54,088][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000009696_4964352.pth [2024-12-13 09:41:59,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 4993024. Throughput: 0: 768.1. Samples: 4994052. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:41:59,076][62436] Avg episode reward: [(0, '4324.558')] [2024-12-13 09:41:59,972][62492] Updated weights for policy 0, policy_version 9760 (0.0033) [2024-12-13 09:42:04,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 4997120. Throughput: 0: 818.6. Samples: 4999184. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:42:04,076][62436] Avg episode reward: [(0, '4335.696')] [2024-12-13 09:42:09,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 5001216. Throughput: 0: 804.1. Samples: 5003356. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:42:09,076][62436] Avg episode reward: [(0, '4407.307')] [2024-12-13 09:42:09,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000009768_5001216.pth... [2024-12-13 09:42:09,094][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000009720_4976640.pth [2024-12-13 09:42:14,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 5005312. Throughput: 0: 805.3. Samples: 5006036. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:42:14,076][62436] Avg episode reward: [(0, '4329.380')] [2024-12-13 09:42:19,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 5009408. Throughput: 0: 824.0. Samples: 5011416. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2024-12-13 09:42:19,076][62436] Avg episode reward: [(0, '4334.019')] [2024-12-13 09:42:24,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 5013504. Throughput: 0: 803.9. Samples: 5015392. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2024-12-13 09:42:24,076][62436] Avg episode reward: [(0, '4313.044')] [2024-12-13 09:42:24,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000009792_5013504.pth... [2024-12-13 09:42:24,088][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000009744_4988928.pth [2024-12-13 09:42:29,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.5). Total num frames: 5017600. Throughput: 0: 803.9. Samples: 5018156. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:42:29,076][62436] Avg episode reward: [(0, '4214.626')] [2024-12-13 09:42:34,075][62436] Fps is (10 sec: 819.2, 60 sec: 819.3, 300 sec: 791.4). Total num frames: 5021696. Throughput: 0: 821.3. Samples: 5023740. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:42:34,076][62436] Avg episode reward: [(0, '4236.450')] [2024-12-13 09:42:39,078][62436] Fps is (10 sec: 819.0, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 5025792. Throughput: 0: 805.8. Samples: 5027552. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:42:39,078][62436] Avg episode reward: [(0, '4246.746')] [2024-12-13 09:42:39,083][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000009816_5025792.pth... [2024-12-13 09:42:39,090][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000009768_5001216.pth [2024-12-13 09:42:44,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 5029888. Throughput: 0: 806.4. Samples: 5030340. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:42:44,076][62436] Avg episode reward: [(0, '4193.097')] [2024-12-13 09:42:49,076][62436] Fps is (10 sec: 819.4, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 5033984. Throughput: 0: 818.5. Samples: 5036016. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 09:42:49,076][62436] Avg episode reward: [(0, '4141.060')] [2024-12-13 09:42:50,870][62492] Updated weights for policy 0, policy_version 9840 (0.0010) [2024-12-13 09:42:54,079][62436] Fps is (10 sec: 818.9, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 5038080. Throughput: 0: 810.3. Samples: 5039820. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 09:42:54,080][62436] Avg episode reward: [(0, '4138.575')] [2024-12-13 09:42:54,086][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000009840_5038080.pth... [2024-12-13 09:42:54,092][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000009792_5013504.pth [2024-12-13 09:42:59,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 5042176. Throughput: 0: 811.8. Samples: 5042568. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:42:59,080][62436] Avg episode reward: [(0, '4134.975')] [2024-12-13 09:43:04,082][62436] Fps is (10 sec: 818.9, 60 sec: 819.1, 300 sec: 791.4). Total num frames: 5046272. Throughput: 0: 819.9. Samples: 5048316. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:43:04,083][62436] Avg episode reward: [(0, '4167.133')] [2024-12-13 09:43:09,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 5050368. Throughput: 0: 821.5. Samples: 5052360. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:43:09,076][62436] Avg episode reward: [(0, '4231.739')] [2024-12-13 09:43:09,085][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000009864_5050368.pth... [2024-12-13 09:43:09,097][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000009816_5025792.pth [2024-12-13 09:43:14,076][62436] Fps is (10 sec: 819.8, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 5054464. Throughput: 0: 813.5. Samples: 5054764. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:43:14,076][62436] Avg episode reward: [(0, '4196.426')] [2024-12-13 09:43:19,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 5058560. Throughput: 0: 813.1. Samples: 5060328. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 09:43:19,076][62436] Avg episode reward: [(0, '4204.731')] [2024-12-13 09:43:24,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 5062656. Throughput: 0: 824.5. Samples: 5064652. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 09:43:24,076][62436] Avg episode reward: [(0, '4205.926')] [2024-12-13 09:43:24,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000009888_5062656.pth... [2024-12-13 09:43:24,090][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000009840_5038080.pth [2024-12-13 09:43:29,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 5066752. Throughput: 0: 809.2. Samples: 5066752. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:43:29,076][62436] Avg episode reward: [(0, '4138.384')] [2024-12-13 09:43:34,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 5070848. Throughput: 0: 808.0. Samples: 5072376. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:43:34,076][62436] Avg episode reward: [(0, '4043.106')] [2024-12-13 09:43:39,078][62436] Fps is (10 sec: 819.0, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 5074944. Throughput: 0: 826.7. Samples: 5077020. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:43:39,079][62436] Avg episode reward: [(0, '4060.407')] [2024-12-13 09:43:39,088][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000009912_5074944.pth... [2024-12-13 09:43:39,094][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000009864_5050368.pth [2024-12-13 09:43:41,801][62492] Updated weights for policy 0, policy_version 9920 (0.0015) [2024-12-13 09:43:44,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 5079040. Throughput: 0: 808.5. Samples: 5078952. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:43:44,076][62436] Avg episode reward: [(0, '4035.178')] [2024-12-13 09:43:49,076][62436] Fps is (10 sec: 819.4, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 5083136. Throughput: 0: 802.3. Samples: 5084412. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:43:49,076][62436] Avg episode reward: [(0, '4094.570')] [2024-12-13 09:43:54,077][62436] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 5087232. Throughput: 0: 821.6. Samples: 5089332. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:43:54,078][62436] Avg episode reward: [(0, '3986.390')] [2024-12-13 09:43:54,084][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000009936_5087232.pth... [2024-12-13 09:43:54,095][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000009888_5062656.pth [2024-12-13 09:43:59,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 5091328. Throughput: 0: 810.6. Samples: 5091240. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:43:59,076][62436] Avg episode reward: [(0, '4031.084')] [2024-12-13 09:44:04,076][62436] Fps is (10 sec: 819.3, 60 sec: 819.3, 300 sec: 805.3). Total num frames: 5095424. Throughput: 0: 805.4. Samples: 5096572. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:44:04,076][62436] Avg episode reward: [(0, '4128.075')] [2024-12-13 09:44:09,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 5099520. Throughput: 0: 826.6. Samples: 5101848. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:44:09,076][62436] Avg episode reward: [(0, '4139.344')] [2024-12-13 09:44:09,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000009960_5099520.pth... [2024-12-13 09:44:09,090][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000009912_5074944.pth [2024-12-13 09:44:14,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 5103616. Throughput: 0: 819.2. Samples: 5103616. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:44:14,076][62436] Avg episode reward: [(0, '4243.214')] [2024-12-13 09:44:19,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 5107712. Throughput: 0: 808.3. Samples: 5108748. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 09:44:19,076][62436] Avg episode reward: [(0, '4248.749')] [2024-12-13 09:44:24,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 5111808. Throughput: 0: 827.0. Samples: 5114232. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 09:44:24,076][62436] Avg episode reward: [(0, '4188.808')] [2024-12-13 09:44:24,084][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000009984_5111808.pth... [2024-12-13 09:44:24,090][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000009936_5087232.pth [2024-12-13 09:44:29,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 5115904. Throughput: 0: 821.2. Samples: 5115904. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:44:29,076][62436] Avg episode reward: [(0, '4108.733')] [2024-12-13 09:44:31,691][62492] Updated weights for policy 0, policy_version 10000 (0.0010) [2024-12-13 09:44:34,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 5120000. Throughput: 0: 810.0. Samples: 5120860. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:44:34,076][62436] Avg episode reward: [(0, '4042.453')] [2024-12-13 09:44:39,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 5124096. Throughput: 0: 825.8. Samples: 5126492. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:44:39,076][62436] Avg episode reward: [(0, '3966.724')] [2024-12-13 09:44:39,084][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000010008_5124096.pth... [2024-12-13 09:44:39,096][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000009960_5099520.pth [2024-12-13 09:44:44,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 5128192. Throughput: 0: 821.2. Samples: 5128192. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:44:44,076][62436] Avg episode reward: [(0, '3966.724')] [2024-12-13 09:44:49,079][62436] Fps is (10 sec: 818.9, 60 sec: 819.1, 300 sec: 805.3). Total num frames: 5132288. Throughput: 0: 810.4. Samples: 5133044. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:44:49,080][62436] Avg episode reward: [(0, '4055.129')] [2024-12-13 09:44:54,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 5136384. Throughput: 0: 821.0. Samples: 5138792. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:44:54,076][62436] Avg episode reward: [(0, '4053.379')] [2024-12-13 09:44:54,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000010032_5136384.pth... [2024-12-13 09:44:54,090][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000009984_5111808.pth [2024-12-13 09:44:59,076][62436] Fps is (10 sec: 819.5, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 5140480. Throughput: 0: 825.0. Samples: 5140740. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 09:44:59,076][62436] Avg episode reward: [(0, '4022.709')] [2024-12-13 09:45:04,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 5144576. Throughput: 0: 813.2. Samples: 5145340. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 09:45:04,076][62436] Avg episode reward: [(0, '3850.413')] [2024-12-13 09:45:09,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 5148672. Throughput: 0: 817.0. Samples: 5150996. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 09:45:09,076][62436] Avg episode reward: [(0, '3897.223')] [2024-12-13 09:45:09,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000010056_5148672.pth... [2024-12-13 09:45:09,089][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000010008_5124096.pth [2024-12-13 09:45:14,080][62436] Fps is (10 sec: 818.9, 60 sec: 819.1, 300 sec: 805.3). Total num frames: 5152768. Throughput: 0: 829.2. Samples: 5153220. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 09:45:14,081][62436] Avg episode reward: [(0, '3930.424')] [2024-12-13 09:45:19,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 5156864. Throughput: 0: 813.3. Samples: 5157460. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:45:19,076][62436] Avg episode reward: [(0, '3920.231')] [2024-12-13 09:45:21,318][62492] Updated weights for policy 0, policy_version 10080 (0.0012) [2024-12-13 09:45:24,076][62436] Fps is (10 sec: 819.5, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 5160960. Throughput: 0: 815.9. Samples: 5163208. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:45:24,076][62436] Avg episode reward: [(0, '3963.509')] [2024-12-13 09:45:24,083][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000010080_5160960.pth... [2024-12-13 09:45:24,096][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000010032_5136384.pth [2024-12-13 09:45:29,078][62436] Fps is (10 sec: 819.0, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 5165056. Throughput: 0: 834.4. Samples: 5165740. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2024-12-13 09:45:29,079][62436] Avg episode reward: [(0, '3847.329')] [2024-12-13 09:45:34,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 5169152. Throughput: 0: 815.6. Samples: 5169744. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2024-12-13 09:45:34,076][62436] Avg episode reward: [(0, '3823.357')] [2024-12-13 09:45:39,076][62436] Fps is (10 sec: 819.4, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 5173248. Throughput: 0: 816.8. Samples: 5175548. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 09:45:39,076][62436] Avg episode reward: [(0, '3835.851')] [2024-12-13 09:45:39,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000010104_5173248.pth... [2024-12-13 09:45:39,088][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000010056_5148672.pth [2024-12-13 09:45:44,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 5177344. Throughput: 0: 835.8. Samples: 5178352. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 09:45:44,076][62436] Avg episode reward: [(0, '3835.394')] [2024-12-13 09:45:49,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.3, 300 sec: 805.3). Total num frames: 5181440. Throughput: 0: 818.3. Samples: 5182164. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:45:49,076][62436] Avg episode reward: [(0, '3896.742')] [2024-12-13 09:45:54,076][62436] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 5185536. Throughput: 0: 817.6. Samples: 5187788. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:45:54,077][62436] Avg episode reward: [(0, '3856.323')] [2024-12-13 09:45:54,083][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000010128_5185536.pth... [2024-12-13 09:45:54,097][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000010080_5160960.pth [2024-12-13 09:45:59,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 5189632. Throughput: 0: 808.8. Samples: 5189612. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:45:59,076][62436] Avg episode reward: [(0, '3942.981')] [2024-12-13 09:46:04,076][62436] Fps is (10 sec: 819.3, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 5193728. Throughput: 0: 790.0. Samples: 5193012. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:46:04,080][62436] Avg episode reward: [(0, '3931.037')] [2024-12-13 09:46:09,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 5197824. Throughput: 0: 770.5. Samples: 5197880. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:46:09,079][62436] Avg episode reward: [(0, '3997.346')] [2024-12-13 09:46:09,085][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000010152_5197824.pth... [2024-12-13 09:46:09,094][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000010104_5173248.pth [2024-12-13 09:46:13,071][62492] Updated weights for policy 0, policy_version 10160 (0.0010) [2024-12-13 09:46:14,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.3, 300 sec: 819.2). Total num frames: 5201920. Throughput: 0: 777.0. Samples: 5200704. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:46:14,076][62436] Avg episode reward: [(0, '3939.755')] [2024-12-13 09:46:19,078][62436] Fps is (10 sec: 819.0, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 5206016. Throughput: 0: 789.6. Samples: 5205276. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 09:46:19,079][62436] Avg episode reward: [(0, '3970.575')] [2024-12-13 09:46:24,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 5210112. Throughput: 0: 762.9. Samples: 5209880. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 09:46:24,076][62436] Avg episode reward: [(0, '3993.840')] [2024-12-13 09:46:24,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000010176_5210112.pth... [2024-12-13 09:46:24,089][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000010128_5185536.pth [2024-12-13 09:46:29,076][62436] Fps is (10 sec: 819.4, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 5214208. Throughput: 0: 762.0. Samples: 5212644. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:46:29,076][62436] Avg episode reward: [(0, '4125.493')] [2024-12-13 09:46:34,076][62436] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 5214208. Throughput: 0: 786.4. Samples: 5217552. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:46:34,076][62436] Avg episode reward: [(0, '4166.051')] [2024-12-13 09:46:39,076][62436] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 5218304. Throughput: 0: 758.6. Samples: 5221924. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:46:39,076][62436] Avg episode reward: [(0, '4172.489')] [2024-12-13 09:46:39,083][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000010192_5218304.pth... [2024-12-13 09:46:39,091][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000010152_5197824.pth [2024-12-13 09:46:44,076][62436] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 5226496. Throughput: 0: 776.5. Samples: 5224556. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:46:44,076][62436] Avg episode reward: [(0, '4189.722')] [2024-12-13 09:46:49,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 5226496. Throughput: 0: 813.9. Samples: 5229636. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:46:49,076][62436] Avg episode reward: [(0, '4258.177')] [2024-12-13 09:46:54,076][62436] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 5230592. Throughput: 0: 802.2. Samples: 5233980. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:46:54,076][62436] Avg episode reward: [(0, '4240.423')] [2024-12-13 09:46:54,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000010216_5230592.pth... [2024-12-13 09:46:54,089][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000010176_5210112.pth [2024-12-13 09:46:59,075][62436] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 5238784. Throughput: 0: 799.6. Samples: 5236688. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:46:59,076][62436] Avg episode reward: [(0, '4330.824')] [2024-12-13 09:47:04,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 5238784. Throughput: 0: 817.2. Samples: 5242048. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:47:04,076][62436] Avg episode reward: [(0, '4420.024')] [2024-12-13 09:47:04,336][62492] Updated weights for policy 0, policy_version 10240 (0.0016) [2024-12-13 09:47:09,076][62436] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 5242880. Throughput: 0: 805.1. Samples: 5246108. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:47:09,076][62436] Avg episode reward: [(0, '4426.775')] [2024-12-13 09:47:09,080][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000010240_5242880.pth... [2024-12-13 09:47:09,089][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000010192_5218304.pth [2024-12-13 09:47:14,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 5246976. Throughput: 0: 804.7. Samples: 5248856. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:47:14,076][62436] Avg episode reward: [(0, '4473.816')] [2024-12-13 09:47:19,076][62436] Fps is (10 sec: 819.2, 60 sec: 751.0, 300 sec: 805.3). Total num frames: 5251072. Throughput: 0: 815.3. Samples: 5254240. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:47:19,076][62436] Avg episode reward: [(0, '4494.257')] [2024-12-13 09:47:24,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 5255168. Throughput: 0: 805.2. Samples: 5258160. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-12-13 09:47:24,076][62436] Avg episode reward: [(0, '4567.193')] [2024-12-13 09:47:24,086][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000010264_5255168.pth... [2024-12-13 09:47:24,093][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000010216_5230592.pth [2024-12-13 09:47:29,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 5259264. Throughput: 0: 807.3. Samples: 5260884. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-12-13 09:47:29,076][62436] Avg episode reward: [(0, '4579.300')] [2024-12-13 09:47:34,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 5263360. Throughput: 0: 818.4. Samples: 5266464. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:47:34,076][62436] Avg episode reward: [(0, '4597.141')] [2024-12-13 09:47:39,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 5267456. Throughput: 0: 805.8. Samples: 5270240. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:47:39,076][62436] Avg episode reward: [(0, '4534.532')] [2024-12-13 09:47:39,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000010288_5267456.pth... [2024-12-13 09:47:39,086][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000010240_5242880.pth [2024-12-13 09:47:44,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 5271552. Throughput: 0: 804.7. Samples: 5272900. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:47:44,076][62436] Avg episode reward: [(0, '4486.778')] [2024-12-13 09:47:49,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 5275648. Throughput: 0: 810.8. Samples: 5278532. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:47:49,076][62436] Avg episode reward: [(0, '4501.487')] [2024-12-13 09:47:54,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 5279744. Throughput: 0: 804.6. Samples: 5282316. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:47:54,076][62436] Avg episode reward: [(0, '4577.728')] [2024-12-13 09:47:54,088][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000010312_5279744.pth... [2024-12-13 09:47:54,103][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000010264_5255168.pth [2024-12-13 09:47:55,535][62492] Updated weights for policy 0, policy_version 10320 (0.0011) [2024-12-13 09:47:59,075][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 5283840. Throughput: 0: 799.8. Samples: 5284848. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:47:59,079][62436] Avg episode reward: [(0, '4634.981')] [2024-12-13 09:48:04,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 5287936. Throughput: 0: 805.3. Samples: 5290480. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:48:04,076][62436] Avg episode reward: [(0, '4599.388')] [2024-12-13 09:48:09,080][62436] Fps is (10 sec: 818.8, 60 sec: 819.1, 300 sec: 805.3). Total num frames: 5292032. Throughput: 0: 810.9. Samples: 5294656. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:48:09,081][62436] Avg episode reward: [(0, '4638.371')] [2024-12-13 09:48:09,095][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000010336_5292032.pth... [2024-12-13 09:48:09,119][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000010288_5267456.pth [2024-12-13 09:48:14,075][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 5296128. Throughput: 0: 801.8. Samples: 5296964. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:48:14,076][62436] Avg episode reward: [(0, '4614.527')] [2024-12-13 09:48:19,076][62436] Fps is (10 sec: 819.6, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 5300224. Throughput: 0: 805.4. Samples: 5302708. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:48:19,076][62436] Avg episode reward: [(0, '4638.348')] [2024-12-13 09:48:24,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 5304320. Throughput: 0: 821.1. Samples: 5307188. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-12-13 09:48:24,076][62436] Avg episode reward: [(0, '4564.408')] [2024-12-13 09:48:24,085][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000010360_5304320.pth... [2024-12-13 09:48:24,096][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000010312_5279744.pth [2024-12-13 09:48:29,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 5308416. Throughput: 0: 805.6. Samples: 5309152. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-12-13 09:48:29,076][62436] Avg episode reward: [(0, '4502.608')] [2024-12-13 09:48:34,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 5312512. Throughput: 0: 805.4. Samples: 5314776. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:48:34,076][62436] Avg episode reward: [(0, '4516.254')] [2024-12-13 09:48:39,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 5316608. Throughput: 0: 827.8. Samples: 5319568. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:48:39,077][62436] Avg episode reward: [(0, '4467.059')] [2024-12-13 09:48:39,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000010384_5316608.pth... [2024-12-13 09:48:39,093][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000010336_5292032.pth [2024-12-13 09:48:44,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 5320704. Throughput: 0: 809.1. Samples: 5321256. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:48:44,076][62436] Avg episode reward: [(0, '4478.329')] [2024-12-13 09:48:45,348][62492] Updated weights for policy 0, policy_version 10400 (0.0012) [2024-12-13 09:48:49,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 5324800. Throughput: 0: 811.0. Samples: 5326976. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:48:49,078][62436] Avg episode reward: [(0, '4464.559')] [2024-12-13 09:48:54,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 5328896. Throughput: 0: 827.6. Samples: 5331896. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-12-13 09:48:54,076][62436] Avg episode reward: [(0, '4422.695')] [2024-12-13 09:48:54,085][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000010408_5328896.pth... [2024-12-13 09:48:54,095][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000010360_5304320.pth [2024-12-13 09:48:59,075][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 5332992. Throughput: 0: 814.8. Samples: 5333628. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-12-13 09:48:59,076][62436] Avg episode reward: [(0, '4337.848')] [2024-12-13 09:49:04,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 5337088. Throughput: 0: 803.6. Samples: 5338868. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 09:49:04,076][62436] Avg episode reward: [(0, '4289.045')] [2024-12-13 09:49:09,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.3, 300 sec: 805.3). Total num frames: 5341184. Throughput: 0: 822.1. Samples: 5344184. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 09:49:09,076][62436] Avg episode reward: [(0, '4388.604')] [2024-12-13 09:49:09,085][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000010432_5341184.pth... [2024-12-13 09:49:09,093][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000010384_5316608.pth [2024-12-13 09:49:14,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 5345280. Throughput: 0: 816.6. Samples: 5345900. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:49:14,083][62436] Avg episode reward: [(0, '4443.103')] [2024-12-13 09:49:19,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 5349376. Throughput: 0: 807.5. Samples: 5351112. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:49:19,076][62436] Avg episode reward: [(0, '4420.089')] [2024-12-13 09:49:24,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 5353472. Throughput: 0: 821.2. Samples: 5356524. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-12-13 09:49:24,076][62436] Avg episode reward: [(0, '4361.479')] [2024-12-13 09:49:24,083][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000010456_5353472.pth... [2024-12-13 09:49:24,089][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000010408_5328896.pth [2024-12-13 09:49:29,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 5357568. Throughput: 0: 822.2. Samples: 5358256. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-12-13 09:49:29,077][62436] Avg episode reward: [(0, '4396.530')] [2024-12-13 09:49:34,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 5361664. Throughput: 0: 803.3. Samples: 5363124. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:49:34,078][62436] Avg episode reward: [(0, '4338.836')] [2024-12-13 09:49:35,511][62492] Updated weights for policy 0, policy_version 10480 (0.0014) [2024-12-13 09:49:39,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 5365760. Throughput: 0: 819.2. Samples: 5368760. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:49:39,076][62436] Avg episode reward: [(0, '4346.628')] [2024-12-13 09:49:39,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000010480_5365760.pth... [2024-12-13 09:49:39,092][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000010432_5341184.pth [2024-12-13 09:49:44,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 5369856. Throughput: 0: 821.6. Samples: 5370600. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:49:44,076][62436] Avg episode reward: [(0, '4397.733')] [2024-12-13 09:49:49,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 5373952. Throughput: 0: 805.3. Samples: 5375108. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:49:49,076][62436] Avg episode reward: [(0, '4404.392')] [2024-12-13 09:49:54,079][62436] Fps is (10 sec: 818.9, 60 sec: 819.1, 300 sec: 805.3). Total num frames: 5378048. Throughput: 0: 812.1. Samples: 5380732. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 09:49:54,080][62436] Avg episode reward: [(0, '4381.996')] [2024-12-13 09:49:54,085][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000010504_5378048.pth... [2024-12-13 09:49:54,093][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000010456_5353472.pth [2024-12-13 09:49:59,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 5382144. Throughput: 0: 820.0. Samples: 5382800. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 09:49:59,076][62436] Avg episode reward: [(0, '4373.152')] [2024-12-13 09:50:04,076][62436] Fps is (10 sec: 819.5, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 5386240. Throughput: 0: 798.8. Samples: 5387060. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 09:50:04,076][62436] Avg episode reward: [(0, '4414.409')] [2024-12-13 09:50:09,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 5390336. Throughput: 0: 801.7. Samples: 5392600. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 09:50:09,076][62436] Avg episode reward: [(0, '4439.131')] [2024-12-13 09:50:09,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000010528_5390336.pth... [2024-12-13 09:50:09,090][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000010480_5365760.pth [2024-12-13 09:50:14,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 5394432. Throughput: 0: 814.8. Samples: 5394920. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:50:14,076][62436] Avg episode reward: [(0, '4394.562')] [2024-12-13 09:50:19,078][62436] Fps is (10 sec: 819.0, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 5398528. Throughput: 0: 796.7. Samples: 5398976. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:50:19,079][62436] Avg episode reward: [(0, '4495.093')] [2024-12-13 09:50:24,078][62436] Fps is (10 sec: 819.0, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 5402624. Throughput: 0: 796.2. Samples: 5404592. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-12-13 09:50:24,079][62436] Avg episode reward: [(0, '4516.493')] [2024-12-13 09:50:24,084][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000010552_5402624.pth... [2024-12-13 09:50:24,091][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000010504_5378048.pth [2024-12-13 09:50:25,957][62492] Updated weights for policy 0, policy_version 10560 (0.0018) [2024-12-13 09:50:29,076][62436] Fps is (10 sec: 819.4, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 5406720. Throughput: 0: 814.6. Samples: 5407256. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-12-13 09:50:29,076][62436] Avg episode reward: [(0, '4544.473')] [2024-12-13 09:50:34,076][62436] Fps is (10 sec: 819.4, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 5410816. Throughput: 0: 798.4. Samples: 5411036. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 09:50:34,076][62436] Avg episode reward: [(0, '4441.653')] [2024-12-13 09:50:39,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 5414912. Throughput: 0: 799.2. Samples: 5416692. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 09:50:39,076][62436] Avg episode reward: [(0, '4599.019')] [2024-12-13 09:50:39,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000010576_5414912.pth... [2024-12-13 09:50:39,088][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000010528_5390336.pth [2024-12-13 09:50:44,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 5419008. Throughput: 0: 800.2. Samples: 5418808. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 09:50:44,077][62436] Avg episode reward: [(0, '4781.505')] [2024-12-13 09:50:49,076][62436] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 5419008. Throughput: 0: 773.1. Samples: 5421848. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 09:50:49,076][62436] Avg episode reward: [(0, '4801.976')] [2024-12-13 09:50:54,076][62436] Fps is (10 sec: 409.6, 60 sec: 751.0, 300 sec: 791.4). Total num frames: 5423104. Throughput: 0: 759.4. Samples: 5426772. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 09:50:54,076][62436] Avg episode reward: [(0, '4851.679')] [2024-12-13 09:50:54,083][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000010592_5423104.pth... [2024-12-13 09:50:54,090][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000010552_5402624.pth [2024-12-13 09:50:54,090][62473] Saving new best policy, reward=4851.679! [2024-12-13 09:50:59,076][62436] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 5431296. Throughput: 0: 766.9. Samples: 5429432. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:50:59,076][62436] Avg episode reward: [(0, '4866.230')] [2024-12-13 09:50:59,077][62473] Saving new best policy, reward=4866.230! [2024-12-13 09:51:04,080][62436] Fps is (10 sec: 818.8, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 5431296. Throughput: 0: 780.0. Samples: 5434076. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:51:04,081][62436] Avg episode reward: [(0, '4866.325')] [2024-12-13 09:51:04,082][62473] Saving new best policy, reward=4866.325! [2024-12-13 09:51:09,076][62436] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 5435392. Throughput: 0: 758.9. Samples: 5438740. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:51:09,076][62436] Avg episode reward: [(0, '4859.317')] [2024-12-13 09:51:09,084][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000010616_5435392.pth... [2024-12-13 09:51:09,090][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000010576_5414912.pth [2024-12-13 09:51:14,076][62436] Fps is (10 sec: 1229.4, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 5443584. Throughput: 0: 759.7. Samples: 5441444. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:51:14,076][62436] Avg episode reward: [(0, '4904.239')] [2024-12-13 09:51:14,077][62473] Saving new best policy, reward=4904.239! [2024-12-13 09:51:19,076][62436] Fps is (10 sec: 819.2, 60 sec: 751.0, 300 sec: 791.4). Total num frames: 5443584. Throughput: 0: 786.9. Samples: 5446448. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:51:19,076][62436] Avg episode reward: [(0, '4915.134')] [2024-12-13 09:51:19,077][62473] Saving new best policy, reward=4915.134! [2024-12-13 09:51:19,914][62492] Updated weights for policy 0, policy_version 10640 (0.0012) [2024-12-13 09:51:24,077][62436] Fps is (10 sec: 409.5, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 5447680. Throughput: 0: 755.3. Samples: 5450684. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:51:24,078][62436] Avg episode reward: [(0, '4888.415')] [2024-12-13 09:51:24,086][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000010640_5447680.pth... [2024-12-13 09:51:24,094][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000010592_5423104.pth [2024-12-13 09:51:29,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 5451776. Throughput: 0: 768.2. Samples: 5453376. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 09:51:29,076][62436] Avg episode reward: [(0, '4951.185')] [2024-12-13 09:51:29,077][62473] Saving new best policy, reward=4951.185! [2024-12-13 09:51:34,076][62436] Fps is (10 sec: 819.3, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 5455872. Throughput: 0: 818.9. Samples: 5458700. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 09:51:34,076][62436] Avg episode reward: [(0, '4882.804')] [2024-12-13 09:51:39,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 5459968. Throughput: 0: 800.7. Samples: 5462804. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 09:51:39,083][62436] Avg episode reward: [(0, '4833.140')] [2024-12-13 09:51:39,092][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000010664_5459968.pth... [2024-12-13 09:51:39,110][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000010616_5435392.pth [2024-12-13 09:51:44,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 5464064. Throughput: 0: 802.5. Samples: 5465544. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 09:51:44,076][62436] Avg episode reward: [(0, '4835.690')] [2024-12-13 09:51:49,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 5468160. Throughput: 0: 818.0. Samples: 5470884. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:51:49,076][62436] Avg episode reward: [(0, '4805.842')] [2024-12-13 09:51:54,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 5472256. Throughput: 0: 802.0. Samples: 5474828. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:51:54,076][62436] Avg episode reward: [(0, '4783.128')] [2024-12-13 09:51:54,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000010688_5472256.pth... [2024-12-13 09:51:54,088][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000010640_5447680.pth [2024-12-13 09:51:59,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 5476352. Throughput: 0: 800.4. Samples: 5477464. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:51:59,076][62436] Avg episode reward: [(0, '4825.948')] [2024-12-13 09:52:04,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.3, 300 sec: 805.3). Total num frames: 5480448. Throughput: 0: 813.8. Samples: 5483068. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:52:04,076][62436] Avg episode reward: [(0, '4823.724')] [2024-12-13 09:52:09,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 5484544. Throughput: 0: 801.9. Samples: 5486768. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-12-13 09:52:09,076][62436] Avg episode reward: [(0, '4828.587')] [2024-12-13 09:52:09,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000010712_5484544.pth... [2024-12-13 09:52:09,090][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000010664_5459968.pth [2024-12-13 09:52:10,715][62492] Updated weights for policy 0, policy_version 10720 (0.0011) [2024-12-13 09:52:14,076][62436] Fps is (10 sec: 819.1, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 5488640. Throughput: 0: 802.6. Samples: 5489492. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-12-13 09:52:14,077][62436] Avg episode reward: [(0, '4905.827')] [2024-12-13 09:52:19,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 5492736. Throughput: 0: 810.5. Samples: 5495172. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:52:19,077][62436] Avg episode reward: [(0, '4955.508')] [2024-12-13 09:52:19,078][62473] Saving new best policy, reward=4955.508! [2024-12-13 09:52:24,076][62436] Fps is (10 sec: 819.3, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 5496832. Throughput: 0: 804.9. Samples: 5499024. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:52:24,076][62436] Avg episode reward: [(0, '4890.309')] [2024-12-13 09:52:24,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000010736_5496832.pth... [2024-12-13 09:52:24,088][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000010688_5472256.pth [2024-12-13 09:52:29,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 5500928. Throughput: 0: 800.1. Samples: 5501548. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 09:52:29,076][62436] Avg episode reward: [(0, '4798.619')] [2024-12-13 09:52:34,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 5505024. Throughput: 0: 809.6. Samples: 5507316. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 09:52:34,079][62436] Avg episode reward: [(0, '4756.418')] [2024-12-13 09:52:39,077][62436] Fps is (10 sec: 819.0, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 5509120. Throughput: 0: 814.0. Samples: 5511460. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-12-13 09:52:39,080][62436] Avg episode reward: [(0, '4709.369')] [2024-12-13 09:52:39,093][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000010760_5509120.pth... [2024-12-13 09:52:39,101][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000010712_5484544.pth [2024-12-13 09:52:44,076][62436] Fps is (10 sec: 819.3, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 5513216. Throughput: 0: 807.8. Samples: 5513816. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-12-13 09:52:44,076][62436] Avg episode reward: [(0, '4708.717')] [2024-12-13 09:52:49,076][62436] Fps is (10 sec: 819.4, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 5517312. Throughput: 0: 811.4. Samples: 5519580. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 09:52:49,076][62436] Avg episode reward: [(0, '4627.737')] [2024-12-13 09:52:54,078][62436] Fps is (10 sec: 819.0, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 5521408. Throughput: 0: 827.2. Samples: 5523996. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 09:52:54,079][62436] Avg episode reward: [(0, '4625.057')] [2024-12-13 09:52:54,088][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000010784_5521408.pth... [2024-12-13 09:52:54,100][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000010736_5496832.pth [2024-12-13 09:52:59,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 5525504. Throughput: 0: 811.0. Samples: 5525988. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:52:59,076][62436] Avg episode reward: [(0, '4584.557')] [2024-12-13 09:53:00,416][62492] Updated weights for policy 0, policy_version 10800 (0.0011) [2024-12-13 09:53:04,076][62436] Fps is (10 sec: 819.4, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 5529600. Throughput: 0: 812.1. Samples: 5531716. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:53:04,076][62436] Avg episode reward: [(0, '4598.084')] [2024-12-13 09:53:09,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 5533696. Throughput: 0: 832.7. Samples: 5536496. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 09:53:09,076][62436] Avg episode reward: [(0, '4459.876')] [2024-12-13 09:53:09,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000010808_5533696.pth... [2024-12-13 09:53:09,106][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000010760_5509120.pth [2024-12-13 09:53:14,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 5537792. Throughput: 0: 816.1. Samples: 5538272. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 09:53:14,076][62436] Avg episode reward: [(0, '4411.805')] [2024-12-13 09:53:19,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 5541888. Throughput: 0: 815.0. Samples: 5543992. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-12-13 09:53:19,076][62436] Avg episode reward: [(0, '4379.041')] [2024-12-13 09:53:24,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 5545984. Throughput: 0: 833.9. Samples: 5548984. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-12-13 09:53:24,076][62436] Avg episode reward: [(0, '4350.369')] [2024-12-13 09:53:24,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000010832_5545984.pth... [2024-12-13 09:53:24,093][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000010784_5521408.pth [2024-12-13 09:53:29,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 5550080. Throughput: 0: 818.4. Samples: 5550644. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 09:53:29,076][62436] Avg episode reward: [(0, '4282.806')] [2024-12-13 09:53:34,076][62436] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 5554176. Throughput: 0: 809.9. Samples: 5556028. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 09:53:34,077][62436] Avg episode reward: [(0, '4156.877')] [2024-12-13 09:53:39,077][62436] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 5558272. Throughput: 0: 830.1. Samples: 5561348. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 09:53:39,089][62436] Avg episode reward: [(0, '4143.755')] [2024-12-13 09:53:39,099][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000010856_5558272.pth... [2024-12-13 09:53:39,104][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000010808_5533696.pth [2024-12-13 09:53:44,076][62436] Fps is (10 sec: 819.3, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 5562368. Throughput: 0: 826.0. Samples: 5563156. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 09:53:44,076][62436] Avg episode reward: [(0, '4090.647')] [2024-12-13 09:53:49,076][62436] Fps is (10 sec: 819.3, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 5566464. Throughput: 0: 814.4. Samples: 5568364. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 09:53:49,076][62436] Avg episode reward: [(0, '4038.902')] [2024-12-13 09:53:50,012][62492] Updated weights for policy 0, policy_version 10880 (0.0012) [2024-12-13 09:53:54,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 5570560. Throughput: 0: 831.5. Samples: 5573912. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 09:53:54,076][62436] Avg episode reward: [(0, '4073.462')] [2024-12-13 09:53:54,086][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000010880_5570560.pth... [2024-12-13 09:53:54,093][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000010832_5545984.pth [2024-12-13 09:53:59,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 5574656. Throughput: 0: 830.5. Samples: 5575644. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:53:59,077][62436] Avg episode reward: [(0, '4116.376')] [2024-12-13 09:54:04,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 5578752. Throughput: 0: 812.7. Samples: 5580564. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:54:04,076][62436] Avg episode reward: [(0, '4105.938')] [2024-12-13 09:54:09,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 5582848. Throughput: 0: 828.8. Samples: 5586280. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 09:54:09,076][62436] Avg episode reward: [(0, '4134.328')] [2024-12-13 09:54:09,086][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000010904_5582848.pth... [2024-12-13 09:54:09,097][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000010856_5558272.pth [2024-12-13 09:54:14,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 5586944. Throughput: 0: 832.0. Samples: 5588084. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 09:54:14,083][62436] Avg episode reward: [(0, '4039.972')] [2024-12-13 09:54:19,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 5591040. Throughput: 0: 815.6. Samples: 5592728. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-12-13 09:54:19,076][62436] Avg episode reward: [(0, '3995.804')] [2024-12-13 09:54:24,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 5595136. Throughput: 0: 823.1. Samples: 5598384. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-12-13 09:54:24,076][62436] Avg episode reward: [(0, '3939.833')] [2024-12-13 09:54:24,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000010928_5595136.pth... [2024-12-13 09:54:24,088][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000010880_5570560.pth [2024-12-13 09:54:29,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 5599232. Throughput: 0: 831.4. Samples: 5600568. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:54:29,076][62436] Avg episode reward: [(0, '3918.258')] [2024-12-13 09:54:34,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 5603328. Throughput: 0: 812.6. Samples: 5604932. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:54:34,076][62436] Avg episode reward: [(0, '4071.294')] [2024-12-13 09:54:39,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 5607424. Throughput: 0: 815.9. Samples: 5610628. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:54:39,076][62436] Avg episode reward: [(0, '4031.779')] [2024-12-13 09:54:39,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000010952_5607424.pth... [2024-12-13 09:54:39,089][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000010904_5582848.pth [2024-12-13 09:54:39,671][62492] Updated weights for policy 0, policy_version 10960 (0.0010) [2024-12-13 09:54:44,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 5611520. Throughput: 0: 832.0. Samples: 5613084. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:54:44,076][62436] Avg episode reward: [(0, '4065.487')] [2024-12-13 09:54:49,075][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 5615616. Throughput: 0: 811.6. Samples: 5617084. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 09:54:49,076][62436] Avg episode reward: [(0, '4080.317')] [2024-12-13 09:54:54,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 5619712. Throughput: 0: 809.5. Samples: 5622708. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 09:54:54,076][62436] Avg episode reward: [(0, '4053.062')] [2024-12-13 09:54:54,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000010976_5619712.pth... [2024-12-13 09:54:54,088][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000010928_5595136.pth [2024-12-13 09:54:59,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 5623808. Throughput: 0: 829.1. Samples: 5625392. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 09:54:59,077][62436] Avg episode reward: [(0, '4092.165')] [2024-12-13 09:55:04,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 5627904. Throughput: 0: 809.3. Samples: 5629148. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 09:55:04,076][62436] Avg episode reward: [(0, '4095.031')] [2024-12-13 09:55:09,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 5632000. Throughput: 0: 808.4. Samples: 5634764. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-12-13 09:55:09,076][62436] Avg episode reward: [(0, '4096.897')] [2024-12-13 09:55:09,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000011000_5632000.pth... [2024-12-13 09:55:09,090][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000010952_5607424.pth [2024-12-13 09:55:14,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 5636096. Throughput: 0: 821.6. Samples: 5637540. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-12-13 09:55:14,076][62436] Avg episode reward: [(0, '4074.494')] [2024-12-13 09:55:19,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 5640192. Throughput: 0: 809.0. Samples: 5641336. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:55:19,076][62436] Avg episode reward: [(0, '4155.544')] [2024-12-13 09:55:24,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 5644288. Throughput: 0: 799.3. Samples: 5646596. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:55:24,076][62436] Avg episode reward: [(0, '4207.296')] [2024-12-13 09:55:24,097][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000011024_5644288.pth... [2024-12-13 09:55:24,111][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000010976_5619712.pth [2024-12-13 09:55:29,080][62436] Fps is (10 sec: 818.9, 60 sec: 819.1, 300 sec: 805.3). Total num frames: 5648384. Throughput: 0: 784.4. Samples: 5648384. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:55:29,080][62436] Avg episode reward: [(0, '4226.976')] [2024-12-13 09:55:33,530][62492] Updated weights for policy 0, policy_version 11040 (0.0016) [2024-12-13 09:55:34,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 5652480. Throughput: 0: 776.5. Samples: 5652028. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:55:34,076][62436] Avg episode reward: [(0, '4224.372')] [2024-12-13 09:55:39,076][62436] Fps is (10 sec: 819.5, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 5656576. Throughput: 0: 763.2. Samples: 5657052. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:55:39,079][62436] Avg episode reward: [(0, '4327.455')] [2024-12-13 09:55:39,085][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000011048_5656576.pth... [2024-12-13 09:55:39,090][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000011000_5632000.pth [2024-12-13 09:55:44,080][62436] Fps is (10 sec: 818.8, 60 sec: 819.1, 300 sec: 819.2). Total num frames: 5660672. Throughput: 0: 766.7. Samples: 5659896. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:55:44,081][62436] Avg episode reward: [(0, '4348.982')] [2024-12-13 09:55:49,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 5664768. Throughput: 0: 781.5. Samples: 5664316. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2024-12-13 09:55:49,076][62436] Avg episode reward: [(0, '4363.637')] [2024-12-13 09:55:54,076][62436] Fps is (10 sec: 819.6, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 5668864. Throughput: 0: 761.2. Samples: 5669016. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2024-12-13 09:55:54,076][62436] Avg episode reward: [(0, '4541.517')] [2024-12-13 09:55:54,086][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000011072_5668864.pth... [2024-12-13 09:55:54,095][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000011024_5644288.pth [2024-12-13 09:55:59,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 5672960. Throughput: 0: 763.2. Samples: 5671884. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:55:59,076][62436] Avg episode reward: [(0, '4634.167')] [2024-12-13 09:56:04,076][62436] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 5672960. Throughput: 0: 780.0. Samples: 5676436. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:56:04,076][62436] Avg episode reward: [(0, '4691.656')] [2024-12-13 09:56:09,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 5681152. Throughput: 0: 767.9. Samples: 5681152. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:56:09,076][62436] Avg episode reward: [(0, '4738.389')] [2024-12-13 09:56:09,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000011096_5681152.pth... [2024-12-13 09:56:09,088][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000011048_5656576.pth [2024-12-13 09:56:14,076][62436] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 5685248. Throughput: 0: 790.3. Samples: 5683944. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:56:14,076][62436] Avg episode reward: [(0, '4793.236')] [2024-12-13 09:56:19,076][62436] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 5685248. Throughput: 0: 813.3. Samples: 5688628. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:56:19,077][62436] Avg episode reward: [(0, '4789.984')] [2024-12-13 09:56:23,922][62492] Updated weights for policy 0, policy_version 11120 (0.0011) [2024-12-13 09:56:24,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 5693440. Throughput: 0: 804.3. Samples: 5693244. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:56:24,076][62436] Avg episode reward: [(0, '4867.308')] [2024-12-13 09:56:24,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000011120_5693440.pth... [2024-12-13 09:56:24,089][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000011072_5668864.pth [2024-12-13 09:56:29,076][62436] Fps is (10 sec: 1228.8, 60 sec: 819.3, 300 sec: 819.2). Total num frames: 5697536. Throughput: 0: 802.1. Samples: 5695988. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 09:56:29,076][62436] Avg episode reward: [(0, '4904.761')] [2024-12-13 09:56:34,076][62436] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 5697536. Throughput: 0: 811.6. Samples: 5700836. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 09:56:34,076][62436] Avg episode reward: [(0, '4885.716')] [2024-12-13 09:56:39,076][62436] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 5701632. Throughput: 0: 806.6. Samples: 5705312. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 09:56:39,076][62436] Avg episode reward: [(0, '4904.266')] [2024-12-13 09:56:39,158][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000011144_5705728.pth... [2024-12-13 09:56:39,165][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000011096_5681152.pth [2024-12-13 09:56:44,075][62436] Fps is (10 sec: 1228.8, 60 sec: 819.3, 300 sec: 819.2). Total num frames: 5709824. Throughput: 0: 805.0. Samples: 5708108. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-12-13 09:56:44,076][62436] Avg episode reward: [(0, '4969.115')] [2024-12-13 09:56:44,077][62473] Saving new best policy, reward=4969.115! [2024-12-13 09:56:49,076][62436] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 5713920. Throughput: 0: 820.2. Samples: 5713344. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-12-13 09:56:49,076][62436] Avg episode reward: [(0, '4978.702')] [2024-12-13 09:56:49,078][62473] Saving new best policy, reward=4978.702! [2024-12-13 09:56:54,076][62436] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 5713920. Throughput: 0: 807.7. Samples: 5717500. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-12-13 09:56:54,076][62436] Avg episode reward: [(0, '4990.100')] [2024-12-13 09:56:54,083][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000011160_5713920.pth... [2024-12-13 09:56:54,089][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000011120_5693440.pth [2024-12-13 09:56:54,094][62473] Saving new best policy, reward=4990.100! [2024-12-13 09:56:59,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 5722112. Throughput: 0: 807.1. Samples: 5720264. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 09:56:59,076][62436] Avg episode reward: [(0, '5098.575')] [2024-12-13 09:56:59,077][62473] Saving new best policy, reward=5098.575! [2024-12-13 09:57:04,076][62436] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 819.2). Total num frames: 5726208. Throughput: 0: 821.1. Samples: 5725576. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 09:57:04,076][62436] Avg episode reward: [(0, '5111.996')] [2024-12-13 09:57:04,077][62473] Saving new best policy, reward=5111.996! [2024-12-13 09:57:09,076][62436] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 5726208. Throughput: 0: 803.9. Samples: 5729420. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 09:57:09,076][62436] Avg episode reward: [(0, '5117.688')] [2024-12-13 09:57:09,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000011184_5726208.pth... [2024-12-13 09:57:09,090][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000011144_5705728.pth [2024-12-13 09:57:09,090][62473] Saving new best policy, reward=5117.688! [2024-12-13 09:57:14,055][62492] Updated weights for policy 0, policy_version 11200 (0.0011) [2024-12-13 09:57:14,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 5734400. Throughput: 0: 803.7. Samples: 5732156. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 09:57:14,076][62436] Avg episode reward: [(0, '5149.665')] [2024-12-13 09:57:14,077][62473] Saving new best policy, reward=5149.665! [2024-12-13 09:57:19,080][62436] Fps is (10 sec: 1228.3, 60 sec: 887.4, 300 sec: 819.2). Total num frames: 5738496. Throughput: 0: 821.5. Samples: 5737808. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:57:19,080][62436] Avg episode reward: [(0, '5102.117')] [2024-12-13 09:57:24,077][62436] Fps is (10 sec: 409.5, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 5738496. Throughput: 0: 805.8. Samples: 5741576. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:57:24,078][62436] Avg episode reward: [(0, '5098.640')] [2024-12-13 09:57:24,083][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000011208_5738496.pth... [2024-12-13 09:57:24,096][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000011160_5713920.pth [2024-12-13 09:57:29,076][62436] Fps is (10 sec: 409.8, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 5742592. Throughput: 0: 802.0. Samples: 5744196. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:57:29,076][62436] Avg episode reward: [(0, '5061.810')] [2024-12-13 09:57:34,076][62436] Fps is (10 sec: 1228.9, 60 sec: 887.5, 300 sec: 819.2). Total num frames: 5750784. Throughput: 0: 811.8. Samples: 5749876. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:57:34,077][62436] Avg episode reward: [(0, '4989.611')] [2024-12-13 09:57:39,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 5750784. Throughput: 0: 811.2. Samples: 5754004. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:57:39,076][62436] Avg episode reward: [(0, '4975.633')] [2024-12-13 09:57:39,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000011232_5750784.pth... [2024-12-13 09:57:39,101][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000011184_5726208.pth [2024-12-13 09:57:44,076][62436] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 5754880. Throughput: 0: 799.8. Samples: 5756256. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 09:57:44,076][62436] Avg episode reward: [(0, '4952.622')] [2024-12-13 09:57:49,076][62436] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 5763072. Throughput: 0: 808.3. Samples: 5761948. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:57:49,076][62436] Avg episode reward: [(0, '4955.145')] [2024-12-13 09:57:54,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 5763072. Throughput: 0: 820.5. Samples: 5766344. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:57:54,076][62436] Avg episode reward: [(0, '5005.635')] [2024-12-13 09:57:54,083][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000011256_5763072.pth... [2024-12-13 09:57:54,094][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000011208_5738496.pth [2024-12-13 09:57:59,078][62436] Fps is (10 sec: 409.5, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 5767168. Throughput: 0: 803.5. Samples: 5768316. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:57:59,078][62436] Avg episode reward: [(0, '5042.786')] [2024-12-13 09:58:04,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 5771264. Throughput: 0: 802.2. Samples: 5773904. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:58:04,076][62436] Avg episode reward: [(0, '5055.962')] [2024-12-13 09:58:04,239][62492] Updated weights for policy 0, policy_version 11280 (0.0014) [2024-12-13 09:58:09,076][62436] Fps is (10 sec: 819.4, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 5775360. Throughput: 0: 822.4. Samples: 5778584. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:58:09,077][62436] Avg episode reward: [(0, '5052.954')] [2024-12-13 09:58:09,083][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000011280_5775360.pth... [2024-12-13 09:58:09,097][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000011232_5750784.pth [2024-12-13 09:58:14,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 5779456. Throughput: 0: 804.2. Samples: 5780384. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:58:14,076][62436] Avg episode reward: [(0, '5069.978')] [2024-12-13 09:58:19,075][62436] Fps is (10 sec: 819.2, 60 sec: 751.0, 300 sec: 805.3). Total num frames: 5783552. Throughput: 0: 803.0. Samples: 5786012. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:58:19,076][62436] Avg episode reward: [(0, '5073.745')] [2024-12-13 09:58:24,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 5787648. Throughput: 0: 819.1. Samples: 5790864. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:58:24,077][62436] Avg episode reward: [(0, '5101.439')] [2024-12-13 09:58:24,083][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000011304_5787648.pth... [2024-12-13 09:58:24,096][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000011256_5763072.pth [2024-12-13 09:58:29,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 5791744. Throughput: 0: 806.4. Samples: 5792544. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:58:29,076][62436] Avg episode reward: [(0, '5147.241')] [2024-12-13 09:58:34,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 5795840. Throughput: 0: 799.7. Samples: 5797936. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:58:34,076][62436] Avg episode reward: [(0, '5178.600')] [2024-12-13 09:58:34,077][62473] Saving new best policy, reward=5178.600! [2024-12-13 09:58:39,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 5799936. Throughput: 0: 816.1. Samples: 5803068. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:58:39,076][62436] Avg episode reward: [(0, '5176.660')] [2024-12-13 09:58:39,091][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000011328_5799936.pth... [2024-12-13 09:58:39,102][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000011280_5775360.pth [2024-12-13 09:58:44,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 5804032. Throughput: 0: 810.5. Samples: 5804788. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:58:44,076][62436] Avg episode reward: [(0, '5162.148')] [2024-12-13 09:58:49,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 5808128. Throughput: 0: 805.2. Samples: 5810136. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:58:49,076][62436] Avg episode reward: [(0, '5140.437')] [2024-12-13 09:58:54,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 5812224. Throughput: 0: 819.4. Samples: 5815456. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:58:54,076][62436] Avg episode reward: [(0, '5039.009')] [2024-12-13 09:58:54,083][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000011352_5812224.pth... [2024-12-13 09:58:54,089][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000011304_5787648.pth [2024-12-13 09:58:55,267][62492] Updated weights for policy 0, policy_version 11360 (0.0010) [2024-12-13 09:58:59,079][62436] Fps is (10 sec: 818.9, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 5816320. Throughput: 0: 818.1. Samples: 5817200. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:58:59,080][62436] Avg episode reward: [(0, '5017.134')] [2024-12-13 09:59:04,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 5820416. Throughput: 0: 803.3. Samples: 5822160. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 09:59:04,076][62436] Avg episode reward: [(0, '5000.249')] [2024-12-13 09:59:09,076][62436] Fps is (10 sec: 819.5, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 5824512. Throughput: 0: 819.3. Samples: 5827732. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 09:59:09,076][62436] Avg episode reward: [(0, '5050.632')] [2024-12-13 09:59:09,080][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000011376_5824512.pth... [2024-12-13 09:59:09,088][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000011328_5799936.pth [2024-12-13 09:59:14,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 5828608. Throughput: 0: 819.9. Samples: 5829440. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 09:59:14,076][62436] Avg episode reward: [(0, '4978.477')] [2024-12-13 09:59:19,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 5832704. Throughput: 0: 806.4. Samples: 5834224. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 09:59:19,076][62436] Avg episode reward: [(0, '5002.243')] [2024-12-13 09:59:24,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 5836800. Throughput: 0: 816.7. Samples: 5839820. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:59:24,078][62436] Avg episode reward: [(0, '4982.055')] [2024-12-13 09:59:24,085][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000011400_5836800.pth... [2024-12-13 09:59:24,090][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000011352_5812224.pth [2024-12-13 09:59:29,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 5840896. Throughput: 0: 823.4. Samples: 5841840. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:59:29,076][62436] Avg episode reward: [(0, '5014.602')] [2024-12-13 09:59:34,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 5844992. Throughput: 0: 804.4. Samples: 5846332. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:59:34,076][62436] Avg episode reward: [(0, '5110.175')] [2024-12-13 09:59:39,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 5849088. Throughput: 0: 810.9. Samples: 5851948. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:59:39,076][62436] Avg episode reward: [(0, '5125.250')] [2024-12-13 09:59:39,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000011424_5849088.pth... [2024-12-13 09:59:39,090][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000011376_5824512.pth [2024-12-13 09:59:44,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 5853184. Throughput: 0: 821.6. Samples: 5854168. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:59:44,078][62436] Avg episode reward: [(0, '5119.958')] [2024-12-13 09:59:46,491][62492] Updated weights for policy 0, policy_version 11440 (0.0010) [2024-12-13 09:59:49,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 5857280. Throughput: 0: 803.5. Samples: 5858316. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 09:59:49,076][62436] Avg episode reward: [(0, '5151.668')] [2024-12-13 09:59:54,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 5861376. Throughput: 0: 803.5. Samples: 5863888. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-12-13 09:59:54,076][62436] Avg episode reward: [(0, '5124.188')] [2024-12-13 09:59:54,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000011448_5861376.pth... [2024-12-13 09:59:54,090][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000011400_5836800.pth [2024-12-13 09:59:59,076][62436] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 5865472. Throughput: 0: 822.1. Samples: 5866436. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-12-13 09:59:59,077][62436] Avg episode reward: [(0, '5135.349')] [2024-12-13 10:00:04,076][62436] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 5869568. Throughput: 0: 798.5. Samples: 5870156. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:00:04,077][62436] Avg episode reward: [(0, '5143.500')] [2024-12-13 10:00:09,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 5873664. Throughput: 0: 770.6. Samples: 5874496. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:00:09,078][62436] Avg episode reward: [(0, '5127.306')] [2024-12-13 10:00:09,091][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000011472_5873664.pth... [2024-12-13 10:00:09,098][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000011424_5849088.pth [2024-12-13 10:00:14,076][62436] Fps is (10 sec: 819.3, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 5877760. Throughput: 0: 770.1. Samples: 5876496. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 10:00:14,076][62436] Avg episode reward: [(0, '5168.931')] [2024-12-13 10:00:19,076][62436] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 5877760. Throughput: 0: 753.4. Samples: 5880236. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 10:00:19,076][62436] Avg episode reward: [(0, '5172.618')] [2024-12-13 10:00:24,076][62436] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 5881856. Throughput: 0: 747.7. Samples: 5885596. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 10:00:24,077][62436] Avg episode reward: [(0, '5172.893')] [2024-12-13 10:00:24,146][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000011496_5885952.pth... [2024-12-13 10:00:24,159][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000011448_5861376.pth [2024-12-13 10:00:29,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 5885952. Throughput: 0: 758.2. Samples: 5888288. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 10:00:29,076][62436] Avg episode reward: [(0, '5138.887')] [2024-12-13 10:00:34,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 5890048. Throughput: 0: 753.4. Samples: 5892220. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 10:00:34,076][62436] Avg episode reward: [(0, '5118.595')] [2024-12-13 10:00:39,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 5894144. Throughput: 0: 743.8. Samples: 5897360. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:00:39,076][62436] Avg episode reward: [(0, '5090.129')] [2024-12-13 10:00:39,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000011512_5894144.pth... [2024-12-13 10:00:39,092][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000011472_5873664.pth [2024-12-13 10:00:39,702][62492] Updated weights for policy 0, policy_version 11520 (0.0010) [2024-12-13 10:00:44,075][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 5898240. Throughput: 0: 747.1. Samples: 5900056. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:00:44,076][62436] Avg episode reward: [(0, '5049.815')] [2024-12-13 10:00:49,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 5902336. Throughput: 0: 759.5. Samples: 5904332. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:00:49,076][62436] Avg episode reward: [(0, '5011.795')] [2024-12-13 10:00:54,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 5906432. Throughput: 0: 770.7. Samples: 5909176. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:00:54,076][62436] Avg episode reward: [(0, '5053.042')] [2024-12-13 10:00:54,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000011536_5906432.pth... [2024-12-13 10:00:54,090][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000011496_5885952.pth [2024-12-13 10:00:59,077][62436] Fps is (10 sec: 819.1, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 5910528. Throughput: 0: 783.9. Samples: 5911772. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-12-13 10:00:59,078][62436] Avg episode reward: [(0, '5048.587')] [2024-12-13 10:01:04,078][62436] Fps is (10 sec: 819.0, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 5914624. Throughput: 0: 804.1. Samples: 5916424. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-12-13 10:01:04,079][62436] Avg episode reward: [(0, '5088.549')] [2024-12-13 10:01:09,076][62436] Fps is (10 sec: 819.3, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 5918720. Throughput: 0: 788.5. Samples: 5921080. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:01:09,076][62436] Avg episode reward: [(0, '5123.632')] [2024-12-13 10:01:09,086][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000011560_5918720.pth... [2024-12-13 10:01:09,091][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000011512_5894144.pth [2024-12-13 10:01:14,075][62436] Fps is (10 sec: 819.4, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 5922816. Throughput: 0: 787.9. Samples: 5923744. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:01:14,076][62436] Avg episode reward: [(0, '5080.473')] [2024-12-13 10:01:19,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 5926912. Throughput: 0: 811.0. Samples: 5928716. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 10:01:19,080][62436] Avg episode reward: [(0, '5055.844')] [2024-12-13 10:01:24,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 5931008. Throughput: 0: 791.6. Samples: 5932984. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 10:01:24,076][62436] Avg episode reward: [(0, '5026.688')] [2024-12-13 10:01:24,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000011584_5931008.pth... [2024-12-13 10:01:24,088][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000011536_5906432.pth [2024-12-13 10:01:29,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 5935104. Throughput: 0: 793.0. Samples: 5935740. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 10:01:29,076][62436] Avg episode reward: [(0, '4928.519')] [2024-12-13 10:01:30,308][62492] Updated weights for policy 0, policy_version 11600 (0.0010) [2024-12-13 10:01:34,077][62436] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 5939200. Throughput: 0: 815.6. Samples: 5941036. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 10:01:34,078][62436] Avg episode reward: [(0, '4909.890')] [2024-12-13 10:01:39,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 5943296. Throughput: 0: 799.3. Samples: 5945144. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:01:39,076][62436] Avg episode reward: [(0, '4870.438')] [2024-12-13 10:01:39,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000011608_5943296.pth... [2024-12-13 10:01:39,088][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000011560_5918720.pth [2024-12-13 10:01:44,076][62436] Fps is (10 sec: 819.3, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 5947392. Throughput: 0: 801.9. Samples: 5947856. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:01:44,076][62436] Avg episode reward: [(0, '4734.456')] [2024-12-13 10:01:49,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 5951488. Throughput: 0: 823.7. Samples: 5953488. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 10:01:49,076][62436] Avg episode reward: [(0, '4686.477')] [2024-12-13 10:01:54,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 5955584. Throughput: 0: 803.0. Samples: 5957216. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 10:01:54,076][62436] Avg episode reward: [(0, '4556.443')] [2024-12-13 10:01:54,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000011632_5955584.pth... [2024-12-13 10:01:54,087][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000011584_5931008.pth [2024-12-13 10:01:59,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 5959680. Throughput: 0: 804.2. Samples: 5959932. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 10:01:59,076][62436] Avg episode reward: [(0, '4419.374')] [2024-12-13 10:02:04,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 5963776. Throughput: 0: 818.9. Samples: 5965568. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 10:02:04,076][62436] Avg episode reward: [(0, '4332.987')] [2024-12-13 10:02:09,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 5967872. Throughput: 0: 808.3. Samples: 5969356. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 10:02:09,076][62436] Avg episode reward: [(0, '4366.776')] [2024-12-13 10:02:09,083][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000011656_5967872.pth... [2024-12-13 10:02:09,089][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000011608_5943296.pth [2024-12-13 10:02:14,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 5971968. Throughput: 0: 805.1. Samples: 5971968. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 10:02:14,076][62436] Avg episode reward: [(0, '4332.391')] [2024-12-13 10:02:19,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 5976064. Throughput: 0: 814.5. Samples: 5977688. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:02:19,076][62436] Avg episode reward: [(0, '4324.807')] [2024-12-13 10:02:20,994][62492] Updated weights for policy 0, policy_version 11680 (0.0010) [2024-12-13 10:02:24,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 5980160. Throughput: 0: 813.9. Samples: 5981768. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:02:24,076][62436] Avg episode reward: [(0, '4318.669')] [2024-12-13 10:02:24,090][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000011680_5980160.pth... [2024-12-13 10:02:24,099][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000011632_5955584.pth [2024-12-13 10:02:29,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 5984256. Throughput: 0: 805.1. Samples: 5984084. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2024-12-13 10:02:29,076][62436] Avg episode reward: [(0, '4349.804')] [2024-12-13 10:02:34,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 5988352. Throughput: 0: 801.3. Samples: 5989548. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2024-12-13 10:02:34,076][62436] Avg episode reward: [(0, '4307.411')] [2024-12-13 10:02:39,079][62436] Fps is (10 sec: 818.9, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 5992448. Throughput: 0: 815.5. Samples: 5993916. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:02:39,079][62436] Avg episode reward: [(0, '4286.422')] [2024-12-13 10:02:39,093][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000011704_5992448.pth... [2024-12-13 10:02:39,110][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000011656_5967872.pth [2024-12-13 10:02:44,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 5996544. Throughput: 0: 802.0. Samples: 5996020. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:02:44,076][62436] Avg episode reward: [(0, '4283.741')] [2024-12-13 10:02:49,076][62436] Fps is (10 sec: 819.5, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 6000640. Throughput: 0: 800.4. Samples: 6001584. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-12-13 10:02:49,076][62436] Avg episode reward: [(0, '4215.366')] [2024-12-13 10:02:54,078][62436] Fps is (10 sec: 819.0, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 6004736. Throughput: 0: 819.5. Samples: 6006236. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-12-13 10:02:54,079][62436] Avg episode reward: [(0, '4254.108')] [2024-12-13 10:02:54,094][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000011728_6004736.pth... [2024-12-13 10:02:54,107][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000011680_5980160.pth [2024-12-13 10:02:59,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 6008832. Throughput: 0: 803.0. Samples: 6008104. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:02:59,076][62436] Avg episode reward: [(0, '4236.450')] [2024-12-13 10:03:04,076][62436] Fps is (10 sec: 819.4, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 6012928. Throughput: 0: 793.8. Samples: 6013408. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:03:04,076][62436] Avg episode reward: [(0, '4260.712')] [2024-12-13 10:03:09,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 6017024. Throughput: 0: 813.4. Samples: 6018372. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 10:03:09,076][62436] Avg episode reward: [(0, '4230.526')] [2024-12-13 10:03:09,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000011752_6017024.pth... [2024-12-13 10:03:09,088][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000011704_5992448.pth [2024-12-13 10:03:13,107][62492] Updated weights for policy 0, policy_version 11760 (0.0013) [2024-12-13 10:03:14,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 6021120. Throughput: 0: 803.5. Samples: 6020240. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 10:03:14,076][62436] Avg episode reward: [(0, '4227.822')] [2024-12-13 10:03:19,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 6025216. Throughput: 0: 794.0. Samples: 6025280. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-12-13 10:03:19,076][62436] Avg episode reward: [(0, '4472.817')] [2024-12-13 10:03:24,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 6029312. Throughput: 0: 814.7. Samples: 6030576. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-12-13 10:03:24,076][62436] Avg episode reward: [(0, '4421.432')] [2024-12-13 10:03:24,094][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000011776_6029312.pth... [2024-12-13 10:03:24,107][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000011728_6004736.pth [2024-12-13 10:03:29,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 6033408. Throughput: 0: 808.8. Samples: 6032416. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:03:29,076][62436] Avg episode reward: [(0, '4531.092')] [2024-12-13 10:03:34,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 6037504. Throughput: 0: 790.5. Samples: 6037156. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:03:34,076][62436] Avg episode reward: [(0, '4585.996')] [2024-12-13 10:03:39,077][62436] Fps is (10 sec: 819.0, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 6041600. Throughput: 0: 808.5. Samples: 6042616. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:03:39,081][62436] Avg episode reward: [(0, '4604.967')] [2024-12-13 10:03:39,089][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000011800_6041600.pth... [2024-12-13 10:03:39,097][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000011752_6017024.pth [2024-12-13 10:03:44,079][62436] Fps is (10 sec: 818.9, 60 sec: 819.1, 300 sec: 805.3). Total num frames: 6045696. Throughput: 0: 811.8. Samples: 6044636. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:03:44,080][62436] Avg episode reward: [(0, '4667.889')] [2024-12-13 10:03:49,076][62436] Fps is (10 sec: 819.4, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 6049792. Throughput: 0: 796.4. Samples: 6049244. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:03:49,076][62436] Avg episode reward: [(0, '4668.715')] [2024-12-13 10:03:54,078][62436] Fps is (10 sec: 819.3, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 6053888. Throughput: 0: 806.2. Samples: 6054652. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:03:54,079][62436] Avg episode reward: [(0, '4756.518')] [2024-12-13 10:03:54,085][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000011824_6053888.pth... [2024-12-13 10:03:54,095][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000011776_6029312.pth [2024-12-13 10:03:59,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 6057984. Throughput: 0: 814.4. Samples: 6056888. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:03:59,076][62436] Avg episode reward: [(0, '4773.082')] [2024-12-13 10:04:03,812][62492] Updated weights for policy 0, policy_version 11840 (0.0012) [2024-12-13 10:04:04,076][62436] Fps is (10 sec: 819.4, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 6062080. Throughput: 0: 796.2. Samples: 6061108. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:04:04,076][62436] Avg episode reward: [(0, '4788.153')] [2024-12-13 10:04:09,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 6066176. Throughput: 0: 800.4. Samples: 6066592. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:04:09,076][62436] Avg episode reward: [(0, '4852.139')] [2024-12-13 10:04:09,080][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000011848_6066176.pth... [2024-12-13 10:04:09,086][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000011800_6041600.pth [2024-12-13 10:04:14,077][62436] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 6070272. Throughput: 0: 815.1. Samples: 6069096. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:04:14,078][62436] Avg episode reward: [(0, '4829.549')] [2024-12-13 10:04:19,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 6074368. Throughput: 0: 800.8. Samples: 6073192. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:04:19,076][62436] Avg episode reward: [(0, '4825.851')] [2024-12-13 10:04:24,077][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 6078464. Throughput: 0: 802.8. Samples: 6078744. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:04:24,078][62436] Avg episode reward: [(0, '4838.848')] [2024-12-13 10:04:24,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000011872_6078464.pth... [2024-12-13 10:04:24,088][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000011824_6053888.pth [2024-12-13 10:04:29,081][62436] Fps is (10 sec: 818.7, 60 sec: 819.1, 300 sec: 805.3). Total num frames: 6082560. Throughput: 0: 818.1. Samples: 6081452. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 10:04:29,082][62436] Avg episode reward: [(0, '4939.312')] [2024-12-13 10:04:34,076][62436] Fps is (10 sec: 409.7, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 6082560. Throughput: 0: 799.6. Samples: 6085228. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 10:04:34,076][62436] Avg episode reward: [(0, '4979.666')] [2024-12-13 10:04:39,076][62436] Fps is (10 sec: 819.7, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 6090752. Throughput: 0: 802.3. Samples: 6090752. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 10:04:39,076][62436] Avg episode reward: [(0, '4945.814')] [2024-12-13 10:04:39,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000011896_6090752.pth... [2024-12-13 10:04:39,089][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000011848_6066176.pth [2024-12-13 10:04:44,076][62436] Fps is (10 sec: 1228.8, 60 sec: 819.3, 300 sec: 805.3). Total num frames: 6094848. Throughput: 0: 813.0. Samples: 6093472. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 10:04:44,076][62436] Avg episode reward: [(0, '4900.020')] [2024-12-13 10:04:49,076][62436] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 6094848. Throughput: 0: 800.4. Samples: 6097124. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 10:04:49,076][62436] Avg episode reward: [(0, '4886.833')] [2024-12-13 10:04:54,076][62436] Fps is (10 sec: 409.6, 60 sec: 751.0, 300 sec: 791.4). Total num frames: 6098944. Throughput: 0: 767.3. Samples: 6101120. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:04:54,077][62436] Avg episode reward: [(0, '4794.904')] [2024-12-13 10:04:54,084][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000011912_6098944.pth... [2024-12-13 10:04:54,091][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000011872_6078464.pth [2024-12-13 10:04:56,325][62492] Updated weights for policy 0, policy_version 11920 (0.0018) [2024-12-13 10:04:59,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 6103040. Throughput: 0: 760.1. Samples: 6103300. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:04:59,076][62436] Avg episode reward: [(0, '4679.082')] [2024-12-13 10:05:04,076][62436] Fps is (10 sec: 819.3, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 6107136. Throughput: 0: 754.8. Samples: 6107160. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:05:04,076][62436] Avg episode reward: [(0, '4643.392')] [2024-12-13 10:05:09,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 6111232. Throughput: 0: 746.6. Samples: 6112340. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:05:09,076][62436] Avg episode reward: [(0, '4663.790')] [2024-12-13 10:05:09,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000011936_6111232.pth... [2024-12-13 10:05:09,092][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000011896_6090752.pth [2024-12-13 10:05:14,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 6115328. Throughput: 0: 750.5. Samples: 6115220. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-12-13 10:05:14,076][62436] Avg episode reward: [(0, '4511.613')] [2024-12-13 10:05:19,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 6119424. Throughput: 0: 759.9. Samples: 6119424. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-12-13 10:05:19,088][62436] Avg episode reward: [(0, '4503.659')] [2024-12-13 10:05:24,076][62436] Fps is (10 sec: 819.2, 60 sec: 751.0, 300 sec: 805.3). Total num frames: 6123520. Throughput: 0: 747.5. Samples: 6124388. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-12-13 10:05:24,076][62436] Avg episode reward: [(0, '4515.359')] [2024-12-13 10:05:24,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000011960_6123520.pth... [2024-12-13 10:05:24,090][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000011912_6098944.pth [2024-12-13 10:05:29,076][62436] Fps is (10 sec: 819.2, 60 sec: 751.0, 300 sec: 805.3). Total num frames: 6127616. Throughput: 0: 748.9. Samples: 6127172. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-12-13 10:05:29,076][62436] Avg episode reward: [(0, '4481.227')] [2024-12-13 10:05:34,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 6131712. Throughput: 0: 768.1. Samples: 6131688. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:05:34,076][62436] Avg episode reward: [(0, '4459.006')] [2024-12-13 10:05:39,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 6135808. Throughput: 0: 781.3. Samples: 6136276. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:05:39,076][62436] Avg episode reward: [(0, '4441.371')] [2024-12-13 10:05:39,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000011984_6135808.pth... [2024-12-13 10:05:39,090][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000011936_6111232.pth [2024-12-13 10:05:44,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 6139904. Throughput: 0: 797.2. Samples: 6139172. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 10:05:44,076][62436] Avg episode reward: [(0, '4443.808')] [2024-12-13 10:05:48,044][62492] Updated weights for policy 0, policy_version 12000 (0.0011) [2024-12-13 10:05:49,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 6144000. Throughput: 0: 816.7. Samples: 6143912. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 10:05:49,081][62436] Avg episode reward: [(0, '4497.899')] [2024-12-13 10:05:54,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 6148096. Throughput: 0: 796.8. Samples: 6148196. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:05:54,076][62436] Avg episode reward: [(0, '4503.884')] [2024-12-13 10:05:54,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000012008_6148096.pth... [2024-12-13 10:05:54,096][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000011960_6123520.pth [2024-12-13 10:05:59,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 6152192. Throughput: 0: 796.4. Samples: 6151060. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:05:59,076][62436] Avg episode reward: [(0, '4547.369')] [2024-12-13 10:06:04,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 6156288. Throughput: 0: 817.0. Samples: 6156188. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:06:04,076][62436] Avg episode reward: [(0, '4553.985')] [2024-12-13 10:06:09,077][62436] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 6160384. Throughput: 0: 797.5. Samples: 6160276. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:06:09,078][62436] Avg episode reward: [(0, '4523.036')] [2024-12-13 10:06:09,083][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000012032_6160384.pth... [2024-12-13 10:06:09,089][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000011984_6135808.pth [2024-12-13 10:06:14,075][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 6164480. Throughput: 0: 796.1. Samples: 6162996. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-12-13 10:06:14,076][62436] Avg episode reward: [(0, '4591.445')] [2024-12-13 10:06:19,076][62436] Fps is (10 sec: 819.3, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 6168576. Throughput: 0: 815.7. Samples: 6168396. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-12-13 10:06:19,076][62436] Avg episode reward: [(0, '4583.371')] [2024-12-13 10:06:24,076][62436] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 6168576. Throughput: 0: 800.7. Samples: 6172308. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-12-13 10:06:24,077][62436] Avg episode reward: [(0, '4514.814')] [2024-12-13 10:06:24,129][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000012056_6172672.pth... [2024-12-13 10:06:24,135][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000012008_6148096.pth [2024-12-13 10:06:29,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 6176768. Throughput: 0: 796.0. Samples: 6174992. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 10:06:29,076][62436] Avg episode reward: [(0, '4588.803')] [2024-12-13 10:06:34,076][62436] Fps is (10 sec: 1228.7, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 6180864. Throughput: 0: 814.4. Samples: 6180560. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:06:34,077][62436] Avg episode reward: [(0, '4621.249')] [2024-12-13 10:06:39,076][62436] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 6180864. Throughput: 0: 801.9. Samples: 6184280. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:06:39,076][62436] Avg episode reward: [(0, '4623.121')] [2024-12-13 10:06:39,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000012072_6180864.pth... [2024-12-13 10:06:39,086][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000012032_6160384.pth [2024-12-13 10:06:39,527][62492] Updated weights for policy 0, policy_version 12080 (0.0021) [2024-12-13 10:06:44,076][62436] Fps is (10 sec: 819.3, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 6189056. Throughput: 0: 797.4. Samples: 6186944. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:06:44,076][62436] Avg episode reward: [(0, '4676.537')] [2024-12-13 10:06:49,077][62436] Fps is (10 sec: 1228.6, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 6193152. Throughput: 0: 809.0. Samples: 6192596. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:06:49,078][62436] Avg episode reward: [(0, '4687.225')] [2024-12-13 10:06:54,076][62436] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 6193152. Throughput: 0: 808.0. Samples: 6196636. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:06:54,077][62436] Avg episode reward: [(0, '4687.225')] [2024-12-13 10:06:54,083][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000012096_6193152.pth... [2024-12-13 10:06:54,090][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000012056_6172672.pth [2024-12-13 10:06:59,076][62436] Fps is (10 sec: 409.7, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 6197248. Throughput: 0: 799.5. Samples: 6198972. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:06:59,076][62436] Avg episode reward: [(0, '4769.615')] [2024-12-13 10:07:04,076][62436] Fps is (10 sec: 1228.9, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 6205440. Throughput: 0: 801.5. Samples: 6204464. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:07:04,076][62436] Avg episode reward: [(0, '4792.368')] [2024-12-13 10:07:09,076][62436] Fps is (10 sec: 819.2, 60 sec: 751.0, 300 sec: 791.4). Total num frames: 6205440. Throughput: 0: 813.2. Samples: 6208900. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:07:09,076][62436] Avg episode reward: [(0, '4839.000')] [2024-12-13 10:07:09,088][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000012120_6205440.pth... [2024-12-13 10:07:09,094][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000012072_6180864.pth [2024-12-13 10:07:14,076][62436] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 6209536. Throughput: 0: 798.7. Samples: 6210932. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-12-13 10:07:14,076][62436] Avg episode reward: [(0, '4799.093')] [2024-12-13 10:07:19,075][62436] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 6217728. Throughput: 0: 798.9. Samples: 6216508. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:07:19,076][62436] Avg episode reward: [(0, '4835.149')] [2024-12-13 10:07:24,081][62436] Fps is (10 sec: 818.7, 60 sec: 819.1, 300 sec: 791.4). Total num frames: 6217728. Throughput: 0: 818.9. Samples: 6221136. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:07:24,082][62436] Avg episode reward: [(0, '4892.819')] [2024-12-13 10:07:24,090][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000012144_6217728.pth... [2024-12-13 10:07:24,097][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000012096_6193152.pth [2024-12-13 10:07:29,076][62436] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 6221824. Throughput: 0: 797.5. Samples: 6222832. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:07:29,076][62436] Avg episode reward: [(0, '4855.915')] [2024-12-13 10:07:29,988][62492] Updated weights for policy 0, policy_version 12160 (0.0018) [2024-12-13 10:07:34,076][62436] Fps is (10 sec: 819.7, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 6225920. Throughput: 0: 795.3. Samples: 6228384. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 10:07:34,076][62436] Avg episode reward: [(0, '4826.339')] [2024-12-13 10:07:39,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 6230016. Throughput: 0: 812.3. Samples: 6233188. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 10:07:39,076][62436] Avg episode reward: [(0, '4826.331')] [2024-12-13 10:07:39,085][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000012168_6230016.pth... [2024-12-13 10:07:39,093][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000012120_6205440.pth [2024-12-13 10:07:44,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 6234112. Throughput: 0: 796.5. Samples: 6234816. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:07:44,076][62436] Avg episode reward: [(0, '4819.976')] [2024-12-13 10:07:49,078][62436] Fps is (10 sec: 819.0, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 6238208. Throughput: 0: 792.4. Samples: 6240124. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:07:49,079][62436] Avg episode reward: [(0, '4769.031')] [2024-12-13 10:07:54,082][62436] Fps is (10 sec: 818.7, 60 sec: 819.1, 300 sec: 791.4). Total num frames: 6242304. Throughput: 0: 809.6. Samples: 6245336. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:07:54,083][62436] Avg episode reward: [(0, '4744.178')] [2024-12-13 10:07:54,091][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000012192_6242304.pth... [2024-12-13 10:07:54,096][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000012144_6217728.pth [2024-12-13 10:07:59,076][62436] Fps is (10 sec: 819.4, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 6246400. Throughput: 0: 802.8. Samples: 6247056. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:07:59,076][62436] Avg episode reward: [(0, '4820.096')] [2024-12-13 10:08:04,076][62436] Fps is (10 sec: 819.8, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 6250496. Throughput: 0: 790.6. Samples: 6252084. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:08:04,076][62436] Avg episode reward: [(0, '4741.166')] [2024-12-13 10:08:09,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 6254592. Throughput: 0: 810.6. Samples: 6257608. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:08:09,076][62436] Avg episode reward: [(0, '4686.098')] [2024-12-13 10:08:09,085][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000012216_6254592.pth... [2024-12-13 10:08:09,098][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000012168_6230016.pth [2024-12-13 10:08:14,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 6258688. Throughput: 0: 810.2. Samples: 6259292. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:08:14,076][62436] Avg episode reward: [(0, '4686.097')] [2024-12-13 10:08:19,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 6262784. Throughput: 0: 793.0. Samples: 6264068. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:08:19,076][62436] Avg episode reward: [(0, '4673.742')] [2024-12-13 10:08:20,689][62492] Updated weights for policy 0, policy_version 12240 (0.0010) [2024-12-13 10:08:24,076][62436] Fps is (10 sec: 819.1, 60 sec: 819.3, 300 sec: 791.4). Total num frames: 6266880. Throughput: 0: 811.2. Samples: 6269692. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 10:08:24,077][62436] Avg episode reward: [(0, '4647.869')] [2024-12-13 10:08:24,087][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000012240_6266880.pth... [2024-12-13 10:08:24,095][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000012192_6242304.pth [2024-12-13 10:08:29,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 6270976. Throughput: 0: 812.9. Samples: 6271396. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 10:08:29,076][62436] Avg episode reward: [(0, '4661.251')] [2024-12-13 10:08:34,076][62436] Fps is (10 sec: 819.3, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 6275072. Throughput: 0: 798.4. Samples: 6276048. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:08:34,076][62436] Avg episode reward: [(0, '4584.425')] [2024-12-13 10:08:39,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 6279168. Throughput: 0: 807.7. Samples: 6281676. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:08:39,078][62436] Avg episode reward: [(0, '4572.888')] [2024-12-13 10:08:39,084][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000012264_6279168.pth... [2024-12-13 10:08:39,090][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000012216_6254592.pth [2024-12-13 10:08:44,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 6283264. Throughput: 0: 811.6. Samples: 6283576. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 10:08:44,076][62436] Avg episode reward: [(0, '4552.263')] [2024-12-13 10:08:49,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 6287360. Throughput: 0: 796.6. Samples: 6287932. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 10:08:49,076][62436] Avg episode reward: [(0, '4561.760')] [2024-12-13 10:08:54,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.3, 300 sec: 791.4). Total num frames: 6291456. Throughput: 0: 795.6. Samples: 6293408. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:08:54,076][62436] Avg episode reward: [(0, '4539.637')] [2024-12-13 10:08:54,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000012288_6291456.pth... [2024-12-13 10:08:54,090][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000012240_6266880.pth [2024-12-13 10:08:59,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 6295552. Throughput: 0: 807.7. Samples: 6295640. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:08:59,079][62436] Avg episode reward: [(0, '4521.379')] [2024-12-13 10:09:04,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 6299648. Throughput: 0: 790.7. Samples: 6299648. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:09:04,076][62436] Avg episode reward: [(0, '4484.238')] [2024-12-13 10:09:09,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 6303744. Throughput: 0: 788.2. Samples: 6305160. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:09:09,076][62436] Avg episode reward: [(0, '4534.186')] [2024-12-13 10:09:09,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000012312_6303744.pth... [2024-12-13 10:09:09,088][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000012264_6279168.pth [2024-12-13 10:09:11,901][62492] Updated weights for policy 0, policy_version 12320 (0.0011) [2024-12-13 10:09:14,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 6307840. Throughput: 0: 808.7. Samples: 6307788. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:09:14,076][62436] Avg episode reward: [(0, '4498.712')] [2024-12-13 10:09:19,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 6311936. Throughput: 0: 790.2. Samples: 6311608. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:09:19,076][62436] Avg episode reward: [(0, '4505.951')] [2024-12-13 10:09:24,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 6316032. Throughput: 0: 785.2. Samples: 6317008. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:09:24,076][62436] Avg episode reward: [(0, '4625.011')] [2024-12-13 10:09:24,083][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000012336_6316032.pth... [2024-12-13 10:09:24,089][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000012288_6291456.pth [2024-12-13 10:09:29,077][62436] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 6320128. Throughput: 0: 803.3. Samples: 6319724. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:09:29,081][62436] Avg episode reward: [(0, '4540.813')] [2024-12-13 10:09:34,076][62436] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 6320128. Throughput: 0: 776.7. Samples: 6322884. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:09:34,080][62436] Avg episode reward: [(0, '4548.742')] [2024-12-13 10:09:39,076][62436] Fps is (10 sec: 409.7, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 6324224. Throughput: 0: 744.8. Samples: 6326924. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:09:39,076][62436] Avg episode reward: [(0, '4608.604')] [2024-12-13 10:09:39,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000012352_6324224.pth... [2024-12-13 10:09:39,097][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000012312_6303744.pth [2024-12-13 10:09:44,076][62436] Fps is (10 sec: 819.1, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 6328320. Throughput: 0: 754.9. Samples: 6329612. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:09:44,077][62436] Avg episode reward: [(0, '4478.748')] [2024-12-13 10:09:49,077][62436] Fps is (10 sec: 819.1, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 6332416. Throughput: 0: 765.7. Samples: 6334104. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:09:49,078][62436] Avg episode reward: [(0, '4469.108')] [2024-12-13 10:09:54,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 6336512. Throughput: 0: 747.8. Samples: 6338812. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:09:54,076][62436] Avg episode reward: [(0, '4532.826')] [2024-12-13 10:09:54,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000012376_6336512.pth... [2024-12-13 10:09:54,091][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000012336_6316032.pth [2024-12-13 10:09:59,076][62436] Fps is (10 sec: 819.4, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 6340608. Throughput: 0: 748.1. Samples: 6341452. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:09:59,076][62436] Avg episode reward: [(0, '4591.213')] [2024-12-13 10:10:04,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 6344704. Throughput: 0: 769.4. Samples: 6346232. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:10:04,076][62436] Avg episode reward: [(0, '4615.396')] [2024-12-13 10:10:06,769][62492] Updated weights for policy 0, policy_version 12400 (0.0012) [2024-12-13 10:10:09,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 6348800. Throughput: 0: 745.3. Samples: 6350548. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-12-13 10:10:09,076][62436] Avg episode reward: [(0, '4702.020')] [2024-12-13 10:10:09,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000012400_6348800.pth... [2024-12-13 10:10:09,089][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000012352_6324224.pth [2024-12-13 10:10:14,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 6352896. Throughput: 0: 743.5. Samples: 6353180. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-12-13 10:10:14,076][62436] Avg episode reward: [(0, '4632.256')] [2024-12-13 10:10:19,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 6356992. Throughput: 0: 787.0. Samples: 6358300. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-12-13 10:10:19,076][62436] Avg episode reward: [(0, '4656.645')] [2024-12-13 10:10:24,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 6361088. Throughput: 0: 785.6. Samples: 6362276. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2024-12-13 10:10:24,076][62436] Avg episode reward: [(0, '4689.829')] [2024-12-13 10:10:24,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000012424_6361088.pth... [2024-12-13 10:10:24,090][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000012376_6336512.pth [2024-12-13 10:10:29,076][62436] Fps is (10 sec: 819.2, 60 sec: 751.0, 300 sec: 791.4). Total num frames: 6365184. Throughput: 0: 788.7. Samples: 6365104. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2024-12-13 10:10:29,076][62436] Avg episode reward: [(0, '4687.370')] [2024-12-13 10:10:34,080][62436] Fps is (10 sec: 818.8, 60 sec: 819.1, 300 sec: 791.4). Total num frames: 6369280. Throughput: 0: 807.1. Samples: 6370424. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2024-12-13 10:10:34,081][62436] Avg episode reward: [(0, '4739.464')] [2024-12-13 10:10:39,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 6373376. Throughput: 0: 787.2. Samples: 6374236. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 10:10:39,076][62436] Avg episode reward: [(0, '4786.154')] [2024-12-13 10:10:39,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000012448_6373376.pth... [2024-12-13 10:10:39,088][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000012400_6348800.pth [2024-12-13 10:10:44,076][62436] Fps is (10 sec: 819.6, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 6377472. Throughput: 0: 790.8. Samples: 6377036. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 10:10:44,076][62436] Avg episode reward: [(0, '4853.920')] [2024-12-13 10:10:49,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 6381568. Throughput: 0: 803.4. Samples: 6382384. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 10:10:49,076][62436] Avg episode reward: [(0, '4836.769')] [2024-12-13 10:10:54,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 6385664. Throughput: 0: 793.1. Samples: 6386236. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 10:10:54,078][62436] Avg episode reward: [(0, '4868.561')] [2024-12-13 10:10:54,085][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000012472_6385664.pth... [2024-12-13 10:10:54,095][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000012424_6361088.pth [2024-12-13 10:10:57,785][62492] Updated weights for policy 0, policy_version 12480 (0.0012) [2024-12-13 10:10:59,075][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 6389760. Throughput: 0: 793.0. Samples: 6388864. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:10:59,076][62436] Avg episode reward: [(0, '4809.057')] [2024-12-13 10:11:04,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 6393856. Throughput: 0: 798.1. Samples: 6394216. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:11:04,076][62436] Avg episode reward: [(0, '4838.048')] [2024-12-13 10:11:09,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 6397952. Throughput: 0: 802.2. Samples: 6398376. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:11:09,086][62436] Avg episode reward: [(0, '4828.824')] [2024-12-13 10:11:09,102][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000012496_6397952.pth... [2024-12-13 10:11:09,114][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000012448_6373376.pth [2024-12-13 10:11:14,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 6402048. Throughput: 0: 792.0. Samples: 6400744. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:11:14,076][62436] Avg episode reward: [(0, '4863.030')] [2024-12-13 10:11:19,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 6406144. Throughput: 0: 794.2. Samples: 6406160. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 10:11:19,076][62436] Avg episode reward: [(0, '4861.075')] [2024-12-13 10:11:24,077][62436] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 6410240. Throughput: 0: 808.6. Samples: 6410624. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 10:11:24,079][62436] Avg episode reward: [(0, '5060.235')] [2024-12-13 10:11:24,090][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000012520_6410240.pth... [2024-12-13 10:11:24,105][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000012472_6385664.pth [2024-12-13 10:11:29,077][62436] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 6414336. Throughput: 0: 793.7. Samples: 6412752. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 10:11:29,077][62436] Avg episode reward: [(0, '5070.044')] [2024-12-13 10:11:34,076][62436] Fps is (10 sec: 819.4, 60 sec: 819.3, 300 sec: 805.3). Total num frames: 6418432. Throughput: 0: 798.0. Samples: 6418292. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 10:11:34,076][62436] Avg episode reward: [(0, '5085.006')] [2024-12-13 10:11:39,076][62436] Fps is (10 sec: 819.3, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 6422528. Throughput: 0: 812.5. Samples: 6422800. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:11:39,076][62436] Avg episode reward: [(0, '5085.036')] [2024-12-13 10:11:39,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000012544_6422528.pth... [2024-12-13 10:11:39,090][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000012496_6397952.pth [2024-12-13 10:11:44,077][62436] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 6426624. Throughput: 0: 796.4. Samples: 6424704. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:11:44,078][62436] Avg episode reward: [(0, '5135.235')] [2024-12-13 10:11:48,293][62492] Updated weights for policy 0, policy_version 12560 (0.0010) [2024-12-13 10:11:49,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 6430720. Throughput: 0: 800.4. Samples: 6430232. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:11:49,076][62436] Avg episode reward: [(0, '5137.003')] [2024-12-13 10:11:54,076][62436] Fps is (10 sec: 819.3, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 6434816. Throughput: 0: 814.3. Samples: 6435020. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:11:54,076][62436] Avg episode reward: [(0, '5074.812')] [2024-12-13 10:11:54,083][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000012568_6434816.pth... [2024-12-13 10:11:54,089][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000012520_6410240.pth [2024-12-13 10:11:59,076][62436] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 6434816. Throughput: 0: 803.3. Samples: 6436892. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:11:59,077][62436] Avg episode reward: [(0, '5146.695')] [2024-12-13 10:12:04,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 6443008. Throughput: 0: 802.3. Samples: 6442264. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:12:04,076][62436] Avg episode reward: [(0, '5210.851')] [2024-12-13 10:12:04,077][62473] Saving new best policy, reward=5210.851! [2024-12-13 10:12:09,076][62436] Fps is (10 sec: 1228.9, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 6447104. Throughput: 0: 815.7. Samples: 6447328. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:12:09,076][62436] Avg episode reward: [(0, '5174.410')] [2024-12-13 10:12:09,085][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000012592_6447104.pth... [2024-12-13 10:12:09,097][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000012544_6422528.pth [2024-12-13 10:12:14,076][62436] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 6447104. Throughput: 0: 809.5. Samples: 6449180. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:12:14,076][62436] Avg episode reward: [(0, '5198.824')] [2024-12-13 10:12:19,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 6455296. Throughput: 0: 798.1. Samples: 6454208. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:12:19,076][62436] Avg episode reward: [(0, '5231.761')] [2024-12-13 10:12:19,077][62473] Saving new best policy, reward=5231.761! [2024-12-13 10:12:24,076][62436] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 6459392. Throughput: 0: 817.6. Samples: 6459592. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:12:24,076][62436] Avg episode reward: [(0, '5225.865')] [2024-12-13 10:12:24,080][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000012616_6459392.pth... [2024-12-13 10:12:24,088][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000012568_6434816.pth [2024-12-13 10:12:29,079][62436] Fps is (10 sec: 409.5, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 6459392. Throughput: 0: 817.8. Samples: 6461504. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:12:29,079][62436] Avg episode reward: [(0, '5250.062')] [2024-12-13 10:12:29,080][62473] Saving new best policy, reward=5250.062! [2024-12-13 10:12:34,076][62436] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 6463488. Throughput: 0: 798.2. Samples: 6466152. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 10:12:34,076][62436] Avg episode reward: [(0, '5230.567')] [2024-12-13 10:12:38,543][62492] Updated weights for policy 0, policy_version 12640 (0.0010) [2024-12-13 10:12:39,076][62436] Fps is (10 sec: 1229.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 6471680. Throughput: 0: 816.1. Samples: 6471744. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 10:12:39,076][62436] Avg episode reward: [(0, '5225.461')] [2024-12-13 10:12:39,086][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000012640_6471680.pth... [2024-12-13 10:12:39,091][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000012592_6447104.pth [2024-12-13 10:12:44,076][62436] Fps is (10 sec: 819.2, 60 sec: 751.0, 300 sec: 791.4). Total num frames: 6471680. Throughput: 0: 821.9. Samples: 6473876. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 10:12:44,076][62436] Avg episode reward: [(0, '5286.268')] [2024-12-13 10:12:44,078][62473] Saving new best policy, reward=5286.268! [2024-12-13 10:12:49,076][62436] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 6475776. Throughput: 0: 802.1. Samples: 6478360. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:12:49,076][62436] Avg episode reward: [(0, '5250.073')] [2024-12-13 10:12:54,076][62436] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 6483968. Throughput: 0: 814.2. Samples: 6483968. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:12:54,076][62436] Avg episode reward: [(0, '5189.646')] [2024-12-13 10:12:54,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000012664_6483968.pth... [2024-12-13 10:12:54,088][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000012616_6459392.pth [2024-12-13 10:12:59,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 6483968. Throughput: 0: 825.5. Samples: 6486328. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:12:59,076][62436] Avg episode reward: [(0, '5199.842')] [2024-12-13 10:13:04,076][62436] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 6488064. Throughput: 0: 800.9. Samples: 6490248. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:13:04,076][62436] Avg episode reward: [(0, '5199.842')] [2024-12-13 10:13:09,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 6492160. Throughput: 0: 806.6. Samples: 6495888. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:13:09,076][62436] Avg episode reward: [(0, '5154.243')] [2024-12-13 10:13:09,163][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000012688_6496256.pth... [2024-12-13 10:13:09,169][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000012640_6471680.pth [2024-12-13 10:13:14,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 6496256. Throughput: 0: 822.3. Samples: 6498504. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:13:14,077][62436] Avg episode reward: [(0, '5157.452')] [2024-12-13 10:13:19,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 6500352. Throughput: 0: 803.9. Samples: 6502328. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:13:19,076][62436] Avg episode reward: [(0, '5150.637')] [2024-12-13 10:13:24,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 6504448. Throughput: 0: 805.2. Samples: 6507980. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:13:24,076][62436] Avg episode reward: [(0, '5105.709')] [2024-12-13 10:13:24,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000012704_6504448.pth... [2024-12-13 10:13:24,093][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000012664_6483968.pth [2024-12-13 10:13:29,075][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 6508544. Throughput: 0: 818.9. Samples: 6510728. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:13:29,076][62436] Avg episode reward: [(0, '5099.689')] [2024-12-13 10:13:29,618][62492] Updated weights for policy 0, policy_version 12720 (0.0018) [2024-12-13 10:13:34,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 6512640. Throughput: 0: 803.3. Samples: 6514508. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:13:34,076][62436] Avg episode reward: [(0, '5158.709')] [2024-12-13 10:13:39,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 6516736. Throughput: 0: 803.5. Samples: 6520124. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:13:39,076][62436] Avg episode reward: [(0, '5192.071')] [2024-12-13 10:13:39,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000012728_6516736.pth... [2024-12-13 10:13:39,088][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000012688_6496256.pth [2024-12-13 10:13:44,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 6520832. Throughput: 0: 811.4. Samples: 6522840. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 10:13:44,076][62436] Avg episode reward: [(0, '5226.515')] [2024-12-13 10:13:49,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 6524928. Throughput: 0: 812.9. Samples: 6526828. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 10:13:49,076][62436] Avg episode reward: [(0, '5251.505')] [2024-12-13 10:13:54,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 6529024. Throughput: 0: 808.1. Samples: 6532252. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 10:13:54,076][62436] Avg episode reward: [(0, '5325.866')] [2024-12-13 10:13:54,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000012752_6529024.pth... [2024-12-13 10:13:54,088][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000012704_6504448.pth [2024-12-13 10:13:54,090][62473] Saving new best policy, reward=5325.866! [2024-12-13 10:13:59,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 6533120. Throughput: 0: 811.1. Samples: 6535004. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 10:13:59,076][62436] Avg episode reward: [(0, '5306.404')] [2024-12-13 10:14:04,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 6537216. Throughput: 0: 817.2. Samples: 6539104. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 10:14:04,076][62436] Avg episode reward: [(0, '5282.056')] [2024-12-13 10:14:09,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 6541312. Throughput: 0: 809.1. Samples: 6544388. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 10:14:09,076][62436] Avg episode reward: [(0, '5273.430')] [2024-12-13 10:14:09,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000012776_6541312.pth... [2024-12-13 10:14:09,089][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000012728_6516736.pth [2024-12-13 10:14:14,077][62436] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 6545408. Throughput: 0: 805.7. Samples: 6546984. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 10:14:14,078][62436] Avg episode reward: [(0, '5313.203')] [2024-12-13 10:14:19,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 6549504. Throughput: 0: 791.6. Samples: 6550132. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-12-13 10:14:19,076][62436] Avg episode reward: [(0, '5314.083')] [2024-12-13 10:14:22,803][62492] Updated weights for policy 0, policy_version 12800 (0.0010) [2024-12-13 10:14:24,076][62436] Fps is (10 sec: 819.3, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 6553600. Throughput: 0: 760.5. Samples: 6554348. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-12-13 10:14:24,076][62436] Avg episode reward: [(0, '5306.270')] [2024-12-13 10:14:24,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000012800_6553600.pth... [2024-12-13 10:14:24,088][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000012752_6529024.pth [2024-12-13 10:14:29,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 6557696. Throughput: 0: 765.1. Samples: 6557268. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:14:29,076][62436] Avg episode reward: [(0, '5294.131')] [2024-12-13 10:14:34,075][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 6561792. Throughput: 0: 786.7. Samples: 6562228. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:14:34,076][62436] Avg episode reward: [(0, '5269.128')] [2024-12-13 10:14:39,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 6565888. Throughput: 0: 759.7. Samples: 6566440. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:14:39,076][62436] Avg episode reward: [(0, '5302.406')] [2024-12-13 10:14:39,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000012824_6565888.pth... [2024-12-13 10:14:39,089][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000012776_6541312.pth [2024-12-13 10:14:44,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 6569984. Throughput: 0: 763.0. Samples: 6569340. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:14:44,076][62436] Avg episode reward: [(0, '5307.841')] [2024-12-13 10:14:49,079][62436] Fps is (10 sec: 818.9, 60 sec: 819.1, 300 sec: 805.3). Total num frames: 6574080. Throughput: 0: 789.6. Samples: 6574640. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:14:49,080][62436] Avg episode reward: [(0, '5377.650')] [2024-12-13 10:14:49,081][62473] Saving new best policy, reward=5377.650! [2024-12-13 10:14:54,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 6578176. Throughput: 0: 758.4. Samples: 6578516. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:14:54,076][62436] Avg episode reward: [(0, '5378.634')] [2024-12-13 10:14:54,085][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000012848_6578176.pth... [2024-12-13 10:14:54,098][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000012800_6553600.pth [2024-12-13 10:14:54,099][62473] Saving new best policy, reward=5378.634! [2024-12-13 10:14:59,076][62436] Fps is (10 sec: 819.5, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 6582272. Throughput: 0: 766.2. Samples: 6581460. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:14:59,076][62436] Avg episode reward: [(0, '5424.373')] [2024-12-13 10:14:59,077][62473] Saving new best policy, reward=5424.373! [2024-12-13 10:15:04,082][62436] Fps is (10 sec: 818.7, 60 sec: 819.1, 300 sec: 805.3). Total num frames: 6586368. Throughput: 0: 817.0. Samples: 6586904. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:15:04,083][62436] Avg episode reward: [(0, '5454.710')] [2024-12-13 10:15:04,084][62473] Saving new best policy, reward=5454.710! [2024-12-13 10:15:09,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 6590464. Throughput: 0: 805.0. Samples: 6590572. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 10:15:09,076][62436] Avg episode reward: [(0, '5421.479')] [2024-12-13 10:15:09,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000012872_6590464.pth... [2024-12-13 10:15:09,088][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000012824_6565888.pth [2024-12-13 10:15:12,994][62492] Updated weights for policy 0, policy_version 12880 (0.0010) [2024-12-13 10:15:14,076][62436] Fps is (10 sec: 819.8, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 6594560. Throughput: 0: 802.2. Samples: 6593368. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 10:15:14,076][62436] Avg episode reward: [(0, '5396.763')] [2024-12-13 10:15:19,076][62436] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 6598656. Throughput: 0: 812.0. Samples: 6598768. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 10:15:19,077][62436] Avg episode reward: [(0, '5421.199')] [2024-12-13 10:15:24,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 6602752. Throughput: 0: 807.5. Samples: 6602776. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:15:24,076][62436] Avg episode reward: [(0, '5385.402')] [2024-12-13 10:15:24,083][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000012896_6602752.pth... [2024-12-13 10:15:24,093][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000012848_6578176.pth [2024-12-13 10:15:29,076][62436] Fps is (10 sec: 819.3, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 6606848. Throughput: 0: 800.6. Samples: 6605368. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:15:29,076][62436] Avg episode reward: [(0, '5351.769')] [2024-12-13 10:15:34,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 6610944. Throughput: 0: 805.0. Samples: 6610860. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:15:34,076][62436] Avg episode reward: [(0, '5333.036')] [2024-12-13 10:15:39,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 6615040. Throughput: 0: 813.1. Samples: 6615104. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 10:15:39,076][62436] Avg episode reward: [(0, '5332.969')] [2024-12-13 10:15:39,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000012920_6615040.pth... [2024-12-13 10:15:39,092][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000012872_6590464.pth [2024-12-13 10:15:44,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 6619136. Throughput: 0: 798.7. Samples: 6617400. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 10:15:44,076][62436] Avg episode reward: [(0, '5237.270')] [2024-12-13 10:15:49,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.3, 300 sec: 805.3). Total num frames: 6623232. Throughput: 0: 802.8. Samples: 6623024. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 10:15:49,080][62436] Avg episode reward: [(0, '5086.107')] [2024-12-13 10:15:54,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 6627328. Throughput: 0: 819.1. Samples: 6627432. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:15:54,076][62436] Avg episode reward: [(0, '4982.258')] [2024-12-13 10:15:54,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000012944_6627328.pth... [2024-12-13 10:15:54,089][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000012896_6602752.pth [2024-12-13 10:15:59,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 6631424. Throughput: 0: 805.4. Samples: 6629612. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:15:59,076][62436] Avg episode reward: [(0, '4897.798')] [2024-12-13 10:16:03,109][62492] Updated weights for policy 0, policy_version 12960 (0.0011) [2024-12-13 10:16:04,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.3, 300 sec: 805.3). Total num frames: 6635520. Throughput: 0: 808.5. Samples: 6635152. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 10:16:04,076][62436] Avg episode reward: [(0, '4859.309')] [2024-12-13 10:16:09,080][62436] Fps is (10 sec: 818.8, 60 sec: 819.1, 300 sec: 805.3). Total num frames: 6639616. Throughput: 0: 822.8. Samples: 6639804. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 10:16:09,081][62436] Avg episode reward: [(0, '4745.528')] [2024-12-13 10:16:09,090][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000012968_6639616.pth... [2024-12-13 10:16:09,100][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000012920_6615040.pth [2024-12-13 10:16:14,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 6643712. Throughput: 0: 806.9. Samples: 6641680. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:16:14,076][62436] Avg episode reward: [(0, '4617.158')] [2024-12-13 10:16:19,076][62436] Fps is (10 sec: 819.6, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 6647808. Throughput: 0: 809.9. Samples: 6647304. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:16:19,076][62436] Avg episode reward: [(0, '4546.251')] [2024-12-13 10:16:24,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 6651904. Throughput: 0: 824.0. Samples: 6652184. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:16:24,079][62436] Avg episode reward: [(0, '4506.418')] [2024-12-13 10:16:24,085][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000012992_6651904.pth... [2024-12-13 10:16:24,093][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000012944_6627328.pth [2024-12-13 10:16:29,076][62436] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 6651904. Throughput: 0: 814.9. Samples: 6654072. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:16:29,076][62436] Avg episode reward: [(0, '4532.163')] [2024-12-13 10:16:34,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 6660096. Throughput: 0: 808.2. Samples: 6659392. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:16:34,076][62436] Avg episode reward: [(0, '4532.100')] [2024-12-13 10:16:39,076][62436] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 6664192. Throughput: 0: 823.4. Samples: 6664484. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:16:39,076][62436] Avg episode reward: [(0, '4530.932')] [2024-12-13 10:16:39,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000013016_6664192.pth... [2024-12-13 10:16:39,092][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000012968_6639616.pth [2024-12-13 10:16:44,076][62436] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 6664192. Throughput: 0: 817.8. Samples: 6666412. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:16:44,076][62436] Avg episode reward: [(0, '4532.622')] [2024-12-13 10:16:49,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 6672384. Throughput: 0: 809.5. Samples: 6671580. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:16:49,076][62436] Avg episode reward: [(0, '4594.844')] [2024-12-13 10:16:52,984][62492] Updated weights for policy 0, policy_version 13040 (0.0011) [2024-12-13 10:16:54,076][62436] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 6676480. Throughput: 0: 825.8. Samples: 6676960. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:16:54,076][62436] Avg episode reward: [(0, '4587.984')] [2024-12-13 10:16:54,086][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000013040_6676480.pth... [2024-12-13 10:16:54,104][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000012992_6651904.pth [2024-12-13 10:16:59,076][62436] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 6676480. Throughput: 0: 826.3. Samples: 6678864. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:16:59,076][62436] Avg episode reward: [(0, '4526.922')] [2024-12-13 10:17:04,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 6684672. Throughput: 0: 806.8. Samples: 6683608. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 10:17:04,076][62436] Avg episode reward: [(0, '4538.690')] [2024-12-13 10:17:09,076][62436] Fps is (10 sec: 1228.8, 60 sec: 819.3, 300 sec: 819.2). Total num frames: 6688768. Throughput: 0: 822.0. Samples: 6689176. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 10:17:09,076][62436] Avg episode reward: [(0, '4570.520')] [2024-12-13 10:17:09,083][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000013064_6688768.pth... [2024-12-13 10:17:09,090][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000013016_6664192.pth [2024-12-13 10:17:14,076][62436] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 6688768. Throughput: 0: 827.2. Samples: 6691296. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 10:17:14,076][62436] Avg episode reward: [(0, '4572.283')] [2024-12-13 10:17:19,075][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 6696960. Throughput: 0: 807.3. Samples: 6695720. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:17:19,076][62436] Avg episode reward: [(0, '4613.613')] [2024-12-13 10:17:24,076][62436] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 6701056. Throughput: 0: 816.3. Samples: 6701216. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:17:24,076][62436] Avg episode reward: [(0, '4774.969')] [2024-12-13 10:17:24,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000013088_6701056.pth... [2024-12-13 10:17:24,088][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000013040_6676480.pth [2024-12-13 10:17:29,076][62436] Fps is (10 sec: 409.6, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 6701056. Throughput: 0: 827.3. Samples: 6703640. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:17:29,076][62436] Avg episode reward: [(0, '4825.508')] [2024-12-13 10:17:34,076][62436] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 6705152. Throughput: 0: 806.8. Samples: 6707888. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:17:34,076][62436] Avg episode reward: [(0, '4948.812')] [2024-12-13 10:17:39,079][62436] Fps is (10 sec: 1228.3, 60 sec: 819.1, 300 sec: 819.2). Total num frames: 6713344. Throughput: 0: 810.9. Samples: 6713452. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:17:39,080][62436] Avg episode reward: [(0, '4949.317')] [2024-12-13 10:17:39,095][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000013112_6713344.pth... [2024-12-13 10:17:39,102][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000013064_6688768.pth [2024-12-13 10:17:44,017][62492] Updated weights for policy 0, policy_version 13120 (0.0010) [2024-12-13 10:17:44,078][62436] Fps is (10 sec: 1228.5, 60 sec: 887.4, 300 sec: 819.2). Total num frames: 6717440. Throughput: 0: 827.2. Samples: 6716088. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:17:44,080][62436] Avg episode reward: [(0, '4980.773')] [2024-12-13 10:17:49,076][62436] Fps is (10 sec: 409.8, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 6717440. Throughput: 0: 811.7. Samples: 6720136. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:17:49,076][62436] Avg episode reward: [(0, '5006.297')] [2024-12-13 10:17:54,076][62436] Fps is (10 sec: 819.4, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 6725632. Throughput: 0: 810.1. Samples: 6725632. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:17:54,076][62436] Avg episode reward: [(0, '5037.496')] [2024-12-13 10:17:54,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000013136_6725632.pth... [2024-12-13 10:17:54,087][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000013088_6701056.pth [2024-12-13 10:17:59,076][62436] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 819.2). Total num frames: 6729728. Throughput: 0: 825.8. Samples: 6728456. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:17:59,076][62436] Avg episode reward: [(0, '5092.163')] [2024-12-13 10:18:04,076][62436] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 6729728. Throughput: 0: 810.9. Samples: 6732212. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:18:04,076][62436] Avg episode reward: [(0, '5086.776')] [2024-12-13 10:18:09,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 6737920. Throughput: 0: 814.4. Samples: 6737864. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 10:18:09,076][62436] Avg episode reward: [(0, '4954.932')] [2024-12-13 10:18:09,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000013160_6737920.pth... [2024-12-13 10:18:09,088][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000013112_6713344.pth [2024-12-13 10:18:14,076][62436] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 819.2). Total num frames: 6742016. Throughput: 0: 822.6. Samples: 6740656. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 10:18:14,076][62436] Avg episode reward: [(0, '4940.456')] [2024-12-13 10:18:19,076][62436] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 6742016. Throughput: 0: 812.1. Samples: 6744432. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 10:18:19,076][62436] Avg episode reward: [(0, '4928.384')] [2024-12-13 10:18:24,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 6750208. Throughput: 0: 814.2. Samples: 6750088. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:18:24,076][62436] Avg episode reward: [(0, '4827.273')] [2024-12-13 10:18:24,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000013184_6750208.pth... [2024-12-13 10:18:24,091][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000013136_6725632.pth [2024-12-13 10:18:29,076][62436] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 819.2). Total num frames: 6754304. Throughput: 0: 818.4. Samples: 6752916. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:18:29,076][62436] Avg episode reward: [(0, '4758.805')] [2024-12-13 10:18:34,076][62436] Fps is (10 sec: 409.6, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 6754304. Throughput: 0: 817.1. Samples: 6756904. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:18:34,076][62436] Avg episode reward: [(0, '4618.855')] [2024-12-13 10:18:34,664][62492] Updated weights for policy 0, policy_version 13200 (0.0010) [2024-12-13 10:18:39,076][62436] Fps is (10 sec: 409.6, 60 sec: 751.0, 300 sec: 805.3). Total num frames: 6758400. Throughput: 0: 811.6. Samples: 6762152. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 10:18:39,076][62436] Avg episode reward: [(0, '4571.138')] [2024-12-13 10:18:39,153][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000013208_6762496.pth... [2024-12-13 10:18:39,159][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000013160_6737920.pth [2024-12-13 10:18:44,075][62436] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 6766592. Throughput: 0: 808.1. Samples: 6764820. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 10:18:44,076][62436] Avg episode reward: [(0, '4544.503')] [2024-12-13 10:18:49,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 6766592. Throughput: 0: 820.0. Samples: 6769112. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 10:18:49,076][62436] Avg episode reward: [(0, '4482.595')] [2024-12-13 10:18:54,076][62436] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 6770688. Throughput: 0: 805.1. Samples: 6774092. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 10:18:54,076][62436] Avg episode reward: [(0, '4366.999')] [2024-12-13 10:18:54,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000013224_6770688.pth... [2024-12-13 10:18:54,089][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000013184_6750208.pth [2024-12-13 10:18:59,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 6774784. Throughput: 0: 781.2. Samples: 6775808. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 10:18:59,079][62436] Avg episode reward: [(0, '4309.749')] [2024-12-13 10:19:04,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 6778880. Throughput: 0: 779.7. Samples: 6779520. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 10:19:04,076][62436] Avg episode reward: [(0, '4323.248')] [2024-12-13 10:19:09,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 6782976. Throughput: 0: 760.0. Samples: 6784288. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 10:19:09,076][62436] Avg episode reward: [(0, '4307.488')] [2024-12-13 10:19:09,085][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000013248_6782976.pth... [2024-12-13 10:19:09,093][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000013208_6762496.pth [2024-12-13 10:19:14,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 6787072. Throughput: 0: 759.0. Samples: 6787072. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:19:14,076][62436] Avg episode reward: [(0, '4366.200')] [2024-12-13 10:19:19,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 6791168. Throughput: 0: 783.5. Samples: 6792160. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:19:19,077][62436] Avg episode reward: [(0, '4365.187')] [2024-12-13 10:19:24,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 6795264. Throughput: 0: 764.2. Samples: 6796540. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:19:24,076][62436] Avg episode reward: [(0, '4305.993')] [2024-12-13 10:19:24,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000013272_6795264.pth... [2024-12-13 10:19:24,088][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000013224_6770688.pth [2024-12-13 10:19:26,615][62492] Updated weights for policy 0, policy_version 13280 (0.0010) [2024-12-13 10:19:29,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 6799360. Throughput: 0: 767.6. Samples: 6799360. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:19:29,076][62436] Avg episode reward: [(0, '4183.494')] [2024-12-13 10:19:34,078][62436] Fps is (10 sec: 819.0, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 6803456. Throughput: 0: 788.6. Samples: 6804600. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:19:34,079][62436] Avg episode reward: [(0, '4168.282')] [2024-12-13 10:19:39,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 6807552. Throughput: 0: 770.5. Samples: 6808764. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:19:39,076][62436] Avg episode reward: [(0, '4102.903')] [2024-12-13 10:19:39,083][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000013296_6807552.pth... [2024-12-13 10:19:39,096][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000013248_6782976.pth [2024-12-13 10:19:44,076][62436] Fps is (10 sec: 819.4, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 6811648. Throughput: 0: 796.4. Samples: 6811648. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:19:44,076][62436] Avg episode reward: [(0, '4192.954')] [2024-12-13 10:19:49,076][62436] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 6815744. Throughput: 0: 836.5. Samples: 6817164. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:19:49,077][62436] Avg episode reward: [(0, '4151.752')] [2024-12-13 10:19:54,077][62436] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 6819840. Throughput: 0: 816.1. Samples: 6821016. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-12-13 10:19:54,078][62436] Avg episode reward: [(0, '4208.352')] [2024-12-13 10:19:54,083][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000013320_6819840.pth... [2024-12-13 10:19:54,092][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000013272_6795264.pth [2024-12-13 10:19:59,076][62436] Fps is (10 sec: 819.3, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 6823936. Throughput: 0: 817.8. Samples: 6823872. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-12-13 10:19:59,076][62436] Avg episode reward: [(0, '4211.008')] [2024-12-13 10:20:04,082][62436] Fps is (10 sec: 818.8, 60 sec: 819.1, 300 sec: 805.3). Total num frames: 6828032. Throughput: 0: 826.7. Samples: 6829368. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-12-13 10:20:04,089][62436] Avg episode reward: [(0, '4223.122')] [2024-12-13 10:20:09,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 6832128. Throughput: 0: 812.4. Samples: 6833100. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 10:20:09,076][62436] Avg episode reward: [(0, '4263.044')] [2024-12-13 10:20:09,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000013344_6832128.pth... [2024-12-13 10:20:09,088][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000013296_6807552.pth [2024-12-13 10:20:14,076][62436] Fps is (10 sec: 819.7, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 6836224. Throughput: 0: 814.6. Samples: 6836016. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 10:20:14,076][62436] Avg episode reward: [(0, '4212.685')] [2024-12-13 10:20:16,274][62492] Updated weights for policy 0, policy_version 13360 (0.0010) [2024-12-13 10:20:19,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 6840320. Throughput: 0: 822.2. Samples: 6841596. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 10:20:19,076][62436] Avg episode reward: [(0, '4222.106')] [2024-12-13 10:20:24,079][62436] Fps is (10 sec: 818.9, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 6844416. Throughput: 0: 819.1. Samples: 6845628. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:20:24,079][62436] Avg episode reward: [(0, '4187.599')] [2024-12-13 10:20:24,084][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000013368_6844416.pth... [2024-12-13 10:20:24,089][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000013320_6819840.pth [2024-12-13 10:20:29,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 6848512. Throughput: 0: 815.3. Samples: 6848336. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:20:29,076][62436] Avg episode reward: [(0, '4238.560')] [2024-12-13 10:20:34,077][62436] Fps is (10 sec: 819.3, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 6852608. Throughput: 0: 814.6. Samples: 6853824. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:20:34,078][62436] Avg episode reward: [(0, '4139.648')] [2024-12-13 10:20:39,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 6856704. Throughput: 0: 825.3. Samples: 6858152. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-12-13 10:20:39,076][62436] Avg episode reward: [(0, '4161.768')] [2024-12-13 10:20:39,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000013392_6856704.pth... [2024-12-13 10:20:39,089][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000013344_6832128.pth [2024-12-13 10:20:44,075][62436] Fps is (10 sec: 819.3, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 6860800. Throughput: 0: 815.6. Samples: 6860576. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-12-13 10:20:44,076][62436] Avg episode reward: [(0, '4178.581')] [2024-12-13 10:20:49,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 6864896. Throughput: 0: 816.8. Samples: 6866120. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2024-12-13 10:20:49,076][62436] Avg episode reward: [(0, '4229.082')] [2024-12-13 10:20:54,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 6868992. Throughput: 0: 835.4. Samples: 6870692. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2024-12-13 10:20:54,076][62436] Avg episode reward: [(0, '4241.456')] [2024-12-13 10:20:54,080][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000013416_6868992.pth... [2024-12-13 10:20:54,092][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000013368_6844416.pth [2024-12-13 10:20:59,077][62436] Fps is (10 sec: 819.0, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 6873088. Throughput: 0: 817.0. Samples: 6872784. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:20:59,078][62436] Avg episode reward: [(0, '4365.262')] [2024-12-13 10:21:04,077][62436] Fps is (10 sec: 819.1, 60 sec: 819.3, 300 sec: 805.3). Total num frames: 6877184. Throughput: 0: 814.0. Samples: 6878228. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:21:04,077][62436] Avg episode reward: [(0, '4418.948')] [2024-12-13 10:21:05,906][62492] Updated weights for policy 0, policy_version 13440 (0.0011) [2024-12-13 10:21:09,081][62436] Fps is (10 sec: 818.9, 60 sec: 819.1, 300 sec: 805.3). Total num frames: 6881280. Throughput: 0: 831.3. Samples: 6883040. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:21:09,082][62436] Avg episode reward: [(0, '4422.519')] [2024-12-13 10:21:09,092][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000013440_6881280.pth... [2024-12-13 10:21:09,098][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000013392_6856704.pth [2024-12-13 10:21:14,075][62436] Fps is (10 sec: 819.3, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 6885376. Throughput: 0: 814.0. Samples: 6884964. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:21:14,076][62436] Avg episode reward: [(0, '4582.291')] [2024-12-13 10:21:19,076][62436] Fps is (10 sec: 819.7, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 6889472. Throughput: 0: 814.3. Samples: 6890464. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:21:19,076][62436] Avg episode reward: [(0, '4614.234')] [2024-12-13 10:21:24,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 6893568. Throughput: 0: 832.0. Samples: 6895592. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:21:24,078][62436] Avg episode reward: [(0, '4608.209')] [2024-12-13 10:21:24,086][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000013464_6893568.pth... [2024-12-13 10:21:24,104][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000013416_6868992.pth [2024-12-13 10:21:29,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 6897664. Throughput: 0: 820.5. Samples: 6897500. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:21:29,076][62436] Avg episode reward: [(0, '4691.434')] [2024-12-13 10:21:34,075][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 6901760. Throughput: 0: 811.8. Samples: 6902652. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:21:34,076][62436] Avg episode reward: [(0, '4734.584')] [2024-12-13 10:21:39,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 6905856. Throughput: 0: 832.9. Samples: 6908172. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:21:39,076][62436] Avg episode reward: [(0, '4782.864')] [2024-12-13 10:21:39,083][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000013488_6905856.pth... [2024-12-13 10:21:39,089][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000013440_6881280.pth [2024-12-13 10:21:44,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 6909952. Throughput: 0: 826.0. Samples: 6909952. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-12-13 10:21:44,077][62436] Avg episode reward: [(0, '4781.090')] [2024-12-13 10:21:49,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 6914048. Throughput: 0: 814.6. Samples: 6914884. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-12-13 10:21:49,076][62436] Avg episode reward: [(0, '4932.816')] [2024-12-13 10:21:54,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 6918144. Throughput: 0: 832.6. Samples: 6920500. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-12-13 10:21:54,080][62436] Avg episode reward: [(0, '5001.320')] [2024-12-13 10:21:54,089][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000013512_6918144.pth... [2024-12-13 10:21:54,114][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000013464_6893568.pth [2024-12-13 10:21:56,542][62492] Updated weights for policy 0, policy_version 13520 (0.0010) [2024-12-13 10:21:59,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 6922240. Throughput: 0: 829.5. Samples: 6922292. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:21:59,076][62436] Avg episode reward: [(0, '5084.691')] [2024-12-13 10:22:04,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 6926336. Throughput: 0: 809.8. Samples: 6926904. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:22:04,076][62436] Avg episode reward: [(0, '5171.187')] [2024-12-13 10:22:09,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.3, 300 sec: 819.2). Total num frames: 6930432. Throughput: 0: 823.6. Samples: 6932656. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:22:09,076][62436] Avg episode reward: [(0, '5178.335')] [2024-12-13 10:22:09,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000013536_6930432.pth... [2024-12-13 10:22:09,091][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000013488_6905856.pth [2024-12-13 10:22:14,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 6934528. Throughput: 0: 827.5. Samples: 6934736. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-12-13 10:22:14,076][62436] Avg episode reward: [(0, '5131.401')] [2024-12-13 10:22:19,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 6938624. Throughput: 0: 810.6. Samples: 6939128. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-12-13 10:22:19,076][62436] Avg episode reward: [(0, '5170.720')] [2024-12-13 10:22:24,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 6942720. Throughput: 0: 813.2. Samples: 6944768. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 10:22:24,076][62436] Avg episode reward: [(0, '5204.250')] [2024-12-13 10:22:24,088][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000013560_6942720.pth... [2024-12-13 10:22:24,096][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000013512_6918144.pth [2024-12-13 10:22:29,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 6946816. Throughput: 0: 825.2. Samples: 6947084. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 10:22:29,076][62436] Avg episode reward: [(0, '5236.180')] [2024-12-13 10:22:34,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 6950912. Throughput: 0: 807.9. Samples: 6951240. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:22:34,076][62436] Avg episode reward: [(0, '5306.021')] [2024-12-13 10:22:39,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 6955008. Throughput: 0: 811.0. Samples: 6956996. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:22:39,078][62436] Avg episode reward: [(0, '5318.135')] [2024-12-13 10:22:39,085][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000013584_6955008.pth... [2024-12-13 10:22:39,091][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000013536_6930432.pth [2024-12-13 10:22:44,079][62436] Fps is (10 sec: 818.9, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 6959104. Throughput: 0: 828.1. Samples: 6959560. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:22:44,080][62436] Avg episode reward: [(0, '5328.908')] [2024-12-13 10:22:47,282][62492] Updated weights for policy 0, policy_version 13600 (0.0011) [2024-12-13 10:22:49,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 6963200. Throughput: 0: 812.9. Samples: 6963484. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 10:22:49,076][62436] Avg episode reward: [(0, '5341.205')] [2024-12-13 10:22:54,076][62436] Fps is (10 sec: 819.5, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 6967296. Throughput: 0: 811.2. Samples: 6969160. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 10:22:54,076][62436] Avg episode reward: [(0, '5340.392')] [2024-12-13 10:22:54,083][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000013608_6967296.pth... [2024-12-13 10:22:54,089][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000013560_6942720.pth [2024-12-13 10:22:59,078][62436] Fps is (10 sec: 819.0, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 6971392. Throughput: 0: 826.0. Samples: 6971908. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 10:22:59,079][62436] Avg episode reward: [(0, '5326.831')] [2024-12-13 10:23:04,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 6975488. Throughput: 0: 814.6. Samples: 6975784. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:23:04,076][62436] Avg episode reward: [(0, '5388.710')] [2024-12-13 10:23:09,076][62436] Fps is (10 sec: 819.4, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 6979584. Throughput: 0: 811.8. Samples: 6981300. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:23:09,076][62436] Avg episode reward: [(0, '5402.504')] [2024-12-13 10:23:09,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000013632_6979584.pth... [2024-12-13 10:23:09,086][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000013584_6955008.pth [2024-12-13 10:23:14,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 6983680. Throughput: 0: 821.8. Samples: 6984064. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:23:14,076][62436] Avg episode reward: [(0, '5302.505')] [2024-12-13 10:23:19,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 6987776. Throughput: 0: 823.1. Samples: 6988280. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 10:23:19,076][62436] Avg episode reward: [(0, '5276.117')] [2024-12-13 10:23:24,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 6991872. Throughput: 0: 810.9. Samples: 6993488. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 10:23:24,076][62436] Avg episode reward: [(0, '5240.108')] [2024-12-13 10:23:24,083][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000013656_6991872.pth... [2024-12-13 10:23:24,094][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000013608_6967296.pth [2024-12-13 10:23:29,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 6995968. Throughput: 0: 815.2. Samples: 6996240. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 10:23:29,076][62436] Avg episode reward: [(0, '5240.208')] [2024-12-13 10:23:34,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 7000064. Throughput: 0: 827.7. Samples: 7000732. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 10:23:34,076][62436] Avg episode reward: [(0, '5234.389')] [2024-12-13 10:23:37,112][62492] Updated weights for policy 0, policy_version 13680 (0.0012) [2024-12-13 10:23:39,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 7004160. Throughput: 0: 808.3. Samples: 7005532. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 10:23:39,076][62436] Avg episode reward: [(0, '5234.389')] [2024-12-13 10:23:39,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000013680_7004160.pth... [2024-12-13 10:23:39,088][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000013632_6979584.pth [2024-12-13 10:23:44,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 7008256. Throughput: 0: 791.2. Samples: 7007508. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-12-13 10:23:44,076][62436] Avg episode reward: [(0, '5207.461')] [2024-12-13 10:23:49,076][62436] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 7008256. Throughput: 0: 788.7. Samples: 7011276. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-12-13 10:23:49,076][62436] Avg episode reward: [(0, '5123.792')] [2024-12-13 10:23:54,076][62436] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 7012352. Throughput: 0: 768.8. Samples: 7015896. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-12-13 10:23:54,076][62436] Avg episode reward: [(0, '5128.190')] [2024-12-13 10:23:54,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000013696_7012352.pth... [2024-12-13 10:23:54,087][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000013656_6991872.pth [2024-12-13 10:23:59,076][62436] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 7020544. Throughput: 0: 768.7. Samples: 7018656. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-12-13 10:23:59,076][62436] Avg episode reward: [(0, '5142.875')] [2024-12-13 10:24:04,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 7020544. Throughput: 0: 787.9. Samples: 7023736. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-12-13 10:24:04,076][62436] Avg episode reward: [(0, '5181.189')] [2024-12-13 10:24:09,076][62436] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 7024640. Throughput: 0: 765.4. Samples: 7027932. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 10:24:09,076][62436] Avg episode reward: [(0, '5171.422')] [2024-12-13 10:24:09,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000013720_7024640.pth... [2024-12-13 10:24:09,088][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000013680_7004160.pth [2024-12-13 10:24:14,076][62436] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 7032832. Throughput: 0: 764.6. Samples: 7030648. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 10:24:14,076][62436] Avg episode reward: [(0, '5133.732')] [2024-12-13 10:24:19,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 7032832. Throughput: 0: 784.0. Samples: 7036012. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 10:24:19,076][62436] Avg episode reward: [(0, '5134.737')] [2024-12-13 10:24:24,076][62436] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 7036928. Throughput: 0: 766.9. Samples: 7040044. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 10:24:24,076][62436] Avg episode reward: [(0, '5193.726')] [2024-12-13 10:24:24,083][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000013744_7036928.pth... [2024-12-13 10:24:24,090][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000013696_7012352.pth [2024-12-13 10:24:29,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 7041024. Throughput: 0: 784.8. Samples: 7042824. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 10:24:29,076][62436] Avg episode reward: [(0, '5203.503')] [2024-12-13 10:24:29,189][62492] Updated weights for policy 0, policy_version 13760 (0.0011) [2024-12-13 10:24:34,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 7045120. Throughput: 0: 822.0. Samples: 7048264. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 10:24:34,076][62436] Avg episode reward: [(0, '5191.963')] [2024-12-13 10:24:39,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 7049216. Throughput: 0: 804.8. Samples: 7052112. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 10:24:39,076][62436] Avg episode reward: [(0, '5255.012')] [2024-12-13 10:24:39,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000013768_7049216.pth... [2024-12-13 10:24:39,089][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000013720_7024640.pth [2024-12-13 10:24:44,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 7053312. Throughput: 0: 804.5. Samples: 7054860. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 10:24:44,076][62436] Avg episode reward: [(0, '5297.437')] [2024-12-13 10:24:49,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 7057408. Throughput: 0: 819.2. Samples: 7060600. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 10:24:49,076][62436] Avg episode reward: [(0, '5282.605')] [2024-12-13 10:24:54,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 7061504. Throughput: 0: 809.5. Samples: 7064360. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:24:54,076][62436] Avg episode reward: [(0, '5284.865')] [2024-12-13 10:24:54,083][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000013792_7061504.pth... [2024-12-13 10:24:54,089][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000013744_7036928.pth [2024-12-13 10:24:59,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 7065600. Throughput: 0: 808.4. Samples: 7067028. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:24:59,080][62436] Avg episode reward: [(0, '5284.865')] [2024-12-13 10:25:04,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 7069696. Throughput: 0: 810.1. Samples: 7072468. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 10:25:04,076][62436] Avg episode reward: [(0, '5270.608')] [2024-12-13 10:25:09,080][62436] Fps is (10 sec: 818.8, 60 sec: 819.1, 300 sec: 805.3). Total num frames: 7073792. Throughput: 0: 801.1. Samples: 7076096. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 10:25:09,081][62436] Avg episode reward: [(0, '5331.802')] [2024-12-13 10:25:09,087][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000013816_7073792.pth... [2024-12-13 10:25:09,093][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000013768_7049216.pth [2024-12-13 10:25:14,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 7077888. Throughput: 0: 791.1. Samples: 7078424. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:25:14,076][62436] Avg episode reward: [(0, '5405.666')] [2024-12-13 10:25:19,076][62436] Fps is (10 sec: 819.6, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 7081984. Throughput: 0: 797.7. Samples: 7084160. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:25:19,076][62436] Avg episode reward: [(0, '5450.533')] [2024-12-13 10:25:19,716][62492] Updated weights for policy 0, policy_version 13840 (0.0013) [2024-12-13 10:25:24,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 7086080. Throughput: 0: 811.2. Samples: 7088616. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:25:24,076][62436] Avg episode reward: [(0, '5504.529')] [2024-12-13 10:25:24,089][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000013840_7086080.pth... [2024-12-13 10:25:24,104][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000013792_7061504.pth [2024-12-13 10:25:24,105][62473] Saving new best policy, reward=5504.529! [2024-12-13 10:25:29,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 7090176. Throughput: 0: 794.2. Samples: 7090600. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:25:29,076][62436] Avg episode reward: [(0, '5531.681')] [2024-12-13 10:25:29,077][62473] Saving new best policy, reward=5531.681! [2024-12-13 10:25:34,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 7094272. Throughput: 0: 791.5. Samples: 7096216. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:25:34,076][62436] Avg episode reward: [(0, '5574.367')] [2024-12-13 10:25:34,077][62473] Saving new best policy, reward=5574.367! [2024-12-13 10:25:39,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 7098368. Throughput: 0: 813.1. Samples: 7100948. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:25:39,079][62436] Avg episode reward: [(0, '5586.739')] [2024-12-13 10:25:39,087][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000013864_7098368.pth... [2024-12-13 10:25:39,097][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000013816_7073792.pth [2024-12-13 10:25:39,100][62473] Saving new best policy, reward=5586.739! [2024-12-13 10:25:44,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 7102464. Throughput: 0: 793.5. Samples: 7102736. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:25:44,076][62436] Avg episode reward: [(0, '5628.076')] [2024-12-13 10:25:44,077][62473] Saving new best policy, reward=5628.076! [2024-12-13 10:25:49,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 7106560. Throughput: 0: 797.7. Samples: 7108364. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:25:49,076][62436] Avg episode reward: [(0, '5567.187')] [2024-12-13 10:25:54,077][62436] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 7110656. Throughput: 0: 830.6. Samples: 7113472. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:25:54,077][62436] Avg episode reward: [(0, '5570.810')] [2024-12-13 10:25:54,086][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000013888_7110656.pth... [2024-12-13 10:25:54,098][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000013840_7086080.pth [2024-12-13 10:25:59,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 7114752. Throughput: 0: 816.9. Samples: 7115184. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:25:59,076][62436] Avg episode reward: [(0, '5600.459')] [2024-12-13 10:26:04,076][62436] Fps is (10 sec: 819.3, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 7118848. Throughput: 0: 808.9. Samples: 7120560. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:26:04,076][62436] Avg episode reward: [(0, '5587.400')] [2024-12-13 10:26:09,076][62436] Fps is (10 sec: 819.1, 60 sec: 819.3, 300 sec: 805.3). Total num frames: 7122944. Throughput: 0: 829.5. Samples: 7125944. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:26:09,078][62436] Avg episode reward: [(0, '5606.427')] [2024-12-13 10:26:09,086][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000013912_7122944.pth... [2024-12-13 10:26:09,097][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000013864_7098368.pth [2024-12-13 10:26:10,675][62492] Updated weights for policy 0, policy_version 13920 (0.0011) [2024-12-13 10:26:14,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 7127040. Throughput: 0: 823.6. Samples: 7127660. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:26:14,076][62436] Avg episode reward: [(0, '5606.530')] [2024-12-13 10:26:19,076][62436] Fps is (10 sec: 819.3, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 7131136. Throughput: 0: 812.8. Samples: 7132792. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:26:19,076][62436] Avg episode reward: [(0, '5584.506')] [2024-12-13 10:26:24,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 7135232. Throughput: 0: 831.2. Samples: 7138352. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:26:24,076][62436] Avg episode reward: [(0, '5606.013')] [2024-12-13 10:26:24,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000013936_7135232.pth... [2024-12-13 10:26:24,088][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000013888_7110656.pth [2024-12-13 10:26:29,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 7139328. Throughput: 0: 829.7. Samples: 7140072. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-12-13 10:26:29,076][62436] Avg episode reward: [(0, '5574.062')] [2024-12-13 10:26:34,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 7143424. Throughput: 0: 813.3. Samples: 7144964. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-12-13 10:26:34,076][62436] Avg episode reward: [(0, '5581.391')] [2024-12-13 10:26:39,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 7147520. Throughput: 0: 825.0. Samples: 7150596. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-12-13 10:26:39,076][62436] Avg episode reward: [(0, '5578.851')] [2024-12-13 10:26:39,083][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000013960_7147520.pth... [2024-12-13 10:26:39,089][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000013912_7122944.pth [2024-12-13 10:26:44,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 7151616. Throughput: 0: 829.0. Samples: 7152488. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:26:44,076][62436] Avg episode reward: [(0, '5577.748')] [2024-12-13 10:26:49,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 7155712. Throughput: 0: 811.6. Samples: 7157080. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:26:49,076][62436] Avg episode reward: [(0, '5557.397')] [2024-12-13 10:26:54,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 7159808. Throughput: 0: 817.9. Samples: 7162748. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 10:26:54,078][62436] Avg episode reward: [(0, '5559.163')] [2024-12-13 10:26:54,085][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000013984_7159808.pth... [2024-12-13 10:26:54,093][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000013936_7135232.pth [2024-12-13 10:26:59,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 7163904. Throughput: 0: 826.4. Samples: 7164848. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 10:26:59,076][62436] Avg episode reward: [(0, '5559.163')] [2024-12-13 10:27:01,428][62492] Updated weights for policy 0, policy_version 14000 (0.0009) [2024-12-13 10:27:04,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 7168000. Throughput: 0: 807.6. Samples: 7169136. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:27:04,076][62436] Avg episode reward: [(0, '5588.413')] [2024-12-13 10:27:09,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 7172096. Throughput: 0: 808.5. Samples: 7174736. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:27:09,076][62436] Avg episode reward: [(0, '5614.436')] [2024-12-13 10:27:09,084][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000014008_7172096.pth... [2024-12-13 10:27:09,095][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000013960_7147520.pth [2024-12-13 10:27:14,075][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 7176192. Throughput: 0: 823.6. Samples: 7177136. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:27:14,077][62436] Avg episode reward: [(0, '5615.602')] [2024-12-13 10:27:19,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 7180288. Throughput: 0: 804.4. Samples: 7181164. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:27:19,076][62436] Avg episode reward: [(0, '5668.817')] [2024-12-13 10:27:19,077][62473] Saving new best policy, reward=5668.817! [2024-12-13 10:27:24,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 7184384. Throughput: 0: 803.7. Samples: 7186764. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:27:24,076][62436] Avg episode reward: [(0, '5681.878')] [2024-12-13 10:27:24,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000014032_7184384.pth... [2024-12-13 10:27:24,088][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000013984_7159808.pth [2024-12-13 10:27:24,089][62473] Saving new best policy, reward=5681.878! [2024-12-13 10:27:29,077][62436] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 7188480. Throughput: 0: 822.9. Samples: 7189520. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:27:29,078][62436] Avg episode reward: [(0, '5650.500')] [2024-12-13 10:27:34,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 7192576. Throughput: 0: 803.3. Samples: 7193228. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:27:34,076][62436] Avg episode reward: [(0, '5635.932')] [2024-12-13 10:27:39,076][62436] Fps is (10 sec: 819.3, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 7196672. Throughput: 0: 801.3. Samples: 7198808. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:27:39,076][62436] Avg episode reward: [(0, '5627.572')] [2024-12-13 10:27:39,083][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000014056_7196672.pth... [2024-12-13 10:27:39,091][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000014008_7172096.pth [2024-12-13 10:27:44,076][62436] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 7200768. Throughput: 0: 816.0. Samples: 7201568. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:27:44,077][62436] Avg episode reward: [(0, '5711.284')] [2024-12-13 10:27:44,078][62473] Saving new best policy, reward=5711.284! [2024-12-13 10:27:49,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 7204864. Throughput: 0: 809.9. Samples: 7205580. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:27:49,076][62436] Avg episode reward: [(0, '5705.950')] [2024-12-13 10:27:51,657][62492] Updated weights for policy 0, policy_version 14080 (0.0009) [2024-12-13 10:27:54,076][62436] Fps is (10 sec: 819.3, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 7208960. Throughput: 0: 801.3. Samples: 7210796. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:27:54,076][62436] Avg episode reward: [(0, '5755.843')] [2024-12-13 10:27:54,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000014080_7208960.pth... [2024-12-13 10:27:54,089][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000014032_7184384.pth [2024-12-13 10:27:54,090][62473] Saving new best policy, reward=5755.843! [2024-12-13 10:27:59,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 7213056. Throughput: 0: 808.8. Samples: 7213532. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:27:59,076][62436] Avg episode reward: [(0, '5715.382')] [2024-12-13 10:28:04,077][62436] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 7217152. Throughput: 0: 812.1. Samples: 7217708. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:28:04,078][62436] Avg episode reward: [(0, '5658.811')] [2024-12-13 10:28:09,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 7221248. Throughput: 0: 800.4. Samples: 7222780. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:28:09,076][62436] Avg episode reward: [(0, '5654.894')] [2024-12-13 10:28:09,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000014104_7221248.pth... [2024-12-13 10:28:09,100][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000014056_7196672.pth [2024-12-13 10:28:14,076][62436] Fps is (10 sec: 819.3, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 7225344. Throughput: 0: 798.5. Samples: 7225452. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:28:14,076][62436] Avg episode reward: [(0, '5653.536')] [2024-12-13 10:28:19,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 7229440. Throughput: 0: 818.5. Samples: 7230060. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:28:19,076][62436] Avg episode reward: [(0, '5608.867')] [2024-12-13 10:28:24,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 7233536. Throughput: 0: 784.9. Samples: 7234128. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:28:24,076][62436] Avg episode reward: [(0, '5643.767')] [2024-12-13 10:28:24,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000014128_7233536.pth... [2024-12-13 10:28:24,093][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000014080_7208960.pth [2024-12-13 10:28:29,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 7237632. Throughput: 0: 766.1. Samples: 7236044. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 10:28:29,076][62436] Avg episode reward: [(0, '5650.222')] [2024-12-13 10:28:34,076][62436] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 7237632. Throughput: 0: 779.6. Samples: 7240664. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 10:28:34,076][62436] Avg episode reward: [(0, '5649.514')] [2024-12-13 10:28:39,076][62436] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 7241728. Throughput: 0: 762.6. Samples: 7245112. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 10:28:39,076][62436] Avg episode reward: [(0, '5646.587')] [2024-12-13 10:28:39,083][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000014144_7241728.pth... [2024-12-13 10:28:39,103][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000014104_7221248.pth [2024-12-13 10:28:43,839][62492] Updated weights for policy 0, policy_version 14160 (0.0011) [2024-12-13 10:28:44,076][62436] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 7249920. Throughput: 0: 761.7. Samples: 7247808. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2024-12-13 10:28:44,077][62436] Avg episode reward: [(0, '5622.089')] [2024-12-13 10:28:49,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 7249920. Throughput: 0: 784.2. Samples: 7252996. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2024-12-13 10:28:49,076][62436] Avg episode reward: [(0, '5552.050')] [2024-12-13 10:28:54,076][62436] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 7254016. Throughput: 0: 766.6. Samples: 7257276. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2024-12-13 10:28:54,076][62436] Avg episode reward: [(0, '5455.455')] [2024-12-13 10:28:54,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000014168_7254016.pth... [2024-12-13 10:28:54,087][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000014128_7233536.pth [2024-12-13 10:28:59,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 7258112. Throughput: 0: 767.1. Samples: 7259972. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:28:59,076][62436] Avg episode reward: [(0, '5404.249')] [2024-12-13 10:29:04,076][62436] Fps is (10 sec: 819.2, 60 sec: 751.0, 300 sec: 805.3). Total num frames: 7262208. Throughput: 0: 783.3. Samples: 7265308. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:29:04,076][62436] Avg episode reward: [(0, '5369.095')] [2024-12-13 10:29:09,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 7266304. Throughput: 0: 780.6. Samples: 7269256. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:29:09,076][62436] Avg episode reward: [(0, '5367.970')] [2024-12-13 10:29:09,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000014192_7266304.pth... [2024-12-13 10:29:09,092][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000014144_7241728.pth [2024-12-13 10:29:14,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 7270400. Throughput: 0: 796.9. Samples: 7271904. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:29:14,076][62436] Avg episode reward: [(0, '5407.607')] [2024-12-13 10:29:19,075][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 7274496. Throughput: 0: 817.0. Samples: 7277428. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:29:19,076][62436] Avg episode reward: [(0, '5328.515')] [2024-12-13 10:29:24,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 7278592. Throughput: 0: 801.6. Samples: 7281184. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:29:24,076][62436] Avg episode reward: [(0, '5304.306')] [2024-12-13 10:29:24,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000014216_7278592.pth... [2024-12-13 10:29:24,089][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000014168_7254016.pth [2024-12-13 10:29:29,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 7282688. Throughput: 0: 803.6. Samples: 7283972. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:29:29,076][62436] Avg episode reward: [(0, '5321.975')] [2024-12-13 10:29:34,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 7286784. Throughput: 0: 815.5. Samples: 7289692. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:29:34,076][62436] Avg episode reward: [(0, '5262.850')] [2024-12-13 10:29:34,402][62492] Updated weights for policy 0, policy_version 14240 (0.0013) [2024-12-13 10:29:39,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 7290880. Throughput: 0: 807.0. Samples: 7293592. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:29:39,084][62436] Avg episode reward: [(0, '5277.690')] [2024-12-13 10:29:39,100][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000014240_7290880.pth... [2024-12-13 10:29:39,111][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000014192_7266304.pth [2024-12-13 10:29:44,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 7294976. Throughput: 0: 802.9. Samples: 7296104. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:29:44,076][62436] Avg episode reward: [(0, '5279.297')] [2024-12-13 10:29:49,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 7299072. Throughput: 0: 809.8. Samples: 7301748. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:29:49,076][62436] Avg episode reward: [(0, '5236.365')] [2024-12-13 10:29:54,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 7303168. Throughput: 0: 816.4. Samples: 7305992. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:29:54,076][62436] Avg episode reward: [(0, '5236.190')] [2024-12-13 10:29:54,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000014264_7303168.pth... [2024-12-13 10:29:54,091][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000014216_7278592.pth [2024-12-13 10:29:59,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 7307264. Throughput: 0: 806.8. Samples: 7308208. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-12-13 10:29:59,076][62436] Avg episode reward: [(0, '5242.887')] [2024-12-13 10:30:04,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 7311360. Throughput: 0: 809.8. Samples: 7313868. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-12-13 10:30:04,076][62436] Avg episode reward: [(0, '5345.234')] [2024-12-13 10:30:09,078][62436] Fps is (10 sec: 819.0, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 7315456. Throughput: 0: 824.5. Samples: 7318288. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-12-13 10:30:09,081][62436] Avg episode reward: [(0, '5388.632')] [2024-12-13 10:30:09,091][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000014288_7315456.pth... [2024-12-13 10:30:09,102][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000014240_7290880.pth [2024-12-13 10:30:14,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 7319552. Throughput: 0: 805.2. Samples: 7320208. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2024-12-13 10:30:14,076][62436] Avg episode reward: [(0, '5423.528')] [2024-12-13 10:30:19,076][62436] Fps is (10 sec: 819.4, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 7323648. Throughput: 0: 803.7. Samples: 7325860. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2024-12-13 10:30:19,076][62436] Avg episode reward: [(0, '5473.842')] [2024-12-13 10:30:24,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 7327744. Throughput: 0: 822.8. Samples: 7330616. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2024-12-13 10:30:24,076][62436] Avg episode reward: [(0, '5470.306')] [2024-12-13 10:30:24,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000014312_7327744.pth... [2024-12-13 10:30:24,090][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000014264_7303168.pth [2024-12-13 10:30:25,892][62492] Updated weights for policy 0, policy_version 14320 (0.0010) [2024-12-13 10:30:29,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 7331840. Throughput: 0: 804.3. Samples: 7332296. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:30:29,076][62436] Avg episode reward: [(0, '5473.451')] [2024-12-13 10:30:34,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 7335936. Throughput: 0: 805.0. Samples: 7337972. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:30:34,076][62436] Avg episode reward: [(0, '5497.724')] [2024-12-13 10:30:39,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 7340032. Throughput: 0: 818.8. Samples: 7342836. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:30:39,076][62436] Avg episode reward: [(0, '5546.819')] [2024-12-13 10:30:39,083][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000014336_7340032.pth... [2024-12-13 10:30:39,093][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000014288_7315456.pth [2024-12-13 10:30:44,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 7344128. Throughput: 0: 806.2. Samples: 7344488. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 10:30:44,076][62436] Avg episode reward: [(0, '5577.864')] [2024-12-13 10:30:49,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 7348224. Throughput: 0: 804.4. Samples: 7350068. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 10:30:49,076][62436] Avg episode reward: [(0, '5534.532')] [2024-12-13 10:30:54,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 7352320. Throughput: 0: 823.3. Samples: 7355336. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 10:30:54,076][62436] Avg episode reward: [(0, '5504.122')] [2024-12-13 10:30:54,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000014360_7352320.pth... [2024-12-13 10:30:54,088][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000014312_7327744.pth [2024-12-13 10:30:59,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 7356416. Throughput: 0: 819.5. Samples: 7357084. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 10:30:59,076][62436] Avg episode reward: [(0, '5492.524')] [2024-12-13 10:31:04,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 7360512. Throughput: 0: 810.2. Samples: 7362320. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 10:31:04,079][62436] Avg episode reward: [(0, '5480.785')] [2024-12-13 10:31:09,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 7364608. Throughput: 0: 826.7. Samples: 7367816. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 10:31:09,076][62436] Avg episode reward: [(0, '5471.348')] [2024-12-13 10:31:09,085][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000014384_7364608.pth... [2024-12-13 10:31:09,091][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000014336_7340032.pth [2024-12-13 10:31:14,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 7368704. Throughput: 0: 826.9. Samples: 7369508. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 10:31:14,076][62436] Avg episode reward: [(0, '5453.184')] [2024-12-13 10:31:15,760][62492] Updated weights for policy 0, policy_version 14400 (0.0010) [2024-12-13 10:31:19,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 7372800. Throughput: 0: 812.8. Samples: 7374548. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 10:31:19,076][62436] Avg episode reward: [(0, '5409.134')] [2024-12-13 10:31:24,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 7376896. Throughput: 0: 831.6. Samples: 7380256. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 10:31:24,076][62436] Avg episode reward: [(0, '5484.642')] [2024-12-13 10:31:24,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000014408_7376896.pth... [2024-12-13 10:31:24,086][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000014360_7352320.pth [2024-12-13 10:31:29,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 7380992. Throughput: 0: 830.7. Samples: 7381868. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 10:31:29,076][62436] Avg episode reward: [(0, '5464.215')] [2024-12-13 10:31:34,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 7385088. Throughput: 0: 815.9. Samples: 7386784. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 10:31:34,078][62436] Avg episode reward: [(0, '5498.687')] [2024-12-13 10:31:39,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 7389184. Throughput: 0: 825.3. Samples: 7392476. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 10:31:39,076][62436] Avg episode reward: [(0, '5458.122')] [2024-12-13 10:31:39,085][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000014432_7389184.pth... [2024-12-13 10:31:39,090][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000014384_7364608.pth [2024-12-13 10:31:44,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 7393280. Throughput: 0: 827.6. Samples: 7394328. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:31:44,076][62436] Avg episode reward: [(0, '5433.217')] [2024-12-13 10:31:49,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 7397376. Throughput: 0: 814.8. Samples: 7398988. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:31:49,076][62436] Avg episode reward: [(0, '5452.912')] [2024-12-13 10:31:54,079][62436] Fps is (10 sec: 818.9, 60 sec: 819.1, 300 sec: 805.3). Total num frames: 7401472. Throughput: 0: 818.4. Samples: 7404648. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:31:54,080][62436] Avg episode reward: [(0, '5420.840')] [2024-12-13 10:31:54,085][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000014456_7401472.pth... [2024-12-13 10:31:54,096][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000014408_7376896.pth [2024-12-13 10:31:59,077][62436] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 7405568. Throughput: 0: 827.7. Samples: 7406756. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 10:31:59,078][62436] Avg episode reward: [(0, '5367.404')] [2024-12-13 10:32:04,076][62436] Fps is (10 sec: 819.5, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 7409664. Throughput: 0: 812.1. Samples: 7411092. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 10:32:04,076][62436] Avg episode reward: [(0, '5368.318')] [2024-12-13 10:32:05,521][62492] Updated weights for policy 0, policy_version 14480 (0.0014) [2024-12-13 10:32:09,076][62436] Fps is (10 sec: 819.3, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 7413760. Throughput: 0: 811.6. Samples: 7416776. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:32:09,076][62436] Avg episode reward: [(0, '5336.175')] [2024-12-13 10:32:09,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000014480_7413760.pth... [2024-12-13 10:32:09,088][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000014432_7389184.pth [2024-12-13 10:32:14,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 7417856. Throughput: 0: 828.7. Samples: 7419160. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:32:14,077][62436] Avg episode reward: [(0, '5254.292')] [2024-12-13 10:32:19,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 7421952. Throughput: 0: 809.9. Samples: 7423228. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 10:32:19,076][62436] Avg episode reward: [(0, '5267.025')] [2024-12-13 10:32:24,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 7426048. Throughput: 0: 810.9. Samples: 7428968. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 10:32:24,076][62436] Avg episode reward: [(0, '5280.024')] [2024-12-13 10:32:24,085][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000014504_7426048.pth... [2024-12-13 10:32:24,092][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000014456_7401472.pth [2024-12-13 10:32:29,077][62436] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 7430144. Throughput: 0: 828.5. Samples: 7431612. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 10:32:29,077][62436] Avg episode reward: [(0, '5290.123')] [2024-12-13 10:32:34,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 7434240. Throughput: 0: 808.7. Samples: 7435380. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:32:34,076][62436] Avg episode reward: [(0, '5276.670')] [2024-12-13 10:32:39,076][62436] Fps is (10 sec: 819.3, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 7438336. Throughput: 0: 809.7. Samples: 7441080. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:32:39,076][62436] Avg episode reward: [(0, '5278.117')] [2024-12-13 10:32:39,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000014528_7438336.pth... [2024-12-13 10:32:39,091][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000014480_7413760.pth [2024-12-13 10:32:44,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 7442432. Throughput: 0: 825.0. Samples: 7443880. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:32:44,077][62436] Avg episode reward: [(0, '5279.869')] [2024-12-13 10:32:49,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 7446528. Throughput: 0: 819.2. Samples: 7447956. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2024-12-13 10:32:49,078][62436] Avg episode reward: [(0, '5260.932')] [2024-12-13 10:32:54,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.3, 300 sec: 805.3). Total num frames: 7450624. Throughput: 0: 814.7. Samples: 7453436. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2024-12-13 10:32:54,076][62436] Avg episode reward: [(0, '5357.575')] [2024-12-13 10:32:54,084][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000014552_7450624.pth... [2024-12-13 10:32:54,090][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000014504_7426048.pth [2024-12-13 10:32:54,999][62492] Updated weights for policy 0, policy_version 14560 (0.0010) [2024-12-13 10:32:59,075][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 7454720. Throughput: 0: 824.4. Samples: 7456256. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2024-12-13 10:32:59,076][62436] Avg episode reward: [(0, '5418.004')] [2024-12-13 10:33:04,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 7458816. Throughput: 0: 829.1. Samples: 7460536. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:33:04,076][62436] Avg episode reward: [(0, '5435.108')] [2024-12-13 10:33:09,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 7462912. Throughput: 0: 803.5. Samples: 7465124. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:33:09,076][62436] Avg episode reward: [(0, '5396.918')] [2024-12-13 10:33:09,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000014576_7462912.pth... [2024-12-13 10:33:09,087][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000014528_7438336.pth [2024-12-13 10:33:14,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 7467008. Throughput: 0: 785.3. Samples: 7466948. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:33:14,076][62436] Avg episode reward: [(0, '5360.095')] [2024-12-13 10:33:19,084][62436] Fps is (10 sec: 818.5, 60 sec: 819.1, 300 sec: 805.3). Total num frames: 7471104. Throughput: 0: 793.4. Samples: 7471092. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:33:19,085][62436] Avg episode reward: [(0, '5360.047')] [2024-12-13 10:33:24,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 7475200. Throughput: 0: 773.1. Samples: 7475868. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:33:24,080][62436] Avg episode reward: [(0, '5338.152')] [2024-12-13 10:33:24,086][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000014600_7475200.pth... [2024-12-13 10:33:24,092][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000014552_7450624.pth [2024-12-13 10:33:29,078][62436] Fps is (10 sec: 819.7, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 7479296. Throughput: 0: 775.7. Samples: 7478788. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:33:29,079][62436] Avg episode reward: [(0, '5327.423')] [2024-12-13 10:33:34,084][62436] Fps is (10 sec: 818.5, 60 sec: 819.1, 300 sec: 819.2). Total num frames: 7483392. Throughput: 0: 787.3. Samples: 7483392. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:33:34,085][62436] Avg episode reward: [(0, '5357.301')] [2024-12-13 10:33:39,076][62436] Fps is (10 sec: 819.4, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 7487488. Throughput: 0: 766.8. Samples: 7487944. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:33:39,076][62436] Avg episode reward: [(0, '5331.018')] [2024-12-13 10:33:39,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000014624_7487488.pth... [2024-12-13 10:33:39,091][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000014576_7462912.pth [2024-12-13 10:33:44,076][62436] Fps is (10 sec: 819.9, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 7491584. Throughput: 0: 768.2. Samples: 7490824. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:33:44,076][62436] Avg episode reward: [(0, '5335.845')] [2024-12-13 10:33:47,501][62492] Updated weights for policy 0, policy_version 14640 (0.0019) [2024-12-13 10:33:49,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 7495680. Throughput: 0: 783.1. Samples: 7495776. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:33:49,076][62436] Avg episode reward: [(0, '5442.122')] [2024-12-13 10:33:54,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 7499776. Throughput: 0: 776.4. Samples: 7500060. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-12-13 10:33:54,076][62436] Avg episode reward: [(0, '5442.489')] [2024-12-13 10:33:54,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000014648_7499776.pth... [2024-12-13 10:33:54,088][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000014600_7475200.pth [2024-12-13 10:33:59,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 7503872. Throughput: 0: 800.5. Samples: 7502972. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-12-13 10:33:59,076][62436] Avg episode reward: [(0, '5485.504')] [2024-12-13 10:34:04,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 7507968. Throughput: 0: 822.5. Samples: 7508096. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-12-13 10:34:04,076][62436] Avg episode reward: [(0, '5538.667')] [2024-12-13 10:34:09,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 7512064. Throughput: 0: 804.4. Samples: 7512064. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 10:34:09,076][62436] Avg episode reward: [(0, '5555.038')] [2024-12-13 10:34:09,083][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000014672_7512064.pth... [2024-12-13 10:34:09,088][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000014624_7487488.pth [2024-12-13 10:34:14,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 7516160. Throughput: 0: 803.2. Samples: 7514928. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 10:34:14,076][62436] Avg episode reward: [(0, '5596.182')] [2024-12-13 10:34:19,075][62436] Fps is (10 sec: 819.2, 60 sec: 819.3, 300 sec: 819.2). Total num frames: 7520256. Throughput: 0: 823.2. Samples: 7520428. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 10:34:19,076][62436] Avg episode reward: [(0, '5575.622')] [2024-12-13 10:34:24,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 7524352. Throughput: 0: 805.8. Samples: 7524204. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:34:24,076][62436] Avg episode reward: [(0, '5604.658')] [2024-12-13 10:34:24,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000014696_7524352.pth... [2024-12-13 10:34:24,088][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000014648_7499776.pth [2024-12-13 10:34:29,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 7528448. Throughput: 0: 802.7. Samples: 7526944. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:34:29,076][62436] Avg episode reward: [(0, '5555.880')] [2024-12-13 10:34:34,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.3, 300 sec: 819.2). Total num frames: 7532544. Throughput: 0: 814.8. Samples: 7532440. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:34:34,076][62436] Avg episode reward: [(0, '5538.718')] [2024-12-13 10:34:39,076][62436] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 7532544. Throughput: 0: 808.9. Samples: 7536460. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:34:39,076][62436] Avg episode reward: [(0, '5515.778')] [2024-12-13 10:34:39,208][62492] Updated weights for policy 0, policy_version 14720 (0.0011) [2024-12-13 10:34:39,215][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000014720_7536640.pth... [2024-12-13 10:34:39,221][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000014672_7512064.pth [2024-12-13 10:34:44,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 7540736. Throughput: 0: 795.5. Samples: 7538768. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 10:34:44,076][62436] Avg episode reward: [(0, '5538.554')] [2024-12-13 10:34:49,076][62436] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 7544832. Throughput: 0: 807.6. Samples: 7544440. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 10:34:49,076][62436] Avg episode reward: [(0, '5491.014')] [2024-12-13 10:34:54,076][62436] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 7544832. Throughput: 0: 815.7. Samples: 7548772. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 10:34:54,076][62436] Avg episode reward: [(0, '5519.645')] [2024-12-13 10:34:54,151][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000014744_7548928.pth... [2024-12-13 10:34:54,165][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000014696_7524352.pth [2024-12-13 10:34:59,077][62436] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 7553024. Throughput: 0: 798.5. Samples: 7550864. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:34:59,078][62436] Avg episode reward: [(0, '5541.677')] [2024-12-13 10:35:04,076][62436] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 7557120. Throughput: 0: 801.5. Samples: 7556496. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:35:04,076][62436] Avg episode reward: [(0, '5610.297')] [2024-12-13 10:35:09,076][62436] Fps is (10 sec: 409.7, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 7557120. Throughput: 0: 816.3. Samples: 7560936. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:35:09,076][62436] Avg episode reward: [(0, '5597.099')] [2024-12-13 10:35:09,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000014760_7557120.pth... [2024-12-13 10:35:09,088][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000014720_7536640.pth [2024-12-13 10:35:14,076][62436] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 7561216. Throughput: 0: 796.4. Samples: 7562780. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 10:35:14,076][62436] Avg episode reward: [(0, '5601.755')] [2024-12-13 10:35:19,076][62436] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 7569408. Throughput: 0: 799.6. Samples: 7568420. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 10:35:19,076][62436] Avg episode reward: [(0, '5728.560')] [2024-12-13 10:35:24,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 7569408. Throughput: 0: 818.8. Samples: 7573304. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 10:35:24,076][62436] Avg episode reward: [(0, '5673.036')] [2024-12-13 10:35:24,219][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000014792_7573504.pth... [2024-12-13 10:35:24,232][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000014744_7548928.pth [2024-12-13 10:35:29,076][62436] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 7573504. Throughput: 0: 806.2. Samples: 7575048. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:35:29,077][62436] Avg episode reward: [(0, '5695.872')] [2024-12-13 10:35:29,593][62492] Updated weights for policy 0, policy_version 14800 (0.0018) [2024-12-13 10:35:34,076][62436] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 7581696. Throughput: 0: 801.2. Samples: 7580492. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:35:34,076][62436] Avg episode reward: [(0, '5569.850')] [2024-12-13 10:35:39,076][62436] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 819.2). Total num frames: 7585792. Throughput: 0: 819.7. Samples: 7585660. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:35:39,076][62436] Avg episode reward: [(0, '5568.008')] [2024-12-13 10:35:39,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000014816_7585792.pth... [2024-12-13 10:35:39,088][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000014760_7557120.pth [2024-12-13 10:35:44,076][62436] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 7585792. Throughput: 0: 812.2. Samples: 7587412. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:35:44,076][62436] Avg episode reward: [(0, '5564.787')] [2024-12-13 10:35:49,076][62436] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 7589888. Throughput: 0: 803.0. Samples: 7592632. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:35:49,076][62436] Avg episode reward: [(0, '5555.379')] [2024-12-13 10:35:54,078][62436] Fps is (10 sec: 1228.5, 60 sec: 887.4, 300 sec: 819.2). Total num frames: 7598080. Throughput: 0: 825.4. Samples: 7598080. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:35:54,078][62436] Avg episode reward: [(0, '5608.058')] [2024-12-13 10:35:54,083][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000014840_7598080.pth... [2024-12-13 10:35:54,092][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000014792_7573504.pth [2024-12-13 10:35:59,076][62436] Fps is (10 sec: 819.2, 60 sec: 751.0, 300 sec: 805.3). Total num frames: 7598080. Throughput: 0: 825.0. Samples: 7599904. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:35:59,076][62436] Avg episode reward: [(0, '5607.009')] [2024-12-13 10:36:04,076][62436] Fps is (10 sec: 409.7, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 7602176. Throughput: 0: 807.2. Samples: 7604744. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:36:04,076][62436] Avg episode reward: [(0, '5606.072')] [2024-12-13 10:36:09,076][62436] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 819.2). Total num frames: 7610368. Throughput: 0: 822.0. Samples: 7610296. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:36:09,076][62436] Avg episode reward: [(0, '5590.707')] [2024-12-13 10:36:09,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000014864_7610368.pth... [2024-12-13 10:36:09,093][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000014816_7585792.pth [2024-12-13 10:36:14,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 7610368. Throughput: 0: 823.9. Samples: 7612124. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:36:14,082][62436] Avg episode reward: [(0, '5562.448')] [2024-12-13 10:36:19,076][62436] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 7614464. Throughput: 0: 803.5. Samples: 7616648. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:36:19,076][62436] Avg episode reward: [(0, '5640.932')] [2024-12-13 10:36:19,729][62492] Updated weights for policy 0, policy_version 14880 (0.0011) [2024-12-13 10:36:24,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 7618560. Throughput: 0: 814.1. Samples: 7622296. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2024-12-13 10:36:24,077][62436] Avg episode reward: [(0, '5655.309')] [2024-12-13 10:36:24,097][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000014888_7622656.pth... [2024-12-13 10:36:24,101][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000014840_7598080.pth [2024-12-13 10:36:29,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 7622656. Throughput: 0: 823.8. Samples: 7624484. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2024-12-13 10:36:29,076][62436] Avg episode reward: [(0, '5714.741')] [2024-12-13 10:36:34,078][62436] Fps is (10 sec: 819.0, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 7626752. Throughput: 0: 803.9. Samples: 7628808. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2024-12-13 10:36:34,079][62436] Avg episode reward: [(0, '5747.058')] [2024-12-13 10:36:39,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 7630848. Throughput: 0: 808.0. Samples: 7634440. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:36:39,076][62436] Avg episode reward: [(0, '5808.097')] [2024-12-13 10:36:39,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000014904_7630848.pth... [2024-12-13 10:36:39,090][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000014864_7610368.pth [2024-12-13 10:36:39,091][62473] Saving new best policy, reward=5808.097! [2024-12-13 10:36:44,076][62436] Fps is (10 sec: 819.4, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 7634944. Throughput: 0: 824.0. Samples: 7636984. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:36:44,076][62436] Avg episode reward: [(0, '5712.166')] [2024-12-13 10:36:49,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 7639040. Throughput: 0: 805.7. Samples: 7641000. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:36:49,076][62436] Avg episode reward: [(0, '5713.201')] [2024-12-13 10:36:54,076][62436] Fps is (10 sec: 819.2, 60 sec: 751.0, 300 sec: 805.3). Total num frames: 7643136. Throughput: 0: 809.6. Samples: 7646728. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:36:54,076][62436] Avg episode reward: [(0, '5681.607')] [2024-12-13 10:36:54,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000014928_7643136.pth... [2024-12-13 10:36:54,087][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000014888_7622656.pth [2024-12-13 10:36:59,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 7647232. Throughput: 0: 827.6. Samples: 7649368. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:36:59,076][62436] Avg episode reward: [(0, '5710.319')] [2024-12-13 10:37:04,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 7651328. Throughput: 0: 811.3. Samples: 7653156. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:37:04,076][62436] Avg episode reward: [(0, '5715.093')] [2024-12-13 10:37:09,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 7655424. Throughput: 0: 809.3. Samples: 7658716. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 10:37:09,076][62436] Avg episode reward: [(0, '5740.486')] [2024-12-13 10:37:09,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000014952_7655424.pth... [2024-12-13 10:37:09,089][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000014904_7630848.pth [2024-12-13 10:37:09,649][62492] Updated weights for policy 0, policy_version 14960 (0.0010) [2024-12-13 10:37:14,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 7659520. Throughput: 0: 820.6. Samples: 7661412. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 10:37:14,076][62436] Avg episode reward: [(0, '5708.182')] [2024-12-13 10:37:19,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 7663616. Throughput: 0: 809.6. Samples: 7665236. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 10:37:19,076][62436] Avg episode reward: [(0, '5708.754')] [2024-12-13 10:37:24,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 7667712. Throughput: 0: 808.1. Samples: 7670804. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-12-13 10:37:24,076][62436] Avg episode reward: [(0, '5706.656')] [2024-12-13 10:37:24,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000014976_7667712.pth... [2024-12-13 10:37:24,088][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000014928_7643136.pth [2024-12-13 10:37:29,078][62436] Fps is (10 sec: 819.0, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 7671808. Throughput: 0: 813.9. Samples: 7673612. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-12-13 10:37:29,079][62436] Avg episode reward: [(0, '5757.422')] [2024-12-13 10:37:34,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 7675904. Throughput: 0: 814.6. Samples: 7677656. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-12-13 10:37:34,076][62436] Avg episode reward: [(0, '5816.106')] [2024-12-13 10:37:34,077][62473] Saving new best policy, reward=5816.106! [2024-12-13 10:37:39,076][62436] Fps is (10 sec: 819.4, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 7680000. Throughput: 0: 805.0. Samples: 7682952. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:37:39,076][62436] Avg episode reward: [(0, '5821.111')] [2024-12-13 10:37:39,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000015000_7680000.pth... [2024-12-13 10:37:39,087][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000014952_7655424.pth [2024-12-13 10:37:39,088][62473] Saving new best policy, reward=5821.111! [2024-12-13 10:37:44,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 7684096. Throughput: 0: 806.3. Samples: 7685652. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:37:44,076][62436] Avg episode reward: [(0, '5833.277')] [2024-12-13 10:37:44,077][62473] Saving new best policy, reward=5833.277! [2024-12-13 10:37:49,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 7688192. Throughput: 0: 818.1. Samples: 7689972. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:37:49,078][62436] Avg episode reward: [(0, '5834.671')] [2024-12-13 10:37:49,081][62473] Saving new best policy, reward=5834.671! [2024-12-13 10:37:54,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 7692288. Throughput: 0: 801.2. Samples: 7694772. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:37:54,076][62436] Avg episode reward: [(0, '5838.519')] [2024-12-13 10:37:54,086][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000015024_7692288.pth... [2024-12-13 10:37:54,100][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000014976_7667712.pth [2024-12-13 10:37:54,101][62473] Saving new best policy, reward=5838.519! [2024-12-13 10:37:59,078][62436] Fps is (10 sec: 819.0, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 7696384. Throughput: 0: 780.6. Samples: 7696540. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:37:59,079][62436] Avg episode reward: [(0, '5838.006')] [2024-12-13 10:38:02,569][62492] Updated weights for policy 0, policy_version 15040 (0.0011) [2024-12-13 10:38:04,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 7700480. Throughput: 0: 783.4. Samples: 7700488. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-12-13 10:38:04,076][62436] Avg episode reward: [(0, '5788.513')] [2024-12-13 10:38:09,076][62436] Fps is (10 sec: 819.4, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 7704576. Throughput: 0: 765.6. Samples: 7705256. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-12-13 10:38:09,076][62436] Avg episode reward: [(0, '5838.859')] [2024-12-13 10:38:09,085][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000015048_7704576.pth... [2024-12-13 10:38:09,096][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000015000_7680000.pth [2024-12-13 10:38:09,097][62473] Saving new best policy, reward=5838.859! [2024-12-13 10:38:14,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 7708672. Throughput: 0: 767.7. Samples: 7708156. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-12-13 10:38:14,076][62436] Avg episode reward: [(0, '5922.130')] [2024-12-13 10:38:14,077][62473] Saving new best policy, reward=5922.130! [2024-12-13 10:38:19,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 7712768. Throughput: 0: 785.9. Samples: 7713020. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:38:19,076][62436] Avg episode reward: [(0, '5922.708')] [2024-12-13 10:38:19,077][62473] Saving new best policy, reward=5922.708! [2024-12-13 10:38:24,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 7716864. Throughput: 0: 768.3. Samples: 7717524. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:38:24,076][62436] Avg episode reward: [(0, '5831.319')] [2024-12-13 10:38:24,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000015072_7716864.pth... [2024-12-13 10:38:24,088][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000015024_7692288.pth [2024-12-13 10:38:29,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 7720960. Throughput: 0: 772.4. Samples: 7720412. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:38:29,076][62436] Avg episode reward: [(0, '5853.452')] [2024-12-13 10:38:34,075][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 7725056. Throughput: 0: 786.6. Samples: 7725368. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 10:38:34,076][62436] Avg episode reward: [(0, '5801.380')] [2024-12-13 10:38:39,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 7729152. Throughput: 0: 772.7. Samples: 7729544. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 10:38:39,076][62436] Avg episode reward: [(0, '5846.323')] [2024-12-13 10:38:39,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000015096_7729152.pth... [2024-12-13 10:38:39,089][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000015048_7704576.pth [2024-12-13 10:38:44,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 7733248. Throughput: 0: 797.6. Samples: 7732432. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 10:38:44,076][62436] Avg episode reward: [(0, '5798.245')] [2024-12-13 10:38:49,082][62436] Fps is (10 sec: 818.6, 60 sec: 819.1, 300 sec: 805.3). Total num frames: 7737344. Throughput: 0: 826.6. Samples: 7737692. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:38:49,083][62436] Avg episode reward: [(0, '5747.750')] [2024-12-13 10:38:53,319][62492] Updated weights for policy 0, policy_version 15120 (0.0018) [2024-12-13 10:38:54,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 7741440. Throughput: 0: 809.9. Samples: 7741700. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:38:54,076][62436] Avg episode reward: [(0, '5733.370')] [2024-12-13 10:38:54,084][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000015120_7741440.pth... [2024-12-13 10:38:54,092][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000015072_7716864.pth [2024-12-13 10:38:59,076][62436] Fps is (10 sec: 819.8, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 7745536. Throughput: 0: 809.8. Samples: 7744596. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:38:59,076][62436] Avg episode reward: [(0, '5761.565')] [2024-12-13 10:39:04,075][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 7749632. Throughput: 0: 820.9. Samples: 7749960. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:39:04,076][62436] Avg episode reward: [(0, '5798.547')] [2024-12-13 10:39:09,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 7753728. Throughput: 0: 804.4. Samples: 7753720. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:39:09,076][62436] Avg episode reward: [(0, '5798.547')] [2024-12-13 10:39:09,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000015144_7753728.pth... [2024-12-13 10:39:09,087][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000015096_7729152.pth [2024-12-13 10:39:14,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 7757824. Throughput: 0: 801.5. Samples: 7756480. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:39:14,080][62436] Avg episode reward: [(0, '5706.721')] [2024-12-13 10:39:19,076][62436] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 7761920. Throughput: 0: 812.3. Samples: 7761920. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:39:19,077][62436] Avg episode reward: [(0, '5687.313')] [2024-12-13 10:39:24,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 7766016. Throughput: 0: 810.5. Samples: 7766016. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:39:24,076][62436] Avg episode reward: [(0, '5656.894')] [2024-12-13 10:39:24,088][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000015168_7766016.pth... [2024-12-13 10:39:24,103][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000015120_7741440.pth [2024-12-13 10:39:29,078][62436] Fps is (10 sec: 819.0, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 7770112. Throughput: 0: 803.2. Samples: 7768576. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:39:29,079][62436] Avg episode reward: [(0, '5598.489')] [2024-12-13 10:39:34,077][62436] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 7774208. Throughput: 0: 811.5. Samples: 7774204. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:39:34,077][62436] Avg episode reward: [(0, '5536.400')] [2024-12-13 10:39:39,078][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 7778304. Throughput: 0: 817.5. Samples: 7778492. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:39:39,079][62436] Avg episode reward: [(0, '5449.424')] [2024-12-13 10:39:39,084][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000015192_7778304.pth... [2024-12-13 10:39:39,093][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000015144_7753728.pth [2024-12-13 10:39:43,509][62492] Updated weights for policy 0, policy_version 15200 (0.0011) [2024-12-13 10:39:44,076][62436] Fps is (10 sec: 819.3, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 7782400. Throughput: 0: 802.5. Samples: 7780708. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 10:39:44,076][62436] Avg episode reward: [(0, '5354.779')] [2024-12-13 10:39:49,076][62436] Fps is (10 sec: 819.4, 60 sec: 819.3, 300 sec: 819.2). Total num frames: 7786496. Throughput: 0: 807.8. Samples: 7786312. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 10:39:49,076][62436] Avg episode reward: [(0, '5383.039')] [2024-12-13 10:39:54,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 7790592. Throughput: 0: 826.2. Samples: 7790900. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 10:39:54,076][62436] Avg episode reward: [(0, '5392.249')] [2024-12-13 10:39:54,085][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000015216_7790592.pth... [2024-12-13 10:39:54,091][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000015168_7766016.pth [2024-12-13 10:39:59,077][62436] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 7794688. Throughput: 0: 808.3. Samples: 7792856. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:39:59,078][62436] Avg episode reward: [(0, '5339.462')] [2024-12-13 10:40:04,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 7798784. Throughput: 0: 812.6. Samples: 7798488. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:40:04,076][62436] Avg episode reward: [(0, '5394.225')] [2024-12-13 10:40:09,076][62436] Fps is (10 sec: 819.3, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 7802880. Throughput: 0: 828.9. Samples: 7803316. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:40:09,076][62436] Avg episode reward: [(0, '5390.361')] [2024-12-13 10:40:09,084][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000015240_7802880.pth... [2024-12-13 10:40:09,098][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000015192_7778304.pth [2024-12-13 10:40:14,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 7806976. Throughput: 0: 814.4. Samples: 7805220. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 10:40:14,076][62436] Avg episode reward: [(0, '5448.017')] [2024-12-13 10:40:19,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 7811072. Throughput: 0: 809.5. Samples: 7810632. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 10:40:19,076][62436] Avg episode reward: [(0, '5391.503')] [2024-12-13 10:40:24,077][62436] Fps is (10 sec: 819.0, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 7815168. Throughput: 0: 832.3. Samples: 7815944. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 10:40:24,078][62436] Avg episode reward: [(0, '5387.444')] [2024-12-13 10:40:24,084][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000015264_7815168.pth... [2024-12-13 10:40:24,090][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000015216_7790592.pth [2024-12-13 10:40:29,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 7819264. Throughput: 0: 825.9. Samples: 7817872. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:40:29,076][62436] Avg episode reward: [(0, '5446.451')] [2024-12-13 10:40:33,255][62492] Updated weights for policy 0, policy_version 15280 (0.0009) [2024-12-13 10:40:34,076][62436] Fps is (10 sec: 819.4, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 7823360. Throughput: 0: 811.9. Samples: 7822848. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:40:34,076][62436] Avg episode reward: [(0, '5524.593')] [2024-12-13 10:40:39,078][62436] Fps is (10 sec: 819.0, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 7827456. Throughput: 0: 832.7. Samples: 7828372. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:40:39,078][62436] Avg episode reward: [(0, '5518.416')] [2024-12-13 10:40:39,084][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000015288_7827456.pth... [2024-12-13 10:40:39,093][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000015240_7802880.pth [2024-12-13 10:40:44,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 7831552. Throughput: 0: 831.8. Samples: 7830284. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 10:40:44,076][62436] Avg episode reward: [(0, '5465.506')] [2024-12-13 10:40:49,076][62436] Fps is (10 sec: 819.4, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 7835648. Throughput: 0: 808.0. Samples: 7834848. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 10:40:49,076][62436] Avg episode reward: [(0, '5438.662')] [2024-12-13 10:40:54,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 7839744. Throughput: 0: 823.8. Samples: 7840388. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 10:40:54,076][62436] Avg episode reward: [(0, '5398.326')] [2024-12-13 10:40:54,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000015312_7839744.pth... [2024-12-13 10:40:54,087][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000015264_7815168.pth [2024-12-13 10:40:59,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 7843840. Throughput: 0: 832.1. Samples: 7842664. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 10:40:59,076][62436] Avg episode reward: [(0, '5411.282')] [2024-12-13 10:41:04,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 7847936. Throughput: 0: 809.2. Samples: 7847044. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 10:41:04,076][62436] Avg episode reward: [(0, '5450.143')] [2024-12-13 10:41:09,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 7852032. Throughput: 0: 813.3. Samples: 7852540. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 10:41:09,076][62436] Avg episode reward: [(0, '5544.609')] [2024-12-13 10:41:09,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000015336_7852032.pth... [2024-12-13 10:41:09,088][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000015288_7827456.pth [2024-12-13 10:41:14,080][62436] Fps is (10 sec: 818.8, 60 sec: 819.1, 300 sec: 819.2). Total num frames: 7856128. Throughput: 0: 827.2. Samples: 7855100. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:41:14,081][62436] Avg episode reward: [(0, '5510.526')] [2024-12-13 10:41:19,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 7860224. Throughput: 0: 806.0. Samples: 7859116. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:41:19,076][62436] Avg episode reward: [(0, '5542.520')] [2024-12-13 10:41:23,269][62492] Updated weights for policy 0, policy_version 15360 (0.0010) [2024-12-13 10:41:24,076][62436] Fps is (10 sec: 819.6, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 7864320. Throughput: 0: 806.1. Samples: 7864644. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:41:24,076][62436] Avg episode reward: [(0, '5496.113')] [2024-12-13 10:41:24,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000015360_7864320.pth... [2024-12-13 10:41:24,088][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000015312_7839744.pth [2024-12-13 10:41:29,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 7868416. Throughput: 0: 827.7. Samples: 7867532. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:41:29,076][62436] Avg episode reward: [(0, '5506.918')] [2024-12-13 10:41:34,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 7872512. Throughput: 0: 807.7. Samples: 7871196. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:41:34,076][62436] Avg episode reward: [(0, '5560.499')] [2024-12-13 10:41:39,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 7876608. Throughput: 0: 807.0. Samples: 7876704. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:41:39,076][62436] Avg episode reward: [(0, '5437.315')] [2024-12-13 10:41:39,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000015384_7876608.pth... [2024-12-13 10:41:39,088][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000015336_7852032.pth [2024-12-13 10:41:44,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 7880704. Throughput: 0: 819.7. Samples: 7879552. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 10:41:44,083][62436] Avg episode reward: [(0, '5405.844')] [2024-12-13 10:41:49,076][62436] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 7880704. Throughput: 0: 810.4. Samples: 7883512. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 10:41:49,076][62436] Avg episode reward: [(0, '5355.001')] [2024-12-13 10:41:54,077][62436] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 7888896. Throughput: 0: 806.4. Samples: 7888828. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 10:41:54,078][62436] Avg episode reward: [(0, '5290.990')] [2024-12-13 10:41:54,083][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000015408_7888896.pth... [2024-12-13 10:41:54,088][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000015360_7864320.pth [2024-12-13 10:41:59,077][62436] Fps is (10 sec: 1228.6, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 7892992. Throughput: 0: 811.6. Samples: 7891620. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:41:59,078][62436] Avg episode reward: [(0, '5289.270')] [2024-12-13 10:42:04,076][62436] Fps is (10 sec: 409.7, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 7892992. Throughput: 0: 815.9. Samples: 7895832. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:42:04,076][62436] Avg episode reward: [(0, '5257.046')] [2024-12-13 10:42:09,076][62436] Fps is (10 sec: 409.7, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 7897088. Throughput: 0: 803.0. Samples: 7900780. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:42:09,077][62436] Avg episode reward: [(0, '5256.004')] [2024-12-13 10:42:09,186][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000015432_7901184.pth... [2024-12-13 10:42:09,192][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000015384_7876608.pth [2024-12-13 10:42:13,607][62492] Updated weights for policy 0, policy_version 15440 (0.0017) [2024-12-13 10:42:14,076][62436] Fps is (10 sec: 1228.7, 60 sec: 819.3, 300 sec: 819.2). Total num frames: 7905280. Throughput: 0: 799.3. Samples: 7903500. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:42:14,077][62436] Avg episode reward: [(0, '5260.850')] [2024-12-13 10:42:19,075][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 7905280. Throughput: 0: 818.8. Samples: 7908044. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:42:19,076][62436] Avg episode reward: [(0, '5273.179')] [2024-12-13 10:42:24,076][62436] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 7909376. Throughput: 0: 804.2. Samples: 7912892. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:42:24,076][62436] Avg episode reward: [(0, '5243.360')] [2024-12-13 10:42:24,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000015448_7909376.pth... [2024-12-13 10:42:24,088][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000015408_7888896.pth [2024-12-13 10:42:29,076][62436] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 7917568. Throughput: 0: 801.8. Samples: 7915632. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2024-12-13 10:42:29,076][62436] Avg episode reward: [(0, '5185.181')] [2024-12-13 10:42:34,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 7917568. Throughput: 0: 821.9. Samples: 7920496. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2024-12-13 10:42:34,079][62436] Avg episode reward: [(0, '5184.105')] [2024-12-13 10:42:39,076][62436] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 7921664. Throughput: 0: 793.4. Samples: 7924528. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2024-12-13 10:42:39,076][62436] Avg episode reward: [(0, '5217.698')] [2024-12-13 10:42:39,085][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000015472_7921664.pth... [2024-12-13 10:42:39,092][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000015432_7901184.pth [2024-12-13 10:42:44,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 7925760. Throughput: 0: 767.6. Samples: 7926160. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:42:44,076][62436] Avg episode reward: [(0, '5268.116')] [2024-12-13 10:42:49,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 7929856. Throughput: 0: 778.0. Samples: 7930840. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:42:49,076][62436] Avg episode reward: [(0, '5267.332')] [2024-12-13 10:42:54,076][62436] Fps is (10 sec: 819.2, 60 sec: 751.0, 300 sec: 805.3). Total num frames: 7933952. Throughput: 0: 762.3. Samples: 7935084. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:42:54,076][62436] Avg episode reward: [(0, '5184.043')] [2024-12-13 10:42:54,090][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000015496_7933952.pth... [2024-12-13 10:42:54,095][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000015448_7909376.pth [2024-12-13 10:42:59,076][62436] Fps is (10 sec: 819.2, 60 sec: 751.0, 300 sec: 805.3). Total num frames: 7938048. Throughput: 0: 765.1. Samples: 7937928. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:42:59,076][62436] Avg episode reward: [(0, '5239.192')] [2024-12-13 10:43:04,082][62436] Fps is (10 sec: 818.7, 60 sec: 819.1, 300 sec: 805.3). Total num frames: 7942144. Throughput: 0: 782.8. Samples: 7943276. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:43:04,083][62436] Avg episode reward: [(0, '5243.328')] [2024-12-13 10:43:07,680][62492] Updated weights for policy 0, policy_version 15520 (0.0013) [2024-12-13 10:43:09,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 7946240. Throughput: 0: 760.5. Samples: 7947116. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 10:43:09,076][62436] Avg episode reward: [(0, '5267.711')] [2024-12-13 10:43:09,085][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000015520_7946240.pth... [2024-12-13 10:43:09,091][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000015472_7921664.pth [2024-12-13 10:43:14,076][62436] Fps is (10 sec: 819.8, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 7950336. Throughput: 0: 762.4. Samples: 7949940. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 10:43:14,076][62436] Avg episode reward: [(0, '5455.102')] [2024-12-13 10:43:19,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 7954432. Throughput: 0: 776.7. Samples: 7955448. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 10:43:19,076][62436] Avg episode reward: [(0, '5456.420')] [2024-12-13 10:43:24,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 7958528. Throughput: 0: 769.3. Samples: 7959148. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:43:24,076][62436] Avg episode reward: [(0, '5421.864')] [2024-12-13 10:43:24,086][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000015544_7958528.pth... [2024-12-13 10:43:24,100][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000015496_7933952.pth [2024-12-13 10:43:29,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 7962624. Throughput: 0: 798.1. Samples: 7962076. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:43:29,076][62436] Avg episode reward: [(0, '5421.357')] [2024-12-13 10:43:34,076][62436] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 7966720. Throughput: 0: 817.2. Samples: 7967616. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:43:34,077][62436] Avg episode reward: [(0, '5549.274')] [2024-12-13 10:43:39,079][62436] Fps is (10 sec: 818.9, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 7970816. Throughput: 0: 813.4. Samples: 7971688. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-12-13 10:43:39,079][62436] Avg episode reward: [(0, '5572.170')] [2024-12-13 10:43:39,085][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000015568_7970816.pth... [2024-12-13 10:43:39,098][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000015520_7946240.pth [2024-12-13 10:43:44,076][62436] Fps is (10 sec: 819.3, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 7974912. Throughput: 0: 806.7. Samples: 7974228. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-12-13 10:43:44,076][62436] Avg episode reward: [(0, '5655.181')] [2024-12-13 10:43:49,076][62436] Fps is (10 sec: 819.5, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 7979008. Throughput: 0: 811.3. Samples: 7979780. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-12-13 10:43:49,076][62436] Avg episode reward: [(0, '5720.145')] [2024-12-13 10:43:54,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 7983104. Throughput: 0: 820.2. Samples: 7984024. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:43:54,076][62436] Avg episode reward: [(0, '5707.911')] [2024-12-13 10:43:54,083][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000015592_7983104.pth... [2024-12-13 10:43:54,093][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000015544_7958528.pth [2024-12-13 10:43:57,562][62492] Updated weights for policy 0, policy_version 15600 (0.0009) [2024-12-13 10:43:59,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 7987200. Throughput: 0: 810.7. Samples: 7986420. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:43:59,076][62436] Avg episode reward: [(0, '5708.437')] [2024-12-13 10:44:04,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.3, 300 sec: 805.3). Total num frames: 7991296. Throughput: 0: 808.2. Samples: 7991816. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:44:04,076][62436] Avg episode reward: [(0, '5657.361')] [2024-12-13 10:44:09,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 7995392. Throughput: 0: 826.8. Samples: 7996352. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:44:09,076][62436] Avg episode reward: [(0, '5628.190')] [2024-12-13 10:44:09,090][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000015616_7995392.pth... [2024-12-13 10:44:09,102][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000015568_7970816.pth [2024-12-13 10:44:14,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 7999488. Throughput: 0: 806.8. Samples: 7998384. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:44:14,076][62436] Avg episode reward: [(0, '5666.521')] [2024-12-13 10:44:19,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 8003584. Throughput: 0: 803.8. Samples: 8003788. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:44:19,077][62436] Avg episode reward: [(0, '5655.691')] [2024-12-13 10:44:24,081][62436] Fps is (10 sec: 818.8, 60 sec: 819.1, 300 sec: 805.3). Total num frames: 8007680. Throughput: 0: 820.0. Samples: 8008588. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:44:24,089][62436] Avg episode reward: [(0, '5716.325')] [2024-12-13 10:44:24,098][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000015640_8007680.pth... [2024-12-13 10:44:24,107][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000015592_7983104.pth [2024-12-13 10:44:29,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 8011776. Throughput: 0: 805.8. Samples: 8010488. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:44:29,076][62436] Avg episode reward: [(0, '5723.609')] [2024-12-13 10:44:34,076][62436] Fps is (10 sec: 819.6, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 8015872. Throughput: 0: 803.3. Samples: 8015928. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:44:34,076][62436] Avg episode reward: [(0, '5739.428')] [2024-12-13 10:44:39,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 8019968. Throughput: 0: 820.7. Samples: 8020956. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:44:39,079][62436] Avg episode reward: [(0, '5739.293')] [2024-12-13 10:44:39,084][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000015664_8019968.pth... [2024-12-13 10:44:39,093][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000015616_7995392.pth [2024-12-13 10:44:44,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 8024064. Throughput: 0: 809.2. Samples: 8022832. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:44:44,076][62436] Avg episode reward: [(0, '5782.715')] [2024-12-13 10:44:47,778][62492] Updated weights for policy 0, policy_version 15680 (0.0010) [2024-12-13 10:44:49,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 8028160. Throughput: 0: 806.2. Samples: 8028096. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:44:49,076][62436] Avg episode reward: [(0, '5830.518')] [2024-12-13 10:44:54,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 8032256. Throughput: 0: 825.4. Samples: 8033496. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 10:44:54,084][62436] Avg episode reward: [(0, '5811.877')] [2024-12-13 10:44:54,091][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000015688_8032256.pth... [2024-12-13 10:44:54,097][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000015640_8007680.pth [2024-12-13 10:44:59,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 8036352. Throughput: 0: 822.8. Samples: 8035412. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 10:44:59,076][62436] Avg episode reward: [(0, '5782.325')] [2024-12-13 10:45:04,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 8040448. Throughput: 0: 813.2. Samples: 8040384. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 10:45:04,077][62436] Avg episode reward: [(0, '5679.788')] [2024-12-13 10:45:09,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 8044544. Throughput: 0: 822.9. Samples: 8045616. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 10:45:09,076][62436] Avg episode reward: [(0, '5715.134')] [2024-12-13 10:45:09,084][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000015712_8044544.pth... [2024-12-13 10:45:09,093][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000015664_8019968.pth [2024-12-13 10:45:14,075][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 8048640. Throughput: 0: 823.9. Samples: 8047564. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 10:45:14,076][62436] Avg episode reward: [(0, '5715.177')] [2024-12-13 10:45:19,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 8052736. Throughput: 0: 805.2. Samples: 8052164. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 10:45:19,076][62436] Avg episode reward: [(0, '5716.428')] [2024-12-13 10:45:24,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.3, 300 sec: 805.3). Total num frames: 8056832. Throughput: 0: 814.2. Samples: 8057596. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-12-13 10:45:24,076][62436] Avg episode reward: [(0, '5729.998')] [2024-12-13 10:45:24,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000015736_8056832.pth... [2024-12-13 10:45:24,093][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000015688_8032256.pth [2024-12-13 10:45:29,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 8060928. Throughput: 0: 821.8. Samples: 8059812. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-12-13 10:45:29,076][62436] Avg episode reward: [(0, '5736.876')] [2024-12-13 10:45:34,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 8065024. Throughput: 0: 802.6. Samples: 8064212. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-12-13 10:45:34,076][62436] Avg episode reward: [(0, '5734.482')] [2024-12-13 10:45:37,926][62492] Updated weights for policy 0, policy_version 15760 (0.0013) [2024-12-13 10:45:39,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 8069120. Throughput: 0: 805.4. Samples: 8069740. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 10:45:39,076][62436] Avg episode reward: [(0, '5727.737')] [2024-12-13 10:45:39,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000015760_8069120.pth... [2024-12-13 10:45:39,090][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000015712_8044544.pth [2024-12-13 10:45:44,077][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 8073216. Throughput: 0: 817.6. Samples: 8072204. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 10:45:44,078][62436] Avg episode reward: [(0, '5746.792')] [2024-12-13 10:45:49,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 8077312. Throughput: 0: 801.1. Samples: 8076432. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 10:45:49,076][62436] Avg episode reward: [(0, '5801.585')] [2024-12-13 10:45:54,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 8081408. Throughput: 0: 808.3. Samples: 8081988. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:45:54,076][62436] Avg episode reward: [(0, '5811.298')] [2024-12-13 10:45:54,083][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000015784_8081408.pth... [2024-12-13 10:45:54,089][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000015736_8056832.pth [2024-12-13 10:45:59,078][62436] Fps is (10 sec: 819.0, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 8085504. Throughput: 0: 826.9. Samples: 8084776. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:45:59,078][62436] Avg episode reward: [(0, '5765.224')] [2024-12-13 10:46:04,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 8089600. Throughput: 0: 811.6. Samples: 8088688. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:46:04,076][62436] Avg episode reward: [(0, '5760.908')] [2024-12-13 10:46:09,076][62436] Fps is (10 sec: 819.4, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 8093696. Throughput: 0: 814.2. Samples: 8094236. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:46:09,076][62436] Avg episode reward: [(0, '5761.997')] [2024-12-13 10:46:09,084][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000015808_8093696.pth... [2024-12-13 10:46:09,094][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000015760_8069120.pth [2024-12-13 10:46:14,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 8097792. Throughput: 0: 828.8. Samples: 8097108. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:46:14,076][62436] Avg episode reward: [(0, '5792.993')] [2024-12-13 10:46:19,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 8101888. Throughput: 0: 816.8. Samples: 8100968. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:46:19,077][62436] Avg episode reward: [(0, '5843.422')] [2024-12-13 10:46:24,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 8105984. Throughput: 0: 816.9. Samples: 8106500. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:46:24,076][62436] Avg episode reward: [(0, '5848.572')] [2024-12-13 10:46:24,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000015832_8105984.pth... [2024-12-13 10:46:24,089][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000015784_8081408.pth [2024-12-13 10:46:27,439][62492] Updated weights for policy 0, policy_version 15840 (0.0010) [2024-12-13 10:46:29,076][62436] Fps is (10 sec: 819.3, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 8110080. Throughput: 0: 827.5. Samples: 8109440. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:46:29,076][62436] Avg episode reward: [(0, '5836.554')] [2024-12-13 10:46:34,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 8114176. Throughput: 0: 823.2. Samples: 8113476. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:46:34,076][62436] Avg episode reward: [(0, '5842.202')] [2024-12-13 10:46:39,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 8118272. Throughput: 0: 815.5. Samples: 8118684. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-12-13 10:46:39,076][62436] Avg episode reward: [(0, '5842.142')] [2024-12-13 10:46:39,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000015856_8118272.pth... [2024-12-13 10:46:39,087][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000015808_8093696.pth [2024-12-13 10:46:44,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 8122368. Throughput: 0: 817.3. Samples: 8121552. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-12-13 10:46:44,076][62436] Avg episode reward: [(0, '5817.581')] [2024-12-13 10:46:49,077][62436] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 8126464. Throughput: 0: 829.2. Samples: 8126004. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-12-13 10:46:49,078][62436] Avg episode reward: [(0, '5750.990')] [2024-12-13 10:46:54,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 8130560. Throughput: 0: 814.5. Samples: 8130888. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 10:46:54,082][62436] Avg episode reward: [(0, '5733.577')] [2024-12-13 10:46:54,091][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000015880_8130560.pth... [2024-12-13 10:46:54,097][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000015832_8105984.pth [2024-12-13 10:46:59,076][62436] Fps is (10 sec: 819.3, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 8134656. Throughput: 0: 815.6. Samples: 8133812. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 10:46:59,077][62436] Avg episode reward: [(0, '5791.544')] [2024-12-13 10:47:04,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 8138752. Throughput: 0: 830.4. Samples: 8138336. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 10:47:04,076][62436] Avg episode reward: [(0, '5816.982')] [2024-12-13 10:47:09,076][62436] Fps is (10 sec: 819.3, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 8142848. Throughput: 0: 813.3. Samples: 8143100. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:47:09,076][62436] Avg episode reward: [(0, '5817.037')] [2024-12-13 10:47:09,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000015904_8142848.pth... [2024-12-13 10:47:09,088][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000015856_8118272.pth [2024-12-13 10:47:14,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 8146944. Throughput: 0: 812.2. Samples: 8145988. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:47:14,076][62436] Avg episode reward: [(0, '5813.745')] [2024-12-13 10:47:18,372][62492] Updated weights for policy 0, policy_version 15920 (0.0010) [2024-12-13 10:47:19,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 8151040. Throughput: 0: 828.6. Samples: 8150764. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:47:19,076][62436] Avg episode reward: [(0, '5809.611')] [2024-12-13 10:47:24,078][62436] Fps is (10 sec: 409.5, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 8151040. Throughput: 0: 792.0. Samples: 8154324. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:47:24,080][62436] Avg episode reward: [(0, '5782.152')] [2024-12-13 10:47:24,091][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000015920_8151040.pth... [2024-12-13 10:47:24,097][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000015880_8130560.pth [2024-12-13 10:47:29,076][62436] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 8155136. Throughput: 0: 770.1. Samples: 8156208. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:47:29,076][62436] Avg episode reward: [(0, '5741.615')] [2024-12-13 10:47:34,076][62436] Fps is (10 sec: 819.4, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 8159232. Throughput: 0: 784.9. Samples: 8161324. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:47:34,076][62436] Avg episode reward: [(0, '5712.267')] [2024-12-13 10:47:39,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 8163328. Throughput: 0: 774.9. Samples: 8165760. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:47:39,076][62436] Avg episode reward: [(0, '5744.135')] [2024-12-13 10:47:39,085][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000015944_8163328.pth... [2024-12-13 10:47:39,093][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000015904_8142848.pth [2024-12-13 10:47:44,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 8167424. Throughput: 0: 769.2. Samples: 8168424. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:47:44,076][62436] Avg episode reward: [(0, '5784.815')] [2024-12-13 10:47:49,076][62436] Fps is (10 sec: 819.2, 60 sec: 751.0, 300 sec: 805.3). Total num frames: 8171520. Throughput: 0: 788.0. Samples: 8173796. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:47:49,079][62436] Avg episode reward: [(0, '5819.085')] [2024-12-13 10:47:54,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 8175616. Throughput: 0: 773.7. Samples: 8177916. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:47:54,076][62436] Avg episode reward: [(0, '5818.534')] [2024-12-13 10:47:54,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000015968_8175616.pth... [2024-12-13 10:47:54,092][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000015920_8151040.pth [2024-12-13 10:47:59,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 8179712. Throughput: 0: 772.3. Samples: 8180740. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-12-13 10:47:59,076][62436] Avg episode reward: [(0, '5918.338')] [2024-12-13 10:48:04,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 8183808. Throughput: 0: 789.2. Samples: 8186280. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-12-13 10:48:04,076][62436] Avg episode reward: [(0, '5956.032')] [2024-12-13 10:48:04,077][62473] Saving new best policy, reward=5956.032! [2024-12-13 10:48:09,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 8187904. Throughput: 0: 798.1. Samples: 8190236. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-12-13 10:48:09,076][62436] Avg episode reward: [(0, '5935.241')] [2024-12-13 10:48:09,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000015992_8187904.pth... [2024-12-13 10:48:09,088][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000015944_8163328.pth [2024-12-13 10:48:10,601][62492] Updated weights for policy 0, policy_version 16000 (0.0010) [2024-12-13 10:48:14,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 8192000. Throughput: 0: 815.2. Samples: 8192892. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 10:48:14,076][62436] Avg episode reward: [(0, '5946.319')] [2024-12-13 10:48:19,075][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 8196096. Throughput: 0: 830.5. Samples: 8198696. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 10:48:19,076][62436] Avg episode reward: [(0, '5904.401')] [2024-12-13 10:48:24,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 8200192. Throughput: 0: 815.2. Samples: 8202444. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 10:48:24,076][62436] Avg episode reward: [(0, '5895.308')] [2024-12-13 10:48:24,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000016016_8200192.pth... [2024-12-13 10:48:24,086][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000015968_8175616.pth [2024-12-13 10:48:29,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 8204288. Throughput: 0: 817.3. Samples: 8205204. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:48:29,078][62436] Avg episode reward: [(0, '5881.387')] [2024-12-13 10:48:34,075][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 8208384. Throughput: 0: 823.6. Samples: 8210860. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:48:34,076][62436] Avg episode reward: [(0, '5847.239')] [2024-12-13 10:48:39,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 8212480. Throughput: 0: 820.7. Samples: 8214848. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:48:39,076][62436] Avg episode reward: [(0, '5844.828')] [2024-12-13 10:48:39,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000016040_8212480.pth... [2024-12-13 10:48:39,087][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000015992_8187904.pth [2024-12-13 10:48:44,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 8216576. Throughput: 0: 812.2. Samples: 8217288. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:48:44,076][62436] Avg episode reward: [(0, '5874.900')] [2024-12-13 10:48:49,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 8220672. Throughput: 0: 816.5. Samples: 8223024. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:48:49,076][62436] Avg episode reward: [(0, '5971.779')] [2024-12-13 10:48:49,077][62473] Saving new best policy, reward=5971.779! [2024-12-13 10:48:54,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 8224768. Throughput: 0: 823.5. Samples: 8227292. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:48:54,076][62436] Avg episode reward: [(0, '5999.741')] [2024-12-13 10:48:54,088][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000016064_8224768.pth... [2024-12-13 10:48:54,098][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000016016_8200192.pth [2024-12-13 10:48:54,103][62473] Saving new best policy, reward=5999.741! [2024-12-13 10:48:59,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 8228864. Throughput: 0: 814.8. Samples: 8229560. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 10:48:59,076][62436] Avg episode reward: [(0, '5981.508')] [2024-12-13 10:49:00,227][62492] Updated weights for policy 0, policy_version 16080 (0.0010) [2024-12-13 10:49:04,077][62436] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 8232960. Throughput: 0: 812.7. Samples: 8235268. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 10:49:04,078][62436] Avg episode reward: [(0, '5967.283')] [2024-12-13 10:49:09,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 8237056. Throughput: 0: 828.5. Samples: 8239728. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 10:49:09,076][62436] Avg episode reward: [(0, '5965.130')] [2024-12-13 10:49:09,083][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000016088_8237056.pth... [2024-12-13 10:49:09,095][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000016040_8212480.pth [2024-12-13 10:49:14,076][62436] Fps is (10 sec: 819.3, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 8241152. Throughput: 0: 810.7. Samples: 8241684. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 10:49:14,076][62436] Avg episode reward: [(0, '5968.256')] [2024-12-13 10:49:19,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 8245248. Throughput: 0: 807.9. Samples: 8247216. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 10:49:19,076][62436] Avg episode reward: [(0, '5970.218')] [2024-12-13 10:49:24,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 8249344. Throughput: 0: 827.3. Samples: 8252076. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 10:49:24,076][62436] Avg episode reward: [(0, '5996.246')] [2024-12-13 10:49:24,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000016112_8249344.pth... [2024-12-13 10:49:24,094][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000016064_8224768.pth [2024-12-13 10:49:29,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 8253440. Throughput: 0: 811.5. Samples: 8253804. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:49:29,076][62436] Avg episode reward: [(0, '6050.420')] [2024-12-13 10:49:29,077][62473] Saving new best policy, reward=6050.420! [2024-12-13 10:49:34,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 8257536. Throughput: 0: 807.9. Samples: 8259380. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:49:34,078][62436] Avg episode reward: [(0, '6047.362')] [2024-12-13 10:49:39,078][62436] Fps is (10 sec: 819.0, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 8261632. Throughput: 0: 826.3. Samples: 8264480. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:49:39,079][62436] Avg episode reward: [(0, '6036.848')] [2024-12-13 10:49:39,100][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000016136_8261632.pth... [2024-12-13 10:49:39,112][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000016088_8237056.pth [2024-12-13 10:49:44,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 8265728. Throughput: 0: 811.8. Samples: 8266092. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:49:44,076][62436] Avg episode reward: [(0, '6036.116')] [2024-12-13 10:49:49,076][62436] Fps is (10 sec: 819.4, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 8269824. Throughput: 0: 806.8. Samples: 8271572. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:49:49,078][62436] Avg episode reward: [(0, '6041.338')] [2024-12-13 10:49:50,203][62492] Updated weights for policy 0, policy_version 16160 (0.0010) [2024-12-13 10:49:54,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 8273920. Throughput: 0: 824.0. Samples: 8276808. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:49:54,076][62436] Avg episode reward: [(0, '6035.162')] [2024-12-13 10:49:54,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000016160_8273920.pth... [2024-12-13 10:49:54,088][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000016112_8249344.pth [2024-12-13 10:49:59,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 8278016. Throughput: 0: 817.2. Samples: 8278456. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 10:49:59,076][62436] Avg episode reward: [(0, '6005.386')] [2024-12-13 10:50:04,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 8282112. Throughput: 0: 809.7. Samples: 8283652. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 10:50:04,076][62436] Avg episode reward: [(0, '5970.087')] [2024-12-13 10:50:09,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 8286208. Throughput: 0: 822.0. Samples: 8289068. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 10:50:09,076][62436] Avg episode reward: [(0, '5943.203')] [2024-12-13 10:50:09,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000016184_8286208.pth... [2024-12-13 10:50:09,088][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000016136_8261632.pth [2024-12-13 10:50:14,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 8290304. Throughput: 0: 820.7. Samples: 8290736. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 10:50:14,076][62436] Avg episode reward: [(0, '5954.481')] [2024-12-13 10:50:19,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 8294400. Throughput: 0: 807.9. Samples: 8295736. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 10:50:19,076][62436] Avg episode reward: [(0, '5963.105')] [2024-12-13 10:50:24,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 8298496. Throughput: 0: 820.6. Samples: 8301404. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 10:50:24,076][62436] Avg episode reward: [(0, '5986.611')] [2024-12-13 10:50:24,089][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000016208_8298496.pth... [2024-12-13 10:50:24,094][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000016160_8273920.pth [2024-12-13 10:50:29,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 8302592. Throughput: 0: 824.1. Samples: 8303176. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:50:29,076][62436] Avg episode reward: [(0, '5963.668')] [2024-12-13 10:50:34,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 8306688. Throughput: 0: 807.0. Samples: 8307888. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:50:34,078][62436] Avg episode reward: [(0, '5951.339')] [2024-12-13 10:50:39,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 8310784. Throughput: 0: 818.4. Samples: 8313636. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:50:39,076][62436] Avg episode reward: [(0, '5922.324')] [2024-12-13 10:50:39,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000016232_8310784.pth... [2024-12-13 10:50:39,088][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000016184_8286208.pth [2024-12-13 10:50:40,358][62492] Updated weights for policy 0, policy_version 16240 (0.0010) [2024-12-13 10:50:44,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 8314880. Throughput: 0: 827.4. Samples: 8315688. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:50:44,076][62436] Avg episode reward: [(0, '5924.274')] [2024-12-13 10:50:49,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 8318976. Throughput: 0: 807.4. Samples: 8319984. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:50:49,076][62436] Avg episode reward: [(0, '5896.058')] [2024-12-13 10:50:54,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 8323072. Throughput: 0: 811.6. Samples: 8325592. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:50:54,076][62436] Avg episode reward: [(0, '5871.029')] [2024-12-13 10:50:54,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000016256_8323072.pth... [2024-12-13 10:50:54,098][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000016208_8298496.pth [2024-12-13 10:50:59,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 8327168. Throughput: 0: 828.6. Samples: 8328024. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 10:50:59,076][62436] Avg episode reward: [(0, '5820.909')] [2024-12-13 10:51:04,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 8331264. Throughput: 0: 807.7. Samples: 8332084. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 10:51:04,076][62436] Avg episode reward: [(0, '5793.476')] [2024-12-13 10:51:09,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 8335360. Throughput: 0: 809.0. Samples: 8337808. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 10:51:09,076][62436] Avg episode reward: [(0, '5768.133')] [2024-12-13 10:51:09,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000016280_8335360.pth... [2024-12-13 10:51:09,088][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000016232_8310784.pth [2024-12-13 10:51:14,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 8339456. Throughput: 0: 830.6. Samples: 8340552. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-12-13 10:51:14,076][62436] Avg episode reward: [(0, '5769.362')] [2024-12-13 10:51:19,075][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 8343552. Throughput: 0: 808.4. Samples: 8344268. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-12-13 10:51:19,076][62436] Avg episode reward: [(0, '5742.724')] [2024-12-13 10:51:24,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 8347648. Throughput: 0: 809.3. Samples: 8350056. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-12-13 10:51:24,076][62436] Avg episode reward: [(0, '5755.826')] [2024-12-13 10:51:24,087][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000016304_8347648.pth... [2024-12-13 10:51:24,094][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000016256_8323072.pth [2024-12-13 10:51:29,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 8351744. Throughput: 0: 822.9. Samples: 8352720. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:51:29,076][62436] Avg episode reward: [(0, '5762.520')] [2024-12-13 10:51:31,604][62492] Updated weights for policy 0, policy_version 16320 (0.0014) [2024-12-13 10:51:34,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 8355840. Throughput: 0: 810.8. Samples: 8356472. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:51:34,076][62436] Avg episode reward: [(0, '5770.726')] [2024-12-13 10:51:39,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 8359936. Throughput: 0: 804.4. Samples: 8361792. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:51:39,076][62436] Avg episode reward: [(0, '5737.140')] [2024-12-13 10:51:39,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000016328_8359936.pth... [2024-12-13 10:51:39,087][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000016280_8335360.pth [2024-12-13 10:51:44,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 8364032. Throughput: 0: 811.4. Samples: 8364536. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 10:51:44,076][62436] Avg episode reward: [(0, '5741.953')] [2024-12-13 10:51:49,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 8368128. Throughput: 0: 813.2. Samples: 8368680. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 10:51:49,076][62436] Avg episode reward: [(0, '5734.118')] [2024-12-13 10:51:54,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 8372224. Throughput: 0: 802.4. Samples: 8373916. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 10:51:54,076][62436] Avg episode reward: [(0, '5794.999')] [2024-12-13 10:51:54,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000016352_8372224.pth... [2024-12-13 10:51:54,088][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000016304_8347648.pth [2024-12-13 10:51:59,075][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 8376320. Throughput: 0: 801.5. Samples: 8376620. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:51:59,076][62436] Avg episode reward: [(0, '5828.410')] [2024-12-13 10:52:04,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 8380416. Throughput: 0: 816.9. Samples: 8381028. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:52:04,076][62436] Avg episode reward: [(0, '5834.398')] [2024-12-13 10:52:09,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 8384512. Throughput: 0: 765.7. Samples: 8384512. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 10:52:09,076][62436] Avg episode reward: [(0, '5835.896')] [2024-12-13 10:52:09,084][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000016376_8384512.pth... [2024-12-13 10:52:09,091][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000016328_8359936.pth [2024-12-13 10:52:14,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 8388608. Throughput: 0: 758.8. Samples: 8386868. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 10:52:14,076][62436] Avg episode reward: [(0, '5823.775')] [2024-12-13 10:52:19,076][62436] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 8388608. Throughput: 0: 785.0. Samples: 8391796. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 10:52:19,076][62436] Avg episode reward: [(0, '5802.299')] [2024-12-13 10:52:24,076][62436] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 8392704. Throughput: 0: 767.5. Samples: 8396328. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 10:52:24,076][62436] Avg episode reward: [(0, '5809.139')] [2024-12-13 10:52:24,201][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000016400_8396800.pth... [2024-12-13 10:52:24,212][62492] Updated weights for policy 0, policy_version 16400 (0.0012) [2024-12-13 10:52:24,216][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000016352_8372224.pth [2024-12-13 10:52:29,076][62436] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 8400896. Throughput: 0: 769.0. Samples: 8399140. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:52:29,076][62436] Avg episode reward: [(0, '5824.342')] [2024-12-13 10:52:34,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 8400896. Throughput: 0: 788.3. Samples: 8404152. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:52:34,076][62436] Avg episode reward: [(0, '5839.050')] [2024-12-13 10:52:39,076][62436] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 8404992. Throughput: 0: 768.4. Samples: 8408496. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:52:39,076][62436] Avg episode reward: [(0, '5869.662')] [2024-12-13 10:52:39,083][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000016416_8404992.pth... [2024-12-13 10:52:39,088][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000016376_8384512.pth [2024-12-13 10:52:44,076][62436] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 8413184. Throughput: 0: 769.4. Samples: 8411244. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:52:44,076][62436] Avg episode reward: [(0, '5890.029')] [2024-12-13 10:52:49,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 8413184. Throughput: 0: 787.6. Samples: 8416472. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:52:49,076][62436] Avg episode reward: [(0, '5920.648')] [2024-12-13 10:52:54,076][62436] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 8417280. Throughput: 0: 803.4. Samples: 8420664. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:52:54,076][62436] Avg episode reward: [(0, '5937.457')] [2024-12-13 10:52:54,083][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000016440_8417280.pth... [2024-12-13 10:52:54,091][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000016400_8396800.pth [2024-12-13 10:52:59,076][62436] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 8425472. Throughput: 0: 810.8. Samples: 8423356. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2024-12-13 10:52:59,076][62436] Avg episode reward: [(0, '6016.123')] [2024-12-13 10:53:04,075][62436] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 8429568. Throughput: 0: 827.2. Samples: 8429020. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2024-12-13 10:53:04,076][62436] Avg episode reward: [(0, '6020.687')] [2024-12-13 10:53:09,076][62436] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 8429568. Throughput: 0: 811.9. Samples: 8432864. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2024-12-13 10:53:09,076][62436] Avg episode reward: [(0, '6074.403')] [2024-12-13 10:53:09,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000016464_8429568.pth... [2024-12-13 10:53:09,088][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000016416_8404992.pth [2024-12-13 10:53:09,093][62473] Saving new best policy, reward=6074.403! [2024-12-13 10:53:14,055][62492] Updated weights for policy 0, policy_version 16480 (0.0011) [2024-12-13 10:53:14,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 8437760. Throughput: 0: 810.1. Samples: 8435596. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:53:14,076][62436] Avg episode reward: [(0, '6073.619')] [2024-12-13 10:53:19,076][62436] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 819.2). Total num frames: 8441856. Throughput: 0: 824.3. Samples: 8441244. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:53:19,076][62436] Avg episode reward: [(0, '6088.658')] [2024-12-13 10:53:19,077][62473] Saving new best policy, reward=6088.658! [2024-12-13 10:53:24,076][62436] Fps is (10 sec: 409.6, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 8441856. Throughput: 0: 815.0. Samples: 8445172. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:53:24,076][62436] Avg episode reward: [(0, '6115.916')] [2024-12-13 10:53:24,085][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000016488_8441856.pth... [2024-12-13 10:53:24,096][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000016440_8417280.pth [2024-12-13 10:53:24,097][62473] Saving new best policy, reward=6115.916! [2024-12-13 10:53:29,076][62436] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 8445952. Throughput: 0: 809.4. Samples: 8447668. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:53:29,076][62436] Avg episode reward: [(0, '6112.680')] [2024-12-13 10:53:34,076][62436] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 819.2). Total num frames: 8454144. Throughput: 0: 822.5. Samples: 8453484. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:53:34,076][62436] Avg episode reward: [(0, '6119.501')] [2024-12-13 10:53:34,077][62473] Saving new best policy, reward=6119.501! [2024-12-13 10:53:39,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 8454144. Throughput: 0: 826.2. Samples: 8457844. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:53:39,076][62436] Avg episode reward: [(0, '6174.280')] [2024-12-13 10:53:39,083][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000016512_8454144.pth... [2024-12-13 10:53:39,104][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000016464_8429568.pth [2024-12-13 10:53:39,105][62473] Saving new best policy, reward=6174.280! [2024-12-13 10:53:44,076][62436] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 8458240. Throughput: 0: 815.6. Samples: 8460060. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:53:44,076][62436] Avg episode reward: [(0, '6169.858')] [2024-12-13 10:53:49,076][62436] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 819.2). Total num frames: 8466432. Throughput: 0: 814.1. Samples: 8465656. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:53:49,076][62436] Avg episode reward: [(0, '6142.127')] [2024-12-13 10:53:54,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 8466432. Throughput: 0: 827.8. Samples: 8470116. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:53:54,076][62436] Avg episode reward: [(0, '6150.064')] [2024-12-13 10:53:54,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000016536_8466432.pth... [2024-12-13 10:53:54,088][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000016488_8441856.pth [2024-12-13 10:53:59,076][62436] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 8470528. Throughput: 0: 810.5. Samples: 8472068. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 10:53:59,076][62436] Avg episode reward: [(0, '6149.731')] [2024-12-13 10:54:03,742][62492] Updated weights for policy 0, policy_version 16560 (0.0011) [2024-12-13 10:54:04,076][62436] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 8478720. Throughput: 0: 811.3. Samples: 8477752. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 10:54:04,076][62436] Avg episode reward: [(0, '6131.496')] [2024-12-13 10:54:09,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 8478720. Throughput: 0: 829.8. Samples: 8482512. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 10:54:09,076][62436] Avg episode reward: [(0, '6148.730')] [2024-12-13 10:54:09,084][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000016560_8478720.pth... [2024-12-13 10:54:09,094][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000016512_8454144.pth [2024-12-13 10:54:14,076][62436] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 8482816. Throughput: 0: 813.0. Samples: 8484252. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-12-13 10:54:14,076][62436] Avg episode reward: [(0, '6180.779')] [2024-12-13 10:54:14,077][62473] Saving new best policy, reward=6180.779! [2024-12-13 10:54:19,076][62436] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 8491008. Throughput: 0: 809.1. Samples: 8489892. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-12-13 10:54:19,076][62436] Avg episode reward: [(0, '6216.464')] [2024-12-13 10:54:19,077][62473] Saving new best policy, reward=6216.464! [2024-12-13 10:54:24,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 8491008. Throughput: 0: 822.0. Samples: 8494836. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-12-13 10:54:24,076][62436] Avg episode reward: [(0, '6215.651')] [2024-12-13 10:54:24,088][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000016584_8491008.pth... [2024-12-13 10:54:24,099][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000016536_8466432.pth [2024-12-13 10:54:29,076][62436] Fps is (10 sec: 409.6, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 8495104. Throughput: 0: 811.2. Samples: 8496564. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:54:29,076][62436] Avg episode reward: [(0, '6215.515')] [2024-12-13 10:54:34,076][62436] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 8503296. Throughput: 0: 809.5. Samples: 8502084. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:54:34,076][62436] Avg episode reward: [(0, '6226.366')] [2024-12-13 10:54:34,077][62473] Saving new best policy, reward=6226.366! [2024-12-13 10:54:39,076][62436] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 819.2). Total num frames: 8507392. Throughput: 0: 825.8. Samples: 8507276. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 10:54:39,076][62436] Avg episode reward: [(0, '6218.920')] [2024-12-13 10:54:39,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000016616_8507392.pth... [2024-12-13 10:54:39,086][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000016560_8478720.pth [2024-12-13 10:54:44,076][62436] Fps is (10 sec: 409.6, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 8507392. Throughput: 0: 821.6. Samples: 8509040. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 10:54:44,076][62436] Avg episode reward: [(0, '6210.173')] [2024-12-13 10:54:49,076][62436] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 8511488. Throughput: 0: 811.4. Samples: 8514264. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 10:54:49,076][62436] Avg episode reward: [(0, '6220.102')] [2024-12-13 10:54:54,066][62492] Updated weights for policy 0, policy_version 16640 (0.0013) [2024-12-13 10:54:54,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 8515584. Throughput: 0: 822.7. Samples: 8519532. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 10:54:54,083][62436] Avg episode reward: [(0, '6216.663')] [2024-12-13 10:54:54,105][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000016640_8519680.pth... [2024-12-13 10:54:54,113][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000016584_8491008.pth [2024-12-13 10:54:59,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 8519680. Throughput: 0: 822.9. Samples: 8521284. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 10:54:59,076][62436] Avg episode reward: [(0, '6216.147')] [2024-12-13 10:55:04,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 8523776. Throughput: 0: 813.4. Samples: 8526496. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 10:55:04,076][62436] Avg episode reward: [(0, '6213.693')] [2024-12-13 10:55:09,076][62436] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 819.2). Total num frames: 8531968. Throughput: 0: 824.0. Samples: 8531916. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:55:09,076][62436] Avg episode reward: [(0, '6211.382')] [2024-12-13 10:55:09,083][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000016664_8531968.pth... [2024-12-13 10:55:09,098][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000016616_8507392.pth [2024-12-13 10:55:14,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 8531968. Throughput: 0: 825.7. Samples: 8533720. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:55:14,079][62436] Avg episode reward: [(0, '6211.597')] [2024-12-13 10:55:19,076][62436] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 8536064. Throughput: 0: 815.5. Samples: 8538780. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:55:19,076][62436] Avg episode reward: [(0, '6207.745')] [2024-12-13 10:55:24,077][62436] Fps is (10 sec: 1228.6, 60 sec: 887.4, 300 sec: 819.2). Total num frames: 8544256. Throughput: 0: 824.9. Samples: 8544396. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:55:24,078][62436] Avg episode reward: [(0, '6215.783')] [2024-12-13 10:55:24,083][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000016688_8544256.pth... [2024-12-13 10:55:24,093][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000016640_8519680.pth [2024-12-13 10:55:29,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 8544256. Throughput: 0: 829.9. Samples: 8546384. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:55:29,076][62436] Avg episode reward: [(0, '6162.266')] [2024-12-13 10:55:34,076][62436] Fps is (10 sec: 409.7, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 8548352. Throughput: 0: 817.7. Samples: 8551060. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:55:34,077][62436] Avg episode reward: [(0, '6159.214')] [2024-12-13 10:55:39,076][62436] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 8556544. Throughput: 0: 824.0. Samples: 8556612. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:55:39,079][62436] Avg episode reward: [(0, '6161.725')] [2024-12-13 10:55:39,084][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000016712_8556544.pth... [2024-12-13 10:55:39,090][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000016664_8531968.pth [2024-12-13 10:55:44,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 8556544. Throughput: 0: 836.2. Samples: 8558912. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:55:44,076][62436] Avg episode reward: [(0, '6153.635')] [2024-12-13 10:55:44,601][62492] Updated weights for policy 0, policy_version 16720 (0.0010) [2024-12-13 10:55:49,076][62436] Fps is (10 sec: 409.6, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 8560640. Throughput: 0: 817.2. Samples: 8563272. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:55:49,076][62436] Avg episode reward: [(0, '6148.736')] [2024-12-13 10:55:54,076][62436] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 819.2). Total num frames: 8568832. Throughput: 0: 820.4. Samples: 8568832. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 10:55:54,076][62436] Avg episode reward: [(0, '6183.850')] [2024-12-13 10:55:54,080][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000016736_8568832.pth... [2024-12-13 10:55:54,088][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000016688_8544256.pth [2024-12-13 10:55:59,078][62436] Fps is (10 sec: 819.0, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 8568832. Throughput: 0: 835.7. Samples: 8571328. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 10:55:59,081][62436] Avg episode reward: [(0, '6183.236')] [2024-12-13 10:56:04,076][62436] Fps is (10 sec: 409.6, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 8572928. Throughput: 0: 812.1. Samples: 8575324. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 10:56:04,079][62436] Avg episode reward: [(0, '6177.615')] [2024-12-13 10:56:09,076][62436] Fps is (10 sec: 1229.1, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 8581120. Throughput: 0: 813.3. Samples: 8580992. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2024-12-13 10:56:09,076][62436] Avg episode reward: [(0, '6217.699')] [2024-12-13 10:56:09,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000016760_8581120.pth... [2024-12-13 10:56:09,088][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000016712_8556544.pth [2024-12-13 10:56:14,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 8581120. Throughput: 0: 829.8. Samples: 8583724. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2024-12-13 10:56:14,077][62436] Avg episode reward: [(0, '6165.610')] [2024-12-13 10:56:19,076][62436] Fps is (10 sec: 409.6, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 8585216. Throughput: 0: 808.7. Samples: 8587452. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2024-12-13 10:56:19,076][62436] Avg episode reward: [(0, '5995.444')] [2024-12-13 10:56:24,076][62436] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 8593408. Throughput: 0: 812.7. Samples: 8593184. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:56:24,076][62436] Avg episode reward: [(0, '6001.961')] [2024-12-13 10:56:24,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000016784_8593408.pth... [2024-12-13 10:56:24,086][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000016736_8568832.pth [2024-12-13 10:56:29,075][62436] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 819.2). Total num frames: 8597504. Throughput: 0: 823.2. Samples: 8595956. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:56:29,076][62436] Avg episode reward: [(0, '6001.587')] [2024-12-13 10:56:34,076][62436] Fps is (10 sec: 409.6, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 8597504. Throughput: 0: 817.1. Samples: 8600040. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:56:34,076][62436] Avg episode reward: [(0, '5998.223')] [2024-12-13 10:56:34,647][62492] Updated weights for policy 0, policy_version 16800 (0.0010) [2024-12-13 10:56:39,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 8605696. Throughput: 0: 814.4. Samples: 8605480. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 10:56:39,076][62436] Avg episode reward: [(0, '5933.409')] [2024-12-13 10:56:39,083][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000016808_8605696.pth... [2024-12-13 10:56:39,092][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000016760_8581120.pth [2024-12-13 10:56:44,076][62436] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 819.2). Total num frames: 8609792. Throughput: 0: 819.1. Samples: 8608184. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 10:56:44,076][62436] Avg episode reward: [(0, '5851.724')] [2024-12-13 10:56:49,076][62436] Fps is (10 sec: 409.6, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 8609792. Throughput: 0: 818.6. Samples: 8612160. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 10:56:49,078][62436] Avg episode reward: [(0, '5851.704')] [2024-12-13 10:56:54,076][62436] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 8613888. Throughput: 0: 764.7. Samples: 8615404. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:56:54,076][62436] Avg episode reward: [(0, '5852.510')] [2024-12-13 10:56:54,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000016824_8613888.pth... [2024-12-13 10:56:54,088][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000016784_8593408.pth [2024-12-13 10:56:59,075][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 8617984. Throughput: 0: 764.6. Samples: 8618132. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:56:59,076][62436] Avg episode reward: [(0, '5850.930')] [2024-12-13 10:57:04,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 8622080. Throughput: 0: 803.2. Samples: 8623596. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:57:04,076][62436] Avg episode reward: [(0, '5853.940')] [2024-12-13 10:57:09,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 8626176. Throughput: 0: 764.0. Samples: 8627564. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:57:09,076][62436] Avg episode reward: [(0, '5833.066')] [2024-12-13 10:57:09,083][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000016848_8626176.pth... [2024-12-13 10:57:09,093][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000016808_8605696.pth [2024-12-13 10:57:14,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 8630272. Throughput: 0: 765.0. Samples: 8630380. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:57:14,076][62436] Avg episode reward: [(0, '5839.658')] [2024-12-13 10:57:19,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 8634368. Throughput: 0: 798.5. Samples: 8635972. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:57:19,076][62436] Avg episode reward: [(0, '5841.993')] [2024-12-13 10:57:24,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 8638464. Throughput: 0: 762.7. Samples: 8639800. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:57:24,081][62436] Avg episode reward: [(0, '5838.363')] [2024-12-13 10:57:24,085][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000016872_8638464.pth... [2024-12-13 10:57:24,094][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000016824_8613888.pth [2024-12-13 10:57:26,653][62492] Updated weights for policy 0, policy_version 16880 (0.0010) [2024-12-13 10:57:29,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 819.2). Total num frames: 8642560. Throughput: 0: 763.9. Samples: 8642560. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:57:29,076][62436] Avg episode reward: [(0, '5894.251')] [2024-12-13 10:57:34,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 8646656. Throughput: 0: 800.4. Samples: 8648180. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:57:34,076][62436] Avg episode reward: [(0, '5896.325')] [2024-12-13 10:57:39,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 8650752. Throughput: 0: 820.1. Samples: 8652308. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:57:39,076][62436] Avg episode reward: [(0, '5913.230')] [2024-12-13 10:57:39,089][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000016896_8650752.pth... [2024-12-13 10:57:39,095][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000016848_8626176.pth [2024-12-13 10:57:44,077][62436] Fps is (10 sec: 819.1, 60 sec: 750.9, 300 sec: 819.2). Total num frames: 8654848. Throughput: 0: 814.1. Samples: 8654768. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:57:44,079][62436] Avg episode reward: [(0, '5932.011')] [2024-12-13 10:57:49,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 8658944. Throughput: 0: 814.9. Samples: 8660268. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:57:49,076][62436] Avg episode reward: [(0, '5838.467')] [2024-12-13 10:57:54,076][62436] Fps is (10 sec: 819.3, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 8663040. Throughput: 0: 815.2. Samples: 8664248. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 10:57:54,076][62436] Avg episode reward: [(0, '5842.515')] [2024-12-13 10:57:54,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000016920_8663040.pth... [2024-12-13 10:57:54,095][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000016872_8638464.pth [2024-12-13 10:57:59,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 8667136. Throughput: 0: 802.8. Samples: 8666504. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 10:57:59,076][62436] Avg episode reward: [(0, '5854.827')] [2024-12-13 10:58:04,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 8671232. Throughput: 0: 801.2. Samples: 8672028. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 10:58:04,076][62436] Avg episode reward: [(0, '5825.993')] [2024-12-13 10:58:09,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 8675328. Throughput: 0: 817.0. Samples: 8676564. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2024-12-13 10:58:09,076][62436] Avg episode reward: [(0, '5842.715')] [2024-12-13 10:58:09,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000016944_8675328.pth... [2024-12-13 10:58:09,088][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000016896_8650752.pth [2024-12-13 10:58:14,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 8679424. Throughput: 0: 803.1. Samples: 8678700. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2024-12-13 10:58:14,076][62436] Avg episode reward: [(0, '5884.171')] [2024-12-13 10:58:16,877][62492] Updated weights for policy 0, policy_version 16960 (0.0009) [2024-12-13 10:58:19,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 8683520. Throughput: 0: 801.1. Samples: 8684228. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2024-12-13 10:58:19,076][62436] Avg episode reward: [(0, '5834.148')] [2024-12-13 10:58:24,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 8687616. Throughput: 0: 815.6. Samples: 8689012. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 10:58:24,077][62436] Avg episode reward: [(0, '5822.775')] [2024-12-13 10:58:24,084][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000016968_8687616.pth... [2024-12-13 10:58:24,094][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000016920_8663040.pth [2024-12-13 10:58:29,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 8691712. Throughput: 0: 803.7. Samples: 8690932. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 10:58:29,076][62436] Avg episode reward: [(0, '5870.836')] [2024-12-13 10:58:34,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 8695808. Throughput: 0: 802.1. Samples: 8696364. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 10:58:34,076][62436] Avg episode reward: [(0, '5890.566')] [2024-12-13 10:58:39,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 8699904. Throughput: 0: 826.8. Samples: 8701452. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-12-13 10:58:39,076][62436] Avg episode reward: [(0, '5890.566')] [2024-12-13 10:58:39,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000016992_8699904.pth... [2024-12-13 10:58:39,092][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000016944_8675328.pth [2024-12-13 10:58:44,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 8704000. Throughput: 0: 819.3. Samples: 8703372. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-12-13 10:58:44,076][62436] Avg episode reward: [(0, '5927.762')] [2024-12-13 10:58:49,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 8708096. Throughput: 0: 811.8. Samples: 8708560. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-12-13 10:58:49,076][62436] Avg episode reward: [(0, '5915.757')] [2024-12-13 10:58:54,077][62436] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 8712192. Throughput: 0: 831.0. Samples: 8713960. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:58:54,080][62436] Avg episode reward: [(0, '5909.463')] [2024-12-13 10:58:54,087][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000017016_8712192.pth... [2024-12-13 10:58:54,095][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000016968_8687616.pth [2024-12-13 10:58:59,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 8716288. Throughput: 0: 826.5. Samples: 8715892. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:58:59,076][62436] Avg episode reward: [(0, '5939.046')] [2024-12-13 10:59:04,079][62436] Fps is (10 sec: 819.0, 60 sec: 819.1, 300 sec: 819.2). Total num frames: 8720384. Throughput: 0: 810.2. Samples: 8720692. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:59:04,080][62436] Avg episode reward: [(0, '5942.022')] [2024-12-13 10:59:06,590][62492] Updated weights for policy 0, policy_version 17040 (0.0010) [2024-12-13 10:59:09,078][62436] Fps is (10 sec: 819.0, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 8724480. Throughput: 0: 831.0. Samples: 8726408. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:59:09,079][62436] Avg episode reward: [(0, '5940.185')] [2024-12-13 10:59:09,085][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000017040_8724480.pth... [2024-12-13 10:59:09,094][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000016992_8699904.pth [2024-12-13 10:59:14,076][62436] Fps is (10 sec: 819.5, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 8728576. Throughput: 0: 831.4. Samples: 8728344. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:59:14,076][62436] Avg episode reward: [(0, '5889.018')] [2024-12-13 10:59:19,076][62436] Fps is (10 sec: 819.4, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 8732672. Throughput: 0: 811.6. Samples: 8732888. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:59:19,076][62436] Avg episode reward: [(0, '5802.668')] [2024-12-13 10:59:24,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 8736768. Throughput: 0: 826.1. Samples: 8738628. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:59:24,076][62436] Avg episode reward: [(0, '5787.631')] [2024-12-13 10:59:24,083][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000017064_8736768.pth... [2024-12-13 10:59:24,094][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000017016_8712192.pth [2024-12-13 10:59:29,078][62436] Fps is (10 sec: 819.0, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 8740864. Throughput: 0: 831.0. Samples: 8740768. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:59:29,079][62436] Avg episode reward: [(0, '5788.788')] [2024-12-13 10:59:34,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 8744960. Throughput: 0: 810.7. Samples: 8745040. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:59:34,076][62436] Avg episode reward: [(0, '5894.731')] [2024-12-13 10:59:39,076][62436] Fps is (10 sec: 819.4, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 8749056. Throughput: 0: 813.7. Samples: 8750576. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 10:59:39,076][62436] Avg episode reward: [(0, '5893.610')] [2024-12-13 10:59:39,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000017088_8749056.pth... [2024-12-13 10:59:39,088][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000017040_8724480.pth [2024-12-13 10:59:44,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 8753152. Throughput: 0: 824.1. Samples: 8752976. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 10:59:44,079][62436] Avg episode reward: [(0, '5853.340')] [2024-12-13 10:59:49,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 8757248. Throughput: 0: 807.4. Samples: 8757020. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 10:59:49,076][62436] Avg episode reward: [(0, '5865.683')] [2024-12-13 10:59:54,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 8761344. Throughput: 0: 802.1. Samples: 8762500. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:59:54,076][62436] Avg episode reward: [(0, '5815.915')] [2024-12-13 10:59:54,083][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000017112_8761344.pth... [2024-12-13 10:59:54,089][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000017064_8736768.pth [2024-12-13 10:59:57,118][62492] Updated weights for policy 0, policy_version 17120 (0.0012) [2024-12-13 10:59:59,076][62436] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 8765440. Throughput: 0: 820.0. Samples: 8765244. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 10:59:59,078][62436] Avg episode reward: [(0, '5814.759')] [2024-12-13 11:00:04,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.3, 300 sec: 805.3). Total num frames: 8769536. Throughput: 0: 802.0. Samples: 8768980. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:00:04,076][62436] Avg episode reward: [(0, '5872.398')] [2024-12-13 11:00:09,076][62436] Fps is (10 sec: 819.3, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 8773632. Throughput: 0: 793.7. Samples: 8774344. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-12-13 11:00:09,076][62436] Avg episode reward: [(0, '5892.716')] [2024-12-13 11:00:09,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000017136_8773632.pth... [2024-12-13 11:00:09,087][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000017088_8749056.pth [2024-12-13 11:00:14,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 8777728. Throughput: 0: 809.1. Samples: 8777176. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-12-13 11:00:14,076][62436] Avg episode reward: [(0, '6039.040')] [2024-12-13 11:00:19,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 8781824. Throughput: 0: 797.1. Samples: 8780908. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-12-13 11:00:19,076][62436] Avg episode reward: [(0, '6039.729')] [2024-12-13 11:00:24,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 8785920. Throughput: 0: 788.6. Samples: 8786064. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 11:00:24,076][62436] Avg episode reward: [(0, '6059.212')] [2024-12-13 11:00:24,086][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000017160_8785920.pth... [2024-12-13 11:00:24,091][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000017112_8761344.pth [2024-12-13 11:00:29,078][62436] Fps is (10 sec: 819.0, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 8790016. Throughput: 0: 799.1. Samples: 8788940. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 11:00:29,080][62436] Avg episode reward: [(0, '6064.189')] [2024-12-13 11:00:34,076][62436] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 8790016. Throughput: 0: 800.5. Samples: 8793044. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 11:00:34,076][62436] Avg episode reward: [(0, '6064.845')] [2024-12-13 11:00:39,076][62436] Fps is (10 sec: 819.4, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 8798208. Throughput: 0: 791.6. Samples: 8798120. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 11:00:39,076][62436] Avg episode reward: [(0, '6102.697')] [2024-12-13 11:00:39,084][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000017184_8798208.pth... [2024-12-13 11:00:39,091][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000017136_8773632.pth [2024-12-13 11:00:44,076][62436] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 8802304. Throughput: 0: 791.3. Samples: 8800852. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 11:00:44,076][62436] Avg episode reward: [(0, '6109.375')] [2024-12-13 11:00:49,076][62436] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 8802304. Throughput: 0: 806.5. Samples: 8805272. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 11:00:49,076][62436] Avg episode reward: [(0, '6128.626')] [2024-12-13 11:00:49,687][62492] Updated weights for policy 0, policy_version 17200 (0.0011) [2024-12-13 11:00:54,076][62436] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 8806400. Throughput: 0: 792.6. Samples: 8810012. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 11:00:54,076][62436] Avg episode reward: [(0, '6128.516')] [2024-12-13 11:00:54,083][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000017200_8806400.pth... [2024-12-13 11:00:54,091][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000017160_8785920.pth [2024-12-13 11:00:59,079][62436] Fps is (10 sec: 1228.3, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 8814592. Throughput: 0: 788.1. Samples: 8812644. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 11:00:59,080][62436] Avg episode reward: [(0, '6065.790')] [2024-12-13 11:01:04,076][62436] Fps is (10 sec: 819.1, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 8814592. Throughput: 0: 811.4. Samples: 8817420. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 11:01:04,077][62436] Avg episode reward: [(0, '6101.376')] [2024-12-13 11:01:09,076][62436] Fps is (10 sec: 409.8, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 8818688. Throughput: 0: 783.9. Samples: 8821340. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 11:01:09,076][62436] Avg episode reward: [(0, '6130.947')] [2024-12-13 11:01:09,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000017224_8818688.pth... [2024-12-13 11:01:09,087][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000017184_8798208.pth [2024-12-13 11:01:14,078][62436] Fps is (10 sec: 819.0, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 8822784. Throughput: 0: 781.0. Samples: 8824084. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:01:14,079][62436] Avg episode reward: [(0, '6152.367')] [2024-12-13 11:01:19,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 8826880. Throughput: 0: 804.4. Samples: 8829240. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:01:19,076][62436] Avg episode reward: [(0, '6211.543')] [2024-12-13 11:01:24,076][62436] Fps is (10 sec: 819.4, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 8830976. Throughput: 0: 781.3. Samples: 8833280. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:01:24,076][62436] Avg episode reward: [(0, '6227.861')] [2024-12-13 11:01:24,083][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000017248_8830976.pth... [2024-12-13 11:01:24,090][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000017200_8806400.pth [2024-12-13 11:01:24,091][62473] Saving new best policy, reward=6227.861! [2024-12-13 11:01:29,076][62436] Fps is (10 sec: 819.2, 60 sec: 751.0, 300 sec: 805.3). Total num frames: 8835072. Throughput: 0: 764.0. Samples: 8835232. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 11:01:29,076][62436] Avg episode reward: [(0, '6226.730')] [2024-12-13 11:01:34,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 8839168. Throughput: 0: 762.5. Samples: 8839584. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 11:01:34,079][62436] Avg episode reward: [(0, '6214.558')] [2024-12-13 11:01:39,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 8843264. Throughput: 0: 738.9. Samples: 8843264. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:01:39,076][62436] Avg episode reward: [(0, '6203.292')] [2024-12-13 11:01:39,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000017272_8843264.pth... [2024-12-13 11:01:39,088][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000017224_8818688.pth [2024-12-13 11:01:43,056][62492] Updated weights for policy 0, policy_version 17280 (0.0011) [2024-12-13 11:01:44,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 8847360. Throughput: 0: 744.5. Samples: 8846144. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:01:44,076][62436] Avg episode reward: [(0, '6211.980')] [2024-12-13 11:01:49,083][62436] Fps is (10 sec: 818.6, 60 sec: 819.1, 300 sec: 805.3). Total num frames: 8851456. Throughput: 0: 756.2. Samples: 8851456. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:01:49,084][62436] Avg episode reward: [(0, '6190.156')] [2024-12-13 11:01:54,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 8855552. Throughput: 0: 757.7. Samples: 8855436. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:01:54,076][62436] Avg episode reward: [(0, '6236.291')] [2024-12-13 11:01:54,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000017296_8855552.pth... [2024-12-13 11:01:54,088][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000017248_8830976.pth [2024-12-13 11:01:54,092][62473] Saving new best policy, reward=6236.291! [2024-12-13 11:01:59,076][62436] Fps is (10 sec: 819.8, 60 sec: 751.0, 300 sec: 805.3). Total num frames: 8859648. Throughput: 0: 753.3. Samples: 8857980. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:01:59,076][62436] Avg episode reward: [(0, '6257.087')] [2024-12-13 11:01:59,077][62473] Saving new best policy, reward=6257.087! [2024-12-13 11:02:04,078][62436] Fps is (10 sec: 819.0, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 8863744. Throughput: 0: 764.8. Samples: 8863660. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:02:04,079][62436] Avg episode reward: [(0, '6323.357')] [2024-12-13 11:02:04,080][62473] Saving new best policy, reward=6323.357! [2024-12-13 11:02:09,079][62436] Fps is (10 sec: 818.9, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 8867840. Throughput: 0: 766.9. Samples: 8867792. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:02:09,080][62436] Avg episode reward: [(0, '6324.307')] [2024-12-13 11:02:09,086][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000017320_8867840.pth... [2024-12-13 11:02:09,097][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000017272_8843264.pth [2024-12-13 11:02:09,107][62473] Saving new best policy, reward=6324.307! [2024-12-13 11:02:14,076][62436] Fps is (10 sec: 819.4, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 8871936. Throughput: 0: 774.2. Samples: 8870072. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:02:14,076][62436] Avg episode reward: [(0, '6325.875')] [2024-12-13 11:02:14,079][62473] Saving new best policy, reward=6325.875! [2024-12-13 11:02:19,076][62436] Fps is (10 sec: 819.5, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 8876032. Throughput: 0: 802.7. Samples: 8875704. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:02:19,076][62436] Avg episode reward: [(0, '6325.930')] [2024-12-13 11:02:19,077][62473] Saving new best policy, reward=6325.930! [2024-12-13 11:02:24,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 8880128. Throughput: 0: 818.5. Samples: 8880096. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:02:24,076][62436] Avg episode reward: [(0, '6325.298')] [2024-12-13 11:02:24,083][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000017344_8880128.pth... [2024-12-13 11:02:24,095][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000017296_8855552.pth [2024-12-13 11:02:29,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 8884224. Throughput: 0: 800.5. Samples: 8882168. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:02:29,076][62436] Avg episode reward: [(0, '6348.202')] [2024-12-13 11:02:29,079][62473] Saving new best policy, reward=6348.202! [2024-12-13 11:02:33,279][62492] Updated weights for policy 0, policy_version 17360 (0.0010) [2024-12-13 11:02:34,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 8888320. Throughput: 0: 808.1. Samples: 8887816. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:02:34,076][62436] Avg episode reward: [(0, '6351.909')] [2024-12-13 11:02:34,077][62473] Saving new best policy, reward=6351.909! [2024-12-13 11:02:39,079][62436] Fps is (10 sec: 818.9, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 8892416. Throughput: 0: 821.7. Samples: 8892416. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 11:02:39,079][62436] Avg episode reward: [(0, '6353.253')] [2024-12-13 11:02:39,084][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000017368_8892416.pth... [2024-12-13 11:02:39,097][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000017320_8867840.pth [2024-12-13 11:02:39,113][62473] Saving new best policy, reward=6353.253! [2024-12-13 11:02:44,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 8896512. Throughput: 0: 806.4. Samples: 8894268. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 11:02:44,076][62436] Avg episode reward: [(0, '6333.489')] [2024-12-13 11:02:49,076][62436] Fps is (10 sec: 819.5, 60 sec: 819.3, 300 sec: 805.3). Total num frames: 8900608. Throughput: 0: 804.7. Samples: 8899868. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 11:02:49,076][62436] Avg episode reward: [(0, '6337.474')] [2024-12-13 11:02:54,076][62436] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 8904704. Throughput: 0: 820.0. Samples: 8904692. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-12-13 11:02:54,077][62436] Avg episode reward: [(0, '6340.156')] [2024-12-13 11:02:54,085][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000017392_8904704.pth... [2024-12-13 11:02:54,099][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000017344_8880128.pth [2024-12-13 11:02:59,076][62436] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 8904704. Throughput: 0: 807.5. Samples: 8906408. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-12-13 11:02:59,076][62436] Avg episode reward: [(0, '6300.243')] [2024-12-13 11:03:04,076][62436] Fps is (10 sec: 819.3, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 8912896. Throughput: 0: 803.7. Samples: 8911872. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-12-13 11:03:04,079][62436] Avg episode reward: [(0, '6362.524')] [2024-12-13 11:03:04,080][62473] Saving new best policy, reward=6362.524! [2024-12-13 11:03:09,076][62436] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 8916992. Throughput: 0: 819.9. Samples: 8916992. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2024-12-13 11:03:09,076][62436] Avg episode reward: [(0, '6363.992')] [2024-12-13 11:03:09,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000017416_8916992.pth... [2024-12-13 11:03:09,095][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000017368_8892416.pth [2024-12-13 11:03:09,096][62473] Saving new best policy, reward=6363.992! [2024-12-13 11:03:14,076][62436] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 8916992. Throughput: 0: 815.5. Samples: 8918864. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2024-12-13 11:03:14,076][62436] Avg episode reward: [(0, '6367.508')] [2024-12-13 11:03:14,077][62473] Saving new best policy, reward=6367.508! [2024-12-13 11:03:19,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 8925184. Throughput: 0: 805.7. Samples: 8924072. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2024-12-13 11:03:19,078][62436] Avg episode reward: [(0, '6368.688')] [2024-12-13 11:03:19,079][62473] Saving new best policy, reward=6368.688! [2024-12-13 11:03:23,361][62492] Updated weights for policy 0, policy_version 17440 (0.0010) [2024-12-13 11:03:24,076][62436] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 8929280. Throughput: 0: 823.3. Samples: 8929460. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:03:24,076][62436] Avg episode reward: [(0, '6389.227')] [2024-12-13 11:03:24,084][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000017440_8929280.pth... [2024-12-13 11:03:24,091][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000017392_8904704.pth [2024-12-13 11:03:24,091][62473] Saving new best policy, reward=6389.227! [2024-12-13 11:03:29,076][62436] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 8929280. Throughput: 0: 823.5. Samples: 8931324. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:03:29,076][62436] Avg episode reward: [(0, '6345.871')] [2024-12-13 11:03:34,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 8937472. Throughput: 0: 810.1. Samples: 8936324. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:03:34,076][62436] Avg episode reward: [(0, '6312.928')] [2024-12-13 11:03:39,080][62436] Fps is (10 sec: 1228.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 8941568. Throughput: 0: 826.1. Samples: 8941868. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:03:39,083][62436] Avg episode reward: [(0, '6261.739')] [2024-12-13 11:03:39,099][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000017464_8941568.pth... [2024-12-13 11:03:39,105][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000017416_8916992.pth [2024-12-13 11:03:44,076][62436] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 8941568. Throughput: 0: 832.3. Samples: 8943860. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:03:44,076][62436] Avg episode reward: [(0, '6261.511')] [2024-12-13 11:03:49,076][62436] Fps is (10 sec: 409.8, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 8945664. Throughput: 0: 811.9. Samples: 8948408. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:03:49,076][62436] Avg episode reward: [(0, '6287.506')] [2024-12-13 11:03:54,076][62436] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 8953856. Throughput: 0: 820.4. Samples: 8953912. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:03:54,076][62436] Avg episode reward: [(0, '6248.691')] [2024-12-13 11:03:54,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000017488_8953856.pth... [2024-12-13 11:03:54,089][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000017440_8929280.pth [2024-12-13 11:03:59,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 8953856. Throughput: 0: 828.9. Samples: 8956164. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:03:59,076][62436] Avg episode reward: [(0, '6248.691')] [2024-12-13 11:04:04,076][62436] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 8957952. Throughput: 0: 809.0. Samples: 8960476. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:04:04,076][62436] Avg episode reward: [(0, '6197.092')] [2024-12-13 11:04:09,076][62436] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 8966144. Throughput: 0: 815.2. Samples: 8966144. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 11:04:09,076][62436] Avg episode reward: [(0, '6111.172')] [2024-12-13 11:04:09,083][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000017512_8966144.pth... [2024-12-13 11:04:09,089][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000017464_8941568.pth [2024-12-13 11:04:14,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 8966144. Throughput: 0: 827.5. Samples: 8968560. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 11:04:14,076][62436] Avg episode reward: [(0, '6108.555')] [2024-12-13 11:04:14,437][62492] Updated weights for policy 0, policy_version 17520 (0.0010) [2024-12-13 11:04:19,076][62436] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 8970240. Throughput: 0: 807.2. Samples: 8972648. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 11:04:19,076][62436] Avg episode reward: [(0, '6106.782')] [2024-12-13 11:04:24,076][62436] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 8978432. Throughput: 0: 811.0. Samples: 8978360. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2024-12-13 11:04:24,076][62436] Avg episode reward: [(0, '6084.313')] [2024-12-13 11:04:24,083][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000017536_8978432.pth... [2024-12-13 11:04:24,089][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000017488_8953856.pth [2024-12-13 11:04:29,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 8978432. Throughput: 0: 826.2. Samples: 8981040. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2024-12-13 11:04:29,076][62436] Avg episode reward: [(0, '6077.921')] [2024-12-13 11:04:34,076][62436] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 8982528. Throughput: 0: 809.9. Samples: 8984852. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2024-12-13 11:04:34,076][62436] Avg episode reward: [(0, '6082.357')] [2024-12-13 11:04:39,076][62436] Fps is (10 sec: 1228.8, 60 sec: 819.3, 300 sec: 805.3). Total num frames: 8990720. Throughput: 0: 813.8. Samples: 8990532. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2024-12-13 11:04:39,076][62436] Avg episode reward: [(0, '6067.961')] [2024-12-13 11:04:39,086][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000017560_8990720.pth... [2024-12-13 11:04:39,091][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000017512_8966144.pth [2024-12-13 11:04:44,076][62436] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 805.3). Total num frames: 8994816. Throughput: 0: 825.0. Samples: 8993288. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2024-12-13 11:04:44,076][62436] Avg episode reward: [(0, '6064.583')] [2024-12-13 11:04:49,076][62436] Fps is (10 sec: 409.6, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 8994816. Throughput: 0: 812.9. Samples: 8997056. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2024-12-13 11:04:49,076][62436] Avg episode reward: [(0, '6061.388')] [2024-12-13 11:04:54,076][62436] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 8998912. Throughput: 0: 806.4. Samples: 9002432. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2024-12-13 11:04:54,076][62436] Avg episode reward: [(0, '6098.531')] [2024-12-13 11:04:54,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000017576_8998912.pth... [2024-12-13 11:04:54,093][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000017536_8978432.pth [2024-12-13 11:04:59,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 9003008. Throughput: 0: 811.9. Samples: 9005096. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:04:59,076][62436] Avg episode reward: [(0, '6098.698')] [2024-12-13 11:05:04,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 9007104. Throughput: 0: 812.0. Samples: 9009188. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:05:04,076][62436] Avg episode reward: [(0, '6068.924')] [2024-12-13 11:05:05,419][62492] Updated weights for policy 0, policy_version 17600 (0.0010) [2024-12-13 11:05:09,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 9011200. Throughput: 0: 796.0. Samples: 9014180. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:05:09,076][62436] Avg episode reward: [(0, '6062.757')] [2024-12-13 11:05:09,086][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000017600_9011200.pth... [2024-12-13 11:05:09,094][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000017560_8990720.pth [2024-12-13 11:05:14,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 9015296. Throughput: 0: 796.4. Samples: 9016876. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:05:14,076][62436] Avg episode reward: [(0, '6061.787')] [2024-12-13 11:05:19,079][62436] Fps is (10 sec: 818.9, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 9019392. Throughput: 0: 806.2. Samples: 9021132. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:05:19,079][62436] Avg episode reward: [(0, '6062.154')] [2024-12-13 11:05:24,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 9023488. Throughput: 0: 787.6. Samples: 9025972. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:05:24,076][62436] Avg episode reward: [(0, '6096.130')] [2024-12-13 11:05:24,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000017624_9023488.pth... [2024-12-13 11:05:24,089][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000017576_8998912.pth [2024-12-13 11:05:29,076][62436] Fps is (10 sec: 819.5, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 9027584. Throughput: 0: 785.8. Samples: 9028648. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:05:29,076][62436] Avg episode reward: [(0, '6091.949')] [2024-12-13 11:05:34,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 9031680. Throughput: 0: 805.0. Samples: 9033280. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:05:34,076][62436] Avg episode reward: [(0, '6132.199')] [2024-12-13 11:05:39,077][62436] Fps is (10 sec: 819.0, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 9035776. Throughput: 0: 788.4. Samples: 9037912. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:05:39,078][62436] Avg episode reward: [(0, '6129.122')] [2024-12-13 11:05:39,087][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000017648_9035776.pth... [2024-12-13 11:05:39,097][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000017600_9011200.pth [2024-12-13 11:05:44,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 9039872. Throughput: 0: 787.3. Samples: 9040524. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:05:44,076][62436] Avg episode reward: [(0, '6125.130')] [2024-12-13 11:05:49,076][62436] Fps is (10 sec: 819.4, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 9043968. Throughput: 0: 802.7. Samples: 9045308. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:05:49,077][62436] Avg episode reward: [(0, '6123.598')] [2024-12-13 11:05:54,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 9048064. Throughput: 0: 783.9. Samples: 9049456. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:05:54,076][62436] Avg episode reward: [(0, '6121.517')] [2024-12-13 11:05:54,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000017672_9048064.pth... [2024-12-13 11:05:54,088][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000017624_9023488.pth [2024-12-13 11:05:56,747][62492] Updated weights for policy 0, policy_version 17680 (0.0012) [2024-12-13 11:05:59,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 9052160. Throughput: 0: 783.4. Samples: 9052128. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:05:59,079][62436] Avg episode reward: [(0, '6215.128')] [2024-12-13 11:06:04,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 9056256. Throughput: 0: 804.3. Samples: 9057324. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:06:04,076][62436] Avg episode reward: [(0, '6210.690')] [2024-12-13 11:06:09,078][62436] Fps is (10 sec: 819.0, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 9060352. Throughput: 0: 764.0. Samples: 9060352. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:06:09,079][62436] Avg episode reward: [(0, '6237.090')] [2024-12-13 11:06:09,090][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000017696_9060352.pth... [2024-12-13 11:06:09,106][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000017648_9035776.pth [2024-12-13 11:06:14,076][62436] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 9060352. Throughput: 0: 742.8. Samples: 9062072. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:06:14,076][62436] Avg episode reward: [(0, '6268.297')] [2024-12-13 11:06:19,076][62436] Fps is (10 sec: 819.4, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 9068544. Throughput: 0: 759.9. Samples: 9067476. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:06:19,076][62436] Avg episode reward: [(0, '6266.842')] [2024-12-13 11:06:24,079][62436] Fps is (10 sec: 818.9, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 9068544. Throughput: 0: 749.0. Samples: 9071620. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:06:24,080][62436] Avg episode reward: [(0, '6270.766')] [2024-12-13 11:06:24,089][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000017712_9068544.pth... [2024-12-13 11:06:24,101][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000017672_9048064.pth [2024-12-13 11:06:29,076][62436] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 9072640. Throughput: 0: 737.0. Samples: 9073688. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:06:29,081][62436] Avg episode reward: [(0, '6267.250')] [2024-12-13 11:06:34,076][62436] Fps is (10 sec: 819.5, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 9076736. Throughput: 0: 752.5. Samples: 9079172. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2024-12-13 11:06:34,076][62436] Avg episode reward: [(0, '6255.279')] [2024-12-13 11:06:39,078][62436] Fps is (10 sec: 819.0, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 9080832. Throughput: 0: 759.3. Samples: 9083628. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2024-12-13 11:06:39,079][62436] Avg episode reward: [(0, '6254.329')] [2024-12-13 11:06:39,088][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000017736_9080832.pth... [2024-12-13 11:06:39,098][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000017696_9060352.pth [2024-12-13 11:06:44,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 791.5). Total num frames: 9084928. Throughput: 0: 738.0. Samples: 9085340. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2024-12-13 11:06:44,076][62436] Avg episode reward: [(0, '6282.979')] [2024-12-13 11:06:49,076][62436] Fps is (10 sec: 819.4, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 9089024. Throughput: 0: 743.9. Samples: 9090800. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-12-13 11:06:49,076][62436] Avg episode reward: [(0, '6282.672')] [2024-12-13 11:06:50,299][62492] Updated weights for policy 0, policy_version 17760 (0.0010) [2024-12-13 11:06:54,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 9093120. Throughput: 0: 782.8. Samples: 9095576. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-12-13 11:06:54,077][62436] Avg episode reward: [(0, '6278.429')] [2024-12-13 11:06:54,086][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000017760_9093120.pth... [2024-12-13 11:06:54,093][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000017712_9068544.pth [2024-12-13 11:06:59,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 9097216. Throughput: 0: 782.4. Samples: 9097280. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-12-13 11:06:59,076][62436] Avg episode reward: [(0, '6274.303')] [2024-12-13 11:07:04,078][62436] Fps is (10 sec: 819.0, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 9101312. Throughput: 0: 779.6. Samples: 9102560. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:07:04,079][62436] Avg episode reward: [(0, '6270.184')] [2024-12-13 11:07:09,080][62436] Fps is (10 sec: 818.8, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 9105408. Throughput: 0: 803.5. Samples: 9107780. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:07:09,081][62436] Avg episode reward: [(0, '6271.939')] [2024-12-13 11:07:09,095][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000017784_9105408.pth... [2024-12-13 11:07:09,101][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000017736_9080832.pth [2024-12-13 11:07:14,076][62436] Fps is (10 sec: 819.4, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 9109504. Throughput: 0: 798.0. Samples: 9109596. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:07:14,076][62436] Avg episode reward: [(0, '6167.555')] [2024-12-13 11:07:19,076][62436] Fps is (10 sec: 819.6, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 9113600. Throughput: 0: 786.1. Samples: 9114548. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:07:19,076][62436] Avg episode reward: [(0, '6165.046')] [2024-12-13 11:07:24,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.3, 300 sec: 791.4). Total num frames: 9117696. Throughput: 0: 807.8. Samples: 9119976. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:07:24,076][62436] Avg episode reward: [(0, '6175.222')] [2024-12-13 11:07:24,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000017808_9117696.pth... [2024-12-13 11:07:24,094][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000017760_9093120.pth [2024-12-13 11:07:29,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 9121792. Throughput: 0: 809.3. Samples: 9121760. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:07:29,076][62436] Avg episode reward: [(0, '6148.347')] [2024-12-13 11:07:34,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 9125888. Throughput: 0: 789.9. Samples: 9126344. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-12-13 11:07:34,076][62436] Avg episode reward: [(0, '6092.789')] [2024-12-13 11:07:39,076][62436] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 9129984. Throughput: 0: 809.3. Samples: 9131996. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-12-13 11:07:39,084][62436] Avg episode reward: [(0, '6094.508')] [2024-12-13 11:07:39,095][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000017832_9129984.pth... [2024-12-13 11:07:39,105][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000017784_9105408.pth [2024-12-13 11:07:42,021][62492] Updated weights for policy 0, policy_version 17840 (0.0010) [2024-12-13 11:07:44,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 9134080. Throughput: 0: 815.6. Samples: 9133984. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-12-13 11:07:44,076][62436] Avg episode reward: [(0, '6150.922')] [2024-12-13 11:07:49,076][62436] Fps is (10 sec: 819.3, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 9138176. Throughput: 0: 796.7. Samples: 9138408. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 11:07:49,076][62436] Avg episode reward: [(0, '6112.675')] [2024-12-13 11:07:54,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 9142272. Throughput: 0: 804.0. Samples: 9143956. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 11:07:54,076][62436] Avg episode reward: [(0, '6079.375')] [2024-12-13 11:07:54,080][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000017856_9142272.pth... [2024-12-13 11:07:54,089][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000017808_9117696.pth [2024-12-13 11:07:59,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 9146368. Throughput: 0: 814.6. Samples: 9146252. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 11:07:59,076][62436] Avg episode reward: [(0, '6079.867')] [2024-12-13 11:08:04,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 9150464. Throughput: 0: 797.1. Samples: 9150416. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:08:04,076][62436] Avg episode reward: [(0, '6081.101')] [2024-12-13 11:08:09,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.3, 300 sec: 805.3). Total num frames: 9154560. Throughput: 0: 800.1. Samples: 9155980. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:08:09,076][62436] Avg episode reward: [(0, '6077.590')] [2024-12-13 11:08:09,085][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000017880_9154560.pth... [2024-12-13 11:08:09,094][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000017832_9129984.pth [2024-12-13 11:08:14,076][62436] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 9158656. Throughput: 0: 816.3. Samples: 9158496. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:08:14,077][62436] Avg episode reward: [(0, '6078.973')] [2024-12-13 11:08:19,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 9162752. Throughput: 0: 802.1. Samples: 9162440. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 11:08:19,076][62436] Avg episode reward: [(0, '6079.223')] [2024-12-13 11:08:24,079][62436] Fps is (10 sec: 819.0, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 9166848. Throughput: 0: 796.0. Samples: 9167820. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 11:08:24,080][62436] Avg episode reward: [(0, '6078.312')] [2024-12-13 11:08:24,093][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000017904_9166848.pth... [2024-12-13 11:08:24,105][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000017856_9142272.pth [2024-12-13 11:08:29,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 9170944. Throughput: 0: 814.6. Samples: 9170640. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 11:08:29,076][62436] Avg episode reward: [(0, '6078.686')] [2024-12-13 11:08:33,489][62492] Updated weights for policy 0, policy_version 17920 (0.0012) [2024-12-13 11:08:34,076][62436] Fps is (10 sec: 819.5, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 9175040. Throughput: 0: 798.8. Samples: 9174356. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:08:34,076][62436] Avg episode reward: [(0, '6120.208')] [2024-12-13 11:08:39,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 9179136. Throughput: 0: 796.1. Samples: 9179780. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:08:39,076][62436] Avg episode reward: [(0, '6121.875')] [2024-12-13 11:08:39,090][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000017928_9179136.pth... [2024-12-13 11:08:39,097][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000017880_9154560.pth [2024-12-13 11:08:44,075][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 9183232. Throughput: 0: 808.4. Samples: 9182628. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:08:44,076][62436] Avg episode reward: [(0, '6126.370')] [2024-12-13 11:08:49,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 9187328. Throughput: 0: 802.6. Samples: 9186532. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 11:08:49,076][62436] Avg episode reward: [(0, '6126.212')] [2024-12-13 11:08:54,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 9191424. Throughput: 0: 791.5. Samples: 9191596. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 11:08:54,076][62436] Avg episode reward: [(0, '6136.483')] [2024-12-13 11:08:54,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000017952_9191424.pth... [2024-12-13 11:08:54,094][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000017904_9166848.pth [2024-12-13 11:08:59,079][62436] Fps is (10 sec: 818.9, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 9195520. Throughput: 0: 799.3. Samples: 9194468. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 11:08:59,080][62436] Avg episode reward: [(0, '6144.021')] [2024-12-13 11:09:04,076][62436] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 9195520. Throughput: 0: 805.1. Samples: 9198668. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 11:09:04,076][62436] Avg episode reward: [(0, '6149.600')] [2024-12-13 11:09:09,076][62436] Fps is (10 sec: 819.4, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 9203712. Throughput: 0: 796.8. Samples: 9203672. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 11:09:09,077][62436] Avg episode reward: [(0, '6183.753')] [2024-12-13 11:09:09,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000017976_9203712.pth... [2024-12-13 11:09:09,092][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000017928_9179136.pth [2024-12-13 11:09:14,076][62436] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 9207808. Throughput: 0: 794.8. Samples: 9206408. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 11:09:14,076][62436] Avg episode reward: [(0, '6294.190')] [2024-12-13 11:09:19,078][62436] Fps is (10 sec: 409.5, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 9207808. Throughput: 0: 810.6. Samples: 9210836. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 11:09:19,079][62436] Avg episode reward: [(0, '6294.127')] [2024-12-13 11:09:24,076][62436] Fps is (10 sec: 409.6, 60 sec: 751.0, 300 sec: 791.4). Total num frames: 9211904. Throughput: 0: 795.0. Samples: 9215556. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:09:24,076][62436] Avg episode reward: [(0, '6298.103')] [2024-12-13 11:09:24,084][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000017992_9211904.pth... [2024-12-13 11:09:24,092][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000017952_9191424.pth [2024-12-13 11:09:24,223][62492] Updated weights for policy 0, policy_version 18000 (0.0010) [2024-12-13 11:09:29,076][62436] Fps is (10 sec: 1229.1, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 9220096. Throughput: 0: 790.7. Samples: 9218208. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:09:29,076][62436] Avg episode reward: [(0, '6327.356')] [2024-12-13 11:09:34,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 9220096. Throughput: 0: 812.2. Samples: 9223080. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:09:34,076][62436] Avg episode reward: [(0, '6330.342')] [2024-12-13 11:09:39,076][62436] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 777.5). Total num frames: 9224192. Throughput: 0: 796.8. Samples: 9227452. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 11:09:39,076][62436] Avg episode reward: [(0, '6387.663')] [2024-12-13 11:09:39,086][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000018016_9224192.pth... [2024-12-13 11:09:39,094][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000017976_9203712.pth [2024-12-13 11:09:44,076][62436] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 9232384. Throughput: 0: 793.5. Samples: 9230172. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 11:09:44,076][62436] Avg episode reward: [(0, '6410.486')] [2024-12-13 11:09:44,077][62473] Saving new best policy, reward=6410.486! [2024-12-13 11:09:49,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 9232384. Throughput: 0: 815.1. Samples: 9235348. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 11:09:49,076][62436] Avg episode reward: [(0, '6413.847')] [2024-12-13 11:09:49,079][62473] Saving new best policy, reward=6413.847! [2024-12-13 11:09:54,076][62436] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 9236480. Throughput: 0: 794.3. Samples: 9239416. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:09:54,076][62436] Avg episode reward: [(0, '6414.909')] [2024-12-13 11:09:54,086][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000018040_9236480.pth... [2024-12-13 11:09:54,094][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000017992_9211904.pth [2024-12-13 11:09:54,095][62473] Saving new best policy, reward=6414.909! [2024-12-13 11:09:59,076][62436] Fps is (10 sec: 819.2, 60 sec: 751.0, 300 sec: 791.4). Total num frames: 9240576. Throughput: 0: 794.2. Samples: 9242148. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:09:59,076][62436] Avg episode reward: [(0, '6345.527')] [2024-12-13 11:10:04,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 9244672. Throughput: 0: 814.5. Samples: 9247484. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:10:04,076][62436] Avg episode reward: [(0, '6312.143')] [2024-12-13 11:10:09,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 9248768. Throughput: 0: 793.7. Samples: 9251272. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 11:10:09,076][62436] Avg episode reward: [(0, '6315.718')] [2024-12-13 11:10:09,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000018064_9248768.pth... [2024-12-13 11:10:09,094][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000018016_9224192.pth [2024-12-13 11:10:14,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 9252864. Throughput: 0: 793.2. Samples: 9253900. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 11:10:14,076][62436] Avg episode reward: [(0, '6345.317')] [2024-12-13 11:10:14,971][62492] Updated weights for policy 0, policy_version 18080 (0.0013) [2024-12-13 11:10:19,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 9256960. Throughput: 0: 808.1. Samples: 9259444. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 11:10:19,076][62436] Avg episode reward: [(0, '6315.023')] [2024-12-13 11:10:24,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 9261056. Throughput: 0: 791.1. Samples: 9263052. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 11:10:24,076][62436] Avg episode reward: [(0, '6313.068')] [2024-12-13 11:10:24,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000018088_9261056.pth... [2024-12-13 11:10:24,088][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000018040_9236480.pth [2024-12-13 11:10:29,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 9265152. Throughput: 0: 790.5. Samples: 9265744. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 11:10:29,076][62436] Avg episode reward: [(0, '6265.310')] [2024-12-13 11:10:34,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 9269248. Throughput: 0: 799.6. Samples: 9271332. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 11:10:34,076][62436] Avg episode reward: [(0, '6264.769')] [2024-12-13 11:10:39,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 9273344. Throughput: 0: 797.0. Samples: 9275280. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-12-13 11:10:39,076][62436] Avg episode reward: [(0, '6264.834')] [2024-12-13 11:10:39,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000018112_9273344.pth... [2024-12-13 11:10:39,088][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000018064_9248768.pth [2024-12-13 11:10:44,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 9277440. Throughput: 0: 783.1. Samples: 9277388. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-12-13 11:10:44,076][62436] Avg episode reward: [(0, '6265.195')] [2024-12-13 11:10:49,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 9281536. Throughput: 0: 756.1. Samples: 9281508. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-12-13 11:10:49,076][62436] Avg episode reward: [(0, '6269.046')] [2024-12-13 11:10:54,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 9285632. Throughput: 0: 765.2. Samples: 9285704. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:10:54,076][62436] Avg episode reward: [(0, '6266.694')] [2024-12-13 11:10:54,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000018136_9285632.pth... [2024-12-13 11:10:54,100][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000018088_9261056.pth [2024-12-13 11:10:59,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 9289728. Throughput: 0: 757.4. Samples: 9287984. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:10:59,076][62436] Avg episode reward: [(0, '6270.212')] [2024-12-13 11:11:04,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 9293824. Throughput: 0: 761.1. Samples: 9293692. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:11:04,078][62436] Avg episode reward: [(0, '6270.401')] [2024-12-13 11:11:08,255][62492] Updated weights for policy 0, policy_version 18160 (0.0010) [2024-12-13 11:11:09,077][62436] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 9297920. Throughput: 0: 779.7. Samples: 9298140. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2024-12-13 11:11:09,078][62436] Avg episode reward: [(0, '6270.394')] [2024-12-13 11:11:09,084][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000018160_9297920.pth... [2024-12-13 11:11:09,097][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000018112_9273344.pth [2024-12-13 11:11:14,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 9302016. Throughput: 0: 763.8. Samples: 9300116. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2024-12-13 11:11:14,076][62436] Avg episode reward: [(0, '6266.754')] [2024-12-13 11:11:19,076][62436] Fps is (10 sec: 819.3, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 9306112. Throughput: 0: 766.4. Samples: 9305820. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2024-12-13 11:11:19,076][62436] Avg episode reward: [(0, '6266.004')] [2024-12-13 11:11:24,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 9310208. Throughput: 0: 781.7. Samples: 9310456. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:11:24,076][62436] Avg episode reward: [(0, '6225.258')] [2024-12-13 11:11:24,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000018184_9310208.pth... [2024-12-13 11:11:24,092][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000018136_9285632.pth [2024-12-13 11:11:29,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 9314304. Throughput: 0: 777.0. Samples: 9312352. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:11:29,076][62436] Avg episode reward: [(0, '6222.636')] [2024-12-13 11:11:34,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 9318400. Throughput: 0: 807.1. Samples: 9317828. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:11:34,076][62436] Avg episode reward: [(0, '6230.617')] [2024-12-13 11:11:39,076][62436] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 9322496. Throughput: 0: 820.9. Samples: 9322644. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:11:39,078][62436] Avg episode reward: [(0, '6229.147')] [2024-12-13 11:11:39,090][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000018208_9322496.pth... [2024-12-13 11:11:39,095][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000018160_9297920.pth [2024-12-13 11:11:44,076][62436] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 9322496. Throughput: 0: 811.7. Samples: 9324512. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:11:44,076][62436] Avg episode reward: [(0, '6194.722')] [2024-12-13 11:11:49,080][62436] Fps is (10 sec: 818.9, 60 sec: 819.1, 300 sec: 805.3). Total num frames: 9330688. Throughput: 0: 801.2. Samples: 9329748. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:11:49,081][62436] Avg episode reward: [(0, '6169.135')] [2024-12-13 11:11:54,076][62436] Fps is (10 sec: 1228.7, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 9334784. Throughput: 0: 816.5. Samples: 9334880. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:11:54,077][62436] Avg episode reward: [(0, '6143.237')] [2024-12-13 11:11:54,084][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000018232_9334784.pth... [2024-12-13 11:11:54,090][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000018184_9310208.pth [2024-12-13 11:11:59,076][62436] Fps is (10 sec: 409.8, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 9334784. Throughput: 0: 814.0. Samples: 9336748. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:11:59,076][62436] Avg episode reward: [(0, '6143.236')] [2024-12-13 11:11:59,692][62492] Updated weights for policy 0, policy_version 18240 (0.0020) [2024-12-13 11:12:04,076][62436] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 9338880. Throughput: 0: 796.4. Samples: 9341660. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:12:04,076][62436] Avg episode reward: [(0, '6143.558')] [2024-12-13 11:12:09,076][62436] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 9347072. Throughput: 0: 814.8. Samples: 9347120. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 11:12:09,076][62436] Avg episode reward: [(0, '6144.382')] [2024-12-13 11:12:09,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000018256_9347072.pth... [2024-12-13 11:12:09,089][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000018208_9322496.pth [2024-12-13 11:12:14,077][62436] Fps is (10 sec: 819.1, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 9347072. Throughput: 0: 813.8. Samples: 9348976. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 11:12:14,078][62436] Avg episode reward: [(0, '6144.808')] [2024-12-13 11:12:19,076][62436] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 9351168. Throughput: 0: 792.4. Samples: 9353488. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 11:12:19,076][62436] Avg episode reward: [(0, '6144.375')] [2024-12-13 11:12:24,076][62436] Fps is (10 sec: 819.3, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 9355264. Throughput: 0: 807.8. Samples: 9358996. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 11:12:24,076][62436] Avg episode reward: [(0, '6141.330')] [2024-12-13 11:12:24,167][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000018280_9359360.pth... [2024-12-13 11:12:24,179][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000018232_9334784.pth [2024-12-13 11:12:29,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 9359360. Throughput: 0: 812.5. Samples: 9361076. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:12:29,076][62436] Avg episode reward: [(0, '6174.541')] [2024-12-13 11:12:34,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 9363456. Throughput: 0: 789.3. Samples: 9365264. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:12:34,076][62436] Avg episode reward: [(0, '6173.842')] [2024-12-13 11:12:39,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 9367552. Throughput: 0: 800.0. Samples: 9370880. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:12:39,076][62436] Avg episode reward: [(0, '6096.984')] [2024-12-13 11:12:39,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000018296_9367552.pth... [2024-12-13 11:12:39,091][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000018256_9347072.pth [2024-12-13 11:12:44,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 9371648. Throughput: 0: 808.4. Samples: 9373128. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:12:44,076][62436] Avg episode reward: [(0, '6036.590')] [2024-12-13 11:12:49,076][62436] Fps is (10 sec: 819.2, 60 sec: 751.0, 300 sec: 791.4). Total num frames: 9375744. Throughput: 0: 787.4. Samples: 9377092. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:12:49,076][62436] Avg episode reward: [(0, '6038.745')] [2024-12-13 11:12:50,624][62492] Updated weights for policy 0, policy_version 18320 (0.0010) [2024-12-13 11:12:54,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 9379840. Throughput: 0: 791.2. Samples: 9382724. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:12:54,076][62436] Avg episode reward: [(0, '6030.841')] [2024-12-13 11:12:54,085][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000018320_9379840.pth... [2024-12-13 11:12:54,104][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000018280_9359360.pth [2024-12-13 11:12:59,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 9383936. Throughput: 0: 809.3. Samples: 9385392. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:12:59,076][62436] Avg episode reward: [(0, '5970.748')] [2024-12-13 11:13:04,078][62436] Fps is (10 sec: 819.0, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 9388032. Throughput: 0: 793.7. Samples: 9389208. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:13:04,078][62436] Avg episode reward: [(0, '5970.599')] [2024-12-13 11:13:09,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 9392128. Throughput: 0: 796.2. Samples: 9394824. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:13:09,076][62436] Avg episode reward: [(0, '5934.486')] [2024-12-13 11:13:09,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000018344_9392128.pth... [2024-12-13 11:13:09,088][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000018296_9367552.pth [2024-12-13 11:13:14,076][62436] Fps is (10 sec: 819.4, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 9396224. Throughput: 0: 808.4. Samples: 9397452. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-12-13 11:13:14,076][62436] Avg episode reward: [(0, '5938.360')] [2024-12-13 11:13:19,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 9400320. Throughput: 0: 799.3. Samples: 9401232. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-12-13 11:13:19,076][62436] Avg episode reward: [(0, '5880.594')] [2024-12-13 11:13:24,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 9404416. Throughput: 0: 797.8. Samples: 9406780. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-12-13 11:13:24,076][62436] Avg episode reward: [(0, '5852.071')] [2024-12-13 11:13:24,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000018368_9404416.pth... [2024-12-13 11:13:24,088][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000018320_9379840.pth [2024-12-13 11:13:29,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 9408512. Throughput: 0: 808.5. Samples: 9409512. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:13:29,076][62436] Avg episode reward: [(0, '5854.227')] [2024-12-13 11:13:34,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 9412608. Throughput: 0: 807.6. Samples: 9413432. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:13:34,076][62436] Avg episode reward: [(0, '5856.345')] [2024-12-13 11:13:39,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 9416704. Throughput: 0: 804.6. Samples: 9418932. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:13:39,076][62436] Avg episode reward: [(0, '5866.788')] [2024-12-13 11:13:39,083][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000018392_9416704.pth... [2024-12-13 11:13:39,091][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000018344_9392128.pth [2024-12-13 11:13:40,748][62492] Updated weights for policy 0, policy_version 18400 (0.0012) [2024-12-13 11:13:44,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 9420800. Throughput: 0: 804.4. Samples: 9421592. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:13:44,076][62436] Avg episode reward: [(0, '5907.498')] [2024-12-13 11:13:49,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 9424896. Throughput: 0: 810.1. Samples: 9425660. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:13:49,076][62436] Avg episode reward: [(0, '5909.189')] [2024-12-13 11:13:54,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 9428992. Throughput: 0: 802.1. Samples: 9430920. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:13:54,078][62436] Avg episode reward: [(0, '5967.276')] [2024-12-13 11:13:54,084][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000018416_9428992.pth... [2024-12-13 11:13:54,091][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000018368_9404416.pth [2024-12-13 11:13:59,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 9433088. Throughput: 0: 805.2. Samples: 9433688. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:13:59,076][62436] Avg episode reward: [(0, '5948.569')] [2024-12-13 11:14:04,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 9437184. Throughput: 0: 818.0. Samples: 9438044. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:14:04,076][62436] Avg episode reward: [(0, '5947.034')] [2024-12-13 11:14:09,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 9441280. Throughput: 0: 808.4. Samples: 9443160. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:14:09,076][62436] Avg episode reward: [(0, '5897.670')] [2024-12-13 11:14:09,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000018440_9441280.pth... [2024-12-13 11:14:09,090][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000018392_9416704.pth [2024-12-13 11:14:14,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 9445376. Throughput: 0: 807.6. Samples: 9445852. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:14:14,076][62436] Avg episode reward: [(0, '5901.605')] [2024-12-13 11:14:19,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 9449472. Throughput: 0: 823.3. Samples: 9450480. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:14:19,076][62436] Avg episode reward: [(0, '5901.235')] [2024-12-13 11:14:24,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 9453568. Throughput: 0: 805.9. Samples: 9455196. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:14:24,076][62436] Avg episode reward: [(0, '5926.509')] [2024-12-13 11:14:24,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000018464_9453568.pth... [2024-12-13 11:14:24,088][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000018416_9428992.pth [2024-12-13 11:14:29,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 9457664. Throughput: 0: 808.0. Samples: 9457952. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 11:14:29,076][62436] Avg episode reward: [(0, '6065.622')] [2024-12-13 11:14:30,797][62492] Updated weights for policy 0, policy_version 18480 (0.0012) [2024-12-13 11:14:34,077][62436] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 9461760. Throughput: 0: 828.4. Samples: 9462940. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 11:14:34,078][62436] Avg episode reward: [(0, '6022.154')] [2024-12-13 11:14:39,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 9465856. Throughput: 0: 811.4. Samples: 9467432. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-12-13 11:14:39,076][62436] Avg episode reward: [(0, '5990.944')] [2024-12-13 11:14:39,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000018488_9465856.pth... [2024-12-13 11:14:39,089][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000018440_9441280.pth [2024-12-13 11:14:44,076][62436] Fps is (10 sec: 819.3, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 9469952. Throughput: 0: 810.6. Samples: 9470164. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-12-13 11:14:44,076][62436] Avg episode reward: [(0, '5997.000')] [2024-12-13 11:14:49,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 9474048. Throughput: 0: 832.4. Samples: 9475500. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-12-13 11:14:49,076][62436] Avg episode reward: [(0, '6049.812')] [2024-12-13 11:14:54,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 9478144. Throughput: 0: 809.5. Samples: 9479588. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-12-13 11:14:54,076][62436] Avg episode reward: [(0, '6048.609')] [2024-12-13 11:14:54,085][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000018512_9478144.pth... [2024-12-13 11:14:54,090][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000018464_9453568.pth [2024-12-13 11:14:59,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 9482240. Throughput: 0: 810.8. Samples: 9482336. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:14:59,076][62436] Avg episode reward: [(0, '6041.626')] [2024-12-13 11:15:04,076][62436] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 9486336. Throughput: 0: 829.8. Samples: 9487820. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:15:04,077][62436] Avg episode reward: [(0, '6038.661')] [2024-12-13 11:15:09,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 9490432. Throughput: 0: 809.3. Samples: 9491616. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:15:09,076][62436] Avg episode reward: [(0, '6031.280')] [2024-12-13 11:15:09,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000018536_9490432.pth... [2024-12-13 11:15:09,088][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000018488_9465856.pth [2024-12-13 11:15:14,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 9494528. Throughput: 0: 811.5. Samples: 9494468. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 11:15:14,077][62436] Avg episode reward: [(0, '6032.015')] [2024-12-13 11:15:19,080][62436] Fps is (10 sec: 818.8, 60 sec: 819.1, 300 sec: 805.3). Total num frames: 9498624. Throughput: 0: 823.3. Samples: 9499992. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 11:15:19,081][62436] Avg episode reward: [(0, '6095.704')] [2024-12-13 11:15:22,941][62492] Updated weights for policy 0, policy_version 18560 (0.0013) [2024-12-13 11:15:24,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 9502720. Throughput: 0: 792.4. Samples: 9503092. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 11:15:24,077][62436] Avg episode reward: [(0, '6078.560')] [2024-12-13 11:15:24,086][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000018560_9502720.pth... [2024-12-13 11:15:24,100][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000018512_9478144.pth [2024-12-13 11:15:29,076][62436] Fps is (10 sec: 409.8, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 9502720. Throughput: 0: 764.5. Samples: 9504568. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 11:15:29,077][62436] Avg episode reward: [(0, '6080.643')] [2024-12-13 11:15:34,076][62436] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 9510912. Throughput: 0: 768.5. Samples: 9510084. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:15:34,077][62436] Avg episode reward: [(0, '6125.456')] [2024-12-13 11:15:39,076][62436] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 9515008. Throughput: 0: 789.9. Samples: 9515132. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:15:39,076][62436] Avg episode reward: [(0, '6152.831')] [2024-12-13 11:15:39,084][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000018584_9515008.pth... [2024-12-13 11:15:39,090][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000018536_9490432.pth [2024-12-13 11:15:44,076][62436] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 9515008. Throughput: 0: 771.6. Samples: 9517056. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:15:44,076][62436] Avg episode reward: [(0, '6153.363')] [2024-12-13 11:15:49,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 9523200. Throughput: 0: 765.6. Samples: 9522272. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:15:49,076][62436] Avg episode reward: [(0, '6139.897')] [2024-12-13 11:15:54,076][62436] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 9527296. Throughput: 0: 798.8. Samples: 9527564. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:15:54,076][62436] Avg episode reward: [(0, '6175.461')] [2024-12-13 11:15:54,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000018608_9527296.pth... [2024-12-13 11:15:54,089][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000018560_9502720.pth [2024-12-13 11:15:59,076][62436] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 9527296. Throughput: 0: 777.7. Samples: 9529464. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:15:59,076][62436] Avg episode reward: [(0, '6176.824')] [2024-12-13 11:16:04,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 9535488. Throughput: 0: 763.1. Samples: 9534328. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2024-12-13 11:16:04,076][62436] Avg episode reward: [(0, '6218.570')] [2024-12-13 11:16:09,076][62436] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 9539584. Throughput: 0: 813.2. Samples: 9539688. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2024-12-13 11:16:09,076][62436] Avg episode reward: [(0, '6211.491')] [2024-12-13 11:16:09,083][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000018632_9539584.pth... [2024-12-13 11:16:09,089][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000018584_9515008.pth [2024-12-13 11:16:14,077][62436] Fps is (10 sec: 409.5, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 9539584. Throughput: 0: 824.6. Samples: 9541676. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2024-12-13 11:16:14,078][62436] Avg episode reward: [(0, '6210.822')] [2024-12-13 11:16:14,929][62492] Updated weights for policy 0, policy_version 18640 (0.0012) [2024-12-13 11:16:19,076][62436] Fps is (10 sec: 409.6, 60 sec: 751.0, 300 sec: 791.4). Total num frames: 9543680. Throughput: 0: 804.5. Samples: 9546284. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:16:19,076][62436] Avg episode reward: [(0, '6167.707')] [2024-12-13 11:16:24,076][62436] Fps is (10 sec: 1229.0, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 9551872. Throughput: 0: 816.4. Samples: 9551868. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:16:24,076][62436] Avg episode reward: [(0, '6128.925')] [2024-12-13 11:16:24,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000018656_9551872.pth... [2024-12-13 11:16:24,090][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000018608_9527296.pth [2024-12-13 11:16:29,075][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 9551872. Throughput: 0: 822.8. Samples: 9554084. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:16:29,076][62436] Avg episode reward: [(0, '6065.996')] [2024-12-13 11:16:34,076][62436] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 9555968. Throughput: 0: 803.7. Samples: 9558440. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:16:34,076][62436] Avg episode reward: [(0, '6047.164')] [2024-12-13 11:16:39,076][62436] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 9564160. Throughput: 0: 812.0. Samples: 9564104. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:16:39,076][62436] Avg episode reward: [(0, '6083.197')] [2024-12-13 11:16:39,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000018680_9564160.pth... [2024-12-13 11:16:39,093][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000018632_9539584.pth [2024-12-13 11:16:44,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 791.4). Total num frames: 9564160. Throughput: 0: 822.0. Samples: 9566456. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:16:44,076][62436] Avg episode reward: [(0, '6088.656')] [2024-12-13 11:16:49,076][62436] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 9568256. Throughput: 0: 807.6. Samples: 9570672. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:16:49,076][62436] Avg episode reward: [(0, '6074.324')] [2024-12-13 11:16:54,076][62436] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 9576448. Throughput: 0: 814.3. Samples: 9576332. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:16:54,076][62436] Avg episode reward: [(0, '6076.507')] [2024-12-13 11:16:54,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000018704_9576448.pth... [2024-12-13 11:16:54,089][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000018656_9551872.pth [2024-12-13 11:16:59,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 9576448. Throughput: 0: 828.9. Samples: 9578976. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:16:59,076][62436] Avg episode reward: [(0, '6156.858')] [2024-12-13 11:17:04,076][62436] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 9580544. Throughput: 0: 814.7. Samples: 9582944. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:17:04,076][62436] Avg episode reward: [(0, '6158.192')] [2024-12-13 11:17:04,519][62492] Updated weights for policy 0, policy_version 18720 (0.0012) [2024-12-13 11:17:09,076][62436] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 9588736. Throughput: 0: 815.3. Samples: 9588556. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:17:09,076][62436] Avg episode reward: [(0, '6163.565')] [2024-12-13 11:17:09,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000018728_9588736.pth... [2024-12-13 11:17:09,087][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000018680_9564160.pth [2024-12-13 11:17:14,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 9588736. Throughput: 0: 823.2. Samples: 9591128. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:17:14,076][62436] Avg episode reward: [(0, '6184.418')] [2024-12-13 11:17:19,076][62436] Fps is (10 sec: 409.6, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 9592832. Throughput: 0: 811.3. Samples: 9594948. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:17:19,076][62436] Avg episode reward: [(0, '6187.212')] [2024-12-13 11:17:24,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 9596928. Throughput: 0: 809.5. Samples: 9600532. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:17:24,076][62436] Avg episode reward: [(0, '6204.810')] [2024-12-13 11:17:24,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000018744_9596928.pth... [2024-12-13 11:17:24,088][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000018704_9576448.pth [2024-12-13 11:17:29,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 9601024. Throughput: 0: 817.3. Samples: 9603236. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:17:29,076][62436] Avg episode reward: [(0, '6204.534')] [2024-12-13 11:17:34,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 9605120. Throughput: 0: 814.6. Samples: 9607328. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:17:34,076][62436] Avg episode reward: [(0, '6163.734')] [2024-12-13 11:17:39,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 9609216. Throughput: 0: 808.2. Samples: 9612700. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:17:39,077][62436] Avg episode reward: [(0, '6196.912')] [2024-12-13 11:17:39,083][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000018768_9609216.pth... [2024-12-13 11:17:39,090][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000018728_9588736.pth [2024-12-13 11:17:44,076][62436] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 819.2). Total num frames: 9617408. Throughput: 0: 810.0. Samples: 9615424. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:17:44,078][62436] Avg episode reward: [(0, '6176.956')] [2024-12-13 11:17:49,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 9617408. Throughput: 0: 818.8. Samples: 9619792. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:17:49,077][62436] Avg episode reward: [(0, '6177.281')] [2024-12-13 11:17:54,076][62436] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 9621504. Throughput: 0: 806.8. Samples: 9624864. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:17:54,076][62436] Avg episode reward: [(0, '6184.899')] [2024-12-13 11:17:54,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000018792_9621504.pth... [2024-12-13 11:17:54,088][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000018744_9596928.pth [2024-12-13 11:17:54,523][62492] Updated weights for policy 0, policy_version 18800 (0.0012) [2024-12-13 11:17:59,076][62436] Fps is (10 sec: 1228.9, 60 sec: 887.5, 300 sec: 819.2). Total num frames: 9629696. Throughput: 0: 810.0. Samples: 9627580. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:17:59,076][62436] Avg episode reward: [(0, '6182.958')] [2024-12-13 11:18:04,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 9629696. Throughput: 0: 828.8. Samples: 9632244. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:18:04,076][62436] Avg episode reward: [(0, '6203.294')] [2024-12-13 11:18:09,076][62436] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 9633792. Throughput: 0: 810.8. Samples: 9637016. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:18:09,076][62436] Avg episode reward: [(0, '6239.294')] [2024-12-13 11:18:09,083][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000018816_9633792.pth... [2024-12-13 11:18:09,092][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000018768_9609216.pth [2024-12-13 11:18:14,076][62436] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 819.2). Total num frames: 9641984. Throughput: 0: 811.6. Samples: 9639756. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:18:14,076][62436] Avg episode reward: [(0, '6285.927')] [2024-12-13 11:18:19,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 9641984. Throughput: 0: 830.0. Samples: 9644680. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:18:19,076][62436] Avg episode reward: [(0, '6283.761')] [2024-12-13 11:18:24,076][62436] Fps is (10 sec: 409.6, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 9646080. Throughput: 0: 811.6. Samples: 9649220. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:18:24,076][62436] Avg episode reward: [(0, '6326.693')] [2024-12-13 11:18:24,084][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000018840_9646080.pth... [2024-12-13 11:18:24,090][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000018792_9621504.pth [2024-12-13 11:18:29,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 9650176. Throughput: 0: 811.9. Samples: 9651960. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:18:29,077][62436] Avg episode reward: [(0, '6454.906')] [2024-12-13 11:18:29,120][62473] Saving new best policy, reward=6454.906! [2024-12-13 11:18:34,076][62436] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 9654272. Throughput: 0: 829.4. Samples: 9657116. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:18:34,078][62436] Avg episode reward: [(0, '6455.726')] [2024-12-13 11:18:34,079][62473] Saving new best policy, reward=6455.726! [2024-12-13 11:18:39,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 9658368. Throughput: 0: 811.6. Samples: 9661388. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:18:39,076][62436] Avg episode reward: [(0, '6404.415')] [2024-12-13 11:18:39,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000018864_9658368.pth... [2024-12-13 11:18:39,088][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000018816_9633792.pth [2024-12-13 11:18:44,076][62436] Fps is (10 sec: 819.3, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 9662464. Throughput: 0: 811.9. Samples: 9664116. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:18:44,076][62436] Avg episode reward: [(0, '6409.441')] [2024-12-13 11:18:44,260][62492] Updated weights for policy 0, policy_version 18880 (0.0030) [2024-12-13 11:18:49,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 9666560. Throughput: 0: 826.8. Samples: 9669452. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:18:49,076][62436] Avg episode reward: [(0, '6408.500')] [2024-12-13 11:18:54,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 9670656. Throughput: 0: 809.0. Samples: 9673420. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:18:54,076][62436] Avg episode reward: [(0, '6418.858')] [2024-12-13 11:18:54,083][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000018888_9670656.pth... [2024-12-13 11:18:54,090][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000018840_9646080.pth [2024-12-13 11:18:59,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 9674752. Throughput: 0: 810.2. Samples: 9676216. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:18:59,076][62436] Avg episode reward: [(0, '6412.397')] [2024-12-13 11:19:04,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 9678848. Throughput: 0: 821.9. Samples: 9681664. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:19:04,076][62436] Avg episode reward: [(0, '6409.143')] [2024-12-13 11:19:09,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 9682944. Throughput: 0: 806.4. Samples: 9685508. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:19:09,076][62436] Avg episode reward: [(0, '6409.143')] [2024-12-13 11:19:09,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000018912_9682944.pth... [2024-12-13 11:19:09,087][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000018864_9658368.pth [2024-12-13 11:19:14,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 9687040. Throughput: 0: 804.9. Samples: 9688180. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:19:14,076][62436] Avg episode reward: [(0, '6413.526')] [2024-12-13 11:19:19,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 9691136. Throughput: 0: 814.7. Samples: 9693776. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 11:19:19,077][62436] Avg episode reward: [(0, '6410.301')] [2024-12-13 11:19:24,078][62436] Fps is (10 sec: 819.0, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 9695232. Throughput: 0: 802.4. Samples: 9697500. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 11:19:24,079][62436] Avg episode reward: [(0, '6412.594')] [2024-12-13 11:19:24,084][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000018936_9695232.pth... [2024-12-13 11:19:24,090][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000018888_9670656.pth [2024-12-13 11:19:29,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 9699328. Throughput: 0: 801.7. Samples: 9700192. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 11:19:29,076][62436] Avg episode reward: [(0, '6310.244')] [2024-12-13 11:19:34,076][62436] Fps is (10 sec: 819.3, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 9703424. Throughput: 0: 808.3. Samples: 9705828. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:19:34,077][62436] Avg episode reward: [(0, '6287.701')] [2024-12-13 11:19:35,148][62492] Updated weights for policy 0, policy_version 18960 (0.0014) [2024-12-13 11:19:39,077][62436] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 9707520. Throughput: 0: 807.3. Samples: 9709748. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:19:39,078][62436] Avg episode reward: [(0, '6253.934')] [2024-12-13 11:19:39,085][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000018960_9707520.pth... [2024-12-13 11:19:39,095][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000018912_9682944.pth [2024-12-13 11:19:44,076][62436] Fps is (10 sec: 819.3, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 9711616. Throughput: 0: 799.6. Samples: 9712196. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:19:44,076][62436] Avg episode reward: [(0, '6245.501')] [2024-12-13 11:19:49,076][62436] Fps is (10 sec: 819.4, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 9715712. Throughput: 0: 804.2. Samples: 9717852. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:19:49,076][62436] Avg episode reward: [(0, '6270.752')] [2024-12-13 11:19:54,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 9719808. Throughput: 0: 807.6. Samples: 9721852. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:19:54,080][62436] Avg episode reward: [(0, '6268.352')] [2024-12-13 11:19:54,086][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000018984_9719808.pth... [2024-12-13 11:19:54,105][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000018936_9695232.pth [2024-12-13 11:19:59,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 9723904. Throughput: 0: 782.0. Samples: 9723368. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:19:59,076][62436] Avg episode reward: [(0, '6269.221')] [2024-12-13 11:20:04,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 9728000. Throughput: 0: 758.3. Samples: 9727900. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2024-12-13 11:20:04,077][62436] Avg episode reward: [(0, '6267.397')] [2024-12-13 11:20:09,077][62436] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 9732096. Throughput: 0: 780.5. Samples: 9732624. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2024-12-13 11:20:09,080][62436] Avg episode reward: [(0, '6215.875')] [2024-12-13 11:20:09,087][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000019008_9732096.pth... [2024-12-13 11:20:09,098][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000018960_9707520.pth [2024-12-13 11:20:14,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 9736192. Throughput: 0: 763.4. Samples: 9734544. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2024-12-13 11:20:14,076][62436] Avg episode reward: [(0, '6160.949')] [2024-12-13 11:20:19,076][62436] Fps is (10 sec: 819.4, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 9740288. Throughput: 0: 757.9. Samples: 9739932. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:20:19,076][62436] Avg episode reward: [(0, '6126.321')] [2024-12-13 11:20:24,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 9744384. Throughput: 0: 782.5. Samples: 9744960. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:20:24,076][62436] Avg episode reward: [(0, '6091.068')] [2024-12-13 11:20:24,085][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000019032_9744384.pth... [2024-12-13 11:20:24,099][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000018984_9719808.pth [2024-12-13 11:20:29,011][62492] Updated weights for policy 0, policy_version 19040 (0.0012) [2024-12-13 11:20:29,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 9748480. Throughput: 0: 770.3. Samples: 9746860. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:20:29,076][62436] Avg episode reward: [(0, '6091.068')] [2024-12-13 11:20:34,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 9752576. Throughput: 0: 758.3. Samples: 9751976. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 11:20:34,076][62436] Avg episode reward: [(0, '6122.903')] [2024-12-13 11:20:39,077][62436] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 9756672. Throughput: 0: 793.8. Samples: 9757576. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 11:20:39,078][62436] Avg episode reward: [(0, '6122.981')] [2024-12-13 11:20:39,087][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000019056_9756672.pth... [2024-12-13 11:20:39,093][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000019008_9732096.pth [2024-12-13 11:20:44,084][62436] Fps is (10 sec: 818.5, 60 sec: 819.1, 300 sec: 805.3). Total num frames: 9760768. Throughput: 0: 803.4. Samples: 9759528. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 11:20:44,084][62436] Avg episode reward: [(0, '6120.621')] [2024-12-13 11:20:49,076][62436] Fps is (10 sec: 819.3, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 9764864. Throughput: 0: 808.5. Samples: 9764284. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:20:49,076][62436] Avg episode reward: [(0, '6068.898')] [2024-12-13 11:20:54,076][62436] Fps is (10 sec: 819.9, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 9768960. Throughput: 0: 826.1. Samples: 9769796. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:20:54,078][62436] Avg episode reward: [(0, '6062.011')] [2024-12-13 11:20:54,084][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000019080_9768960.pth... [2024-12-13 11:20:54,091][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000019032_9744384.pth [2024-12-13 11:20:59,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 9773056. Throughput: 0: 829.0. Samples: 9771848. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:20:59,076][62436] Avg episode reward: [(0, '6007.909')] [2024-12-13 11:21:04,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 9777152. Throughput: 0: 809.8. Samples: 9776372. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 11:21:04,076][62436] Avg episode reward: [(0, '5954.453')] [2024-12-13 11:21:09,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 9781248. Throughput: 0: 819.1. Samples: 9781820. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 11:21:09,076][62436] Avg episode reward: [(0, '5948.788')] [2024-12-13 11:21:09,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000019104_9781248.pth... [2024-12-13 11:21:09,088][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000019056_9756672.pth [2024-12-13 11:21:14,083][62436] Fps is (10 sec: 818.6, 60 sec: 819.1, 300 sec: 819.2). Total num frames: 9785344. Throughput: 0: 829.0. Samples: 9784172. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 11:21:14,084][62436] Avg episode reward: [(0, '5915.905')] [2024-12-13 11:21:18,771][62492] Updated weights for policy 0, policy_version 19120 (0.0016) [2024-12-13 11:21:19,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 9789440. Throughput: 0: 810.8. Samples: 9788464. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-12-13 11:21:19,076][62436] Avg episode reward: [(0, '5958.323')] [2024-12-13 11:21:24,076][62436] Fps is (10 sec: 819.8, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 9793536. Throughput: 0: 808.0. Samples: 9793936. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-12-13 11:21:24,076][62436] Avg episode reward: [(0, '5994.399')] [2024-12-13 11:21:24,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000019128_9793536.pth... [2024-12-13 11:21:24,090][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000019080_9768960.pth [2024-12-13 11:21:29,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 9797632. Throughput: 0: 822.3. Samples: 9796524. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-12-13 11:21:29,077][62436] Avg episode reward: [(0, '6042.165')] [2024-12-13 11:21:34,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 9801728. Throughput: 0: 806.5. Samples: 9800576. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:21:34,076][62436] Avg episode reward: [(0, '6099.702')] [2024-12-13 11:21:39,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 9805824. Throughput: 0: 808.0. Samples: 9806156. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:21:39,076][62436] Avg episode reward: [(0, '6096.599')] [2024-12-13 11:21:39,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000019152_9805824.pth... [2024-12-13 11:21:39,087][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000019104_9781248.pth [2024-12-13 11:21:44,078][62436] Fps is (10 sec: 819.0, 60 sec: 819.3, 300 sec: 819.2). Total num frames: 9809920. Throughput: 0: 825.1. Samples: 9808980. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:21:44,078][62436] Avg episode reward: [(0, '5976.507')] [2024-12-13 11:21:49,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 9814016. Throughput: 0: 810.8. Samples: 9812856. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:21:49,076][62436] Avg episode reward: [(0, '5928.963')] [2024-12-13 11:21:54,076][62436] Fps is (10 sec: 819.4, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 9818112. Throughput: 0: 810.8. Samples: 9818304. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:21:54,076][62436] Avg episode reward: [(0, '5928.734')] [2024-12-13 11:21:54,084][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000019176_9818112.pth... [2024-12-13 11:21:54,110][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000019128_9793536.pth [2024-12-13 11:21:59,079][62436] Fps is (10 sec: 818.9, 60 sec: 819.1, 300 sec: 819.2). Total num frames: 9822208. Throughput: 0: 822.0. Samples: 9821160. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:21:59,080][62436] Avg episode reward: [(0, '5926.214')] [2024-12-13 11:22:04,076][62436] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 9822208. Throughput: 0: 812.6. Samples: 9825032. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:22:04,077][62436] Avg episode reward: [(0, '5922.066')] [2024-12-13 11:22:08,531][62492] Updated weights for policy 0, policy_version 19200 (0.0010) [2024-12-13 11:22:09,076][62436] Fps is (10 sec: 819.5, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 9830400. Throughput: 0: 811.9. Samples: 9830472. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 11:22:09,076][62436] Avg episode reward: [(0, '6012.456')] [2024-12-13 11:22:09,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000019200_9830400.pth... [2024-12-13 11:22:09,090][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000019152_9805824.pth [2024-12-13 11:22:14,076][62436] Fps is (10 sec: 1228.8, 60 sec: 819.3, 300 sec: 819.2). Total num frames: 9834496. Throughput: 0: 818.5. Samples: 9833356. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 11:22:14,076][62436] Avg episode reward: [(0, '6037.341')] [2024-12-13 11:22:19,076][62436] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 9834496. Throughput: 0: 820.8. Samples: 9837512. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 11:22:19,076][62436] Avg episode reward: [(0, '6033.942')] [2024-12-13 11:22:24,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 9842688. Throughput: 0: 808.3. Samples: 9842528. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-12-13 11:22:24,076][62436] Avg episode reward: [(0, '6043.579')] [2024-12-13 11:22:24,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000019224_9842688.pth... [2024-12-13 11:22:24,088][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000019176_9818112.pth [2024-12-13 11:22:29,076][62436] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 9846784. Throughput: 0: 805.5. Samples: 9845224. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-12-13 11:22:29,076][62436] Avg episode reward: [(0, '5987.345')] [2024-12-13 11:22:34,076][62436] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 9846784. Throughput: 0: 818.7. Samples: 9849696. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2024-12-13 11:22:34,076][62436] Avg episode reward: [(0, '5980.694')] [2024-12-13 11:22:39,076][62436] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 9850880. Throughput: 0: 805.7. Samples: 9854560. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 11:22:39,076][62436] Avg episode reward: [(0, '5977.560')] [2024-12-13 11:22:39,083][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000019240_9850880.pth... [2024-12-13 11:22:39,093][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000019200_9830400.pth [2024-12-13 11:22:44,075][62436] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 9859072. Throughput: 0: 801.7. Samples: 9857232. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 11:22:44,076][62436] Avg episode reward: [(0, '6042.561')] [2024-12-13 11:22:49,077][62436] Fps is (10 sec: 819.1, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 9859072. Throughput: 0: 821.1. Samples: 9861984. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2024-12-13 11:22:49,078][62436] Avg episode reward: [(0, '6042.400')] [2024-12-13 11:22:54,076][62436] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 9863168. Throughput: 0: 802.6. Samples: 9866588. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:22:54,077][62436] Avg episode reward: [(0, '6067.987')] [2024-12-13 11:22:54,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000019264_9863168.pth... [2024-12-13 11:22:54,092][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000019224_9842688.pth [2024-12-13 11:22:58,889][62492] Updated weights for policy 0, policy_version 19280 (0.0009) [2024-12-13 11:22:59,076][62436] Fps is (10 sec: 1229.0, 60 sec: 819.3, 300 sec: 819.2). Total num frames: 9871360. Throughput: 0: 796.8. Samples: 9869212. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:22:59,076][62436] Avg episode reward: [(0, '6090.628')] [2024-12-13 11:23:04,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 9871360. Throughput: 0: 819.0. Samples: 9874368. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:23:04,077][62436] Avg episode reward: [(0, '6137.461')] [2024-12-13 11:23:09,076][62436] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 791.4). Total num frames: 9875456. Throughput: 0: 803.6. Samples: 9878692. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 11:23:09,076][62436] Avg episode reward: [(0, '6140.794')] [2024-12-13 11:23:09,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000019288_9875456.pth... [2024-12-13 11:23:09,091][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000019240_9850880.pth [2024-12-13 11:23:14,076][62436] Fps is (10 sec: 1228.9, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 9883648. Throughput: 0: 804.2. Samples: 9881412. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 11:23:14,076][62436] Avg episode reward: [(0, '6173.428')] [2024-12-13 11:23:19,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 9883648. Throughput: 0: 826.1. Samples: 9886872. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 11:23:19,076][62436] Avg episode reward: [(0, '6162.026')] [2024-12-13 11:23:24,076][62436] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 9887744. Throughput: 0: 808.1. Samples: 9890924. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:23:24,076][62436] Avg episode reward: [(0, '6184.550')] [2024-12-13 11:23:24,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000019312_9887744.pth... [2024-12-13 11:23:24,092][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000019264_9863168.pth [2024-12-13 11:23:29,076][62436] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 9895936. Throughput: 0: 810.5. Samples: 9893704. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:23:29,076][62436] Avg episode reward: [(0, '6175.524')] [2024-12-13 11:23:34,076][62436] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 819.2). Total num frames: 9900032. Throughput: 0: 831.3. Samples: 9899392. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-12-13 11:23:34,076][62436] Avg episode reward: [(0, '6181.065')] [2024-12-13 11:23:39,076][62436] Fps is (10 sec: 409.6, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 9900032. Throughput: 0: 812.9. Samples: 9903168. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-12-13 11:23:39,076][62436] Avg episode reward: [(0, '6185.505')] [2024-12-13 11:23:39,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000019336_9900032.pth... [2024-12-13 11:23:39,088][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000019288_9875456.pth [2024-12-13 11:23:44,076][62436] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 9904128. Throughput: 0: 815.2. Samples: 9905896. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-12-13 11:23:44,077][62436] Avg episode reward: [(0, '6199.783')] [2024-12-13 11:23:48,833][62492] Updated weights for policy 0, policy_version 19360 (0.0010) [2024-12-13 11:23:49,076][62436] Fps is (10 sec: 1228.7, 60 sec: 887.5, 300 sec: 819.2). Total num frames: 9912320. Throughput: 0: 826.8. Samples: 9911572. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 11:23:49,077][62436] Avg episode reward: [(0, '6308.855')] [2024-12-13 11:23:54,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 9912320. Throughput: 0: 815.3. Samples: 9915380. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 11:23:54,076][62436] Avg episode reward: [(0, '6271.448')] [2024-12-13 11:23:54,081][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000019360_9912320.pth... [2024-12-13 11:23:54,088][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000019312_9887744.pth [2024-12-13 11:23:59,076][62436] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 9916416. Throughput: 0: 812.3. Samples: 9917964. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 11:23:59,076][62436] Avg episode reward: [(0, '6249.352')] [2024-12-13 11:24:04,076][62436] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 819.2). Total num frames: 9924608. Throughput: 0: 817.7. Samples: 9923668. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:24:04,076][62436] Avg episode reward: [(0, '6261.595')] [2024-12-13 11:24:09,082][62436] Fps is (10 sec: 818.6, 60 sec: 819.1, 300 sec: 805.3). Total num frames: 9924608. Throughput: 0: 820.9. Samples: 9927868. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:24:09,083][62436] Avg episode reward: [(0, '6256.588')] [2024-12-13 11:24:09,095][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000019384_9924608.pth... [2024-12-13 11:24:09,109][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000019336_9900032.pth [2024-12-13 11:24:14,076][62436] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 9928704. Throughput: 0: 809.4. Samples: 9930128. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:24:14,076][62436] Avg episode reward: [(0, '6268.829')] [2024-12-13 11:24:19,076][62436] Fps is (10 sec: 1229.6, 60 sec: 887.5, 300 sec: 819.2). Total num frames: 9936896. Throughput: 0: 807.6. Samples: 9935736. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 11:24:19,076][62436] Avg episode reward: [(0, '6231.584')] [2024-12-13 11:24:24,077][62436] Fps is (10 sec: 819.0, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 9936896. Throughput: 0: 821.4. Samples: 9940132. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 11:24:24,078][62436] Avg episode reward: [(0, '6213.561')] [2024-12-13 11:24:24,085][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000019408_9936896.pth... [2024-12-13 11:24:24,090][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000019360_9912320.pth [2024-12-13 11:24:29,076][62436] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 9940992. Throughput: 0: 804.2. Samples: 9942084. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 11:24:29,079][62436] Avg episode reward: [(0, '6258.394')] [2024-12-13 11:24:34,076][62436] Fps is (10 sec: 819.4, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 9945088. Throughput: 0: 769.3. Samples: 9946188. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2024-12-13 11:24:34,076][62436] Avg episode reward: [(0, '6249.204')] [2024-12-13 11:24:39,080][62436] Fps is (10 sec: 818.9, 60 sec: 819.1, 300 sec: 805.3). Total num frames: 9949184. Throughput: 0: 778.1. Samples: 9950396. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 11:24:39,080][62436] Avg episode reward: [(0, '6178.490')] [2024-12-13 11:24:39,087][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000019432_9949184.pth... [2024-12-13 11:24:39,092][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000019384_9924608.pth [2024-12-13 11:24:42,700][62492] Updated weights for policy 0, policy_version 19440 (0.0010) [2024-12-13 11:24:44,079][62436] Fps is (10 sec: 818.9, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 9953280. Throughput: 0: 763.6. Samples: 9952328. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 11:24:44,079][62436] Avg episode reward: [(0, '6185.629')] [2024-12-13 11:24:49,076][62436] Fps is (10 sec: 819.5, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 9957376. Throughput: 0: 760.4. Samples: 9957888. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2024-12-13 11:24:49,076][62436] Avg episode reward: [(0, '6198.534')] [2024-12-13 11:24:54,076][62436] Fps is (10 sec: 819.5, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 9961472. Throughput: 0: 779.1. Samples: 9962924. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:24:54,076][62436] Avg episode reward: [(0, '6207.258')] [2024-12-13 11:24:54,088][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000019456_9961472.pth... [2024-12-13 11:24:54,096][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000019408_9936896.pth [2024-12-13 11:24:59,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 9965568. Throughput: 0: 770.8. Samples: 9964816. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:24:59,076][62436] Avg episode reward: [(0, '6148.118')] [2024-12-13 11:25:04,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 9969664. Throughput: 0: 759.7. Samples: 9969924. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:25:04,076][62436] Avg episode reward: [(0, '6072.485')] [2024-12-13 11:25:09,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.3, 300 sec: 805.3). Total num frames: 9973760. Throughput: 0: 782.3. Samples: 9975332. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-12-13 11:25:09,076][62436] Avg episode reward: [(0, '6071.431')] [2024-12-13 11:25:09,083][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000019480_9973760.pth... [2024-12-13 11:25:09,096][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000019432_9949184.pth [2024-12-13 11:25:14,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 9977856. Throughput: 0: 782.0. Samples: 9977276. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-12-13 11:25:14,076][62436] Avg episode reward: [(0, '5964.293')] [2024-12-13 11:25:19,076][62436] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 805.3). Total num frames: 9981952. Throughput: 0: 794.0. Samples: 9981920. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-12-13 11:25:19,076][62436] Avg episode reward: [(0, '5975.697')] [2024-12-13 11:25:24,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 9986048. Throughput: 0: 824.1. Samples: 9987476. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:25:24,076][62436] Avg episode reward: [(0, '5990.745')] [2024-12-13 11:25:24,084][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000019504_9986048.pth... [2024-12-13 11:25:24,095][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000019456_9961472.pth [2024-12-13 11:25:29,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 9990144. Throughput: 0: 825.9. Samples: 9989492. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:25:29,077][62436] Avg episode reward: [(0, '5994.014')] [2024-12-13 11:25:32,937][62492] Updated weights for policy 0, policy_version 19520 (0.0009) [2024-12-13 11:25:34,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 9994240. Throughput: 0: 803.2. Samples: 9994032. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:25:34,076][62436] Avg episode reward: [(0, '6002.991')] [2024-12-13 11:25:39,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.3, 300 sec: 805.3). Total num frames: 9998336. Throughput: 0: 815.6. Samples: 9999624. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:25:39,076][62436] Avg episode reward: [(0, '6050.774')] [2024-12-13 11:25:39,082][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000019528_9998336.pth... [2024-12-13 11:25:39,094][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000019480_9973760.pth [2024-12-13 11:25:44,076][62436] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 805.3). Total num frames: 10002432. Throughput: 0: 823.9. Samples: 10001892. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2024-12-13 11:25:44,078][62436] Avg episode reward: [(0, '6012.737')] [2024-12-13 11:25:47,986][62473] Early stopping after 2 epochs (8 sgd steps), loss delta 0.0000000 [2024-12-13 11:25:47,989][62473] Stopping Batcher_0... [2024-12-13 11:25:47,990][62473] Loop batcher_evt_loop terminating... [2024-12-13 11:25:47,992][62436] Component Batcher_0 stopped! [2024-12-13 11:25:47,994][62436] Component RolloutWorker_w3 stopped! [2024-12-13 11:25:47,994][62436] Component RolloutWorker_w6 stopped! [2024-12-13 11:25:47,994][62436] Component RolloutWorker_w1 stopped! [2024-12-13 11:25:47,995][62436] Component RolloutWorker_w0 stopped! [2024-12-13 11:25:47,997][62436] Component RolloutWorker_w4 stopped! [2024-12-13 11:25:47,997][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000019544_10006528.pth... [2024-12-13 11:25:47,997][62436] Component RolloutWorker_w7 stopped! [2024-12-13 11:25:48,003][62473] Removing ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000019504_9986048.pth [2024-12-13 11:25:48,005][62473] Saving ./train_dir_humamoid/Ant/checkpoint_p0/checkpoint_000019544_10006528.pth... [2024-12-13 11:25:48,015][62436] Component RolloutWorker_w5 stopped! [2024-12-13 11:25:48,016][62473] Stopping LearnerWorker_p0... [2024-12-13 11:25:48,016][62436] Component LearnerWorker_p0 stopped! [2024-12-13 11:25:48,016][62473] Loop learner_proc0_evt_loop terminating... [2024-12-13 11:25:48,017][62436] Component RolloutWorker_w2 stopped! [2024-12-13 11:25:48,019][62494] Stopping RolloutWorker_w4... [2024-12-13 11:25:48,019][62490] Stopping RolloutWorker_w3... [2024-12-13 11:25:48,009][62488] Stopping RolloutWorker_w1... [2024-12-13 11:25:48,007][62493] Stopping RolloutWorker_w6... [2024-12-13 11:25:48,021][62491] Stopping RolloutWorker_w7... [2024-12-13 11:25:48,047][62490] Loop rollout_proc3_evt_loop terminating... [2024-12-13 11:25:48,018][62489] Stopping RolloutWorker_w5... [2024-12-13 11:25:48,049][62488] Loop rollout_proc1_evt_loop terminating... [2024-12-13 11:25:48,061][62491] Loop rollout_proc7_evt_loop terminating... [2024-12-13 11:25:48,022][62487] Stopping RolloutWorker_w0... [2024-12-13 11:25:48,069][62487] Loop rollout_proc0_evt_loop terminating... [2024-12-13 11:25:48,058][62489] Loop rollout_proc5_evt_loop terminating... [2024-12-13 11:25:48,018][62486] Stopping RolloutWorker_w2... [2024-12-13 11:25:48,080][62486] Loop rollout_proc2_evt_loop terminating... [2024-12-13 11:25:48,051][62493] Loop rollout_proc6_evt_loop terminating... [2024-12-13 11:25:48,083][62494] Loop rollout_proc4_evt_loop terminating... [2024-12-13 11:25:48,299][62492] Weights refcount: 2 0 [2024-12-13 11:25:48,313][62436] Component InferenceWorker_p0-w0 stopped! [2024-12-13 11:25:48,314][62436] Waiting for process learner_proc0 to stop... [2024-12-13 11:25:48,313][62492] Stopping InferenceWorker_p0-w0... [2024-12-13 11:25:48,315][62492] Loop inference_proc0-0_evt_loop terminating... [2024-12-13 11:25:50,605][62436] Waiting for process inference_proc0-0 to join... [2024-12-13 11:25:50,607][62436] Waiting for process rollout_proc0 to join... [2024-12-13 11:25:55,078][62436] Waiting for process rollout_proc1 to join... [2024-12-13 11:25:55,310][62436] Waiting for process rollout_proc2 to join... [2024-12-13 11:25:55,312][62436] Waiting for process rollout_proc3 to join... [2024-12-13 11:25:55,317][62436] Waiting for process rollout_proc4 to join... [2024-12-13 11:25:55,319][62436] Waiting for process rollout_proc5 to join... [2024-12-13 11:25:55,339][62436] Waiting for process rollout_proc6 to join... [2024-12-13 11:25:55,341][62436] Waiting for process rollout_proc7 to join... [2024-12-13 11:25:55,342][62436] Batcher 0 profile tree view: batching: 14.5067, releasing_batches: 2.8802 [2024-12-13 11:25:55,344][62436] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0012 wait_policy_total: 8858.7291 update_model: 103.2282 weight_update: 0.0010 one_step: 0.0027 handle_policy_step: 3037.3943 deserialize: 112.1482, stack: 36.8268, obs_to_device_normalize: 606.7685, forward: 1562.1253, send_messages: 200.4705 prepare_outputs: 283.0249 to_cpu: 37.4340 [2024-12-13 11:25:55,345][62436] Learner 0 profile tree view: misc: 0.0136, prepare_batch: 83.8834 train: 247.9682 epoch_init: 0.0851, minibatch_init: 2.7941, losses_postprocess: 3.1693, kl_divergence: 0.9983, after_optimizer: 4.2852 calculate_losses: 101.2483 losses_init: 0.0923, forward_head: 42.4489, bptt_initial: 0.5377, bptt: 0.6817, tail: 28.9032, advantages_returns: 2.1625, losses: 22.8205 update: 131.3361 clip: 12.6118 [2024-12-13 11:25:55,346][62436] RolloutWorker_w0 profile tree view: wait_for_trajectories: 2.0046, enqueue_policy_requests: 1514.5409, env_step: 8520.4749, overhead: 617.2293, complete_rollouts: 9.3567 save_policy_outputs: 343.5115 split_output_tensors: 135.5807 [2024-12-13 11:25:55,346][62436] RolloutWorker_w7 profile tree view: wait_for_trajectories: 1.7211, enqueue_policy_requests: 1462.3961, env_step: 8398.6923, overhead: 610.2173, complete_rollouts: 10.6109 save_policy_outputs: 349.5194 split_output_tensors: 138.2964 [2024-12-13 11:25:55,347][62436] Loop Runner_EvtLoop terminating... [2024-12-13 11:25:55,347][62436] Runner profile tree view: main_loop: 12447.0713 [2024-12-13 11:25:55,348][62436] Collected {0: 10006528}, FPS: 803.9