diff --git "a/sf_log.txt" "b/sf_log.txt" --- "a/sf_log.txt" +++ "b/sf_log.txt" @@ -971,3 +971,2028 @@ main_loop: 1146.9944 [2023-02-27 11:08:02,663][00394] Avg episode rewards: #0: 4.332, true rewards: #0: 4.032 [2023-02-27 11:08:02,665][00394] Avg episode reward: 4.332, avg true_objective: 4.032 [2023-02-27 11:08:21,442][00394] Replay video saved to /content/train_dir/default_experiment/replay.mp4! +[2023-02-27 11:08:25,610][00394] The model has been pushed to https://huggingface.co./Clawoo/rl_course_vizdoom_health_gathering_supreme +[2023-02-27 11:09:47,580][00394] Loading legacy config file train_dir/doom_health_gathering_supreme_2222/cfg.json instead of train_dir/doom_health_gathering_supreme_2222/config.json +[2023-02-27 11:09:47,582][00394] Loading existing experiment configuration from train_dir/doom_health_gathering_supreme_2222/config.json +[2023-02-27 11:09:47,584][00394] Overriding arg 'experiment' with value 'doom_health_gathering_supreme_2222' passed from command line +[2023-02-27 11:09:47,588][00394] Overriding arg 'train_dir' with value 'train_dir' passed from command line +[2023-02-27 11:09:47,590][00394] Overriding arg 'num_workers' with value 1 passed from command line +[2023-02-27 11:09:47,592][00394] Adding new argument 'lr_adaptive_min'=1e-06 that is not in the saved config file! +[2023-02-27 11:09:47,594][00394] Adding new argument 'lr_adaptive_max'=0.01 that is not in the saved config file! +[2023-02-27 11:09:47,595][00394] Adding new argument 'env_gpu_observations'=True that is not in the saved config file! +[2023-02-27 11:09:47,596][00394] Adding new argument 'no_render'=True that is not in the saved config file! +[2023-02-27 11:09:47,597][00394] Adding new argument 'save_video'=True that is not in the saved config file! +[2023-02-27 11:09:47,599][00394] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! +[2023-02-27 11:09:47,600][00394] Adding new argument 'video_name'=None that is not in the saved config file! +[2023-02-27 11:09:47,601][00394] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! +[2023-02-27 11:09:47,603][00394] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! +[2023-02-27 11:09:47,604][00394] Adding new argument 'push_to_hub'=False that is not in the saved config file! +[2023-02-27 11:09:47,605][00394] Adding new argument 'hf_repository'=None that is not in the saved config file! +[2023-02-27 11:09:47,606][00394] Adding new argument 'policy_index'=0 that is not in the saved config file! +[2023-02-27 11:09:47,608][00394] Adding new argument 'eval_deterministic'=False that is not in the saved config file! +[2023-02-27 11:09:47,609][00394] Adding new argument 'train_script'=None that is not in the saved config file! +[2023-02-27 11:09:47,610][00394] Adding new argument 'enjoy_script'=None that is not in the saved config file! +[2023-02-27 11:09:47,611][00394] Using frameskip 1 and render_action_repeat=4 for evaluation +[2023-02-27 11:09:47,648][00394] RunningMeanStd input shape: (3, 72, 128) +[2023-02-27 11:09:47,650][00394] RunningMeanStd input shape: (1,) +[2023-02-27 11:09:47,668][00394] ConvEncoder: input_channels=3 +[2023-02-27 11:09:47,711][00394] Conv encoder output size: 512 +[2023-02-27 11:09:47,713][00394] Policy head output size: 512 +[2023-02-27 11:09:47,736][00394] Loading state from checkpoint train_dir/doom_health_gathering_supreme_2222/checkpoint_p0/checkpoint_000539850_4422451200.pth... +[2023-02-27 11:09:48,209][00394] Num frames 100... +[2023-02-27 11:09:48,342][00394] Num frames 200... +[2023-02-27 11:09:48,460][00394] Num frames 300... +[2023-02-27 11:09:48,581][00394] Num frames 400... +[2023-02-27 11:09:48,701][00394] Num frames 500... +[2023-02-27 11:09:48,826][00394] Num frames 600... +[2023-02-27 11:09:48,953][00394] Num frames 700... +[2023-02-27 11:09:49,094][00394] Num frames 800... +[2023-02-27 11:09:49,212][00394] Num frames 900... +[2023-02-27 11:09:49,338][00394] Num frames 1000... +[2023-02-27 11:09:49,467][00394] Num frames 1100... +[2023-02-27 11:09:49,588][00394] Num frames 1200... +[2023-02-27 11:09:49,720][00394] Num frames 1300... +[2023-02-27 11:09:49,841][00394] Num frames 1400... +[2023-02-27 11:09:49,971][00394] Num frames 1500... +[2023-02-27 11:09:50,094][00394] Num frames 1600... +[2023-02-27 11:09:50,218][00394] Num frames 1700... +[2023-02-27 11:09:50,350][00394] Num frames 1800... +[2023-02-27 11:09:50,468][00394] Num frames 1900... +[2023-02-27 11:09:50,591][00394] Num frames 2000... +[2023-02-27 11:09:50,720][00394] Num frames 2100... +[2023-02-27 11:09:50,772][00394] Avg episode rewards: #0: 64.998, true rewards: #0: 21.000 +[2023-02-27 11:09:50,774][00394] Avg episode reward: 64.998, avg true_objective: 21.000 +[2023-02-27 11:09:50,897][00394] Num frames 2200... +[2023-02-27 11:09:51,029][00394] Num frames 2300... +[2023-02-27 11:09:51,151][00394] Num frames 2400... +[2023-02-27 11:09:51,277][00394] Num frames 2500... +[2023-02-27 11:09:51,405][00394] Num frames 2600... +[2023-02-27 11:09:51,520][00394] Num frames 2700... +[2023-02-27 11:09:51,637][00394] Num frames 2800... +[2023-02-27 11:09:51,759][00394] Num frames 2900... +[2023-02-27 11:09:51,874][00394] Num frames 3000... +[2023-02-27 11:09:51,993][00394] Num frames 3100... +[2023-02-27 11:09:52,112][00394] Num frames 3200... +[2023-02-27 11:09:52,238][00394] Num frames 3300... +[2023-02-27 11:09:52,373][00394] Num frames 3400... +[2023-02-27 11:09:52,490][00394] Num frames 3500... +[2023-02-27 11:09:52,615][00394] Num frames 3600... +[2023-02-27 11:09:52,733][00394] Num frames 3700... +[2023-02-27 11:09:52,857][00394] Num frames 3800... +[2023-02-27 11:09:52,985][00394] Num frames 3900... +[2023-02-27 11:09:53,107][00394] Num frames 4000... +[2023-02-27 11:09:53,234][00394] Num frames 4100... +[2023-02-27 11:09:53,372][00394] Num frames 4200... +[2023-02-27 11:09:53,426][00394] Avg episode rewards: #0: 63.499, true rewards: #0: 21.000 +[2023-02-27 11:09:53,429][00394] Avg episode reward: 63.499, avg true_objective: 21.000 +[2023-02-27 11:09:53,550][00394] Num frames 4300... +[2023-02-27 11:09:53,673][00394] Num frames 4400... +[2023-02-27 11:09:53,793][00394] Num frames 4500... +[2023-02-27 11:09:53,918][00394] Num frames 4600... +[2023-02-27 11:09:54,036][00394] Num frames 4700... +[2023-02-27 11:09:54,153][00394] Num frames 4800... +[2023-02-27 11:09:54,282][00394] Num frames 4900... +[2023-02-27 11:09:54,417][00394] Num frames 5000... +[2023-02-27 11:09:54,534][00394] Num frames 5100... +[2023-02-27 11:09:54,657][00394] Num frames 5200... +[2023-02-27 11:09:54,772][00394] Num frames 5300... +[2023-02-27 11:09:54,904][00394] Num frames 5400... +[2023-02-27 11:09:55,034][00394] Num frames 5500... +[2023-02-27 11:09:55,153][00394] Num frames 5600... +[2023-02-27 11:09:55,282][00394] Num frames 5700... +[2023-02-27 11:09:55,414][00394] Num frames 5800... +[2023-02-27 11:09:55,539][00394] Num frames 5900... +[2023-02-27 11:09:55,658][00394] Num frames 6000... +[2023-02-27 11:09:55,779][00394] Num frames 6100... +[2023-02-27 11:09:55,942][00394] Num frames 6200... +[2023-02-27 11:09:56,116][00394] Num frames 6300... +[2023-02-27 11:09:56,171][00394] Avg episode rewards: #0: 62.665, true rewards: #0: 21.000 +[2023-02-27 11:09:56,173][00394] Avg episode reward: 62.665, avg true_objective: 21.000 +[2023-02-27 11:09:56,344][00394] Num frames 6400... +[2023-02-27 11:09:56,526][00394] Num frames 6500... +[2023-02-27 11:09:56,692][00394] Num frames 6600... +[2023-02-27 11:09:56,868][00394] Num frames 6700... +[2023-02-27 11:09:57,041][00394] Num frames 6800... +[2023-02-27 11:09:57,212][00394] Num frames 6900... +[2023-02-27 11:09:57,399][00394] Num frames 7000... +[2023-02-27 11:09:57,574][00394] Num frames 7100... +[2023-02-27 11:09:57,736][00394] Num frames 7200... +[2023-02-27 11:09:57,904][00394] Num frames 7300... +[2023-02-27 11:09:58,082][00394] Num frames 7400... +[2023-02-27 11:09:58,248][00394] Num frames 7500... +[2023-02-27 11:09:58,431][00394] Num frames 7600... +[2023-02-27 11:09:58,620][00394] Num frames 7700... +[2023-02-27 11:09:58,791][00394] Num frames 7800... +[2023-02-27 11:09:58,980][00394] Num frames 7900... +[2023-02-27 11:09:59,157][00394] Num frames 8000... +[2023-02-27 11:09:59,340][00394] Num frames 8100... +[2023-02-27 11:09:59,508][00394] Num frames 8200... +[2023-02-27 11:09:59,664][00394] Num frames 8300... +[2023-02-27 11:09:59,794][00394] Num frames 8400... +[2023-02-27 11:09:59,850][00394] Avg episode rewards: #0: 63.249, true rewards: #0: 21.000 +[2023-02-27 11:09:59,852][00394] Avg episode reward: 63.249, avg true_objective: 21.000 +[2023-02-27 11:09:59,976][00394] Num frames 8500... +[2023-02-27 11:10:00,090][00394] Num frames 8600... +[2023-02-27 11:10:00,210][00394] Num frames 8700... +[2023-02-27 11:10:00,335][00394] Num frames 8800... +[2023-02-27 11:10:00,455][00394] Num frames 8900... +[2023-02-27 11:10:00,576][00394] Num frames 9000... +[2023-02-27 11:10:00,701][00394] Num frames 9100... +[2023-02-27 11:10:00,819][00394] Num frames 9200... +[2023-02-27 11:10:00,942][00394] Num frames 9300... +[2023-02-27 11:10:01,064][00394] Num frames 9400... +[2023-02-27 11:10:01,185][00394] Num frames 9500... +[2023-02-27 11:10:01,308][00394] Num frames 9600... +[2023-02-27 11:10:01,434][00394] Num frames 9700... +[2023-02-27 11:10:01,549][00394] Num frames 9800... +[2023-02-27 11:10:01,680][00394] Num frames 9900... +[2023-02-27 11:10:01,800][00394] Num frames 10000... +[2023-02-27 11:10:01,927][00394] Num frames 10100... +[2023-02-27 11:10:02,050][00394] Num frames 10200... +[2023-02-27 11:10:02,165][00394] Num frames 10300... +[2023-02-27 11:10:02,302][00394] Num frames 10400... +[2023-02-27 11:10:02,427][00394] Num frames 10500... +[2023-02-27 11:10:02,479][00394] Avg episode rewards: #0: 61.399, true rewards: #0: 21.000 +[2023-02-27 11:10:02,482][00394] Avg episode reward: 61.399, avg true_objective: 21.000 +[2023-02-27 11:10:02,600][00394] Num frames 10600... +[2023-02-27 11:10:02,730][00394] Num frames 10700... +[2023-02-27 11:10:02,852][00394] Num frames 10800... +[2023-02-27 11:10:02,967][00394] Num frames 10900... +[2023-02-27 11:10:03,085][00394] Num frames 11000... +[2023-02-27 11:10:03,201][00394] Num frames 11100... +[2023-02-27 11:10:03,317][00394] Num frames 11200... +[2023-02-27 11:10:03,443][00394] Num frames 11300... +[2023-02-27 11:10:03,559][00394] Num frames 11400... +[2023-02-27 11:10:03,693][00394] Num frames 11500... +[2023-02-27 11:10:03,810][00394] Num frames 11600... +[2023-02-27 11:10:03,933][00394] Num frames 11700... +[2023-02-27 11:10:04,057][00394] Num frames 11800... +[2023-02-27 11:10:04,185][00394] Num frames 11900... +[2023-02-27 11:10:04,311][00394] Num frames 12000... +[2023-02-27 11:10:04,437][00394] Num frames 12100... +[2023-02-27 11:10:04,565][00394] Num frames 12200... +[2023-02-27 11:10:04,688][00394] Num frames 12300... +[2023-02-27 11:10:04,810][00394] Num frames 12400... +[2023-02-27 11:10:04,934][00394] Num frames 12500... +[2023-02-27 11:10:05,065][00394] Num frames 12600... +[2023-02-27 11:10:05,117][00394] Avg episode rewards: #0: 61.332, true rewards: #0: 21.000 +[2023-02-27 11:10:05,119][00394] Avg episode reward: 61.332, avg true_objective: 21.000 +[2023-02-27 11:10:05,240][00394] Num frames 12700... +[2023-02-27 11:10:05,359][00394] Num frames 12800... +[2023-02-27 11:10:05,483][00394] Num frames 12900... +[2023-02-27 11:10:05,600][00394] Num frames 13000... +[2023-02-27 11:10:05,720][00394] Num frames 13100... +[2023-02-27 11:10:05,847][00394] Num frames 13200... +[2023-02-27 11:10:05,965][00394] Num frames 13300... +[2023-02-27 11:10:06,090][00394] Num frames 13400... +[2023-02-27 11:10:06,209][00394] Num frames 13500... +[2023-02-27 11:10:06,346][00394] Num frames 13600... +[2023-02-27 11:10:06,477][00394] Num frames 13700... +[2023-02-27 11:10:06,593][00394] Num frames 13800... +[2023-02-27 11:10:06,723][00394] Num frames 13900... +[2023-02-27 11:10:06,840][00394] Num frames 14000... +[2023-02-27 11:10:06,965][00394] Num frames 14100... +[2023-02-27 11:10:07,084][00394] Num frames 14200... +[2023-02-27 11:10:07,211][00394] Num frames 14300... +[2023-02-27 11:10:07,375][00394] Avg episode rewards: #0: 59.844, true rewards: #0: 20.560 +[2023-02-27 11:10:07,377][00394] Avg episode reward: 59.844, avg true_objective: 20.560 +[2023-02-27 11:10:07,390][00394] Num frames 14400... +[2023-02-27 11:10:07,507][00394] Num frames 14500... +[2023-02-27 11:10:07,627][00394] Num frames 14600... +[2023-02-27 11:10:07,747][00394] Num frames 14700... +[2023-02-27 11:10:07,870][00394] Num frames 14800... +[2023-02-27 11:10:07,987][00394] Num frames 14900... +[2023-02-27 11:10:08,107][00394] Num frames 15000... +[2023-02-27 11:10:08,224][00394] Num frames 15100... +[2023-02-27 11:10:08,353][00394] Num frames 15200... +[2023-02-27 11:10:08,470][00394] Num frames 15300... +[2023-02-27 11:10:08,588][00394] Num frames 15400... +[2023-02-27 11:10:08,706][00394] Num frames 15500... +[2023-02-27 11:10:08,840][00394] Num frames 15600... +[2023-02-27 11:10:08,963][00394] Num frames 15700... +[2023-02-27 11:10:09,083][00394] Num frames 15800... +[2023-02-27 11:10:09,203][00394] Num frames 15900... +[2023-02-27 11:10:09,336][00394] Num frames 16000... +[2023-02-27 11:10:09,455][00394] Num frames 16100... +[2023-02-27 11:10:09,581][00394] Num frames 16200... +[2023-02-27 11:10:09,744][00394] Num frames 16300... +[2023-02-27 11:10:09,920][00394] Num frames 16400... +[2023-02-27 11:10:10,136][00394] Avg episode rewards: #0: 60.739, true rewards: #0: 20.615 +[2023-02-27 11:10:10,138][00394] Avg episode reward: 60.739, avg true_objective: 20.615 +[2023-02-27 11:10:10,154][00394] Num frames 16500... +[2023-02-27 11:10:10,325][00394] Num frames 16600... +[2023-02-27 11:10:10,494][00394] Num frames 16700... +[2023-02-27 11:10:10,664][00394] Num frames 16800... +[2023-02-27 11:10:10,845][00394] Num frames 16900... +[2023-02-27 11:10:11,021][00394] Num frames 17000... +[2023-02-27 11:10:11,189][00394] Num frames 17100... +[2023-02-27 11:10:11,361][00394] Num frames 17200... +[2023-02-27 11:10:11,546][00394] Num frames 17300... +[2023-02-27 11:10:11,719][00394] Num frames 17400... +[2023-02-27 11:10:11,898][00394] Num frames 17500... +[2023-02-27 11:10:12,071][00394] Num frames 17600... +[2023-02-27 11:10:12,245][00394] Num frames 17700... +[2023-02-27 11:10:12,424][00394] Num frames 17800... +[2023-02-27 11:10:12,599][00394] Num frames 17900... +[2023-02-27 11:10:12,772][00394] Num frames 18000... +[2023-02-27 11:10:12,969][00394] Num frames 18100... +[2023-02-27 11:10:13,144][00394] Num frames 18200... +[2023-02-27 11:10:13,316][00394] Num frames 18300... +[2023-02-27 11:10:13,446][00394] Num frames 18400... +[2023-02-27 11:10:13,566][00394] Num frames 18500... +[2023-02-27 11:10:13,739][00394] Avg episode rewards: #0: 61.212, true rewards: #0: 20.658 +[2023-02-27 11:10:13,741][00394] Avg episode reward: 61.212, avg true_objective: 20.658 +[2023-02-27 11:10:13,754][00394] Num frames 18600... +[2023-02-27 11:10:13,881][00394] Num frames 18700... +[2023-02-27 11:10:14,010][00394] Num frames 18800... +[2023-02-27 11:10:14,133][00394] Num frames 18900... +[2023-02-27 11:10:14,250][00394] Num frames 19000... +[2023-02-27 11:10:14,374][00394] Num frames 19100... +[2023-02-27 11:10:14,492][00394] Num frames 19200... +[2023-02-27 11:10:14,611][00394] Num frames 19300... +[2023-02-27 11:10:14,735][00394] Num frames 19400... +[2023-02-27 11:10:14,853][00394] Num frames 19500... +[2023-02-27 11:10:14,993][00394] Num frames 19600... +[2023-02-27 11:10:15,117][00394] Num frames 19700... +[2023-02-27 11:10:15,244][00394] Num frames 19800... +[2023-02-27 11:10:15,373][00394] Num frames 19900... +[2023-02-27 11:10:15,492][00394] Num frames 20000... +[2023-02-27 11:10:15,612][00394] Num frames 20100... +[2023-02-27 11:10:15,729][00394] Num frames 20200... +[2023-02-27 11:10:15,859][00394] Num frames 20300... +[2023-02-27 11:10:15,983][00394] Num frames 20400... +[2023-02-27 11:10:16,110][00394] Num frames 20500... +[2023-02-27 11:10:16,229][00394] Num frames 20600... +[2023-02-27 11:10:16,405][00394] Avg episode rewards: #0: 61.591, true rewards: #0: 20.692 +[2023-02-27 11:10:16,407][00394] Avg episode reward: 61.591, avg true_objective: 20.692 +[2023-02-27 11:12:28,617][00394] Replay video saved to train_dir/doom_health_gathering_supreme_2222/replay.mp4! +[2023-02-27 11:16:16,032][00394] Environment doom_basic already registered, overwriting... +[2023-02-27 11:16:16,038][00394] Environment doom_two_colors_easy already registered, overwriting... +[2023-02-27 11:16:16,040][00394] Environment doom_two_colors_hard already registered, overwriting... +[2023-02-27 11:16:16,042][00394] Environment doom_dm already registered, overwriting... +[2023-02-27 11:16:16,044][00394] Environment doom_dwango5 already registered, overwriting... +[2023-02-27 11:16:16,045][00394] Environment doom_my_way_home_flat_actions already registered, overwriting... +[2023-02-27 11:16:16,047][00394] Environment doom_defend_the_center_flat_actions already registered, overwriting... +[2023-02-27 11:16:16,048][00394] Environment doom_my_way_home already registered, overwriting... +[2023-02-27 11:16:16,050][00394] Environment doom_deadly_corridor already registered, overwriting... +[2023-02-27 11:16:16,051][00394] Environment doom_defend_the_center already registered, overwriting... +[2023-02-27 11:16:16,052][00394] Environment doom_defend_the_line already registered, overwriting... +[2023-02-27 11:16:16,056][00394] Environment doom_health_gathering already registered, overwriting... +[2023-02-27 11:16:16,057][00394] Environment doom_health_gathering_supreme already registered, overwriting... +[2023-02-27 11:16:16,058][00394] Environment doom_battle already registered, overwriting... +[2023-02-27 11:16:16,059][00394] Environment doom_battle2 already registered, overwriting... +[2023-02-27 11:16:16,061][00394] Environment doom_duel_bots already registered, overwriting... +[2023-02-27 11:16:16,062][00394] Environment doom_deathmatch_bots already registered, overwriting... +[2023-02-27 11:16:16,063][00394] Environment doom_duel already registered, overwriting... +[2023-02-27 11:16:16,065][00394] Environment doom_deathmatch_full already registered, overwriting... +[2023-02-27 11:16:16,066][00394] Environment doom_benchmark already registered, overwriting... +[2023-02-27 11:16:16,067][00394] register_encoder_factory: +[2023-02-27 11:16:16,105][00394] Loading legacy config file train_dir/doom_deathmatch_bots_2222/cfg.json instead of train_dir/doom_deathmatch_bots_2222/config.json +[2023-02-27 11:16:16,107][00394] Loading existing experiment configuration from train_dir/doom_deathmatch_bots_2222/config.json +[2023-02-27 11:16:16,108][00394] Overriding arg 'experiment' with value 'doom_deathmatch_bots_2222' passed from command line +[2023-02-27 11:16:16,110][00394] Overriding arg 'train_dir' with value 'train_dir' passed from command line +[2023-02-27 11:16:16,111][00394] Overriding arg 'num_workers' with value 1 passed from command line +[2023-02-27 11:16:16,114][00394] Adding new argument 'lr_adaptive_min'=1e-06 that is not in the saved config file! +[2023-02-27 11:16:16,115][00394] Adding new argument 'lr_adaptive_max'=0.01 that is not in the saved config file! +[2023-02-27 11:16:16,116][00394] Adding new argument 'env_gpu_observations'=True that is not in the saved config file! +[2023-02-27 11:16:16,118][00394] Adding new argument 'no_render'=True that is not in the saved config file! +[2023-02-27 11:16:16,119][00394] Adding new argument 'save_video'=True that is not in the saved config file! +[2023-02-27 11:16:16,120][00394] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! +[2023-02-27 11:16:16,121][00394] Adding new argument 'video_name'=None that is not in the saved config file! +[2023-02-27 11:16:16,123][00394] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! +[2023-02-27 11:16:16,124][00394] Adding new argument 'max_num_episodes'=1 that is not in the saved config file! +[2023-02-27 11:16:16,125][00394] Adding new argument 'push_to_hub'=False that is not in the saved config file! +[2023-02-27 11:16:16,127][00394] Adding new argument 'hf_repository'=None that is not in the saved config file! +[2023-02-27 11:16:16,128][00394] Adding new argument 'policy_index'=0 that is not in the saved config file! +[2023-02-27 11:16:16,129][00394] Adding new argument 'eval_deterministic'=False that is not in the saved config file! +[2023-02-27 11:16:16,131][00394] Adding new argument 'train_script'=None that is not in the saved config file! +[2023-02-27 11:16:16,132][00394] Adding new argument 'enjoy_script'=None that is not in the saved config file! +[2023-02-27 11:16:16,133][00394] Using frameskip 1 and render_action_repeat=4 for evaluation +[2023-02-27 11:16:16,177][00394] Port 40300 is available +[2023-02-27 11:16:16,179][00394] Using port 40300 +[2023-02-27 11:16:16,183][00394] RunningMeanStd input shape: (23,) +[2023-02-27 11:16:16,186][00394] RunningMeanStd input shape: (3, 72, 128) +[2023-02-27 11:16:16,189][00394] RunningMeanStd input shape: (1,) +[2023-02-27 11:16:16,204][00394] ConvEncoder: input_channels=3 +[2023-02-27 11:16:16,251][00394] Conv encoder output size: 512 +[2023-02-27 11:16:16,254][00394] Policy head output size: 512 +[2023-02-27 11:16:16,300][00394] Loading state from checkpoint train_dir/doom_deathmatch_bots_2222/checkpoint_p0/checkpoint_000282220_2311946240.pth... +[2023-02-27 11:16:16,336][00394] Using port 40300 on host... +[2023-02-27 11:16:16,669][00394] Initialized w:0 v:0 player:0 +[2023-02-27 11:16:16,841][00394] Num frames 100... +[2023-02-27 11:16:17,008][00394] Num frames 200... +[2023-02-27 11:16:17,189][00394] Num frames 300... +[2023-02-27 11:16:17,377][00394] Num frames 400... +[2023-02-27 11:16:17,553][00394] Num frames 500... +[2023-02-27 11:16:17,722][00394] Num frames 600... +[2023-02-27 11:16:17,887][00394] Num frames 700... +[2023-02-27 11:16:18,056][00394] Num frames 800... +[2023-02-27 11:16:18,236][00394] Num frames 900... +[2023-02-27 11:16:18,419][00394] Num frames 1000... +[2023-02-27 11:16:18,586][00394] Num frames 1100... +[2023-02-27 11:16:18,760][00394] Num frames 1200... +[2023-02-27 11:16:18,940][00394] Num frames 1300... +[2023-02-27 11:16:19,126][00394] Num frames 1400... +[2023-02-27 11:16:19,305][00394] Num frames 1500... +[2023-02-27 11:16:19,478][00394] Num frames 1600... +[2023-02-27 11:16:19,647][00394] Num frames 1700... +[2023-02-27 11:16:19,829][00394] Num frames 1800... +[2023-02-27 11:16:20,003][00394] Num frames 1900... +[2023-02-27 11:16:20,170][00394] Num frames 2000... +[2023-02-27 11:16:20,358][00394] Num frames 2100... +[2023-02-27 11:16:20,533][00394] Num frames 2200... +[2023-02-27 11:16:20,706][00394] Num frames 2300... +[2023-02-27 11:16:20,892][00394] Num frames 2400... +[2023-02-27 11:16:21,061][00394] Num frames 2500... +[2023-02-27 11:16:21,234][00394] Num frames 2600... +[2023-02-27 11:16:21,437][00394] Num frames 2700... +[2023-02-27 11:16:21,608][00394] Num frames 2800... +[2023-02-27 11:16:21,769][00394] Num frames 2900... +[2023-02-27 11:16:21,941][00394] Num frames 3000... +[2023-02-27 11:16:22,181][00394] Num frames 3100... +[2023-02-27 11:16:22,459][00394] Num frames 3200... +[2023-02-27 11:16:22,697][00394] Num frames 3300... +[2023-02-27 11:16:22,940][00394] Num frames 3400... +[2023-02-27 11:16:23,191][00394] Num frames 3500... +[2023-02-27 11:16:23,483][00394] Num frames 3600... +[2023-02-27 11:16:23,733][00394] Num frames 3700... +[2023-02-27 11:16:23,993][00394] Num frames 3800... +[2023-02-27 11:16:24,238][00394] Num frames 3900... +[2023-02-27 11:16:24,494][00394] Num frames 4000... +[2023-02-27 11:16:24,738][00394] Num frames 4100... +[2023-02-27 11:16:24,991][00394] Num frames 4200... +[2023-02-27 11:16:25,241][00394] Num frames 4300... +[2023-02-27 11:16:25,494][00394] Num frames 4400... +[2023-02-27 11:16:25,753][00394] Num frames 4500... +[2023-02-27 11:16:25,944][00394] Num frames 4600... +[2023-02-27 11:16:26,127][00394] Num frames 4700... +[2023-02-27 11:16:26,305][00394] Num frames 4800... +[2023-02-27 11:16:26,483][00394] Num frames 4900... +[2023-02-27 11:16:26,680][00394] Num frames 5000... +[2023-02-27 11:16:26,850][00394] Num frames 5100... +[2023-02-27 11:16:27,029][00394] Num frames 5200... +[2023-02-27 11:16:27,203][00394] Num frames 5300... +[2023-02-27 11:16:27,381][00394] Num frames 5400... +[2023-02-27 11:16:27,565][00394] Num frames 5500... +[2023-02-27 11:16:27,739][00394] Num frames 5600... +[2023-02-27 11:16:27,919][00394] Num frames 5700... +[2023-02-27 11:16:28,091][00394] Num frames 5800... +[2023-02-27 11:16:28,258][00394] Num frames 5900... +[2023-02-27 11:16:28,436][00394] Num frames 6000... +[2023-02-27 11:16:28,623][00394] Num frames 6100... +[2023-02-27 11:16:28,801][00394] Num frames 6200... +[2023-02-27 11:16:28,968][00394] Num frames 6300... +[2023-02-27 11:16:29,150][00394] Num frames 6400... +[2023-02-27 11:16:29,327][00394] Num frames 6500... +[2023-02-27 11:16:29,502][00394] Num frames 6600... +[2023-02-27 11:16:29,686][00394] Num frames 6700... +[2023-02-27 11:16:29,855][00394] Num frames 6800... +[2023-02-27 11:16:30,026][00394] Num frames 6900... +[2023-02-27 11:16:30,204][00394] Num frames 7000... +[2023-02-27 11:16:30,384][00394] Num frames 7100... +[2023-02-27 11:16:30,556][00394] Num frames 7200... +[2023-02-27 11:16:30,734][00394] Num frames 7300... +[2023-02-27 11:16:30,904][00394] Num frames 7400... +[2023-02-27 11:16:31,084][00394] Num frames 7500... +[2023-02-27 11:16:31,267][00394] Num frames 7600... +[2023-02-27 11:16:31,442][00394] Num frames 7700... +[2023-02-27 11:16:31,651][00394] Num frames 7800... +[2023-02-27 11:16:31,830][00394] Num frames 7900... +[2023-02-27 11:16:32,004][00394] Num frames 8000... +[2023-02-27 11:16:32,179][00394] Num frames 8100... +[2023-02-27 11:16:32,360][00394] Num frames 8200... +[2023-02-27 11:16:32,540][00394] Num frames 8300... +[2023-02-27 11:16:32,715][00394] DAMAGECOUNT value on done: 6533.0 +[2023-02-27 11:16:32,717][00394] Sum rewards: 91.524, reward structure: {'DEATHCOUNT': '-12.750', 'HEALTH': '-5.080', 'AMMO5': '0.007', 'AMMO2': '0.022', 'AMMO4': '0.107', 'AMMO3': '0.204', 'WEAPON4': '0.300', 'WEAPON5': '0.300', 'weapon5': '0.654', 'weapon4': '1.066', 'WEAPON3': '1.500', 'weapon2': '1.656', 'HITCOUNT': '3.430', 'weapon3': '12.508', 'DAMAGECOUNT': '19.599', 'FRAGCOUNT': '68.000'} +[2023-02-27 11:16:32,781][00394] Avg episode rewards: #0: 91.519, true rewards: #0: 68.000 +[2023-02-27 11:16:32,783][00394] Avg episode reward: 91.519, avg true_objective: 68.000 +[2023-02-27 11:16:32,790][00394] Num frames 8400... +[2023-02-27 11:17:26,161][00394] Replay video saved to train_dir/doom_deathmatch_bots_2222/replay.mp4! +[2023-02-27 11:23:23,959][00394] Environment doom_basic already registered, overwriting... +[2023-02-27 11:23:23,961][00394] Environment doom_two_colors_easy already registered, overwriting... +[2023-02-27 11:23:23,963][00394] Environment doom_two_colors_hard already registered, overwriting... +[2023-02-27 11:23:23,966][00394] Environment doom_dm already registered, overwriting... +[2023-02-27 11:23:23,968][00394] Environment doom_dwango5 already registered, overwriting... +[2023-02-27 11:23:23,969][00394] Environment doom_my_way_home_flat_actions already registered, overwriting... +[2023-02-27 11:23:23,970][00394] Environment doom_defend_the_center_flat_actions already registered, overwriting... +[2023-02-27 11:23:23,971][00394] Environment doom_my_way_home already registered, overwriting... +[2023-02-27 11:23:23,974][00394] Environment doom_deadly_corridor already registered, overwriting... +[2023-02-27 11:23:23,975][00394] Environment doom_defend_the_center already registered, overwriting... +[2023-02-27 11:23:23,978][00394] Environment doom_defend_the_line already registered, overwriting... +[2023-02-27 11:23:23,979][00394] Environment doom_health_gathering already registered, overwriting... +[2023-02-27 11:23:23,981][00394] Environment doom_health_gathering_supreme already registered, overwriting... +[2023-02-27 11:23:23,984][00394] Environment doom_battle already registered, overwriting... +[2023-02-27 11:23:23,986][00394] Environment doom_battle2 already registered, overwriting... +[2023-02-27 11:23:23,987][00394] Environment doom_duel_bots already registered, overwriting... +[2023-02-27 11:23:23,988][00394] Environment doom_deathmatch_bots already registered, overwriting... +[2023-02-27 11:23:23,992][00394] Environment doom_duel already registered, overwriting... +[2023-02-27 11:23:23,993][00394] Environment doom_deathmatch_full already registered, overwriting... +[2023-02-27 11:23:23,994][00394] Environment doom_benchmark already registered, overwriting... +[2023-02-27 11:23:23,995][00394] register_encoder_factory: +[2023-02-27 11:23:24,031][00394] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json +[2023-02-27 11:23:24,042][00394] Experiment dir /content/train_dir/default_experiment already exists! +[2023-02-27 11:23:24,043][00394] Resuming existing experiment from /content/train_dir/default_experiment... +[2023-02-27 11:23:24,045][00394] Weights and Biases integration disabled +[2023-02-27 11:23:24,050][00394] Environment var CUDA_VISIBLE_DEVICES is 0 + +[2023-02-27 11:23:26,034][00394] Starting experiment with the following configuration: +help=False +algo=APPO +env=doom_health_gathering_supreme +experiment=default_experiment +train_dir=/content/train_dir +restart_behavior=resume +device=gpu +seed=None +num_policies=1 +async_rl=True +serial_mode=False +batched_sampling=False +num_batches_to_accumulate=2 +worker_num_splits=2 +policy_workers_per_policy=1 +max_policy_lag=1000 +num_workers=8 +num_envs_per_worker=4 +batch_size=1024 +num_batches_per_epoch=1 +num_epochs=1 +rollout=32 +recurrence=32 +shuffle_minibatches=False +gamma=0.99 +reward_scale=1.0 +reward_clip=1000.0 +value_bootstrap=False +normalize_returns=True +exploration_loss_coeff=0.001 +value_loss_coeff=0.5 +kl_loss_coeff=0.0 +exploration_loss=symmetric_kl +gae_lambda=0.95 +ppo_clip_ratio=0.1 +ppo_clip_value=0.2 +with_vtrace=False +vtrace_rho=1.0 +vtrace_c=1.0 +optimizer=adam +adam_eps=1e-06 +adam_beta1=0.9 +adam_beta2=0.999 +max_grad_norm=4.0 +learning_rate=0.0001 +lr_schedule=constant +lr_schedule_kl_threshold=0.008 +lr_adaptive_min=1e-06 +lr_adaptive_max=0.01 +obs_subtract_mean=0.0 +obs_scale=255.0 +normalize_input=True +normalize_input_keys=None +decorrelate_experience_max_seconds=0 +decorrelate_envs_on_one_worker=True +actor_worker_gpus=[] +set_workers_cpu_affinity=True +force_envs_single_thread=False +default_niceness=0 +log_to_file=True +experiment_summaries_interval=10 +flush_summaries_interval=30 +stats_avg=100 +summaries_use_frameskip=True +heartbeat_interval=20 +heartbeat_reporting_interval=600 +train_for_env_steps=4000000 +train_for_seconds=10000000000 +save_every_sec=120 +keep_checkpoints=2 +load_checkpoint_kind=latest +save_milestones_sec=-1 +save_best_every_sec=5 +save_best_metric=reward +save_best_after=100000 +benchmark=False +encoder_mlp_layers=[512, 512] +encoder_conv_architecture=convnet_simple +encoder_conv_mlp_layers=[512] +use_rnn=True +rnn_size=512 +rnn_type=gru +rnn_num_layers=1 +decoder_mlp_layers=[] +nonlinearity=elu +policy_initialization=orthogonal +policy_init_gain=1.0 +actor_critic_share_weights=True +adaptive_stddev=True +continuous_tanh_scale=0.0 +initial_stddev=1.0 +use_env_info_cache=False +env_gpu_actions=False +env_gpu_observations=True +env_frameskip=4 +env_framestack=1 +pixel_format=CHW +use_record_episode_statistics=False +with_wandb=False +wandb_user=None +wandb_project=sample_factory +wandb_group=None +wandb_job_type=SF +wandb_tags=[] +with_pbt=False +pbt_mix_policies_in_one_env=True +pbt_period_env_steps=5000000 +pbt_start_mutation=20000000 +pbt_replace_fraction=0.3 +pbt_mutation_rate=0.15 +pbt_replace_reward_gap=0.1 +pbt_replace_reward_gap_absolute=1e-06 +pbt_optimize_gamma=False +pbt_target_objective=true_objective +pbt_perturb_min=1.1 +pbt_perturb_max=1.5 +num_agents=-1 +num_humans=0 +num_bots=-1 +start_bot_difficulty=None +timelimit=None +res_w=128 +res_h=72 +wide_aspect_ratio=False +eval_env_frameskip=1 +fps=35 +command_line=--env=doom_health_gathering_supreme --num_workers=8 --num_envs_per_worker=4 --train_for_env_steps=4000000 +cli_args={'env': 'doom_health_gathering_supreme', 'num_workers': 8, 'num_envs_per_worker': 4, 'train_for_env_steps': 4000000} +git_hash=unknown +git_repo_name=not a git repository +[2023-02-27 11:23:26,037][00394] Saving configuration to /content/train_dir/default_experiment/config.json... +[2023-02-27 11:23:26,041][00394] Rollout worker 0 uses device cpu +[2023-02-27 11:23:26,045][00394] Rollout worker 1 uses device cpu +[2023-02-27 11:23:26,046][00394] Rollout worker 2 uses device cpu +[2023-02-27 11:23:26,048][00394] Rollout worker 3 uses device cpu +[2023-02-27 11:23:26,049][00394] Rollout worker 4 uses device cpu +[2023-02-27 11:23:26,051][00394] Rollout worker 5 uses device cpu +[2023-02-27 11:23:26,052][00394] Rollout worker 6 uses device cpu +[2023-02-27 11:23:26,054][00394] Rollout worker 7 uses device cpu +[2023-02-27 11:23:26,172][00394] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2023-02-27 11:23:26,173][00394] InferenceWorker_p0-w0: min num requests: 2 +[2023-02-27 11:23:26,211][00394] Starting all processes... +[2023-02-27 11:23:26,213][00394] Starting process learner_proc0 +[2023-02-27 11:23:26,371][00394] Starting all processes... +[2023-02-27 11:23:26,378][00394] Starting process inference_proc0-0 +[2023-02-27 11:23:26,378][00394] Starting process rollout_proc0 +[2023-02-27 11:23:26,381][00394] Starting process rollout_proc1 +[2023-02-27 11:23:26,381][00394] Starting process rollout_proc2 +[2023-02-27 11:23:26,381][00394] Starting process rollout_proc3 +[2023-02-27 11:23:26,381][00394] Starting process rollout_proc4 +[2023-02-27 11:23:26,381][00394] Starting process rollout_proc5 +[2023-02-27 11:23:26,381][00394] Starting process rollout_proc6 +[2023-02-27 11:23:26,381][00394] Starting process rollout_proc7 +[2023-02-27 11:23:34,354][23894] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2023-02-27 11:23:34,361][23894] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 +[2023-02-27 11:23:34,414][23894] Num visible devices: 1 +[2023-02-27 11:23:34,451][23894] Starting seed is not provided +[2023-02-27 11:23:34,452][23894] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2023-02-27 11:23:34,453][23894] Initializing actor-critic model on device cuda:0 +[2023-02-27 11:23:34,454][23894] RunningMeanStd input shape: (3, 72, 128) +[2023-02-27 11:23:34,461][23894] RunningMeanStd input shape: (1,) +[2023-02-27 11:23:34,562][23894] ConvEncoder: input_channels=3 +[2023-02-27 11:23:35,408][23894] Conv encoder output size: 512 +[2023-02-27 11:23:35,418][23894] Policy head output size: 512 +[2023-02-27 11:23:35,440][23908] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2023-02-27 11:23:35,445][23908] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 +[2023-02-27 11:23:35,517][23908] Num visible devices: 1 +[2023-02-27 11:23:35,539][23894] Created Actor Critic model with architecture: +[2023-02-27 11:23:35,567][23894] ActorCriticSharedWeights( + (obs_normalizer): ObservationNormalizer( + (running_mean_std): RunningMeanStdDictInPlace( + (running_mean_std): ModuleDict( + (obs): RunningMeanStdInPlace() + ) + ) + ) + (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) + (encoder): VizdoomEncoder( + (basic_encoder): ConvEncoder( + (enc): RecursiveScriptModule( + original_name=ConvEncoderImpl + (conv_head): RecursiveScriptModule( + original_name=Sequential + (0): RecursiveScriptModule(original_name=Conv2d) + (1): RecursiveScriptModule(original_name=ELU) + (2): RecursiveScriptModule(original_name=Conv2d) + (3): RecursiveScriptModule(original_name=ELU) + (4): RecursiveScriptModule(original_name=Conv2d) + (5): RecursiveScriptModule(original_name=ELU) + ) + (mlp_layers): RecursiveScriptModule( + original_name=Sequential + (0): RecursiveScriptModule(original_name=Linear) + (1): RecursiveScriptModule(original_name=ELU) + ) + ) + ) + ) + (core): ModelCoreRNN( + (core): GRU(512, 512) + ) + (decoder): MlpDecoder( + (mlp): Identity() + ) + (critic_linear): Linear(in_features=512, out_features=1, bias=True) + (action_parameterization): ActionParameterizationDefault( + (distribution_linear): Linear(in_features=512, out_features=5, bias=True) + ) +) +[2023-02-27 11:23:36,088][23909] Worker 0 uses CPU cores [0] +[2023-02-27 11:23:37,035][23910] Worker 1 uses CPU cores [1] +[2023-02-27 11:23:37,240][23917] Worker 3 uses CPU cores [1] +[2023-02-27 11:23:37,626][23921] Worker 2 uses CPU cores [0] +[2023-02-27 11:23:37,892][23923] Worker 4 uses CPU cores [0] +[2023-02-27 11:23:38,086][23925] Worker 5 uses CPU cores [1] +[2023-02-27 11:23:38,140][23931] Worker 7 uses CPU cores [1] +[2023-02-27 11:23:38,395][23933] Worker 6 uses CPU cores [0] +[2023-02-27 11:23:41,048][23894] Using optimizer +[2023-02-27 11:23:41,049][23894] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... +[2023-02-27 11:23:41,085][23894] Loading model from checkpoint +[2023-02-27 11:23:41,089][23894] Loaded experiment state at self.train_step=978, self.env_steps=4005888 +[2023-02-27 11:23:41,090][23894] Initialized policy 0 weights for model version 978 +[2023-02-27 11:23:41,092][23894] LearnerWorker_p0 finished initialization! +[2023-02-27 11:23:41,094][23894] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2023-02-27 11:23:41,345][23908] RunningMeanStd input shape: (3, 72, 128) +[2023-02-27 11:23:41,346][23908] RunningMeanStd input shape: (1,) +[2023-02-27 11:23:41,360][23908] ConvEncoder: input_channels=3 +[2023-02-27 11:23:41,469][23908] Conv encoder output size: 512 +[2023-02-27 11:23:41,470][23908] Policy head output size: 512 +[2023-02-27 11:23:43,850][00394] Inference worker 0-0 is ready! +[2023-02-27 11:23:43,853][00394] All inference workers are ready! Signal rollout workers to start! +[2023-02-27 11:23:43,973][23909] Doom resolution: 160x120, resize resolution: (128, 72) +[2023-02-27 11:23:43,979][23923] Doom resolution: 160x120, resize resolution: (128, 72) +[2023-02-27 11:23:43,995][23921] Doom resolution: 160x120, resize resolution: (128, 72) +[2023-02-27 11:23:44,014][23933] Doom resolution: 160x120, resize resolution: (128, 72) +[2023-02-27 11:23:44,053][00394] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 4005888. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2023-02-27 11:23:44,101][23910] Doom resolution: 160x120, resize resolution: (128, 72) +[2023-02-27 11:23:44,115][23931] Doom resolution: 160x120, resize resolution: (128, 72) +[2023-02-27 11:23:44,118][23917] Doom resolution: 160x120, resize resolution: (128, 72) +[2023-02-27 11:23:44,137][23925] Doom resolution: 160x120, resize resolution: (128, 72) +[2023-02-27 11:23:45,000][23931] Decorrelating experience for 0 frames... +[2023-02-27 11:23:45,002][23910] Decorrelating experience for 0 frames... +[2023-02-27 11:23:45,573][23933] Decorrelating experience for 0 frames... +[2023-02-27 11:23:45,578][23921] Decorrelating experience for 0 frames... +[2023-02-27 11:23:45,581][23909] Decorrelating experience for 0 frames... +[2023-02-27 11:23:45,583][23923] Decorrelating experience for 0 frames... +[2023-02-27 11:23:45,817][23910] Decorrelating experience for 32 frames... +[2023-02-27 11:23:45,824][23931] Decorrelating experience for 32 frames... +[2023-02-27 11:23:46,164][00394] Heartbeat connected on Batcher_0 +[2023-02-27 11:23:46,171][00394] Heartbeat connected on LearnerWorker_p0 +[2023-02-27 11:23:46,223][00394] Heartbeat connected on InferenceWorker_p0-w0 +[2023-02-27 11:23:46,526][23923] Decorrelating experience for 32 frames... +[2023-02-27 11:23:46,527][23909] Decorrelating experience for 32 frames... +[2023-02-27 11:23:46,910][23925] Decorrelating experience for 0 frames... +[2023-02-27 11:23:47,222][23910] Decorrelating experience for 64 frames... +[2023-02-27 11:23:47,233][23923] Decorrelating experience for 64 frames... +[2023-02-27 11:23:47,811][23921] Decorrelating experience for 32 frames... +[2023-02-27 11:23:47,852][23931] Decorrelating experience for 64 frames... +[2023-02-27 11:23:48,202][23923] Decorrelating experience for 96 frames... +[2023-02-27 11:23:48,256][23925] Decorrelating experience for 32 frames... +[2023-02-27 11:23:48,384][00394] Heartbeat connected on RolloutWorker_w4 +[2023-02-27 11:23:48,555][23910] Decorrelating experience for 96 frames... +[2023-02-27 11:23:48,811][00394] Heartbeat connected on RolloutWorker_w1 +[2023-02-27 11:23:49,051][00394] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 4005888. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2023-02-27 11:23:49,270][23931] Decorrelating experience for 96 frames... +[2023-02-27 11:23:49,477][00394] Heartbeat connected on RolloutWorker_w7 +[2023-02-27 11:23:49,631][23925] Decorrelating experience for 64 frames... +[2023-02-27 11:23:49,780][23933] Decorrelating experience for 32 frames... +[2023-02-27 11:23:49,899][23921] Decorrelating experience for 64 frames... +[2023-02-27 11:23:51,415][23925] Decorrelating experience for 96 frames... +[2023-02-27 11:23:51,489][23917] Decorrelating experience for 0 frames... +[2023-02-27 11:23:51,819][00394] Heartbeat connected on RolloutWorker_w5 +[2023-02-27 11:23:52,861][23921] Decorrelating experience for 96 frames... +[2023-02-27 11:23:52,868][23933] Decorrelating experience for 64 frames... +[2023-02-27 11:23:53,735][00394] Heartbeat connected on RolloutWorker_w2 +[2023-02-27 11:23:54,054][00394] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 4005888. Throughput: 0: 119.6. Samples: 1196. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2023-02-27 11:23:54,065][00394] Avg episode reward: [(0, '2.480')] +[2023-02-27 11:23:56,929][23917] Decorrelating experience for 32 frames... +[2023-02-27 11:23:57,516][23894] Signal inference workers to stop experience collection... +[2023-02-27 11:23:57,537][23908] InferenceWorker_p0-w0: stopping experience collection +[2023-02-27 11:23:57,618][23909] Decorrelating experience for 64 frames... +[2023-02-27 11:23:58,371][23917] Decorrelating experience for 64 frames... +[2023-02-27 11:23:58,937][23933] Decorrelating experience for 96 frames... +[2023-02-27 11:23:59,051][00394] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 4005888. Throughput: 0: 157.2. Samples: 2358. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2023-02-27 11:23:59,058][00394] Avg episode reward: [(0, '3.160')] +[2023-02-27 11:23:59,201][23917] Decorrelating experience for 96 frames... +[2023-02-27 11:23:59,194][00394] Heartbeat connected on RolloutWorker_w6 +[2023-02-27 11:23:59,303][00394] Heartbeat connected on RolloutWorker_w3 +[2023-02-27 11:23:59,443][23909] Decorrelating experience for 96 frames... +[2023-02-27 11:23:59,503][00394] Heartbeat connected on RolloutWorker_w0 +[2023-02-27 11:24:01,278][23894] Signal inference workers to resume experience collection... +[2023-02-27 11:24:01,281][23894] Stopping Batcher_0... +[2023-02-27 11:24:01,282][23894] Loop batcher_evt_loop terminating... +[2023-02-27 11:24:01,325][23908] Weights refcount: 2 0 +[2023-02-27 11:24:01,342][23908] Stopping InferenceWorker_p0-w0... +[2023-02-27 11:24:01,343][23908] Loop inference_proc0-0_evt_loop terminating... +[2023-02-27 11:24:01,375][00394] Component Batcher_0 stopped! +[2023-02-27 11:24:01,378][00394] Component InferenceWorker_p0-w0 stopped! +[2023-02-27 11:24:01,583][00394] Component RolloutWorker_w7 stopped! +[2023-02-27 11:24:01,582][23931] Stopping RolloutWorker_w7... +[2023-02-27 11:24:01,595][23931] Loop rollout_proc7_evt_loop terminating... +[2023-02-27 11:24:01,603][23925] Stopping RolloutWorker_w5... +[2023-02-27 11:24:01,605][00394] Component RolloutWorker_w5 stopped! +[2023-02-27 11:24:01,614][23910] Stopping RolloutWorker_w1... +[2023-02-27 11:24:01,615][23910] Loop rollout_proc1_evt_loop terminating... +[2023-02-27 11:24:01,615][00394] Component RolloutWorker_w1 stopped! +[2023-02-27 11:24:01,604][23925] Loop rollout_proc5_evt_loop terminating... +[2023-02-27 11:24:01,626][23917] Stopping RolloutWorker_w3... +[2023-02-27 11:24:01,626][23917] Loop rollout_proc3_evt_loop terminating... +[2023-02-27 11:24:01,626][00394] Component RolloutWorker_w3 stopped! +[2023-02-27 11:24:01,636][00394] Component RolloutWorker_w2 stopped! +[2023-02-27 11:24:01,646][00394] Component RolloutWorker_w4 stopped! +[2023-02-27 11:24:01,646][23923] Stopping RolloutWorker_w4... +[2023-02-27 11:24:01,640][23921] Stopping RolloutWorker_w2... +[2023-02-27 11:24:01,648][23923] Loop rollout_proc4_evt_loop terminating... +[2023-02-27 11:24:01,660][23921] Loop rollout_proc2_evt_loop terminating... +[2023-02-27 11:24:01,676][00394] Component RolloutWorker_w6 stopped! +[2023-02-27 11:24:01,679][23933] Stopping RolloutWorker_w6... +[2023-02-27 11:24:01,688][00394] Component RolloutWorker_w0 stopped! +[2023-02-27 11:24:01,691][23909] Stopping RolloutWorker_w0... +[2023-02-27 11:24:01,684][23933] Loop rollout_proc6_evt_loop terminating... +[2023-02-27 11:24:01,696][23909] Loop rollout_proc0_evt_loop terminating... +[2023-02-27 11:24:04,479][23894] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000980_4014080.pth... +[2023-02-27 11:24:04,614][23894] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000915_3747840.pth +[2023-02-27 11:24:04,618][23894] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000980_4014080.pth... +[2023-02-27 11:24:04,751][23894] Stopping LearnerWorker_p0... +[2023-02-27 11:24:04,752][23894] Loop learner_proc0_evt_loop terminating... +[2023-02-27 11:24:04,751][00394] Component LearnerWorker_p0 stopped! +[2023-02-27 11:24:04,755][00394] Waiting for process learner_proc0 to stop... +[2023-02-27 11:24:05,883][00394] Waiting for process inference_proc0-0 to join... +[2023-02-27 11:24:05,887][00394] Waiting for process rollout_proc0 to join... +[2023-02-27 11:24:05,890][00394] Waiting for process rollout_proc1 to join... +[2023-02-27 11:24:05,893][00394] Waiting for process rollout_proc2 to join... +[2023-02-27 11:24:05,895][00394] Waiting for process rollout_proc3 to join... +[2023-02-27 11:24:05,900][00394] Waiting for process rollout_proc4 to join... +[2023-02-27 11:24:05,902][00394] Waiting for process rollout_proc5 to join... +[2023-02-27 11:24:05,903][00394] Waiting for process rollout_proc6 to join... +[2023-02-27 11:24:05,905][00394] Waiting for process rollout_proc7 to join... +[2023-02-27 11:24:05,907][00394] Batcher 0 profile tree view: +batching: 0.0452, releasing_batches: 0.0024 +[2023-02-27 11:24:05,915][00394] InferenceWorker_p0-w0 profile tree view: +update_model: 0.0912 +wait_policy: 0.0046 + wait_policy_total: 8.8212 +one_step: 0.0031 + handle_policy_step: 4.4875 + deserialize: 0.0534, stack: 0.0127, obs_to_device_normalize: 0.4584, forward: 3.3266, send_messages: 0.1138 + prepare_outputs: 0.3910 + to_cpu: 0.2344 +[2023-02-27 11:24:05,917][00394] Learner 0 profile tree view: +misc: 0.0000, prepare_batch: 9.0379 +train: 2.3827 + epoch_init: 0.0000, minibatch_init: 0.0000, losses_postprocess: 0.0006, kl_divergence: 0.0035, after_optimizer: 0.0410 + calculate_losses: 0.4591 + losses_init: 0.0000, forward_head: 0.1491, bptt_initial: 0.2404, tail: 0.0125, advantages_returns: 0.0064, losses: 0.0396 + bptt: 0.0090 + bptt_forward_core: 0.0088 + update: 1.8570 + clip: 0.0439 +[2023-02-27 11:24:05,919][00394] RolloutWorker_w0 profile tree view: +wait_for_trajectories: 0.0003, enqueue_policy_requests: 0.0006 +[2023-02-27 11:24:05,920][00394] RolloutWorker_w7 profile tree view: +wait_for_trajectories: 0.0015, enqueue_policy_requests: 1.6415, env_step: 4.3093, overhead: 0.2027, complete_rollouts: 0.0084 +save_policy_outputs: 0.1547 + split_output_tensors: 0.0689 +[2023-02-27 11:24:05,927][00394] Loop Runner_EvtLoop terminating... +[2023-02-27 11:24:05,929][00394] Runner profile tree view: +main_loop: 39.7182 +[2023-02-27 11:24:05,933][00394] Collected {0: 4014080}, FPS: 206.3 +[2023-02-27 11:24:32,485][00394] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json +[2023-02-27 11:24:32,488][00394] Overriding arg 'num_workers' with value 1 passed from command line +[2023-02-27 11:24:32,489][00394] Adding new argument 'no_render'=True that is not in the saved config file! +[2023-02-27 11:24:32,492][00394] Adding new argument 'save_video'=True that is not in the saved config file! +[2023-02-27 11:24:32,495][00394] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! +[2023-02-27 11:24:32,497][00394] Adding new argument 'video_name'=None that is not in the saved config file! +[2023-02-27 11:24:32,498][00394] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! +[2023-02-27 11:24:32,500][00394] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! +[2023-02-27 11:24:32,505][00394] Adding new argument 'push_to_hub'=False that is not in the saved config file! +[2023-02-27 11:24:32,507][00394] Adding new argument 'hf_repository'=None that is not in the saved config file! +[2023-02-27 11:24:32,511][00394] Adding new argument 'policy_index'=0 that is not in the saved config file! +[2023-02-27 11:24:32,513][00394] Adding new argument 'eval_deterministic'=False that is not in the saved config file! +[2023-02-27 11:24:32,514][00394] Adding new argument 'train_script'=None that is not in the saved config file! +[2023-02-27 11:24:32,515][00394] Adding new argument 'enjoy_script'=None that is not in the saved config file! +[2023-02-27 11:24:32,517][00394] Using frameskip 1 and render_action_repeat=4 for evaluation +[2023-02-27 11:24:32,560][00394] RunningMeanStd input shape: (3, 72, 128) +[2023-02-27 11:24:32,563][00394] RunningMeanStd input shape: (1,) +[2023-02-27 11:24:32,594][00394] ConvEncoder: input_channels=3 +[2023-02-27 11:24:32,747][00394] Conv encoder output size: 512 +[2023-02-27 11:24:32,750][00394] Policy head output size: 512 +[2023-02-27 11:24:32,848][00394] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000980_4014080.pth... +[2023-02-27 11:24:33,902][00394] Num frames 100... +[2023-02-27 11:24:34,023][00394] Num frames 200... +[2023-02-27 11:24:34,145][00394] Num frames 300... +[2023-02-27 11:24:34,271][00394] Num frames 400... +[2023-02-27 11:24:34,347][00394] Avg episode rewards: #0: 5.160, true rewards: #0: 4.160 +[2023-02-27 11:24:34,349][00394] Avg episode reward: 5.160, avg true_objective: 4.160 +[2023-02-27 11:24:34,462][00394] Num frames 500... +[2023-02-27 11:24:34,579][00394] Num frames 600... +[2023-02-27 11:24:34,714][00394] Num frames 700... +[2023-02-27 11:24:34,838][00394] Num frames 800... +[2023-02-27 11:24:34,893][00394] Avg episode rewards: #0: 4.500, true rewards: #0: 4.000 +[2023-02-27 11:24:34,896][00394] Avg episode reward: 4.500, avg true_objective: 4.000 +[2023-02-27 11:24:35,008][00394] Num frames 900... +[2023-02-27 11:24:35,131][00394] Num frames 1000... +[2023-02-27 11:24:35,254][00394] Num frames 1100... +[2023-02-27 11:24:35,375][00394] Num frames 1200... +[2023-02-27 11:24:35,487][00394] Avg episode rewards: #0: 4.827, true rewards: #0: 4.160 +[2023-02-27 11:24:35,489][00394] Avg episode reward: 4.827, avg true_objective: 4.160 +[2023-02-27 11:24:35,559][00394] Num frames 1300... +[2023-02-27 11:24:35,677][00394] Num frames 1400... +[2023-02-27 11:24:35,795][00394] Num frames 1500... +[2023-02-27 11:24:35,911][00394] Num frames 1600... +[2023-02-27 11:24:36,004][00394] Avg episode rewards: #0: 4.580, true rewards: #0: 4.080 +[2023-02-27 11:24:36,006][00394] Avg episode reward: 4.580, avg true_objective: 4.080 +[2023-02-27 11:24:36,090][00394] Num frames 1700... +[2023-02-27 11:24:36,208][00394] Num frames 1800... +[2023-02-27 11:24:36,334][00394] Num frames 1900... +[2023-02-27 11:24:36,458][00394] Num frames 2000... +[2023-02-27 11:24:36,577][00394] Avg episode rewards: #0: 4.496, true rewards: #0: 4.096 +[2023-02-27 11:24:36,581][00394] Avg episode reward: 4.496, avg true_objective: 4.096 +[2023-02-27 11:24:36,644][00394] Num frames 2100... +[2023-02-27 11:24:36,762][00394] Num frames 2200... +[2023-02-27 11:24:36,875][00394] Num frames 2300... +[2023-02-27 11:24:36,980][00394] Avg episode rewards: #0: 4.393, true rewards: #0: 3.893 +[2023-02-27 11:24:36,982][00394] Avg episode reward: 4.393, avg true_objective: 3.893 +[2023-02-27 11:24:37,060][00394] Num frames 2400... +[2023-02-27 11:24:37,179][00394] Num frames 2500... +[2023-02-27 11:24:37,304][00394] Num frames 2600... +[2023-02-27 11:24:37,425][00394] Num frames 2700... +[2023-02-27 11:24:37,561][00394] Num frames 2800... +[2023-02-27 11:24:37,681][00394] Num frames 2900... +[2023-02-27 11:24:37,787][00394] Avg episode rewards: #0: 5.063, true rewards: #0: 4.206 +[2023-02-27 11:24:37,789][00394] Avg episode reward: 5.063, avg true_objective: 4.206 +[2023-02-27 11:24:37,860][00394] Num frames 3000... +[2023-02-27 11:24:37,975][00394] Num frames 3100... +[2023-02-27 11:24:38,092][00394] Num frames 3200... +[2023-02-27 11:24:38,209][00394] Num frames 3300... +[2023-02-27 11:24:38,334][00394] Num frames 3400... +[2023-02-27 11:24:38,419][00394] Avg episode rewards: #0: 5.280, true rewards: #0: 4.280 +[2023-02-27 11:24:38,420][00394] Avg episode reward: 5.280, avg true_objective: 4.280 +[2023-02-27 11:24:38,519][00394] Num frames 3500... +[2023-02-27 11:24:38,647][00394] Num frames 3600... +[2023-02-27 11:24:38,761][00394] Num frames 3700... +[2023-02-27 11:24:38,882][00394] Num frames 3800... +[2023-02-27 11:24:38,947][00394] Avg episode rewards: #0: 5.120, true rewards: #0: 4.231 +[2023-02-27 11:24:38,950][00394] Avg episode reward: 5.120, avg true_objective: 4.231 +[2023-02-27 11:24:39,067][00394] Num frames 3900... +[2023-02-27 11:24:39,181][00394] Num frames 4000... +[2023-02-27 11:24:39,305][00394] Num frames 4100... +[2023-02-27 11:24:39,467][00394] Avg episode rewards: #0: 4.992, true rewards: #0: 4.192 +[2023-02-27 11:24:39,469][00394] Avg episode reward: 4.992, avg true_objective: 4.192 +[2023-02-27 11:25:02,582][00394] Replay video saved to /content/train_dir/default_experiment/replay.mp4! +[2023-02-27 11:26:49,971][00394] Environment doom_basic already registered, overwriting... +[2023-02-27 11:26:49,974][00394] Environment doom_two_colors_easy already registered, overwriting... +[2023-02-27 11:26:49,976][00394] Environment doom_two_colors_hard already registered, overwriting... +[2023-02-27 11:26:49,979][00394] Environment doom_dm already registered, overwriting... +[2023-02-27 11:26:49,982][00394] Environment doom_dwango5 already registered, overwriting... +[2023-02-27 11:26:49,983][00394] Environment doom_my_way_home_flat_actions already registered, overwriting... +[2023-02-27 11:26:49,985][00394] Environment doom_defend_the_center_flat_actions already registered, overwriting... +[2023-02-27 11:26:49,988][00394] Environment doom_my_way_home already registered, overwriting... +[2023-02-27 11:26:49,990][00394] Environment doom_deadly_corridor already registered, overwriting... +[2023-02-27 11:26:49,992][00394] Environment doom_defend_the_center already registered, overwriting... +[2023-02-27 11:26:49,994][00394] Environment doom_defend_the_line already registered, overwriting... +[2023-02-27 11:26:49,996][00394] Environment doom_health_gathering already registered, overwriting... +[2023-02-27 11:26:50,000][00394] Environment doom_health_gathering_supreme already registered, overwriting... +[2023-02-27 11:26:50,002][00394] Environment doom_battle already registered, overwriting... +[2023-02-27 11:26:50,003][00394] Environment doom_battle2 already registered, overwriting... +[2023-02-27 11:26:50,004][00394] Environment doom_duel_bots already registered, overwriting... +[2023-02-27 11:26:50,006][00394] Environment doom_deathmatch_bots already registered, overwriting... +[2023-02-27 11:26:50,009][00394] Environment doom_duel already registered, overwriting... +[2023-02-27 11:26:50,011][00394] Environment doom_deathmatch_full already registered, overwriting... +[2023-02-27 11:26:50,013][00394] Environment doom_benchmark already registered, overwriting... +[2023-02-27 11:26:50,016][00394] register_encoder_factory: +[2023-02-27 11:26:50,057][00394] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json +[2023-02-27 11:26:50,058][00394] Overriding arg 'train_for_env_steps' with value 8000000 passed from command line +[2023-02-27 11:26:50,065][00394] Experiment dir /content/train_dir/default_experiment already exists! +[2023-02-27 11:26:50,070][00394] Resuming existing experiment from /content/train_dir/default_experiment... +[2023-02-27 11:26:50,072][00394] Weights and Biases integration disabled +[2023-02-27 11:26:50,076][00394] Environment var CUDA_VISIBLE_DEVICES is 0 + +[2023-02-27 11:26:53,716][00394] Starting experiment with the following configuration: +help=False +algo=APPO +env=doom_health_gathering_supreme +experiment=default_experiment +train_dir=/content/train_dir +restart_behavior=resume +device=gpu +seed=None +num_policies=1 +async_rl=True +serial_mode=False +batched_sampling=False +num_batches_to_accumulate=2 +worker_num_splits=2 +policy_workers_per_policy=1 +max_policy_lag=1000 +num_workers=8 +num_envs_per_worker=4 +batch_size=1024 +num_batches_per_epoch=1 +num_epochs=1 +rollout=32 +recurrence=32 +shuffle_minibatches=False +gamma=0.99 +reward_scale=1.0 +reward_clip=1000.0 +value_bootstrap=False +normalize_returns=True +exploration_loss_coeff=0.001 +value_loss_coeff=0.5 +kl_loss_coeff=0.0 +exploration_loss=symmetric_kl +gae_lambda=0.95 +ppo_clip_ratio=0.1 +ppo_clip_value=0.2 +with_vtrace=False +vtrace_rho=1.0 +vtrace_c=1.0 +optimizer=adam +adam_eps=1e-06 +adam_beta1=0.9 +adam_beta2=0.999 +max_grad_norm=4.0 +learning_rate=0.0001 +lr_schedule=constant +lr_schedule_kl_threshold=0.008 +lr_adaptive_min=1e-06 +lr_adaptive_max=0.01 +obs_subtract_mean=0.0 +obs_scale=255.0 +normalize_input=True +normalize_input_keys=None +decorrelate_experience_max_seconds=0 +decorrelate_envs_on_one_worker=True +actor_worker_gpus=[] +set_workers_cpu_affinity=True +force_envs_single_thread=False +default_niceness=0 +log_to_file=True +experiment_summaries_interval=10 +flush_summaries_interval=30 +stats_avg=100 +summaries_use_frameskip=True +heartbeat_interval=20 +heartbeat_reporting_interval=600 +train_for_env_steps=8000000 +train_for_seconds=10000000000 +save_every_sec=120 +keep_checkpoints=2 +load_checkpoint_kind=latest +save_milestones_sec=-1 +save_best_every_sec=5 +save_best_metric=reward +save_best_after=100000 +benchmark=False +encoder_mlp_layers=[512, 512] +encoder_conv_architecture=convnet_simple +encoder_conv_mlp_layers=[512] +use_rnn=True +rnn_size=512 +rnn_type=gru +rnn_num_layers=1 +decoder_mlp_layers=[] +nonlinearity=elu +policy_initialization=orthogonal +policy_init_gain=1.0 +actor_critic_share_weights=True +adaptive_stddev=True +continuous_tanh_scale=0.0 +initial_stddev=1.0 +use_env_info_cache=False +env_gpu_actions=False +env_gpu_observations=True +env_frameskip=4 +env_framestack=1 +pixel_format=CHW +use_record_episode_statistics=False +with_wandb=False +wandb_user=None +wandb_project=sample_factory +wandb_group=None +wandb_job_type=SF +wandb_tags=[] +with_pbt=False +pbt_mix_policies_in_one_env=True +pbt_period_env_steps=5000000 +pbt_start_mutation=20000000 +pbt_replace_fraction=0.3 +pbt_mutation_rate=0.15 +pbt_replace_reward_gap=0.1 +pbt_replace_reward_gap_absolute=1e-06 +pbt_optimize_gamma=False +pbt_target_objective=true_objective +pbt_perturb_min=1.1 +pbt_perturb_max=1.5 +num_agents=-1 +num_humans=0 +num_bots=-1 +start_bot_difficulty=None +timelimit=None +res_w=128 +res_h=72 +wide_aspect_ratio=False +eval_env_frameskip=1 +fps=35 +command_line=--env=doom_health_gathering_supreme --num_workers=8 --num_envs_per_worker=4 --train_for_env_steps=4000000 +cli_args={'env': 'doom_health_gathering_supreme', 'num_workers': 8, 'num_envs_per_worker': 4, 'train_for_env_steps': 4000000} +git_hash=unknown +git_repo_name=not a git repository +[2023-02-27 11:26:53,720][00394] Saving configuration to /content/train_dir/default_experiment/config.json... +[2023-02-27 11:26:53,724][00394] Rollout worker 0 uses device cpu +[2023-02-27 11:26:53,727][00394] Rollout worker 1 uses device cpu +[2023-02-27 11:26:53,728][00394] Rollout worker 2 uses device cpu +[2023-02-27 11:26:53,729][00394] Rollout worker 3 uses device cpu +[2023-02-27 11:26:53,731][00394] Rollout worker 4 uses device cpu +[2023-02-27 11:26:53,734][00394] Rollout worker 5 uses device cpu +[2023-02-27 11:26:53,735][00394] Rollout worker 6 uses device cpu +[2023-02-27 11:26:53,737][00394] Rollout worker 7 uses device cpu +[2023-02-27 11:26:53,862][00394] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2023-02-27 11:26:53,864][00394] InferenceWorker_p0-w0: min num requests: 2 +[2023-02-27 11:26:53,905][00394] Starting all processes... +[2023-02-27 11:26:53,907][00394] Starting process learner_proc0 +[2023-02-27 11:26:54,060][00394] Starting all processes... +[2023-02-27 11:26:54,072][00394] Starting process inference_proc0-0 +[2023-02-27 11:26:54,078][00394] Starting process rollout_proc0 +[2023-02-27 11:26:54,078][00394] Starting process rollout_proc1 +[2023-02-27 11:26:54,078][00394] Starting process rollout_proc2 +[2023-02-27 11:26:54,079][00394] Starting process rollout_proc3 +[2023-02-27 11:26:54,079][00394] Starting process rollout_proc4 +[2023-02-27 11:26:54,079][00394] Starting process rollout_proc5 +[2023-02-27 11:26:54,079][00394] Starting process rollout_proc6 +[2023-02-27 11:26:54,079][00394] Starting process rollout_proc7 +[2023-02-27 11:27:01,608][28704] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2023-02-27 11:27:01,612][28704] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 +[2023-02-27 11:27:01,687][28704] Num visible devices: 1 +[2023-02-27 11:27:01,722][28704] Starting seed is not provided +[2023-02-27 11:27:01,723][28704] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2023-02-27 11:27:01,724][28704] Initializing actor-critic model on device cuda:0 +[2023-02-27 11:27:01,725][28704] RunningMeanStd input shape: (3, 72, 128) +[2023-02-27 11:27:01,729][28704] RunningMeanStd input shape: (1,) +[2023-02-27 11:27:01,909][28704] ConvEncoder: input_channels=3 +[2023-02-27 11:27:03,458][28704] Conv encoder output size: 512 +[2023-02-27 11:27:03,464][28704] Policy head output size: 512 +[2023-02-27 11:27:03,656][28704] Created Actor Critic model with architecture: +[2023-02-27 11:27:03,667][28704] ActorCriticSharedWeights( + (obs_normalizer): ObservationNormalizer( + (running_mean_std): RunningMeanStdDictInPlace( + (running_mean_std): ModuleDict( + (obs): RunningMeanStdInPlace() + ) + ) + ) + (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) + (encoder): VizdoomEncoder( + (basic_encoder): ConvEncoder( + (enc): RecursiveScriptModule( + original_name=ConvEncoderImpl + (conv_head): RecursiveScriptModule( + original_name=Sequential + (0): RecursiveScriptModule(original_name=Conv2d) + (1): RecursiveScriptModule(original_name=ELU) + (2): RecursiveScriptModule(original_name=Conv2d) + (3): RecursiveScriptModule(original_name=ELU) + (4): RecursiveScriptModule(original_name=Conv2d) + (5): RecursiveScriptModule(original_name=ELU) + ) + (mlp_layers): RecursiveScriptModule( + original_name=Sequential + (0): RecursiveScriptModule(original_name=Linear) + (1): RecursiveScriptModule(original_name=ELU) + ) + ) + ) + ) + (core): ModelCoreRNN( + (core): GRU(512, 512) + ) + (decoder): MlpDecoder( + (mlp): Identity() + ) + (critic_linear): Linear(in_features=512, out_features=1, bias=True) + (action_parameterization): ActionParameterizationDefault( + (distribution_linear): Linear(in_features=512, out_features=5, bias=True) + ) +) +[2023-02-27 11:27:04,348][28718] Worker 0 uses CPU cores [0] +[2023-02-27 11:27:05,196][28719] Worker 1 uses CPU cores [1] +[2023-02-27 11:27:05,782][28720] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2023-02-27 11:27:05,782][28720] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 +[2023-02-27 11:27:05,886][28720] Num visible devices: 1 +[2023-02-27 11:27:06,144][28728] Worker 2 uses CPU cores [0] +[2023-02-27 11:27:06,284][28723] Worker 3 uses CPU cores [1] +[2023-02-27 11:27:06,567][28731] Worker 5 uses CPU cores [1] +[2023-02-27 11:27:06,881][28735] Worker 7 uses CPU cores [1] +[2023-02-27 11:27:06,981][28741] Worker 6 uses CPU cores [0] +[2023-02-27 11:27:07,024][28733] Worker 4 uses CPU cores [0] +[2023-02-27 11:27:13,278][28704] Using optimizer +[2023-02-27 11:27:13,280][28704] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000980_4014080.pth... +[2023-02-27 11:27:13,318][28704] Loading model from checkpoint +[2023-02-27 11:27:13,323][28704] Loaded experiment state at self.train_step=980, self.env_steps=4014080 +[2023-02-27 11:27:13,324][28704] Initialized policy 0 weights for model version 980 +[2023-02-27 11:27:13,328][28704] LearnerWorker_p0 finished initialization! +[2023-02-27 11:27:13,331][28704] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2023-02-27 11:27:13,534][28720] RunningMeanStd input shape: (3, 72, 128) +[2023-02-27 11:27:13,535][28720] RunningMeanStd input shape: (1,) +[2023-02-27 11:27:13,547][28720] ConvEncoder: input_channels=3 +[2023-02-27 11:27:13,657][28720] Conv encoder output size: 512 +[2023-02-27 11:27:13,657][28720] Policy head output size: 512 +[2023-02-27 11:27:13,855][00394] Heartbeat connected on Batcher_0 +[2023-02-27 11:27:13,862][00394] Heartbeat connected on LearnerWorker_p0 +[2023-02-27 11:27:13,877][00394] Heartbeat connected on RolloutWorker_w0 +[2023-02-27 11:27:13,879][00394] Heartbeat connected on RolloutWorker_w1 +[2023-02-27 11:27:13,885][00394] Heartbeat connected on RolloutWorker_w2 +[2023-02-27 11:27:13,889][00394] Heartbeat connected on RolloutWorker_w3 +[2023-02-27 11:27:13,893][00394] Heartbeat connected on RolloutWorker_w4 +[2023-02-27 11:27:13,895][00394] Heartbeat connected on RolloutWorker_w5 +[2023-02-27 11:27:13,899][00394] Heartbeat connected on RolloutWorker_w6 +[2023-02-27 11:27:13,911][00394] Heartbeat connected on RolloutWorker_w7 +[2023-02-27 11:27:15,076][00394] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 4014080. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2023-02-27 11:27:16,066][00394] Inference worker 0-0 is ready! +[2023-02-27 11:27:16,068][00394] All inference workers are ready! Signal rollout workers to start! +[2023-02-27 11:27:16,070][00394] Heartbeat connected on InferenceWorker_p0-w0 +[2023-02-27 11:27:16,204][28719] Doom resolution: 160x120, resize resolution: (128, 72) +[2023-02-27 11:27:16,207][28735] Doom resolution: 160x120, resize resolution: (128, 72) +[2023-02-27 11:27:16,213][28723] Doom resolution: 160x120, resize resolution: (128, 72) +[2023-02-27 11:27:16,209][28731] Doom resolution: 160x120, resize resolution: (128, 72) +[2023-02-27 11:27:16,237][28733] Doom resolution: 160x120, resize resolution: (128, 72) +[2023-02-27 11:27:16,243][28741] Doom resolution: 160x120, resize resolution: (128, 72) +[2023-02-27 11:27:16,295][28728] Doom resolution: 160x120, resize resolution: (128, 72) +[2023-02-27 11:27:16,297][28718] Doom resolution: 160x120, resize resolution: (128, 72) +[2023-02-27 11:27:17,599][28735] Decorrelating experience for 0 frames... +[2023-02-27 11:27:17,601][28723] Decorrelating experience for 0 frames... +[2023-02-27 11:27:17,790][28741] Decorrelating experience for 0 frames... +[2023-02-27 11:27:17,796][28718] Decorrelating experience for 0 frames... +[2023-02-27 11:27:17,794][28733] Decorrelating experience for 0 frames... +[2023-02-27 11:27:18,259][28719] Decorrelating experience for 0 frames... +[2023-02-27 11:27:18,924][28719] Decorrelating experience for 32 frames... +[2023-02-27 11:27:19,435][28733] Decorrelating experience for 32 frames... +[2023-02-27 11:27:19,458][28718] Decorrelating experience for 32 frames... +[2023-02-27 11:27:19,479][28728] Decorrelating experience for 0 frames... +[2023-02-27 11:27:19,910][28741] Decorrelating experience for 32 frames... +[2023-02-27 11:27:20,076][00394] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 4014080. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2023-02-27 11:27:20,534][28719] Decorrelating experience for 64 frames... +[2023-02-27 11:27:20,898][28723] Decorrelating experience for 32 frames... +[2023-02-27 11:27:21,236][28728] Decorrelating experience for 32 frames... +[2023-02-27 11:27:21,497][28733] Decorrelating experience for 64 frames... +[2023-02-27 11:27:22,039][28719] Decorrelating experience for 96 frames... +[2023-02-27 11:27:22,052][28741] Decorrelating experience for 64 frames... +[2023-02-27 11:27:22,447][28731] Decorrelating experience for 0 frames... +[2023-02-27 11:27:22,566][28723] Decorrelating experience for 64 frames... +[2023-02-27 11:27:23,219][28731] Decorrelating experience for 32 frames... +[2023-02-27 11:27:23,415][28723] Decorrelating experience for 96 frames... +[2023-02-27 11:27:23,634][28718] Decorrelating experience for 64 frames... +[2023-02-27 11:27:23,963][28731] Decorrelating experience for 64 frames... +[2023-02-27 11:27:24,524][28741] Decorrelating experience for 96 frames... +[2023-02-27 11:27:24,736][28733] Decorrelating experience for 96 frames... +[2023-02-27 11:27:25,031][28728] Decorrelating experience for 64 frames... +[2023-02-27 11:27:25,077][00394] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 4014080. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2023-02-27 11:27:25,515][28735] Decorrelating experience for 32 frames... +[2023-02-27 11:27:25,797][28731] Decorrelating experience for 96 frames... +[2023-02-27 11:27:26,284][28735] Decorrelating experience for 64 frames... +[2023-02-27 11:27:26,719][28735] Decorrelating experience for 96 frames... +[2023-02-27 11:27:26,826][28728] Decorrelating experience for 96 frames... +[2023-02-27 11:27:27,020][28718] Decorrelating experience for 96 frames... +[2023-02-27 11:27:30,078][00394] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 4014080. Throughput: 0: 19.5. Samples: 292. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2023-02-27 11:27:30,085][00394] Avg episode reward: [(0, '1.541')] +[2023-02-27 11:27:30,543][28704] Signal inference workers to stop experience collection... +[2023-02-27 11:27:30,552][28720] InferenceWorker_p0-w0: stopping experience collection +[2023-02-27 11:27:34,354][28704] Signal inference workers to resume experience collection... +[2023-02-27 11:27:34,378][28720] InferenceWorker_p0-w0: resuming experience collection +[2023-02-27 11:27:35,089][00394] Fps is (10 sec: 409.6, 60 sec: 204.8, 300 sec: 204.8). Total num frames: 4018176. Throughput: 0: 115.1. Samples: 2302. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) +[2023-02-27 11:27:35,159][00394] Avg episode reward: [(0, '2.274')] +[2023-02-27 11:27:40,077][00394] Fps is (10 sec: 1638.4, 60 sec: 655.4, 300 sec: 655.4). Total num frames: 4030464. Throughput: 0: 135.7. Samples: 3392. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) +[2023-02-27 11:27:40,106][00394] Avg episode reward: [(0, '3.792')] +[2023-02-27 11:27:45,077][00394] Fps is (10 sec: 2457.4, 60 sec: 955.7, 300 sec: 955.7). Total num frames: 4042752. Throughput: 0: 233.6. Samples: 7008. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-27 11:27:45,080][00394] Avg episode reward: [(0, '4.209')] +[2023-02-27 11:27:48,189][28720] Updated weights for policy 0, policy_version 990 (0.0041) +[2023-02-27 11:27:50,076][00394] Fps is (10 sec: 2867.2, 60 sec: 1287.3, 300 sec: 1287.3). Total num frames: 4059136. Throughput: 0: 323.3. Samples: 11316. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2023-02-27 11:27:50,084][00394] Avg episode reward: [(0, '4.616')] +[2023-02-27 11:27:55,076][00394] Fps is (10 sec: 3686.7, 60 sec: 1638.4, 300 sec: 1638.4). Total num frames: 4079616. Throughput: 0: 441.3. Samples: 17652. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-27 11:27:55,084][00394] Avg episode reward: [(0, '4.680')] +[2023-02-27 11:27:58,256][28720] Updated weights for policy 0, policy_version 1000 (0.0024) +[2023-02-27 11:28:00,077][00394] Fps is (10 sec: 4096.0, 60 sec: 1911.5, 300 sec: 1911.5). Total num frames: 4100096. Throughput: 0: 463.9. Samples: 20876. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-27 11:28:00,082][00394] Avg episode reward: [(0, '4.666')] +[2023-02-27 11:28:05,076][00394] Fps is (10 sec: 3276.8, 60 sec: 1966.1, 300 sec: 1966.1). Total num frames: 4112384. Throughput: 0: 553.2. Samples: 24896. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2023-02-27 11:28:05,081][00394] Avg episode reward: [(0, '4.756')] +[2023-02-27 11:28:10,076][00394] Fps is (10 sec: 2457.6, 60 sec: 2010.8, 300 sec: 2010.8). Total num frames: 4124672. Throughput: 0: 598.0. Samples: 26908. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2023-02-27 11:28:10,085][00394] Avg episode reward: [(0, '4.509')] +[2023-02-27 11:28:12,058][28720] Updated weights for policy 0, policy_version 1010 (0.0015) +[2023-02-27 11:28:15,076][00394] Fps is (10 sec: 3686.4, 60 sec: 2252.8, 300 sec: 2252.8). Total num frames: 4149248. Throughput: 0: 710.5. Samples: 32266. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-02-27 11:28:15,082][00394] Avg episode reward: [(0, '4.530')] +[2023-02-27 11:28:20,077][00394] Fps is (10 sec: 4096.0, 60 sec: 2525.9, 300 sec: 2331.6). Total num frames: 4165632. Throughput: 0: 806.1. Samples: 38578. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-27 11:28:20,079][00394] Avg episode reward: [(0, '4.674')] +[2023-02-27 11:28:22,958][28720] Updated weights for policy 0, policy_version 1020 (0.0017) +[2023-02-27 11:28:25,079][00394] Fps is (10 sec: 3275.9, 60 sec: 2798.8, 300 sec: 2399.0). Total num frames: 4182016. Throughput: 0: 831.4. Samples: 40806. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-27 11:28:25,084][00394] Avg episode reward: [(0, '4.655')] +[2023-02-27 11:28:30,077][00394] Fps is (10 sec: 2867.2, 60 sec: 3003.7, 300 sec: 2403.0). Total num frames: 4194304. Throughput: 0: 836.5. Samples: 44652. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-27 11:28:30,080][00394] Avg episode reward: [(0, '4.734')] +[2023-02-27 11:28:35,076][00394] Fps is (10 sec: 3277.7, 60 sec: 3276.8, 300 sec: 2508.8). Total num frames: 4214784. Throughput: 0: 856.4. Samples: 49852. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2023-02-27 11:28:35,085][00394] Avg episode reward: [(0, '4.729')] +[2023-02-27 11:28:35,862][28720] Updated weights for policy 0, policy_version 1030 (0.0042) +[2023-02-27 11:28:40,076][00394] Fps is (10 sec: 4096.0, 60 sec: 3413.3, 300 sec: 2602.2). Total num frames: 4235264. Throughput: 0: 787.0. Samples: 53068. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-02-27 11:28:40,086][00394] Avg episode reward: [(0, '4.711')] +[2023-02-27 11:28:45,076][00394] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 2639.6). Total num frames: 4251648. Throughput: 0: 840.8. Samples: 58712. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-27 11:28:45,080][00394] Avg episode reward: [(0, '4.692')] +[2023-02-27 11:28:47,994][28720] Updated weights for policy 0, policy_version 1040 (0.0017) +[2023-02-27 11:28:50,077][00394] Fps is (10 sec: 2867.1, 60 sec: 3413.3, 300 sec: 2630.1). Total num frames: 4263936. Throughput: 0: 842.2. Samples: 62796. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2023-02-27 11:28:50,084][00394] Avg episode reward: [(0, '4.736')] +[2023-02-27 11:28:50,096][28704] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001041_4263936.pth... +[2023-02-27 11:28:50,497][28704] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth +[2023-02-27 11:28:55,077][00394] Fps is (10 sec: 2867.2, 60 sec: 3345.1, 300 sec: 2662.4). Total num frames: 4280320. Throughput: 0: 840.6. Samples: 64736. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-02-27 11:28:55,087][00394] Avg episode reward: [(0, '4.644')] +[2023-02-27 11:28:59,598][28720] Updated weights for policy 0, policy_version 1050 (0.0035) +[2023-02-27 11:29:00,076][00394] Fps is (10 sec: 3686.5, 60 sec: 3345.1, 300 sec: 2730.7). Total num frames: 4300800. Throughput: 0: 851.4. Samples: 70580. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-27 11:29:00,079][00394] Avg episode reward: [(0, '4.745')] +[2023-02-27 11:29:05,076][00394] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 2755.5). Total num frames: 4317184. Throughput: 0: 842.9. Samples: 76510. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-27 11:29:05,080][00394] Avg episode reward: [(0, '4.610')] +[2023-02-27 11:29:10,077][00394] Fps is (10 sec: 3276.5, 60 sec: 3481.6, 300 sec: 2778.1). Total num frames: 4333568. Throughput: 0: 838.6. Samples: 78542. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-02-27 11:29:10,082][00394] Avg episode reward: [(0, '4.632')] +[2023-02-27 11:29:12,996][28720] Updated weights for policy 0, policy_version 1060 (0.0017) +[2023-02-27 11:29:15,076][00394] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 2764.8). Total num frames: 4345856. Throughput: 0: 841.2. Samples: 82504. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-02-27 11:29:15,080][00394] Avg episode reward: [(0, '4.668')] +[2023-02-27 11:29:20,077][00394] Fps is (10 sec: 3276.9, 60 sec: 3345.1, 300 sec: 2818.0). Total num frames: 4366336. Throughput: 0: 856.7. Samples: 88402. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-27 11:29:20,087][00394] Avg episode reward: [(0, '4.717')] +[2023-02-27 11:29:23,231][28720] Updated weights for policy 0, policy_version 1070 (0.0013) +[2023-02-27 11:29:25,076][00394] Fps is (10 sec: 4096.0, 60 sec: 3413.5, 300 sec: 2867.2). Total num frames: 4386816. Throughput: 0: 851.3. Samples: 91376. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2023-02-27 11:29:25,080][00394] Avg episode reward: [(0, '4.644')] +[2023-02-27 11:29:30,076][00394] Fps is (10 sec: 3276.9, 60 sec: 3413.3, 300 sec: 2852.0). Total num frames: 4399104. Throughput: 0: 833.1. Samples: 96202. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-02-27 11:29:30,081][00394] Avg episode reward: [(0, '4.565')] +[2023-02-27 11:29:35,077][00394] Fps is (10 sec: 2457.4, 60 sec: 3276.8, 300 sec: 2837.9). Total num frames: 4411392. Throughput: 0: 830.0. Samples: 100146. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-02-27 11:29:35,085][00394] Avg episode reward: [(0, '4.619')] +[2023-02-27 11:29:37,494][28720] Updated weights for policy 0, policy_version 1080 (0.0017) +[2023-02-27 11:29:40,080][00394] Fps is (10 sec: 3275.6, 60 sec: 3276.6, 300 sec: 2881.3). Total num frames: 4431872. Throughput: 0: 910.9. Samples: 105730. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2023-02-27 11:29:40,087][00394] Avg episode reward: [(0, '4.631')] +[2023-02-27 11:29:45,076][00394] Fps is (10 sec: 4096.3, 60 sec: 3345.1, 300 sec: 2921.8). Total num frames: 4452352. Throughput: 0: 852.9. Samples: 108962. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-02-27 11:29:45,081][00394] Avg episode reward: [(0, '4.656')] +[2023-02-27 11:29:47,803][28720] Updated weights for policy 0, policy_version 1090 (0.0012) +[2023-02-27 11:29:50,076][00394] Fps is (10 sec: 3687.7, 60 sec: 3413.4, 300 sec: 2933.3). Total num frames: 4468736. Throughput: 0: 834.9. Samples: 114082. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-27 11:29:50,083][00394] Avg episode reward: [(0, '4.699')] +[2023-02-27 11:29:55,076][00394] Fps is (10 sec: 2867.2, 60 sec: 3345.1, 300 sec: 2918.4). Total num frames: 4481024. Throughput: 0: 835.8. Samples: 116152. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2023-02-27 11:29:55,087][00394] Avg episode reward: [(0, '4.580')] +[2023-02-27 11:30:00,077][00394] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 2929.3). Total num frames: 4497408. Throughput: 0: 838.5. Samples: 120238. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-02-27 11:30:00,087][00394] Avg episode reward: [(0, '4.658')] +[2023-02-27 11:30:01,370][28720] Updated weights for policy 0, policy_version 1100 (0.0012) +[2023-02-27 11:30:05,077][00394] Fps is (10 sec: 3686.3, 60 sec: 3345.0, 300 sec: 2963.6). Total num frames: 4517888. Throughput: 0: 846.9. Samples: 126512. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-02-27 11:30:05,082][00394] Avg episode reward: [(0, '4.944')] +[2023-02-27 11:30:10,076][00394] Fps is (10 sec: 4096.0, 60 sec: 3413.4, 300 sec: 2995.9). Total num frames: 4538368. Throughput: 0: 904.4. Samples: 132074. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-27 11:30:10,084][00394] Avg episode reward: [(0, '4.777')] +[2023-02-27 11:30:12,704][28720] Updated weights for policy 0, policy_version 1110 (0.0014) +[2023-02-27 11:30:15,076][00394] Fps is (10 sec: 3276.9, 60 sec: 3413.3, 300 sec: 2981.0). Total num frames: 4550656. Throughput: 0: 839.5. Samples: 133980. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-27 11:30:15,080][00394] Avg episode reward: [(0, '4.716')] +[2023-02-27 11:30:20,077][00394] Fps is (10 sec: 2457.6, 60 sec: 3276.8, 300 sec: 2966.8). Total num frames: 4562944. Throughput: 0: 836.7. Samples: 137798. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-02-27 11:30:20,085][00394] Avg episode reward: [(0, '4.711')] +[2023-02-27 11:30:25,077][00394] Fps is (10 sec: 3276.5, 60 sec: 3276.8, 300 sec: 2996.5). Total num frames: 4583424. Throughput: 0: 846.1. Samples: 143804. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-02-27 11:30:25,079][00394] Avg episode reward: [(0, '4.960')] +[2023-02-27 11:30:25,518][28720] Updated weights for policy 0, policy_version 1120 (0.0019) +[2023-02-27 11:30:30,077][00394] Fps is (10 sec: 4095.7, 60 sec: 3413.3, 300 sec: 3024.7). Total num frames: 4603904. Throughput: 0: 842.4. Samples: 146872. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2023-02-27 11:30:30,081][00394] Avg episode reward: [(0, '4.738')] +[2023-02-27 11:30:35,076][00394] Fps is (10 sec: 3686.7, 60 sec: 3481.7, 300 sec: 3031.0). Total num frames: 4620288. Throughput: 0: 832.3. Samples: 151536. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2023-02-27 11:30:35,079][00394] Avg episode reward: [(0, '4.826')] +[2023-02-27 11:30:37,951][28720] Updated weights for policy 0, policy_version 1130 (0.0016) +[2023-02-27 11:30:40,076][00394] Fps is (10 sec: 2867.4, 60 sec: 3345.3, 300 sec: 3017.1). Total num frames: 4632576. Throughput: 0: 877.5. Samples: 155638. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-02-27 11:30:40,099][00394] Avg episode reward: [(0, '4.553')] +[2023-02-27 11:30:45,077][00394] Fps is (10 sec: 3276.7, 60 sec: 3345.1, 300 sec: 3042.7). Total num frames: 4653056. Throughput: 0: 845.4. Samples: 158280. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2023-02-27 11:30:45,080][00394] Avg episode reward: [(0, '4.617')] +[2023-02-27 11:30:49,067][28720] Updated weights for policy 0, policy_version 1140 (0.0028) +[2023-02-27 11:30:50,076][00394] Fps is (10 sec: 4096.0, 60 sec: 3413.3, 300 sec: 3067.2). Total num frames: 4673536. Throughput: 0: 845.1. Samples: 164540. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-02-27 11:30:50,079][00394] Avg episode reward: [(0, '4.763')] +[2023-02-27 11:30:50,090][28704] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001141_4673536.pth... +[2023-02-27 11:30:50,356][28704] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000980_4014080.pth +[2023-02-27 11:30:55,076][00394] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3053.4). Total num frames: 4685824. Throughput: 0: 781.7. Samples: 167252. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-02-27 11:30:55,088][00394] Avg episode reward: [(0, '4.723')] +[2023-02-27 11:31:00,076][00394] Fps is (10 sec: 2457.6, 60 sec: 3345.1, 300 sec: 3040.1). Total num frames: 4698112. Throughput: 0: 826.7. Samples: 171182. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-27 11:31:00,087][00394] Avg episode reward: [(0, '4.513')] +[2023-02-27 11:31:03,436][28720] Updated weights for policy 0, policy_version 1150 (0.0017) +[2023-02-27 11:31:05,076][00394] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 3045.3). Total num frames: 4714496. Throughput: 0: 836.1. Samples: 175424. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-27 11:31:05,078][00394] Avg episode reward: [(0, '4.453')] +[2023-02-27 11:31:10,077][00394] Fps is (10 sec: 4095.9, 60 sec: 3345.1, 300 sec: 3085.1). Total num frames: 4739072. Throughput: 0: 774.7. Samples: 178664. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2023-02-27 11:31:10,080][00394] Avg episode reward: [(0, '4.617')] +[2023-02-27 11:31:12,965][28720] Updated weights for policy 0, policy_version 1160 (0.0012) +[2023-02-27 11:31:15,081][00394] Fps is (10 sec: 4094.3, 60 sec: 3413.1, 300 sec: 3089.0). Total num frames: 4755456. Throughput: 0: 850.1. Samples: 185128. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-02-27 11:31:15,089][00394] Avg episode reward: [(0, '4.722')] +[2023-02-27 11:31:20,076][00394] Fps is (10 sec: 2867.3, 60 sec: 3413.3, 300 sec: 3076.2). Total num frames: 4767744. Throughput: 0: 834.4. Samples: 189084. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2023-02-27 11:31:20,083][00394] Avg episode reward: [(0, '4.642')] +[2023-02-27 11:31:25,076][00394] Fps is (10 sec: 2868.4, 60 sec: 3345.1, 300 sec: 3080.2). Total num frames: 4784128. Throughput: 0: 832.2. Samples: 193088. Policy #0 lag: (min: 0.0, avg: 0.3, max: 2.0) +[2023-02-27 11:31:25,079][00394] Avg episode reward: [(0, '4.671')] +[2023-02-27 11:31:26,956][28720] Updated weights for policy 0, policy_version 1170 (0.0024) +[2023-02-27 11:31:30,077][00394] Fps is (10 sec: 3686.3, 60 sec: 3345.1, 300 sec: 3100.1). Total num frames: 4804608. Throughput: 0: 844.0. Samples: 196258. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-27 11:31:30,082][00394] Avg episode reward: [(0, '4.659')] +[2023-02-27 11:31:35,077][00394] Fps is (10 sec: 4095.5, 60 sec: 3413.3, 300 sec: 3119.2). Total num frames: 4825088. Throughput: 0: 850.8. Samples: 202826. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-27 11:31:35,085][00394] Avg episode reward: [(0, '4.703')] +[2023-02-27 11:31:37,747][28720] Updated weights for policy 0, policy_version 1180 (0.0020) +[2023-02-27 11:31:40,076][00394] Fps is (10 sec: 3276.9, 60 sec: 3413.3, 300 sec: 3106.8). Total num frames: 4837376. Throughput: 0: 840.4. Samples: 205068. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-02-27 11:31:40,083][00394] Avg episode reward: [(0, '4.487')] +[2023-02-27 11:31:45,078][00394] Fps is (10 sec: 2867.1, 60 sec: 3345.0, 300 sec: 3109.9). Total num frames: 4853760. Throughput: 0: 844.1. Samples: 209168. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-02-27 11:31:45,085][00394] Avg episode reward: [(0, '4.583')] +[2023-02-27 11:31:50,076][00394] Fps is (10 sec: 3276.8, 60 sec: 3276.8, 300 sec: 3113.0). Total num frames: 4870144. Throughput: 0: 867.4. Samples: 214456. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-27 11:31:50,085][00394] Avg episode reward: [(0, '4.684')] +[2023-02-27 11:31:50,154][28720] Updated weights for policy 0, policy_version 1190 (0.0033) +[2023-02-27 11:31:55,081][00394] Fps is (10 sec: 4094.5, 60 sec: 3481.3, 300 sec: 3145.1). Total num frames: 4894720. Throughput: 0: 936.7. Samples: 220822. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-27 11:31:55,089][00394] Avg episode reward: [(0, '4.705')] +[2023-02-27 11:32:00,076][00394] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3133.1). Total num frames: 4907008. Throughput: 0: 845.7. Samples: 223182. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-27 11:32:00,082][00394] Avg episode reward: [(0, '4.726')] +[2023-02-27 11:32:02,021][28720] Updated weights for policy 0, policy_version 1200 (0.0032) +[2023-02-27 11:32:05,077][00394] Fps is (10 sec: 2868.5, 60 sec: 3481.6, 300 sec: 3135.6). Total num frames: 4923392. Throughput: 0: 848.7. Samples: 227274. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-02-27 11:32:05,080][00394] Avg episode reward: [(0, '4.733')] +[2023-02-27 11:32:10,077][00394] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3138.0). Total num frames: 4939776. Throughput: 0: 803.6. Samples: 229248. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-27 11:32:10,083][00394] Avg episode reward: [(0, '4.694')] +[2023-02-27 11:32:13,417][28720] Updated weights for policy 0, policy_version 1210 (0.0022) +[2023-02-27 11:32:15,076][00394] Fps is (10 sec: 3686.5, 60 sec: 3413.6, 300 sec: 3207.4). Total num frames: 4960256. Throughput: 0: 879.2. Samples: 235822. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2023-02-27 11:32:15,078][00394] Avg episode reward: [(0, '4.683')] +[2023-02-27 11:32:20,076][00394] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3262.9). Total num frames: 4976640. Throughput: 0: 859.5. Samples: 241502. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-27 11:32:20,081][00394] Avg episode reward: [(0, '4.701')] +[2023-02-27 11:32:25,095][00394] Fps is (10 sec: 3270.6, 60 sec: 3480.5, 300 sec: 3318.2). Total num frames: 4993024. Throughput: 0: 895.9. Samples: 245400. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2023-02-27 11:32:25,098][00394] Avg episode reward: [(0, '4.740')] +[2023-02-27 11:32:26,727][28720] Updated weights for policy 0, policy_version 1220 (0.0012) +[2023-02-27 11:32:30,076][00394] Fps is (10 sec: 2867.2, 60 sec: 3345.1, 300 sec: 3346.2). Total num frames: 5005312. Throughput: 0: 850.9. Samples: 247458. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-02-27 11:32:30,079][00394] Avg episode reward: [(0, '4.675')] +[2023-02-27 11:32:35,077][00394] Fps is (10 sec: 3693.4, 60 sec: 3413.4, 300 sec: 3387.9). Total num frames: 5029888. Throughput: 0: 870.1. Samples: 253610. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2023-02-27 11:32:35,080][00394] Avg episode reward: [(0, '4.383')] +[2023-02-27 11:32:36,903][28720] Updated weights for policy 0, policy_version 1230 (0.0021) +[2023-02-27 11:32:40,079][00394] Fps is (10 sec: 4094.9, 60 sec: 3481.5, 300 sec: 3401.7). Total num frames: 5046272. Throughput: 0: 798.8. Samples: 256764. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-27 11:32:40,081][00394] Avg episode reward: [(0, '4.505')] +[2023-02-27 11:32:45,077][00394] Fps is (10 sec: 3276.5, 60 sec: 3481.6, 300 sec: 3401.8). Total num frames: 5062656. Throughput: 0: 852.2. Samples: 261530. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-27 11:32:45,080][00394] Avg episode reward: [(0, '4.537')] +[2023-02-27 11:32:50,078][00394] Fps is (10 sec: 2867.4, 60 sec: 3413.2, 300 sec: 3374.0). Total num frames: 5074944. Throughput: 0: 849.7. Samples: 265514. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-27 11:32:50,090][00394] Avg episode reward: [(0, '4.521')] +[2023-02-27 11:32:50,104][28704] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001239_5074944.pth... +[2023-02-27 11:32:50,409][28704] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001041_4263936.pth +[2023-02-27 11:32:50,835][28720] Updated weights for policy 0, policy_version 1240 (0.0022) +[2023-02-27 11:32:55,077][00394] Fps is (10 sec: 3277.1, 60 sec: 3345.3, 300 sec: 3374.0). Total num frames: 5095424. Throughput: 0: 865.6. Samples: 268202. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-27 11:32:55,083][00394] Avg episode reward: [(0, '4.492')] +[2023-02-27 11:33:00,076][00394] Fps is (10 sec: 4096.8, 60 sec: 3481.6, 300 sec: 3401.8). Total num frames: 5115904. Throughput: 0: 855.4. Samples: 274316. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2023-02-27 11:33:00,078][00394] Avg episode reward: [(0, '4.708')] +[2023-02-27 11:33:00,795][28720] Updated weights for policy 0, policy_version 1250 (0.0028) +[2023-02-27 11:33:05,079][00394] Fps is (10 sec: 3275.8, 60 sec: 3413.2, 300 sec: 3401.7). Total num frames: 5128192. Throughput: 0: 840.1. Samples: 279308. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-27 11:33:05,088][00394] Avg episode reward: [(0, '4.976')] +[2023-02-27 11:33:10,077][00394] Fps is (10 sec: 2867.2, 60 sec: 3413.3, 300 sec: 3374.0). Total num frames: 5144576. Throughput: 0: 799.4. Samples: 281360. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-27 11:33:10,084][00394] Avg episode reward: [(0, '4.947')] +[2023-02-27 11:33:14,284][28720] Updated weights for policy 0, policy_version 1260 (0.0019) +[2023-02-27 11:33:15,076][00394] Fps is (10 sec: 3277.8, 60 sec: 3345.1, 300 sec: 3374.0). Total num frames: 5160960. Throughput: 0: 856.1. Samples: 285984. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-27 11:33:15,084][00394] Avg episode reward: [(0, '4.783')] +[2023-02-27 11:33:20,076][00394] Fps is (10 sec: 4096.0, 60 sec: 3481.6, 300 sec: 3401.8). Total num frames: 5185536. Throughput: 0: 863.6. Samples: 292474. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-02-27 11:33:20,085][00394] Avg episode reward: [(0, '4.546')] +[2023-02-27 11:33:25,077][00394] Fps is (10 sec: 3686.0, 60 sec: 3414.4, 300 sec: 3401.8). Total num frames: 5197824. Throughput: 0: 858.3. Samples: 295386. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-27 11:33:25,083][00394] Avg episode reward: [(0, '4.616')] +[2023-02-27 11:33:25,510][28720] Updated weights for policy 0, policy_version 1270 (0.0023) +[2023-02-27 11:33:30,077][00394] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3387.9). Total num frames: 5214208. Throughput: 0: 840.6. Samples: 299358. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-27 11:33:30,084][00394] Avg episode reward: [(0, '4.631')] +[2023-02-27 11:33:35,076][00394] Fps is (10 sec: 3277.1, 60 sec: 3345.1, 300 sec: 3374.0). Total num frames: 5230592. Throughput: 0: 851.0. Samples: 303808. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-27 11:33:35,086][00394] Avg episode reward: [(0, '4.687')] +[2023-02-27 11:33:37,723][28720] Updated weights for policy 0, policy_version 1280 (0.0017) +[2023-02-27 11:33:40,076][00394] Fps is (10 sec: 3686.4, 60 sec: 3413.5, 300 sec: 3387.9). Total num frames: 5251072. Throughput: 0: 937.1. Samples: 310372. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-27 11:33:40,079][00394] Avg episode reward: [(0, '4.622')] +[2023-02-27 11:33:45,076][00394] Fps is (10 sec: 4096.0, 60 sec: 3481.7, 300 sec: 3415.7). Total num frames: 5271552. Throughput: 0: 873.0. Samples: 313600. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-27 11:33:45,084][00394] Avg episode reward: [(0, '4.765')] +[2023-02-27 11:33:49,544][28720] Updated weights for policy 0, policy_version 1290 (0.0018) +[2023-02-27 11:33:50,076][00394] Fps is (10 sec: 3276.8, 60 sec: 3481.7, 300 sec: 3401.8). Total num frames: 5283840. Throughput: 0: 851.2. Samples: 317608. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-02-27 11:33:50,083][00394] Avg episode reward: [(0, '4.712')] +[2023-02-27 11:33:55,077][00394] Fps is (10 sec: 2457.6, 60 sec: 3345.1, 300 sec: 3374.0). Total num frames: 5296128. Throughput: 0: 850.7. Samples: 319640. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2023-02-27 11:33:55,085][00394] Avg episode reward: [(0, '4.666')] +[2023-02-27 11:34:00,077][00394] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3387.9). Total num frames: 5316608. Throughput: 0: 867.6. Samples: 325026. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-02-27 11:34:00,084][00394] Avg episode reward: [(0, '4.550')] +[2023-02-27 11:34:01,218][28720] Updated weights for policy 0, policy_version 1300 (0.0012) +[2023-02-27 11:34:05,076][00394] Fps is (10 sec: 4096.0, 60 sec: 3481.8, 300 sec: 3401.8). Total num frames: 5337088. Throughput: 0: 867.8. Samples: 331524. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-02-27 11:34:05,082][00394] Avg episode reward: [(0, '4.653')] +[2023-02-27 11:34:10,076][00394] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3415.6). Total num frames: 5353472. Throughput: 0: 847.5. Samples: 333522. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2023-02-27 11:34:10,079][00394] Avg episode reward: [(0, '4.766')] +[2023-02-27 11:34:13,902][28720] Updated weights for policy 0, policy_version 1310 (0.0013) +[2023-02-27 11:34:15,077][00394] Fps is (10 sec: 2867.0, 60 sec: 3413.3, 300 sec: 3387.9). Total num frames: 5365760. Throughput: 0: 852.6. Samples: 337724. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-02-27 11:34:15,087][00394] Avg episode reward: [(0, '4.831')] +[2023-02-27 11:34:20,077][00394] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3387.9). Total num frames: 5386240. Throughput: 0: 878.4. Samples: 343334. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-27 11:34:20,087][00394] Avg episode reward: [(0, '4.664')] +[2023-02-27 11:34:24,267][28720] Updated weights for policy 0, policy_version 1320 (0.0024) +[2023-02-27 11:34:25,078][00394] Fps is (10 sec: 4095.7, 60 sec: 3481.6, 300 sec: 3415.6). Total num frames: 5406720. Throughput: 0: 874.5. Samples: 349724. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-27 11:34:25,084][00394] Avg episode reward: [(0, '4.430')] +[2023-02-27 11:34:30,079][00394] Fps is (10 sec: 3685.6, 60 sec: 3481.5, 300 sec: 3429.5). Total num frames: 5423104. Throughput: 0: 848.0. Samples: 351760. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2023-02-27 11:34:30,083][00394] Avg episode reward: [(0, '4.522')] +[2023-02-27 11:34:35,077][00394] Fps is (10 sec: 2867.6, 60 sec: 3413.3, 300 sec: 3401.8). Total num frames: 5435392. Throughput: 0: 849.6. Samples: 355842. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2023-02-27 11:34:35,079][00394] Avg episode reward: [(0, '4.574')] +[2023-02-27 11:34:38,069][28720] Updated weights for policy 0, policy_version 1330 (0.0017) +[2023-02-27 11:34:40,076][00394] Fps is (10 sec: 3277.5, 60 sec: 3413.3, 300 sec: 3401.8). Total num frames: 5455872. Throughput: 0: 925.1. Samples: 361270. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2023-02-27 11:34:40,079][00394] Avg episode reward: [(0, '4.510')] +[2023-02-27 11:34:45,077][00394] Fps is (10 sec: 4096.0, 60 sec: 3413.3, 300 sec: 3415.6). Total num frames: 5476352. Throughput: 0: 876.4. Samples: 364462. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-27 11:34:45,084][00394] Avg episode reward: [(0, '4.484')] +[2023-02-27 11:34:48,202][28720] Updated weights for policy 0, policy_version 1340 (0.0014) +[2023-02-27 11:34:50,077][00394] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3429.5). Total num frames: 5492736. Throughput: 0: 852.7. Samples: 369896. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-02-27 11:34:50,081][00394] Avg episode reward: [(0, '4.566')] +[2023-02-27 11:34:50,097][28704] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001341_5492736.pth... +[2023-02-27 11:34:50,439][28704] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001141_4673536.pth +[2023-02-27 11:34:55,083][00394] Fps is (10 sec: 2865.4, 60 sec: 3481.2, 300 sec: 3415.6). Total num frames: 5505024. Throughput: 0: 852.6. Samples: 371894. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-27 11:34:55,088][00394] Avg episode reward: [(0, '4.740')] +[2023-02-27 11:35:00,076][00394] Fps is (10 sec: 2867.2, 60 sec: 3413.3, 300 sec: 3401.8). Total num frames: 5521408. Throughput: 0: 846.3. Samples: 375806. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-27 11:35:00,079][00394] Avg episode reward: [(0, '4.383')] +[2023-02-27 11:35:01,519][28720] Updated weights for policy 0, policy_version 1350 (0.0019) +[2023-02-27 11:35:05,076][00394] Fps is (10 sec: 3688.8, 60 sec: 3413.3, 300 sec: 3401.8). Total num frames: 5541888. Throughput: 0: 864.9. Samples: 382254. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-27 11:35:05,084][00394] Avg episode reward: [(0, '4.639')] +[2023-02-27 11:35:10,076][00394] Fps is (10 sec: 4096.0, 60 sec: 3481.6, 300 sec: 3429.5). Total num frames: 5562368. Throughput: 0: 795.3. Samples: 385512. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-27 11:35:10,079][00394] Avg episode reward: [(0, '4.791')] +[2023-02-27 11:35:12,630][28720] Updated weights for policy 0, policy_version 1360 (0.0014) +[2023-02-27 11:35:15,076][00394] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3429.5). Total num frames: 5574656. Throughput: 0: 848.6. Samples: 389946. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-02-27 11:35:15,092][00394] Avg episode reward: [(0, '4.585')] +[2023-02-27 11:35:20,076][00394] Fps is (10 sec: 2867.2, 60 sec: 3413.3, 300 sec: 3415.7). Total num frames: 5591040. Throughput: 0: 844.5. Samples: 393844. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-27 11:35:20,087][00394] Avg episode reward: [(0, '4.543')] +[2023-02-27 11:35:24,910][28720] Updated weights for policy 0, policy_version 1370 (0.0029) +[2023-02-27 11:35:25,077][00394] Fps is (10 sec: 3686.2, 60 sec: 3413.4, 300 sec: 3415.7). Total num frames: 5611520. Throughput: 0: 862.8. Samples: 400096. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-27 11:35:25,085][00394] Avg episode reward: [(0, '4.536')] +[2023-02-27 11:35:30,076][00394] Fps is (10 sec: 4096.0, 60 sec: 3481.7, 300 sec: 3429.5). Total num frames: 5632000. Throughput: 0: 862.0. Samples: 403254. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-27 11:35:30,085][00394] Avg episode reward: [(0, '4.557')] +[2023-02-27 11:35:35,076][00394] Fps is (10 sec: 3277.0, 60 sec: 3481.6, 300 sec: 3429.5). Total num frames: 5644288. Throughput: 0: 841.2. Samples: 407748. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-02-27 11:35:35,082][00394] Avg episode reward: [(0, '4.731')] +[2023-02-27 11:35:37,573][28720] Updated weights for policy 0, policy_version 1380 (0.0021) +[2023-02-27 11:35:40,077][00394] Fps is (10 sec: 2457.5, 60 sec: 3345.0, 300 sec: 3401.8). Total num frames: 5656576. Throughput: 0: 843.4. Samples: 409844. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-27 11:35:40,090][00394] Avg episode reward: [(0, '4.603')] +[2023-02-27 11:35:45,076][00394] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3401.8). Total num frames: 5677056. Throughput: 0: 871.5. Samples: 415022. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-27 11:35:45,080][00394] Avg episode reward: [(0, '4.480')] +[2023-02-27 11:35:48,196][28720] Updated weights for policy 0, policy_version 1390 (0.0018) +[2023-02-27 11:35:50,076][00394] Fps is (10 sec: 4505.9, 60 sec: 3481.6, 300 sec: 3443.4). Total num frames: 5701632. Throughput: 0: 870.8. Samples: 421442. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-27 11:35:50,079][00394] Avg episode reward: [(0, '4.531')] +[2023-02-27 11:35:55,076][00394] Fps is (10 sec: 3686.4, 60 sec: 3482.0, 300 sec: 3443.4). Total num frames: 5713920. Throughput: 0: 852.7. Samples: 423884. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-27 11:35:55,086][00394] Avg episode reward: [(0, '4.624')] +[2023-02-27 11:36:00,077][00394] Fps is (10 sec: 2457.6, 60 sec: 3413.3, 300 sec: 3429.5). Total num frames: 5726208. Throughput: 0: 846.2. Samples: 428024. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-02-27 11:36:00,083][00394] Avg episode reward: [(0, '4.870')] +[2023-02-27 11:36:01,747][28720] Updated weights for policy 0, policy_version 1400 (0.0038) +[2023-02-27 11:36:05,076][00394] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3415.7). Total num frames: 5746688. Throughput: 0: 871.1. Samples: 433044. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2023-02-27 11:36:05,082][00394] Avg episode reward: [(0, '5.034')] +[2023-02-27 11:36:10,076][00394] Fps is (10 sec: 4096.0, 60 sec: 3413.3, 300 sec: 3429.6). Total num frames: 5767168. Throughput: 0: 802.9. Samples: 436228. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-27 11:36:10,079][00394] Avg episode reward: [(0, '4.836')] +[2023-02-27 11:36:11,278][28720] Updated weights for policy 0, policy_version 1410 (0.0012) +[2023-02-27 11:36:15,079][00394] Fps is (10 sec: 3685.4, 60 sec: 3481.4, 300 sec: 3443.4). Total num frames: 5783552. Throughput: 0: 865.5. Samples: 442204. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-27 11:36:15,082][00394] Avg episode reward: [(0, '4.974')] +[2023-02-27 11:36:20,076][00394] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3443.4). Total num frames: 5799936. Throughput: 0: 857.8. Samples: 446348. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-27 11:36:20,083][00394] Avg episode reward: [(0, '4.792')] +[2023-02-27 11:36:25,040][28720] Updated weights for policy 0, policy_version 1420 (0.0013) +[2023-02-27 11:36:25,076][00394] Fps is (10 sec: 3277.7, 60 sec: 3413.4, 300 sec: 3429.5). Total num frames: 5816320. Throughput: 0: 918.5. Samples: 451178. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2023-02-27 11:36:25,082][00394] Avg episode reward: [(0, '4.773')] +[2023-02-27 11:36:30,076][00394] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3429.5). Total num frames: 5836800. Throughput: 0: 873.7. Samples: 454340. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-27 11:36:30,079][00394] Avg episode reward: [(0, '4.719')] +[2023-02-27 11:36:35,076][00394] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3443.4). Total num frames: 5853184. Throughput: 0: 867.1. Samples: 460462. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-27 11:36:35,080][00394] Avg episode reward: [(0, '4.703')] +[2023-02-27 11:36:35,416][28720] Updated weights for policy 0, policy_version 1430 (0.0016) +[2023-02-27 11:36:40,095][00394] Fps is (10 sec: 3270.7, 60 sec: 3548.8, 300 sec: 3443.2). Total num frames: 5869568. Throughput: 0: 856.8. Samples: 462456. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-27 11:36:40,098][00394] Avg episode reward: [(0, '4.611')] +[2023-02-27 11:36:45,086][00394] Fps is (10 sec: 2864.6, 60 sec: 3412.8, 300 sec: 3429.4). Total num frames: 5881856. Throughput: 0: 855.2. Samples: 466516. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-27 11:36:45,088][00394] Avg episode reward: [(0, '4.491')] +[2023-02-27 11:36:48,421][28720] Updated weights for policy 0, policy_version 1440 (0.0012) +[2023-02-27 11:36:50,077][00394] Fps is (10 sec: 3282.8, 60 sec: 3345.1, 300 sec: 3415.7). Total num frames: 5902336. Throughput: 0: 872.8. Samples: 472318. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-27 11:36:50,080][00394] Avg episode reward: [(0, '4.489')] +[2023-02-27 11:36:50,097][28704] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001441_5902336.pth... +[2023-02-27 11:36:50,394][28704] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001239_5074944.pth +[2023-02-27 11:36:55,076][00394] Fps is (10 sec: 4099.7, 60 sec: 3481.6, 300 sec: 3443.4). Total num frames: 5922816. Throughput: 0: 933.0. Samples: 478212. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-27 11:36:55,079][00394] Avg episode reward: [(0, '4.634')] +[2023-02-27 11:37:00,079][00394] Fps is (10 sec: 3276.0, 60 sec: 3481.5, 300 sec: 3429.5). Total num frames: 5935104. Throughput: 0: 844.2. Samples: 480192. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-27 11:37:00,082][00394] Avg episode reward: [(0, '4.715')] +[2023-02-27 11:37:00,515][28720] Updated weights for policy 0, policy_version 1450 (0.0042) +[2023-02-27 11:37:05,078][00394] Fps is (10 sec: 2457.2, 60 sec: 3345.0, 300 sec: 3415.6). Total num frames: 5947392. Throughput: 0: 842.7. Samples: 484272. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-27 11:37:05,081][00394] Avg episode reward: [(0, '4.802')] +[2023-02-27 11:37:10,076][00394] Fps is (10 sec: 3687.4, 60 sec: 3413.3, 300 sec: 3429.5). Total num frames: 5971968. Throughput: 0: 791.8. Samples: 486808. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-27 11:37:10,079][00394] Avg episode reward: [(0, '4.684')] +[2023-02-27 11:37:11,963][28720] Updated weights for policy 0, policy_version 1460 (0.0020) +[2023-02-27 11:37:15,076][00394] Fps is (10 sec: 4506.3, 60 sec: 3481.8, 300 sec: 3443.4). Total num frames: 5992448. Throughput: 0: 865.8. Samples: 493302. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-27 11:37:15,084][00394] Avg episode reward: [(0, '4.776')] +[2023-02-27 11:37:20,085][00394] Fps is (10 sec: 3274.0, 60 sec: 3412.8, 300 sec: 3429.7). Total num frames: 6004736. Throughput: 0: 843.6. Samples: 498430. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2023-02-27 11:37:20,090][00394] Avg episode reward: [(0, '4.756')] +[2023-02-27 11:37:24,817][28720] Updated weights for policy 0, policy_version 1470 (0.0019) +[2023-02-27 11:37:25,077][00394] Fps is (10 sec: 2867.0, 60 sec: 3413.3, 300 sec: 3443.4). Total num frames: 6021120. Throughput: 0: 844.1. Samples: 500426. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-27 11:37:25,084][00394] Avg episode reward: [(0, '4.636')] +[2023-02-27 11:37:30,077][00394] Fps is (10 sec: 3279.5, 60 sec: 3345.1, 300 sec: 3415.6). Total num frames: 6037504. Throughput: 0: 852.7. Samples: 504882. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-27 11:37:30,079][00394] Avg episode reward: [(0, '4.713')] +[2023-02-27 11:37:35,076][00394] Fps is (10 sec: 3686.7, 60 sec: 3413.3, 300 sec: 3429.6). Total num frames: 6057984. Throughput: 0: 870.4. Samples: 511488. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-02-27 11:37:35,079][00394] Avg episode reward: [(0, '4.634')] +[2023-02-27 11:37:35,245][28720] Updated weights for policy 0, policy_version 1480 (0.0015) +[2023-02-27 11:37:40,077][00394] Fps is (10 sec: 4096.1, 60 sec: 3482.7, 300 sec: 3443.4). Total num frames: 6078464. Throughput: 0: 811.6. Samples: 514732. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-27 11:37:40,079][00394] Avg episode reward: [(0, '4.674')] +[2023-02-27 11:37:45,080][00394] Fps is (10 sec: 3275.6, 60 sec: 3481.9, 300 sec: 3443.4). Total num frames: 6090752. Throughput: 0: 857.5. Samples: 518782. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2023-02-27 11:37:45,083][00394] Avg episode reward: [(0, '4.676')] +[2023-02-27 11:37:48,818][28720] Updated weights for policy 0, policy_version 1490 (0.0019) +[2023-02-27 11:37:50,076][00394] Fps is (10 sec: 2867.2, 60 sec: 3413.3, 300 sec: 3429.5). Total num frames: 6107136. Throughput: 0: 865.1. Samples: 523200. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-02-27 11:37:50,079][00394] Avg episode reward: [(0, '4.782')] +[2023-02-27 11:37:55,076][00394] Fps is (10 sec: 3687.7, 60 sec: 3413.3, 300 sec: 3429.5). Total num frames: 6127616. Throughput: 0: 881.2. Samples: 526462. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-27 11:37:55,079][00394] Avg episode reward: [(0, '4.692')] +[2023-02-27 11:37:58,232][28720] Updated weights for policy 0, policy_version 1500 (0.0026) +[2023-02-27 11:38:00,081][00394] Fps is (10 sec: 4094.1, 60 sec: 3549.7, 300 sec: 3457.3). Total num frames: 6148096. Throughput: 0: 880.1. Samples: 532910. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-27 11:38:00,083][00394] Avg episode reward: [(0, '4.593')] +[2023-02-27 11:38:05,083][00394] Fps is (10 sec: 3274.6, 60 sec: 3549.6, 300 sec: 3443.3). Total num frames: 6160384. Throughput: 0: 859.8. Samples: 537120. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-27 11:38:05,093][00394] Avg episode reward: [(0, '4.639')] +[2023-02-27 11:38:10,076][00394] Fps is (10 sec: 2868.5, 60 sec: 3413.3, 300 sec: 3443.4). Total num frames: 6176768. Throughput: 0: 861.7. Samples: 539200. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-27 11:38:10,083][00394] Avg episode reward: [(0, '4.565')] +[2023-02-27 11:38:11,852][28720] Updated weights for policy 0, policy_version 1510 (0.0022) +[2023-02-27 11:38:15,076][00394] Fps is (10 sec: 3688.9, 60 sec: 3413.3, 300 sec: 3429.5). Total num frames: 6197248. Throughput: 0: 885.2. Samples: 544716. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-27 11:38:15,079][00394] Avg episode reward: [(0, '4.348')] +[2023-02-27 11:38:20,078][00394] Fps is (10 sec: 4095.9, 60 sec: 3550.4, 300 sec: 3457.3). Total num frames: 6217728. Throughput: 0: 883.1. Samples: 551230. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-02-27 11:38:20,085][00394] Avg episode reward: [(0, '4.572')] +[2023-02-27 11:38:22,040][28720] Updated weights for policy 0, policy_version 1520 (0.0020) +[2023-02-27 11:38:25,076][00394] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3443.4). Total num frames: 6230016. Throughput: 0: 858.8. Samples: 553376. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-02-27 11:38:25,086][00394] Avg episode reward: [(0, '4.750')] +[2023-02-27 11:38:30,077][00394] Fps is (10 sec: 2867.3, 60 sec: 3481.6, 300 sec: 3443.4). Total num frames: 6246400. Throughput: 0: 855.6. Samples: 557280. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-27 11:38:30,083][00394] Avg episode reward: [(0, '4.910')] +[2023-02-27 11:38:35,068][28720] Updated weights for policy 0, policy_version 1530 (0.0042) +[2023-02-27 11:38:35,082][00394] Fps is (10 sec: 3684.3, 60 sec: 3481.3, 300 sec: 3443.4). Total num frames: 6266880. Throughput: 0: 876.9. Samples: 562666. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-27 11:38:35,089][00394] Avg episode reward: [(0, '4.823')] +[2023-02-27 11:38:40,076][00394] Fps is (10 sec: 4096.0, 60 sec: 3481.6, 300 sec: 3443.4). Total num frames: 6287360. Throughput: 0: 876.5. Samples: 565906. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-02-27 11:38:40,083][00394] Avg episode reward: [(0, '4.687')] +[2023-02-27 11:38:45,077][00394] Fps is (10 sec: 3688.2, 60 sec: 3550.0, 300 sec: 3457.3). Total num frames: 6303744. Throughput: 0: 861.5. Samples: 571676. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-02-27 11:38:45,083][00394] Avg episode reward: [(0, '4.761')] +[2023-02-27 11:38:46,402][28720] Updated weights for policy 0, policy_version 1540 (0.0023) +[2023-02-27 11:38:50,077][00394] Fps is (10 sec: 2867.0, 60 sec: 3481.5, 300 sec: 3457.3). Total num frames: 6316032. Throughput: 0: 859.2. Samples: 575780. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2023-02-27 11:38:50,086][00394] Avg episode reward: [(0, '4.670')] +[2023-02-27 11:38:50,102][28704] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001542_6316032.pth... +[2023-02-27 11:38:50,446][28704] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001341_5492736.pth +[2023-02-27 11:38:55,077][00394] Fps is (10 sec: 2867.2, 60 sec: 3413.3, 300 sec: 3443.4). Total num frames: 6332416. Throughput: 0: 927.2. Samples: 580924. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-27 11:38:55,081][00394] Avg episode reward: [(0, '4.694')] +[2023-02-27 11:38:58,255][28720] Updated weights for policy 0, policy_version 1550 (0.0030) +[2023-02-27 11:39:00,076][00394] Fps is (10 sec: 3686.7, 60 sec: 3413.6, 300 sec: 3443.4). Total num frames: 6352896. Throughput: 0: 874.1. Samples: 584052. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2023-02-27 11:39:00,080][00394] Avg episode reward: [(0, '4.717')] +[2023-02-27 11:39:05,076][00394] Fps is (10 sec: 4096.4, 60 sec: 3550.3, 300 sec: 3457.3). Total num frames: 6373376. Throughput: 0: 859.6. Samples: 589910. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-02-27 11:39:05,084][00394] Avg episode reward: [(0, '4.546')] +[2023-02-27 11:39:10,076][00394] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3457.3). Total num frames: 6385664. Throughput: 0: 858.8. Samples: 592022. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-27 11:39:10,084][00394] Avg episode reward: [(0, '4.630')] +[2023-02-27 11:39:10,434][28720] Updated weights for policy 0, policy_version 1560 (0.0012) +[2023-02-27 11:39:15,077][00394] Fps is (10 sec: 2867.1, 60 sec: 3413.3, 300 sec: 3443.4). Total num frames: 6402048. Throughput: 0: 863.9. Samples: 596158. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2023-02-27 11:39:15,091][00394] Avg episode reward: [(0, '4.526')] +[2023-02-27 11:39:20,076][00394] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3443.4). Total num frames: 6422528. Throughput: 0: 884.3. Samples: 602454. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-02-27 11:39:20,081][00394] Avg episode reward: [(0, '4.591')] +[2023-02-27 11:39:21,284][28720] Updated weights for policy 0, policy_version 1570 (0.0012) +[2023-02-27 11:39:25,076][00394] Fps is (10 sec: 4096.2, 60 sec: 3549.9, 300 sec: 3457.3). Total num frames: 6443008. Throughput: 0: 883.4. Samples: 605658. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-02-27 11:39:25,079][00394] Avg episode reward: [(0, '4.671')] +[2023-02-27 11:39:30,076][00394] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3457.3). Total num frames: 6455296. Throughput: 0: 857.8. Samples: 610278. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-02-27 11:39:30,088][00394] Avg episode reward: [(0, '4.647')] +[2023-02-27 11:39:35,026][28720] Updated weights for policy 0, policy_version 1580 (0.0029) +[2023-02-27 11:39:35,077][00394] Fps is (10 sec: 2867.2, 60 sec: 3413.7, 300 sec: 3443.4). Total num frames: 6471680. Throughput: 0: 855.3. Samples: 614270. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-27 11:39:35,084][00394] Avg episode reward: [(0, '4.691')] +[2023-02-27 11:39:40,077][00394] Fps is (10 sec: 3686.2, 60 sec: 3413.3, 300 sec: 3443.4). Total num frames: 6492160. Throughput: 0: 804.6. Samples: 617130. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-27 11:39:40,079][00394] Avg episode reward: [(0, '4.726')] +[2023-02-27 11:39:44,571][28720] Updated weights for policy 0, policy_version 1590 (0.0013) +[2023-02-27 11:39:45,079][00394] Fps is (10 sec: 4094.9, 60 sec: 3481.5, 300 sec: 3457.3). Total num frames: 6512640. Throughput: 0: 882.3. Samples: 623756. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-02-27 11:39:45,095][00394] Avg episode reward: [(0, '4.731')] +[2023-02-27 11:39:50,077][00394] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3457.4). Total num frames: 6524928. Throughput: 0: 856.7. Samples: 628460. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-02-27 11:39:50,086][00394] Avg episode reward: [(0, '4.881')] +[2023-02-27 11:39:55,087][00394] Fps is (10 sec: 2865.1, 60 sec: 3481.1, 300 sec: 3457.2). Total num frames: 6541312. Throughput: 0: 854.8. Samples: 630496. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-27 11:39:55,095][00394] Avg episode reward: [(0, '4.934')] +[2023-02-27 11:39:58,254][28720] Updated weights for policy 0, policy_version 1600 (0.0028) +[2023-02-27 11:40:00,077][00394] Fps is (10 sec: 3276.9, 60 sec: 3413.3, 300 sec: 3443.4). Total num frames: 6557696. Throughput: 0: 868.9. Samples: 635258. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-27 11:40:00,086][00394] Avg episode reward: [(0, '4.927')] +[2023-02-27 11:40:05,076][00394] Fps is (10 sec: 4100.2, 60 sec: 3481.6, 300 sec: 3457.3). Total num frames: 6582272. Throughput: 0: 872.3. Samples: 641708. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-27 11:40:05,078][00394] Avg episode reward: [(0, '4.567')] +[2023-02-27 11:40:08,505][28720] Updated weights for policy 0, policy_version 1610 (0.0015) +[2023-02-27 11:40:10,081][00394] Fps is (10 sec: 4094.0, 60 sec: 3549.6, 300 sec: 3471.1). Total num frames: 6598656. Throughput: 0: 912.7. Samples: 646732. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-02-27 11:40:10,089][00394] Avg episode reward: [(0, '4.613')] +[2023-02-27 11:40:15,077][00394] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3457.3). Total num frames: 6610944. Throughput: 0: 858.0. Samples: 648888. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-27 11:40:15,086][00394] Avg episode reward: [(0, '4.720')] +[2023-02-27 11:40:20,076][00394] Fps is (10 sec: 2868.6, 60 sec: 3413.3, 300 sec: 3443.4). Total num frames: 6627328. Throughput: 0: 875.3. Samples: 653658. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2023-02-27 11:40:20,083][00394] Avg episode reward: [(0, '4.892')] +[2023-02-27 11:40:21,184][28720] Updated weights for policy 0, policy_version 1620 (0.0017) +[2023-02-27 11:40:25,076][00394] Fps is (10 sec: 4096.1, 60 sec: 3481.6, 300 sec: 3457.3). Total num frames: 6651904. Throughput: 0: 957.2. Samples: 660202. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-27 11:40:25,084][00394] Avg episode reward: [(0, '4.741')] +[2023-02-27 11:40:30,077][00394] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3471.2). Total num frames: 6668288. Throughput: 0: 873.2. Samples: 663046. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2023-02-27 11:40:30,085][00394] Avg episode reward: [(0, '4.600')] +[2023-02-27 11:40:32,646][28720] Updated weights for policy 0, policy_version 1630 (0.0023) +[2023-02-27 11:40:35,076][00394] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3471.2). Total num frames: 6680576. Throughput: 0: 860.0. Samples: 667158. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-27 11:40:35,082][00394] Avg episode reward: [(0, '4.566')] +[2023-02-27 11:40:40,076][00394] Fps is (10 sec: 2867.2, 60 sec: 3413.4, 300 sec: 3457.3). Total num frames: 6696960. Throughput: 0: 861.0. Samples: 669234. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-27 11:40:40,079][00394] Avg episode reward: [(0, '4.445')] +[2023-02-27 11:40:44,037][28720] Updated weights for policy 0, policy_version 1640 (0.0017) +[2023-02-27 11:40:45,076][00394] Fps is (10 sec: 4096.0, 60 sec: 3481.8, 300 sec: 3457.3). Total num frames: 6721536. Throughput: 0: 889.3. Samples: 675276. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-02-27 11:40:45,078][00394] Avg episode reward: [(0, '4.677')] +[2023-02-27 11:40:50,077][00394] Fps is (10 sec: 4095.9, 60 sec: 3549.9, 300 sec: 3471.2). Total num frames: 6737920. Throughput: 0: 884.9. Samples: 681528. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-27 11:40:50,083][00394] Avg episode reward: [(0, '4.826')] +[2023-02-27 11:40:50,093][28704] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001645_6737920.pth... +[2023-02-27 11:40:50,410][28704] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001441_5902336.pth +[2023-02-27 11:40:55,076][00394] Fps is (10 sec: 2867.2, 60 sec: 3482.2, 300 sec: 3471.2). Total num frames: 6750208. Throughput: 0: 863.3. Samples: 685574. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-27 11:40:55,084][00394] Avg episode reward: [(0, '4.844')] +[2023-02-27 11:40:56,720][28720] Updated weights for policy 0, policy_version 1650 (0.0027) +[2023-02-27 11:41:00,077][00394] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3457.3). Total num frames: 6766592. Throughput: 0: 861.1. Samples: 687636. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-27 11:41:00,080][00394] Avg episode reward: [(0, '4.546')] +[2023-02-27 11:41:05,076][00394] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3457.3). Total num frames: 6787072. Throughput: 0: 886.9. Samples: 693568. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2023-02-27 11:41:05,089][00394] Avg episode reward: [(0, '4.725')] +[2023-02-27 11:41:07,052][28720] Updated weights for policy 0, policy_version 1660 (0.0013) +[2023-02-27 11:41:10,077][00394] Fps is (10 sec: 4095.8, 60 sec: 3481.9, 300 sec: 3471.2). Total num frames: 6807552. Throughput: 0: 812.8. Samples: 696778. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2023-02-27 11:41:10,086][00394] Avg episode reward: [(0, '4.788')] +[2023-02-27 11:41:15,076][00394] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3471.2). Total num frames: 6823936. Throughput: 0: 866.4. Samples: 702032. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-27 11:41:15,090][00394] Avg episode reward: [(0, '4.888')] +[2023-02-27 11:41:20,077][00394] Fps is (10 sec: 2867.3, 60 sec: 3481.6, 300 sec: 3457.3). Total num frames: 6836224. Throughput: 0: 865.6. Samples: 706112. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2023-02-27 11:41:20,084][00394] Avg episode reward: [(0, '4.878')] +[2023-02-27 11:41:20,761][28720] Updated weights for policy 0, policy_version 1670 (0.0033) +[2023-02-27 11:41:25,076][00394] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3457.3). Total num frames: 6856704. Throughput: 0: 875.2. Samples: 708616. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-27 11:41:25,098][00394] Avg episode reward: [(0, '4.636')] +[2023-02-27 11:41:30,079][00394] Fps is (10 sec: 4095.0, 60 sec: 3481.5, 300 sec: 3471.2). Total num frames: 6877184. Throughput: 0: 882.6. Samples: 714994. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-02-27 11:41:30,082][00394] Avg episode reward: [(0, '4.630')] +[2023-02-27 11:41:30,204][28720] Updated weights for policy 0, policy_version 1680 (0.0014) +[2023-02-27 11:41:35,077][00394] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3471.4). Total num frames: 6893568. Throughput: 0: 857.5. Samples: 720116. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-27 11:41:35,082][00394] Avg episode reward: [(0, '4.545')] +[2023-02-27 11:41:40,077][00394] Fps is (10 sec: 2867.9, 60 sec: 3481.6, 300 sec: 3471.3). Total num frames: 6905856. Throughput: 0: 811.9. Samples: 722108. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2023-02-27 11:41:40,083][00394] Avg episode reward: [(0, '4.724')] +[2023-02-27 11:41:43,760][28720] Updated weights for policy 0, policy_version 1690 (0.0014) +[2023-02-27 11:41:45,076][00394] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3471.2). Total num frames: 6926336. Throughput: 0: 870.0. Samples: 726786. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-27 11:41:45,081][00394] Avg episode reward: [(0, '4.553')] +[2023-02-27 11:41:50,076][00394] Fps is (10 sec: 4096.1, 60 sec: 3481.6, 300 sec: 3471.2). Total num frames: 6946816. Throughput: 0: 886.4. Samples: 733456. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-27 11:41:50,078][00394] Avg episode reward: [(0, '4.512')] +[2023-02-27 11:41:53,787][28720] Updated weights for policy 0, policy_version 1700 (0.0013) +[2023-02-27 11:41:55,078][00394] Fps is (10 sec: 3686.0, 60 sec: 3549.8, 300 sec: 3485.1). Total num frames: 6963200. Throughput: 0: 885.9. Samples: 736646. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-02-27 11:41:55,081][00394] Avg episode reward: [(0, '4.643')] +[2023-02-27 11:42:00,076][00394] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3499.0). Total num frames: 6979584. Throughput: 0: 861.9. Samples: 740818. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-27 11:42:00,083][00394] Avg episode reward: [(0, '4.595')] +[2023-02-27 11:42:05,076][00394] Fps is (10 sec: 3277.2, 60 sec: 3481.6, 300 sec: 3471.2). Total num frames: 6995968. Throughput: 0: 869.2. Samples: 745228. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-02-27 11:42:05,085][00394] Avg episode reward: [(0, '4.417')] +[2023-02-27 11:42:06,766][28720] Updated weights for policy 0, policy_version 1710 (0.0016) +[2023-02-27 11:42:10,076][00394] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3471.2). Total num frames: 7016448. Throughput: 0: 885.6. Samples: 748470. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-27 11:42:10,079][00394] Avg episode reward: [(0, '4.591')] +[2023-02-27 11:42:15,082][00394] Fps is (10 sec: 4093.8, 60 sec: 3549.5, 300 sec: 3499.0). Total num frames: 7036928. Throughput: 0: 891.1. Samples: 755096. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2023-02-27 11:42:15,084][00394] Avg episode reward: [(0, '4.621')] +[2023-02-27 11:42:18,072][28720] Updated weights for policy 0, policy_version 1720 (0.0027) +[2023-02-27 11:42:20,079][00394] Fps is (10 sec: 3275.9, 60 sec: 3549.7, 300 sec: 3485.0). Total num frames: 7049216. Throughput: 0: 866.2. Samples: 759096. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-27 11:42:20,082][00394] Avg episode reward: [(0, '4.649')] +[2023-02-27 11:42:25,076][00394] Fps is (10 sec: 2458.9, 60 sec: 3413.3, 300 sec: 3471.2). Total num frames: 7061504. Throughput: 0: 918.7. Samples: 763448. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2023-02-27 11:42:25,079][00394] Avg episode reward: [(0, '4.756')] +[2023-02-27 11:42:29,928][28720] Updated weights for policy 0, policy_version 1730 (0.0030) +[2023-02-27 11:42:30,076][00394] Fps is (10 sec: 3687.4, 60 sec: 3481.7, 300 sec: 3485.1). Total num frames: 7086080. Throughput: 0: 884.5. Samples: 766590. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-02-27 11:42:30,079][00394] Avg episode reward: [(0, '4.815')] +[2023-02-27 11:42:35,076][00394] Fps is (10 sec: 4505.6, 60 sec: 3549.9, 300 sec: 3485.1). Total num frames: 7106560. Throughput: 0: 883.9. Samples: 773232. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-02-27 11:42:35,079][00394] Avg episode reward: [(0, '4.763')] +[2023-02-27 11:42:40,078][00394] Fps is (10 sec: 3276.4, 60 sec: 3549.8, 300 sec: 3485.1). Total num frames: 7118848. Throughput: 0: 858.0. Samples: 775254. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-27 11:42:40,081][00394] Avg episode reward: [(0, '4.646')] +[2023-02-27 11:42:42,429][28720] Updated weights for policy 0, policy_version 1740 (0.0019) +[2023-02-27 11:42:45,076][00394] Fps is (10 sec: 2457.6, 60 sec: 3413.3, 300 sec: 3471.2). Total num frames: 7131136. Throughput: 0: 857.5. Samples: 779404. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-02-27 11:42:45,085][00394] Avg episode reward: [(0, '4.664')] +[2023-02-27 11:42:50,076][00394] Fps is (10 sec: 3277.2, 60 sec: 3413.3, 300 sec: 3471.2). Total num frames: 7151616. Throughput: 0: 883.5. Samples: 784984. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-02-27 11:42:50,082][00394] Avg episode reward: [(0, '4.687')] +[2023-02-27 11:42:50,152][28704] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001747_7155712.pth... +[2023-02-27 11:42:50,392][28704] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001542_6316032.pth +[2023-02-27 11:42:53,004][28720] Updated weights for policy 0, policy_version 1750 (0.0015) +[2023-02-27 11:42:55,077][00394] Fps is (10 sec: 4505.2, 60 sec: 3549.9, 300 sec: 3485.1). Total num frames: 7176192. Throughput: 0: 950.4. Samples: 791240. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-02-27 11:42:55,081][00394] Avg episode reward: [(0, '4.506')] +[2023-02-27 11:43:00,079][00394] Fps is (10 sec: 3685.5, 60 sec: 3481.4, 300 sec: 3485.1). Total num frames: 7188480. Throughput: 0: 851.4. Samples: 793406. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-27 11:43:00,082][00394] Avg episode reward: [(0, '4.650')] +[2023-02-27 11:43:05,077][00394] Fps is (10 sec: 2867.4, 60 sec: 3481.6, 300 sec: 3485.1). Total num frames: 7204864. Throughput: 0: 854.6. Samples: 797550. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-02-27 11:43:05,092][00394] Avg episode reward: [(0, '4.636')] +[2023-02-27 11:43:06,682][28720] Updated weights for policy 0, policy_version 1760 (0.0025) +[2023-02-27 11:43:10,076][00394] Fps is (10 sec: 3277.7, 60 sec: 3413.3, 300 sec: 3471.2). Total num frames: 7221248. Throughput: 0: 881.2. Samples: 803104. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-02-27 11:43:10,083][00394] Avg episode reward: [(0, '4.596')] +[2023-02-27 11:43:15,076][00394] Fps is (10 sec: 3686.5, 60 sec: 3413.6, 300 sec: 3471.2). Total num frames: 7241728. Throughput: 0: 884.7. Samples: 806402. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-02-27 11:43:15,079][00394] Avg episode reward: [(0, '4.406')] +[2023-02-27 11:43:15,909][28720] Updated weights for policy 0, policy_version 1770 (0.0017) +[2023-02-27 11:43:20,076][00394] Fps is (10 sec: 4096.0, 60 sec: 3550.0, 300 sec: 3499.0). Total num frames: 7262208. Throughput: 0: 862.8. Samples: 812058. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-02-27 11:43:20,087][00394] Avg episode reward: [(0, '4.423')] +[2023-02-27 11:43:25,076][00394] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3485.1). Total num frames: 7274496. Throughput: 0: 906.7. Samples: 816052. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-02-27 11:43:25,080][00394] Avg episode reward: [(0, '4.571')] +[2023-02-27 11:43:29,656][28720] Updated weights for policy 0, policy_version 1780 (0.0021) +[2023-02-27 11:43:30,077][00394] Fps is (10 sec: 2867.2, 60 sec: 3413.3, 300 sec: 3471.3). Total num frames: 7290880. Throughput: 0: 857.3. Samples: 817984. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-02-27 11:43:30,082][00394] Avg episode reward: [(0, '4.623')] +[2023-02-27 11:43:35,077][00394] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3471.2). Total num frames: 7311360. Throughput: 0: 878.4. Samples: 824514. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-02-27 11:43:35,084][00394] Avg episode reward: [(0, '4.758')] +[2023-02-27 11:43:39,808][28720] Updated weights for policy 0, policy_version 1790 (0.0031) +[2023-02-27 11:43:40,077][00394] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3485.1). Total num frames: 7331840. Throughput: 0: 812.4. Samples: 827796. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-02-27 11:43:40,086][00394] Avg episode reward: [(0, '4.808')] +[2023-02-27 11:43:45,076][00394] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3485.1). Total num frames: 7344128. Throughput: 0: 866.0. Samples: 832374. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-02-27 11:43:45,079][00394] Avg episode reward: [(0, '4.810')] +[2023-02-27 11:43:50,076][00394] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3485.1). Total num frames: 7360512. Throughput: 0: 867.7. Samples: 836596. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-27 11:43:50,080][00394] Avg episode reward: [(0, '4.702')] +[2023-02-27 11:43:52,502][28720] Updated weights for policy 0, policy_version 1800 (0.0025) +[2023-02-27 11:43:55,081][00394] Fps is (10 sec: 3684.7, 60 sec: 3413.1, 300 sec: 3485.0). Total num frames: 7380992. Throughput: 0: 887.0. Samples: 843022. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-02-27 11:43:55,087][00394] Avg episode reward: [(0, '4.738')] +[2023-02-27 11:44:00,078][00394] Fps is (10 sec: 4095.2, 60 sec: 3549.9, 300 sec: 3485.0). Total num frames: 7401472. Throughput: 0: 885.2. Samples: 846236. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-02-27 11:44:00,081][00394] Avg episode reward: [(0, '4.637')] +[2023-02-27 11:44:03,866][28720] Updated weights for policy 0, policy_version 1810 (0.0020) +[2023-02-27 11:44:05,077][00394] Fps is (10 sec: 3278.0, 60 sec: 3481.6, 300 sec: 3485.1). Total num frames: 7413760. Throughput: 0: 861.9. Samples: 850844. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-02-27 11:44:05,087][00394] Avg episode reward: [(0, '4.650')] +[2023-02-27 11:44:10,077][00394] Fps is (10 sec: 2867.7, 60 sec: 3481.6, 300 sec: 3485.1). Total num frames: 7430144. Throughput: 0: 820.9. Samples: 852994. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-02-27 11:44:10,081][00394] Avg episode reward: [(0, '4.627')] +[2023-02-27 11:44:15,076][00394] Fps is (10 sec: 3686.7, 60 sec: 3481.6, 300 sec: 3485.1). Total num frames: 7450624. Throughput: 0: 889.0. Samples: 857990. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-27 11:44:15,078][00394] Avg episode reward: [(0, '4.553')] +[2023-02-27 11:44:15,543][28720] Updated weights for policy 0, policy_version 1820 (0.0017) +[2023-02-27 11:44:20,076][00394] Fps is (10 sec: 4096.1, 60 sec: 3481.6, 300 sec: 3485.1). Total num frames: 7471104. Throughput: 0: 892.2. Samples: 864662. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-02-27 11:44:20,081][00394] Avg episode reward: [(0, '4.733')] +[2023-02-27 11:44:25,077][00394] Fps is (10 sec: 3686.2, 60 sec: 3549.8, 300 sec: 3499.0). Total num frames: 7487488. Throughput: 0: 926.1. Samples: 869470. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2023-02-27 11:44:25,083][00394] Avg episode reward: [(0, '4.828')] +[2023-02-27 11:44:27,349][28720] Updated weights for policy 0, policy_version 1830 (0.0013) +[2023-02-27 11:44:30,077][00394] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3485.1). Total num frames: 7499776. Throughput: 0: 869.4. Samples: 871498. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2023-02-27 11:44:30,090][00394] Avg episode reward: [(0, '4.551')] +[2023-02-27 11:44:35,080][00394] Fps is (10 sec: 3275.7, 60 sec: 3481.4, 300 sec: 3485.0). Total num frames: 7520256. Throughput: 0: 879.7. Samples: 876186. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2023-02-27 11:44:35,090][00394] Avg episode reward: [(0, '4.766')] +[2023-02-27 11:44:38,583][28720] Updated weights for policy 0, policy_version 1840 (0.0013) +[2023-02-27 11:44:40,079][00394] Fps is (10 sec: 4095.0, 60 sec: 3481.5, 300 sec: 3485.1). Total num frames: 7540736. Throughput: 0: 885.7. Samples: 882878. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-27 11:44:40,082][00394] Avg episode reward: [(0, '4.626')] +[2023-02-27 11:44:45,076][00394] Fps is (10 sec: 3687.9, 60 sec: 3549.9, 300 sec: 3499.0). Total num frames: 7557120. Throughput: 0: 878.3. Samples: 885760. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-27 11:44:45,079][00394] Avg episode reward: [(0, '4.383')] +[2023-02-27 11:44:50,076][00394] Fps is (10 sec: 3277.6, 60 sec: 3549.9, 300 sec: 3499.1). Total num frames: 7573504. Throughput: 0: 869.4. Samples: 889966. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-27 11:44:50,084][00394] Avg episode reward: [(0, '4.530')] +[2023-02-27 11:44:50,105][28704] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001849_7573504.pth... +[2023-02-27 11:44:50,532][28704] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001645_6737920.pth +[2023-02-27 11:44:51,507][28720] Updated weights for policy 0, policy_version 1850 (0.0035) +[2023-02-27 11:44:55,076][00394] Fps is (10 sec: 2867.2, 60 sec: 3413.6, 300 sec: 3485.1). Total num frames: 7585792. Throughput: 0: 865.1. Samples: 891924. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2023-02-27 11:44:55,079][00394] Avg episode reward: [(0, '4.588')] +[2023-02-27 11:45:00,077][00394] Fps is (10 sec: 3686.4, 60 sec: 3481.7, 300 sec: 3485.1). Total num frames: 7610368. Throughput: 0: 882.9. Samples: 897720. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-27 11:45:00,081][00394] Avg episode reward: [(0, '4.450')] +[2023-02-27 11:45:01,792][28720] Updated weights for policy 0, policy_version 1860 (0.0020) +[2023-02-27 11:45:05,076][00394] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3485.1). Total num frames: 7626752. Throughput: 0: 874.5. Samples: 904014. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-27 11:45:05,081][00394] Avg episode reward: [(0, '4.608')] +[2023-02-27 11:45:10,077][00394] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3499.0). Total num frames: 7643136. Throughput: 0: 813.3. Samples: 906066. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-02-27 11:45:10,086][00394] Avg episode reward: [(0, '4.638')] +[2023-02-27 11:45:15,076][00394] Fps is (10 sec: 2867.2, 60 sec: 3413.3, 300 sec: 3485.1). Total num frames: 7655424. Throughput: 0: 860.6. Samples: 910226. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-02-27 11:45:15,079][00394] Avg episode reward: [(0, '4.559')] +[2023-02-27 11:45:15,547][28720] Updated weights for policy 0, policy_version 1870 (0.0037) +[2023-02-27 11:45:20,076][00394] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3471.2). Total num frames: 7675904. Throughput: 0: 881.9. Samples: 915870. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-27 11:45:20,085][00394] Avg episode reward: [(0, '4.603')] +[2023-02-27 11:45:24,815][28720] Updated weights for policy 0, policy_version 1880 (0.0015) +[2023-02-27 11:45:25,077][00394] Fps is (10 sec: 4505.5, 60 sec: 3549.9, 300 sec: 3499.0). Total num frames: 7700480. Throughput: 0: 806.7. Samples: 919178. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-27 11:45:25,081][00394] Avg episode reward: [(0, '4.621')] +[2023-02-27 11:45:30,079][00394] Fps is (10 sec: 3685.4, 60 sec: 3549.7, 300 sec: 3498.9). Total num frames: 7712768. Throughput: 0: 857.3. Samples: 924342. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-27 11:45:30,090][00394] Avg episode reward: [(0, '4.505')] +[2023-02-27 11:45:35,077][00394] Fps is (10 sec: 2457.5, 60 sec: 3413.5, 300 sec: 3485.1). Total num frames: 7725056. Throughput: 0: 854.2. Samples: 928406. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-02-27 11:45:35,081][00394] Avg episode reward: [(0, '4.492')] +[2023-02-27 11:45:38,676][28720] Updated weights for policy 0, policy_version 1890 (0.0012) +[2023-02-27 11:45:40,076][00394] Fps is (10 sec: 3277.6, 60 sec: 3413.5, 300 sec: 3471.2). Total num frames: 7745536. Throughput: 0: 935.1. Samples: 934002. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-27 11:45:40,078][00394] Avg episode reward: [(0, '4.602')] +[2023-02-27 11:45:45,076][00394] Fps is (10 sec: 4096.3, 60 sec: 3481.6, 300 sec: 3485.1). Total num frames: 7766016. Throughput: 0: 880.2. Samples: 937328. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-02-27 11:45:45,079][00394] Avg episode reward: [(0, '4.650')] +[2023-02-27 11:45:48,908][28720] Updated weights for policy 0, policy_version 1900 (0.0012) +[2023-02-27 11:45:50,081][00394] Fps is (10 sec: 3684.7, 60 sec: 3481.3, 300 sec: 3498.9). Total num frames: 7782400. Throughput: 0: 860.3. Samples: 942732. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-02-27 11:45:50,083][00394] Avg episode reward: [(0, '4.519')] +[2023-02-27 11:45:55,076][00394] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3499.0). Total num frames: 7798784. Throughput: 0: 861.7. Samples: 944844. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-27 11:45:55,080][00394] Avg episode reward: [(0, '4.551')] +[2023-02-27 11:46:00,076][00394] Fps is (10 sec: 3278.3, 60 sec: 3413.3, 300 sec: 3485.1). Total num frames: 7815168. Throughput: 0: 864.0. Samples: 949104. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-27 11:46:00,084][00394] Avg episode reward: [(0, '4.654')] +[2023-02-27 11:46:01,613][28720] Updated weights for policy 0, policy_version 1910 (0.0038) +[2023-02-27 11:46:05,076][00394] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3485.1). Total num frames: 7835648. Throughput: 0: 886.8. Samples: 955776. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-27 11:46:05,079][00394] Avg episode reward: [(0, '4.678')] +[2023-02-27 11:46:10,082][00394] Fps is (10 sec: 4093.6, 60 sec: 3549.5, 300 sec: 3498.9). Total num frames: 7856128. Throughput: 0: 937.0. Samples: 961346. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-27 11:46:10,086][00394] Avg episode reward: [(0, '4.685')] +[2023-02-27 11:46:12,528][28720] Updated weights for policy 0, policy_version 1920 (0.0022) +[2023-02-27 11:46:15,076][00394] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3499.0). Total num frames: 7868416. Throughput: 0: 869.1. Samples: 963450. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-02-27 11:46:15,081][00394] Avg episode reward: [(0, '4.930')] +[2023-02-27 11:46:20,076][00394] Fps is (10 sec: 2868.9, 60 sec: 3481.6, 300 sec: 3485.1). Total num frames: 7884800. Throughput: 0: 870.1. Samples: 967558. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2023-02-27 11:46:20,083][00394] Avg episode reward: [(0, '5.040')] +[2023-02-27 11:46:24,585][28720] Updated weights for policy 0, policy_version 1930 (0.0019) +[2023-02-27 11:46:25,077][00394] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3485.1). Total num frames: 7905280. Throughput: 0: 818.7. Samples: 970844. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-02-27 11:46:25,082][00394] Avg episode reward: [(0, '4.886')] +[2023-02-27 11:46:30,082][00394] Fps is (10 sec: 4093.9, 60 sec: 3549.7, 300 sec: 3498.9). Total num frames: 7925760. Throughput: 0: 888.8. Samples: 977330. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2023-02-27 11:46:30,086][00394] Avg episode reward: [(0, '4.829')] +[2023-02-27 11:46:35,078][00394] Fps is (10 sec: 3276.4, 60 sec: 3549.8, 300 sec: 3498.9). Total num frames: 7938048. Throughput: 0: 867.2. Samples: 981752. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-02-27 11:46:35,081][00394] Avg episode reward: [(0, '4.901')] +[2023-02-27 11:46:36,906][28720] Updated weights for policy 0, policy_version 1940 (0.0017) +[2023-02-27 11:46:40,077][00394] Fps is (10 sec: 2868.4, 60 sec: 3481.5, 300 sec: 3485.1). Total num frames: 7954432. Throughput: 0: 913.0. Samples: 985932. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-02-27 11:46:40,089][00394] Avg episode reward: [(0, '5.057')] +[2023-02-27 11:46:45,076][00394] Fps is (10 sec: 3686.9, 60 sec: 3481.6, 300 sec: 3485.1). Total num frames: 7974912. Throughput: 0: 888.9. Samples: 989106. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-27 11:46:45,086][00394] Avg episode reward: [(0, '5.047')] +[2023-02-27 11:46:47,432][28720] Updated weights for policy 0, policy_version 1950 (0.0015) +[2023-02-27 11:46:50,076][00394] Fps is (10 sec: 4096.4, 60 sec: 3550.1, 300 sec: 3499.0). Total num frames: 7995392. Throughput: 0: 887.5. Samples: 995714. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-27 11:46:50,081][00394] Avg episode reward: [(0, '4.649')] +[2023-02-27 11:46:50,158][28704] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001953_7999488.pth... +[2023-02-27 11:46:50,460][28704] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001747_7155712.pth +[2023-02-27 11:46:53,432][28704] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001955_8007680.pth... +[2023-02-27 11:46:53,443][00394] Component Batcher_0 stopped! +[2023-02-27 11:46:53,442][28704] Stopping Batcher_0... +[2023-02-27 11:46:53,454][28704] Loop batcher_evt_loop terminating... +[2023-02-27 11:46:53,557][28720] Weights refcount: 2 0 +[2023-02-27 11:46:53,611][00394] Component InferenceWorker_p0-w0 stopped! +[2023-02-27 11:46:53,617][28720] Stopping InferenceWorker_p0-w0... +[2023-02-27 11:46:53,618][28720] Loop inference_proc0-0_evt_loop terminating... +[2023-02-27 11:46:53,639][28704] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001849_7573504.pth +[2023-02-27 11:46:53,648][28704] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001955_8007680.pth... +[2023-02-27 11:46:53,860][00394] Component LearnerWorker_p0 stopped! +[2023-02-27 11:46:53,869][00394] Component RolloutWorker_w4 stopped! +[2023-02-27 11:46:53,888][00394] Component RolloutWorker_w0 stopped! +[2023-02-27 11:46:53,868][28733] Stopping RolloutWorker_w4... +[2023-02-27 11:46:53,887][28718] Stopping RolloutWorker_w0... +[2023-02-27 11:46:53,892][28728] Stopping RolloutWorker_w2... +[2023-02-27 11:46:53,895][28728] Loop rollout_proc2_evt_loop terminating... +[2023-02-27 11:46:53,891][28733] Loop rollout_proc4_evt_loop terminating... +[2023-02-27 11:46:53,891][28704] Stopping LearnerWorker_p0... +[2023-02-27 11:46:53,896][28704] Loop learner_proc0_evt_loop terminating... +[2023-02-27 11:46:53,894][28718] Loop rollout_proc0_evt_loop terminating... +[2023-02-27 11:46:53,898][28741] Stopping RolloutWorker_w6... +[2023-02-27 11:46:53,899][28741] Loop rollout_proc6_evt_loop terminating... +[2023-02-27 11:46:53,893][00394] Component RolloutWorker_w2 stopped! +[2023-02-27 11:46:53,900][00394] Component RolloutWorker_w6 stopped! +[2023-02-27 11:46:53,923][00394] Component RolloutWorker_w7 stopped! +[2023-02-27 11:46:53,927][28735] Stopping RolloutWorker_w7... +[2023-02-27 11:46:53,928][28735] Loop rollout_proc7_evt_loop terminating... +[2023-02-27 11:46:53,943][00394] Component RolloutWorker_w5 stopped! +[2023-02-27 11:46:53,950][28731] Stopping RolloutWorker_w5... +[2023-02-27 11:46:53,950][28731] Loop rollout_proc5_evt_loop terminating... +[2023-02-27 11:46:53,963][00394] Component RolloutWorker_w1 stopped! +[2023-02-27 11:46:53,964][28719] Stopping RolloutWorker_w1... +[2023-02-27 11:46:53,965][28719] Loop rollout_proc1_evt_loop terminating... +[2023-02-27 11:46:53,993][00394] Component RolloutWorker_w3 stopped! +[2023-02-27 11:46:53,996][00394] Waiting for process learner_proc0 to stop... +[2023-02-27 11:46:54,000][28723] Stopping RolloutWorker_w3... +[2023-02-27 11:46:54,027][28723] Loop rollout_proc3_evt_loop terminating... +[2023-02-27 11:46:58,284][00394] Waiting for process inference_proc0-0 to join... +[2023-02-27 11:46:58,286][00394] Waiting for process rollout_proc0 to join... +[2023-02-27 11:46:58,288][00394] Waiting for process rollout_proc1 to join... +[2023-02-27 11:46:58,293][00394] Waiting for process rollout_proc2 to join... +[2023-02-27 11:46:58,294][00394] Waiting for process rollout_proc3 to join... +[2023-02-27 11:46:58,295][00394] Waiting for process rollout_proc4 to join... +[2023-02-27 11:46:58,296][00394] Waiting for process rollout_proc5 to join... +[2023-02-27 11:46:58,298][00394] Waiting for process rollout_proc6 to join... +[2023-02-27 11:46:58,299][00394] Waiting for process rollout_proc7 to join... +[2023-02-27 11:46:58,302][00394] Batcher 0 profile tree view: +batching: 29.2540, releasing_batches: 0.0524 +[2023-02-27 11:46:58,312][00394] InferenceWorker_p0-w0 profile tree view: +wait_policy: 0.0001 + wait_policy_total: 547.5281 +update_model: 8.4902 + weight_update: 0.0014 +one_step: 0.0234 + handle_policy_step: 570.4642 + deserialize: 17.2066, stack: 3.6083, obs_to_device_normalize: 126.4224, forward: 275.2417, send_messages: 29.7232 + prepare_outputs: 88.1017 + to_cpu: 54.0219 +[2023-02-27 11:46:58,315][00394] Learner 0 profile tree view: +misc: 0.0055, prepare_batch: 19.1094 +train: 85.8167 + epoch_init: 0.0118, minibatch_init: 0.0064, losses_postprocess: 0.5940, kl_divergence: 0.6677, after_optimizer: 3.9929 + calculate_losses: 29.3184 + losses_init: 0.0035, forward_head: 2.1917, bptt_initial: 18.8199, tail: 1.3381, advantages_returns: 0.3063, losses: 3.6391 + bptt: 2.6715 + bptt_forward_core: 2.5638 + update: 50.4395 + clip: 1.6576 +[2023-02-27 11:46:58,317][00394] RolloutWorker_w0 profile tree view: +wait_for_trajectories: 0.4058, enqueue_policy_requests: 143.4978, env_step: 871.1819, overhead: 29.5023, complete_rollouts: 7.7036 +save_policy_outputs: 25.6963 + split_output_tensors: 12.4900 +[2023-02-27 11:46:58,319][00394] RolloutWorker_w7 profile tree view: +wait_for_trajectories: 0.4221, enqueue_policy_requests: 166.1125, env_step: 847.2251, overhead: 29.6019, complete_rollouts: 6.4290 +save_policy_outputs: 25.4015 + split_output_tensors: 12.0902 +[2023-02-27 11:46:58,326][00394] Loop Runner_EvtLoop terminating... +[2023-02-27 11:46:58,328][00394] Runner profile tree view: +main_loop: 1204.4236 +[2023-02-27 11:46:58,331][00394] Collected {0: 8007680}, FPS: 3315.8 +[2023-02-27 11:46:58,539][00394] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json +[2023-02-27 11:46:58,541][00394] Overriding arg 'num_workers' with value 1 passed from command line +[2023-02-27 11:46:58,544][00394] Adding new argument 'no_render'=True that is not in the saved config file! +[2023-02-27 11:46:58,546][00394] Adding new argument 'save_video'=True that is not in the saved config file! +[2023-02-27 11:46:58,548][00394] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! +[2023-02-27 11:46:58,550][00394] Adding new argument 'video_name'=None that is not in the saved config file! +[2023-02-27 11:46:58,552][00394] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! +[2023-02-27 11:46:58,554][00394] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! +[2023-02-27 11:46:58,555][00394] Adding new argument 'push_to_hub'=False that is not in the saved config file! +[2023-02-27 11:46:58,557][00394] Adding new argument 'hf_repository'=None that is not in the saved config file! +[2023-02-27 11:46:58,558][00394] Adding new argument 'policy_index'=0 that is not in the saved config file! +[2023-02-27 11:46:58,560][00394] Adding new argument 'eval_deterministic'=False that is not in the saved config file! +[2023-02-27 11:46:58,563][00394] Adding new argument 'train_script'=None that is not in the saved config file! +[2023-02-27 11:46:58,564][00394] Adding new argument 'enjoy_script'=None that is not in the saved config file! +[2023-02-27 11:46:58,566][00394] Using frameskip 1 and render_action_repeat=4 for evaluation +[2023-02-27 11:46:58,611][00394] RunningMeanStd input shape: (3, 72, 128) +[2023-02-27 11:46:58,622][00394] RunningMeanStd input shape: (1,) +[2023-02-27 11:46:58,645][00394] ConvEncoder: input_channels=3 +[2023-02-27 11:46:58,843][00394] Conv encoder output size: 512 +[2023-02-27 11:46:58,845][00394] Policy head output size: 512 +[2023-02-27 11:46:58,974][00394] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001955_8007680.pth... +[2023-02-27 11:47:00,352][00394] Num frames 100... +[2023-02-27 11:47:00,468][00394] Num frames 200... +[2023-02-27 11:47:00,601][00394] Num frames 300... +[2023-02-27 11:47:00,759][00394] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840 +[2023-02-27 11:47:00,762][00394] Avg episode reward: 3.840, avg true_objective: 3.840 +[2023-02-27 11:47:00,787][00394] Num frames 400... +[2023-02-27 11:47:00,901][00394] Num frames 500... +[2023-02-27 11:47:01,019][00394] Num frames 600... +[2023-02-27 11:47:01,130][00394] Num frames 700... +[2023-02-27 11:47:01,251][00394] Avg episode rewards: #0: 4.295, true rewards: #0: 3.795 +[2023-02-27 11:47:01,252][00394] Avg episode reward: 4.295, avg true_objective: 3.795 +[2023-02-27 11:47:01,303][00394] Num frames 800... +[2023-02-27 11:47:01,419][00394] Num frames 900... +[2023-02-27 11:47:01,535][00394] Num frames 1000... +[2023-02-27 11:47:01,647][00394] Num frames 1100... +[2023-02-27 11:47:01,768][00394] Num frames 1200... +[2023-02-27 11:47:01,834][00394] Avg episode rewards: #0: 4.690, true rewards: #0: 4.023 +[2023-02-27 11:47:01,835][00394] Avg episode reward: 4.690, avg true_objective: 4.023 +[2023-02-27 11:47:01,943][00394] Num frames 1300... +[2023-02-27 11:47:02,058][00394] Num frames 1400... +[2023-02-27 11:47:02,175][00394] Num frames 1500... +[2023-02-27 11:47:02,291][00394] Num frames 1600... +[2023-02-27 11:47:02,419][00394] Avg episode rewards: #0: 4.888, true rewards: #0: 4.137 +[2023-02-27 11:47:02,421][00394] Avg episode reward: 4.888, avg true_objective: 4.137 +[2023-02-27 11:47:02,474][00394] Num frames 1700... +[2023-02-27 11:47:02,593][00394] Num frames 1800... +[2023-02-27 11:47:02,705][00394] Num frames 1900... +[2023-02-27 11:47:02,831][00394] Num frames 2000... +[2023-02-27 11:47:02,930][00394] Avg episode rewards: #0: 4.678, true rewards: #0: 4.078 +[2023-02-27 11:47:02,932][00394] Avg episode reward: 4.678, avg true_objective: 4.078 +[2023-02-27 11:47:03,014][00394] Num frames 2100... +[2023-02-27 11:47:03,128][00394] Num frames 2200... +[2023-02-27 11:47:03,251][00394] Num frames 2300... +[2023-02-27 11:47:03,371][00394] Num frames 2400... +[2023-02-27 11:47:03,492][00394] Num frames 2500... +[2023-02-27 11:47:03,642][00394] Avg episode rewards: #0: 5.138, true rewards: #0: 4.305 +[2023-02-27 11:47:03,644][00394] Avg episode reward: 5.138, avg true_objective: 4.305 +[2023-02-27 11:47:03,675][00394] Num frames 2600... +[2023-02-27 11:47:03,797][00394] Num frames 2700... +[2023-02-27 11:47:03,924][00394] Num frames 2800... +[2023-02-27 11:47:04,043][00394] Num frames 2900... +[2023-02-27 11:47:04,162][00394] Num frames 3000... +[2023-02-27 11:47:04,292][00394] Num frames 3100... +[2023-02-27 11:47:04,381][00394] Avg episode rewards: #0: 5.467, true rewards: #0: 4.467 +[2023-02-27 11:47:04,383][00394] Avg episode reward: 5.467, avg true_objective: 4.467 +[2023-02-27 11:47:04,471][00394] Num frames 3200... +[2023-02-27 11:47:04,586][00394] Num frames 3300... +[2023-02-27 11:47:04,702][00394] Num frames 3400... +[2023-02-27 11:47:04,820][00394] Num frames 3500... +[2023-02-27 11:47:04,960][00394] Avg episode rewards: #0: 5.469, true rewards: #0: 4.469 +[2023-02-27 11:47:04,962][00394] Avg episode reward: 5.469, avg true_objective: 4.469 +[2023-02-27 11:47:04,995][00394] Num frames 3600... +[2023-02-27 11:47:05,108][00394] Num frames 3700... +[2023-02-27 11:47:05,228][00394] Num frames 3800... +[2023-02-27 11:47:05,339][00394] Num frames 3900... +[2023-02-27 11:47:05,462][00394] Num frames 4000... +[2023-02-27 11:47:05,614][00394] Avg episode rewards: #0: 5.652, true rewards: #0: 4.541 +[2023-02-27 11:47:05,615][00394] Avg episode reward: 5.652, avg true_objective: 4.541 +[2023-02-27 11:47:05,635][00394] Num frames 4100... +[2023-02-27 11:47:05,752][00394] Num frames 4200... +[2023-02-27 11:47:05,882][00394] Num frames 4300... +[2023-02-27 11:47:05,996][00394] Num frames 4400... +[2023-02-27 11:47:06,122][00394] Num frames 4500... +[2023-02-27 11:47:06,220][00394] Avg episode rewards: #0: 5.635, true rewards: #0: 4.535 +[2023-02-27 11:47:06,221][00394] Avg episode reward: 5.635, avg true_objective: 4.535 +[2023-02-27 11:47:31,462][00394] Replay video saved to /content/train_dir/default_experiment/replay.mp4! +[2023-02-27 11:47:31,608][00394] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json +[2023-02-27 11:47:31,610][00394] Overriding arg 'num_workers' with value 1 passed from command line +[2023-02-27 11:47:31,613][00394] Adding new argument 'no_render'=True that is not in the saved config file! +[2023-02-27 11:47:31,615][00394] Adding new argument 'save_video'=True that is not in the saved config file! +[2023-02-27 11:47:31,617][00394] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! +[2023-02-27 11:47:31,618][00394] Adding new argument 'video_name'=None that is not in the saved config file! +[2023-02-27 11:47:31,620][00394] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! +[2023-02-27 11:47:31,621][00394] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! +[2023-02-27 11:47:31,622][00394] Adding new argument 'push_to_hub'=True that is not in the saved config file! +[2023-02-27 11:47:31,624][00394] Adding new argument 'hf_repository'='Clawoo/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! +[2023-02-27 11:47:31,625][00394] Adding new argument 'policy_index'=0 that is not in the saved config file! +[2023-02-27 11:47:31,626][00394] Adding new argument 'eval_deterministic'=False that is not in the saved config file! +[2023-02-27 11:47:31,628][00394] Adding new argument 'train_script'=None that is not in the saved config file! +[2023-02-27 11:47:31,629][00394] Adding new argument 'enjoy_script'=None that is not in the saved config file! +[2023-02-27 11:47:31,630][00394] Using frameskip 1 and render_action_repeat=4 for evaluation +[2023-02-27 11:47:31,655][00394] RunningMeanStd input shape: (3, 72, 128) +[2023-02-27 11:47:31,657][00394] RunningMeanStd input shape: (1,) +[2023-02-27 11:47:31,674][00394] ConvEncoder: input_channels=3 +[2023-02-27 11:47:31,735][00394] Conv encoder output size: 512 +[2023-02-27 11:47:31,739][00394] Policy head output size: 512 +[2023-02-27 11:47:31,766][00394] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001955_8007680.pth... +[2023-02-27 11:47:32,495][00394] Num frames 100... +[2023-02-27 11:47:32,668][00394] Num frames 200... +[2023-02-27 11:47:32,812][00394] Avg episode rewards: #0: 2.560, true rewards: #0: 2.560 +[2023-02-27 11:47:32,815][00394] Avg episode reward: 2.560, avg true_objective: 2.560 +[2023-02-27 11:47:32,891][00394] Num frames 300... +[2023-02-27 11:47:33,042][00394] Num frames 400... +[2023-02-27 11:47:33,206][00394] Num frames 500... +[2023-02-27 11:47:33,364][00394] Num frames 600... +[2023-02-27 11:47:33,502][00394] Avg episode rewards: #0: 3.860, true rewards: #0: 3.360 +[2023-02-27 11:47:33,504][00394] Avg episode reward: 3.860, avg true_objective: 3.360 +[2023-02-27 11:47:33,540][00394] Num frames 700... +[2023-02-27 11:47:33,669][00394] Num frames 800... +[2023-02-27 11:47:33,793][00394] Num frames 900... +[2023-02-27 11:47:33,907][00394] Num frames 1000... +[2023-02-27 11:47:34,027][00394] Avg episode rewards: #0: 4.520, true rewards: #0: 3.520 +[2023-02-27 11:47:34,034][00394] Avg episode reward: 4.520, avg true_objective: 3.520 +[2023-02-27 11:47:34,085][00394] Num frames 1100... +[2023-02-27 11:47:34,204][00394] Num frames 1200... +[2023-02-27 11:47:34,327][00394] Num frames 1300... +[2023-02-27 11:47:34,443][00394] Num frames 1400... +[2023-02-27 11:47:34,512][00394] Avg episode rewards: #0: 4.275, true rewards: #0: 3.525 +[2023-02-27 11:47:34,513][00394] Avg episode reward: 4.275, avg true_objective: 3.525 +[2023-02-27 11:47:34,624][00394] Num frames 1500... +[2023-02-27 11:47:34,739][00394] Num frames 1600... +[2023-02-27 11:47:34,868][00394] Num frames 1700... +[2023-02-27 11:47:35,036][00394] Avg episode rewards: #0: 4.188, true rewards: #0: 3.588 +[2023-02-27 11:47:35,037][00394] Avg episode reward: 4.188, avg true_objective: 3.588 +[2023-02-27 11:47:35,049][00394] Num frames 1800... +[2023-02-27 11:47:35,168][00394] Num frames 1900... +[2023-02-27 11:47:35,283][00394] Num frames 2000... +[2023-02-27 11:47:35,398][00394] Num frames 2100... +[2023-02-27 11:47:35,541][00394] Avg episode rewards: #0: 4.130, true rewards: #0: 3.630 +[2023-02-27 11:47:35,543][00394] Avg episode reward: 4.130, avg true_objective: 3.630 +[2023-02-27 11:47:35,572][00394] Num frames 2200... +[2023-02-27 11:47:35,696][00394] Num frames 2300... +[2023-02-27 11:47:35,815][00394] Num frames 2400... +[2023-02-27 11:47:35,929][00394] Num frames 2500... +[2023-02-27 11:47:36,062][00394] Avg episode rewards: #0: 4.089, true rewards: #0: 3.660 +[2023-02-27 11:47:36,065][00394] Avg episode reward: 4.089, avg true_objective: 3.660 +[2023-02-27 11:47:36,115][00394] Num frames 2600... +[2023-02-27 11:47:36,230][00394] Num frames 2700... +[2023-02-27 11:47:36,347][00394] Num frames 2800... +[2023-02-27 11:47:36,462][00394] Num frames 2900... +[2023-02-27 11:47:36,570][00394] Avg episode rewards: #0: 4.058, true rewards: #0: 3.682 +[2023-02-27 11:47:36,576][00394] Avg episode reward: 4.058, avg true_objective: 3.682 +[2023-02-27 11:47:36,641][00394] Num frames 3000... +[2023-02-27 11:47:36,765][00394] Num frames 3100... +[2023-02-27 11:47:36,885][00394] Num frames 3200... +[2023-02-27 11:47:36,999][00394] Num frames 3300... +[2023-02-27 11:47:37,090][00394] Avg episode rewards: #0: 4.033, true rewards: #0: 3.700 +[2023-02-27 11:47:37,093][00394] Avg episode reward: 4.033, avg true_objective: 3.700 +[2023-02-27 11:47:37,177][00394] Num frames 3400... +[2023-02-27 11:47:37,294][00394] Num frames 3500... +[2023-02-27 11:47:37,416][00394] Num frames 3600... +[2023-02-27 11:47:37,539][00394] Num frames 3700... +[2023-02-27 11:47:37,612][00394] Avg episode rewards: #0: 4.014, true rewards: #0: 3.714 +[2023-02-27 11:47:37,613][00394] Avg episode reward: 4.014, avg true_objective: 3.714 +[2023-02-27 11:47:58,118][00394] Replay video saved to /content/train_dir/default_experiment/replay.mp4!