[2023-06-19 14:05:40,113][00753] Saving configuration to /content/train_dir/default_experiment/config.json... [2023-06-19 14:05:40,117][00753] Rollout worker 0 uses device cpu [2023-06-19 14:05:40,118][00753] Rollout worker 1 uses device cpu [2023-06-19 14:05:40,119][00753] Rollout worker 2 uses device cpu [2023-06-19 14:05:40,120][00753] Rollout worker 3 uses device cpu [2023-06-19 14:05:40,122][00753] Rollout worker 4 uses device cpu [2023-06-19 14:05:40,123][00753] Rollout worker 5 uses device cpu [2023-06-19 14:05:40,124][00753] Rollout worker 6 uses device cpu [2023-06-19 14:05:40,125][00753] Rollout worker 7 uses device cpu [2023-06-19 14:05:40,277][00753] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-06-19 14:05:40,278][00753] InferenceWorker_p0-w0: min num requests: 2 [2023-06-19 14:05:40,309][00753] Starting all processes... [2023-06-19 14:05:40,310][00753] Starting process learner_proc0 [2023-06-19 14:05:40,361][00753] Starting all processes... [2023-06-19 14:05:40,370][00753] Starting process inference_proc0-0 [2023-06-19 14:05:40,370][00753] Starting process rollout_proc0 [2023-06-19 14:05:40,374][00753] Starting process rollout_proc1 [2023-06-19 14:05:40,374][00753] Starting process rollout_proc2 [2023-06-19 14:05:40,374][00753] Starting process rollout_proc3 [2023-06-19 14:05:40,374][00753] Starting process rollout_proc4 [2023-06-19 14:05:40,374][00753] Starting process rollout_proc5 [2023-06-19 14:05:40,376][00753] Starting process rollout_proc6 [2023-06-19 14:05:40,376][00753] Starting process rollout_proc7 [2023-06-19 14:05:55,850][11471] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-06-19 14:05:55,852][11471] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2023-06-19 14:05:55,902][11471] Num visible devices: 1 [2023-06-19 14:05:55,945][11471] Starting seed is not provided [2023-06-19 14:05:55,945][11471] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-06-19 14:05:55,946][11471] Initializing actor-critic model on device cuda:0 [2023-06-19 14:05:55,947][11471] RunningMeanStd input shape: (3, 72, 128) [2023-06-19 14:05:55,949][11471] RunningMeanStd input shape: (1,) [2023-06-19 14:05:56,018][11471] ConvEncoder: input_channels=3 [2023-06-19 14:05:56,443][11492] Worker 7 uses CPU cores [1] [2023-06-19 14:05:56,483][11487] Worker 2 uses CPU cores [0] [2023-06-19 14:05:56,501][11489] Worker 4 uses CPU cores [0] [2023-06-19 14:05:56,575][11485] Worker 0 uses CPU cores [0] [2023-06-19 14:05:56,628][11491] Worker 6 uses CPU cores [0] [2023-06-19 14:05:56,642][11486] Worker 1 uses CPU cores [1] [2023-06-19 14:05:56,642][11488] Worker 3 uses CPU cores [1] [2023-06-19 14:05:56,668][11484] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-06-19 14:05:56,668][11484] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2023-06-19 14:05:56,683][11490] Worker 5 uses CPU cores [1] [2023-06-19 14:05:56,692][11484] Num visible devices: 1 [2023-06-19 14:05:56,718][11471] Conv encoder output size: 512 [2023-06-19 14:05:56,719][11471] Policy head output size: 512 [2023-06-19 14:05:56,767][11471] Created Actor Critic model with architecture: [2023-06-19 14:05:56,767][11471] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2023-06-19 14:06:00,270][00753] Heartbeat connected on Batcher_0 [2023-06-19 14:06:00,278][00753] Heartbeat connected on InferenceWorker_p0-w0 [2023-06-19 14:06:00,288][00753] Heartbeat connected on RolloutWorker_w0 [2023-06-19 14:06:00,290][00753] Heartbeat connected on RolloutWorker_w1 [2023-06-19 14:06:00,296][00753] Heartbeat connected on RolloutWorker_w2 [2023-06-19 14:06:00,297][00753] Heartbeat connected on RolloutWorker_w3 [2023-06-19 14:06:00,300][00753] Heartbeat connected on RolloutWorker_w4 [2023-06-19 14:06:00,303][00753] Heartbeat connected on RolloutWorker_w5 [2023-06-19 14:06:00,309][00753] Heartbeat connected on RolloutWorker_w6 [2023-06-19 14:06:00,310][00753] Heartbeat connected on RolloutWorker_w7 [2023-06-19 14:06:04,826][11471] Using optimizer [2023-06-19 14:06:04,827][11471] No checkpoints found [2023-06-19 14:06:04,827][11471] Did not load from checkpoint, starting from scratch! [2023-06-19 14:06:04,827][11471] Initialized policy 0 weights for model version 0 [2023-06-19 14:06:04,830][11471] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-06-19 14:06:04,837][11471] LearnerWorker_p0 finished initialization! [2023-06-19 14:06:04,837][00753] Heartbeat connected on LearnerWorker_p0 [2023-06-19 14:06:05,020][11484] RunningMeanStd input shape: (3, 72, 128) [2023-06-19 14:06:05,021][11484] RunningMeanStd input shape: (1,) [2023-06-19 14:06:05,034][11484] ConvEncoder: input_channels=3 [2023-06-19 14:06:05,138][11484] Conv encoder output size: 512 [2023-06-19 14:06:05,139][11484] Policy head output size: 512 [2023-06-19 14:06:05,247][00753] Inference worker 0-0 is ready! [2023-06-19 14:06:05,250][00753] All inference workers are ready! Signal rollout workers to start! [2023-06-19 14:06:05,344][11487] Doom resolution: 160x120, resize resolution: (128, 72) [2023-06-19 14:06:05,349][11489] Doom resolution: 160x120, resize resolution: (128, 72) [2023-06-19 14:06:05,353][11491] Doom resolution: 160x120, resize resolution: (128, 72) [2023-06-19 14:06:05,347][11485] Doom resolution: 160x120, resize resolution: (128, 72) [2023-06-19 14:06:05,409][11488] Doom resolution: 160x120, resize resolution: (128, 72) [2023-06-19 14:06:05,426][11492] Doom resolution: 160x120, resize resolution: (128, 72) [2023-06-19 14:06:05,428][11486] Doom resolution: 160x120, resize resolution: (128, 72) [2023-06-19 14:06:05,413][11490] Doom resolution: 160x120, resize resolution: (128, 72) [2023-06-19 14:06:07,489][11485] Decorrelating experience for 0 frames... [2023-06-19 14:06:07,493][11489] Decorrelating experience for 0 frames... [2023-06-19 14:06:07,495][11491] Decorrelating experience for 0 frames... [2023-06-19 14:06:07,488][11487] Decorrelating experience for 0 frames... [2023-06-19 14:06:07,812][11492] Decorrelating experience for 0 frames... [2023-06-19 14:06:07,813][11486] Decorrelating experience for 0 frames... [2023-06-19 14:06:07,821][11490] Decorrelating experience for 0 frames... [2023-06-19 14:06:08,860][11488] Decorrelating experience for 0 frames... [2023-06-19 14:06:09,153][11486] Decorrelating experience for 32 frames... [2023-06-19 14:06:09,474][11489] Decorrelating experience for 32 frames... [2023-06-19 14:06:09,478][11485] Decorrelating experience for 32 frames... [2023-06-19 14:06:09,532][00753] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-06-19 14:06:09,734][11487] Decorrelating experience for 32 frames... [2023-06-19 14:06:09,741][11491] Decorrelating experience for 32 frames... [2023-06-19 14:06:10,838][11492] Decorrelating experience for 32 frames... [2023-06-19 14:06:10,975][11490] Decorrelating experience for 32 frames... [2023-06-19 14:06:11,104][11488] Decorrelating experience for 32 frames... [2023-06-19 14:06:11,104][11487] Decorrelating experience for 64 frames... [2023-06-19 14:06:12,281][11489] Decorrelating experience for 64 frames... [2023-06-19 14:06:12,438][11485] Decorrelating experience for 64 frames... [2023-06-19 14:06:12,448][11486] Decorrelating experience for 64 frames... [2023-06-19 14:06:12,587][11488] Decorrelating experience for 64 frames... [2023-06-19 14:06:12,708][11487] Decorrelating experience for 96 frames... [2023-06-19 14:06:13,469][11490] Decorrelating experience for 64 frames... [2023-06-19 14:06:13,659][00753] Keyboard interrupt detected in the event loop EvtLoop [Runner_EvtLoop, process=main process 753], exiting... [2023-06-19 14:06:13,666][11471] Stopping Batcher_0... [2023-06-19 14:06:13,667][11471] Loop batcher_evt_loop terminating... [2023-06-19 14:06:13,668][11471] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000000_0.pth... [2023-06-19 14:06:13,667][00753] Runner profile tree view: main_loop: 33.3578 [2023-06-19 14:06:13,670][00753] Collected {0: 0}, FPS: 0.0 [2023-06-19 14:06:13,684][11489] VizDoom game.init() threw an exception SignalException('Signal SIGINT received. ViZDoom instance has been closed.'). Terminate process... [2023-06-19 14:06:13,687][11485] VizDoom game.init() threw an exception SignalException('Signal SIGINT received. ViZDoom instance has been closed.'). Terminate process... [2023-06-19 14:06:13,690][11491] VizDoom game.init() threw an exception SignalException('Signal SIGINT received. ViZDoom instance has been closed.'). Terminate process... [2023-06-19 14:06:13,688][11485] EvtLoop [rollout_proc0_evt_loop, process=rollout_proc0] unhandled exception in slot='init' connected to emitter=Emitter(object_id='Sampler', signal_name='_inference_workers_initialized'), args=() Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/doom_gym.py", line 228, in _game_init self.game.init() vizdoom.vizdoom.SignalException: Signal SIGINT received. ViZDoom instance has been closed. During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/signal_slot/signal_slot.py", line 355, in _process_signal slot_callable(*args) File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/sampling/rollout_worker.py", line 150, in init env_runner.init(self.timing) File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 418, in init self._reset() File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 430, in _reset observations, info = e.reset(seed=seed) # new way of doing seeding since Gym 0.26.0 File "/usr/local/lib/python3.10/dist-packages/gymnasium/core.py", line 414, in reset return self.env.reset(seed=seed, options=options) File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/utils/make_env.py", line 125, in reset obs, info = self.env.reset(**kwargs) File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/utils/make_env.py", line 110, in reset obs, info = self.env.reset(**kwargs) File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/wrappers/scenario_wrappers/gathering_reward_shaping.py", line 30, in reset return self.env.reset(**kwargs) File "/usr/local/lib/python3.10/dist-packages/gymnasium/core.py", line 462, in reset obs, info = self.env.reset(seed=seed, options=options) File "/usr/local/lib/python3.10/dist-packages/sample_factory/envs/env_wrappers.py", line 82, in reset obs, info = self.env.reset(**kwargs) File "/usr/local/lib/python3.10/dist-packages/gymnasium/core.py", line 414, in reset return self.env.reset(seed=seed, options=options) File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/wrappers/multiplayer_stats.py", line 51, in reset return self.env.reset(**kwargs) File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/doom_gym.py", line 323, in reset self._ensure_initialized() File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/doom_gym.py", line 274, in _ensure_initialized self.initialize() File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/doom_gym.py", line 269, in initialize self._game_init() File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/doom_gym.py", line 244, in _game_init raise EnvCriticalError() sample_factory.envs.env_utils.EnvCriticalError [2023-06-19 14:06:13,695][11485] Unhandled exception in evt loop rollout_proc0_evt_loop [2023-06-19 14:06:13,685][11489] EvtLoop [rollout_proc4_evt_loop, process=rollout_proc4] unhandled exception in slot='init' connected to emitter=Emitter(object_id='Sampler', signal_name='_inference_workers_initialized'), args=() Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/doom_gym.py", line 228, in _game_init self.game.init() vizdoom.vizdoom.SignalException: Signal SIGINT received. ViZDoom instance has been closed. During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/signal_slot/signal_slot.py", line 355, in _process_signal slot_callable(*args) File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/sampling/rollout_worker.py", line 150, in init env_runner.init(self.timing) File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 418, in init self._reset() File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 430, in _reset observations, info = e.reset(seed=seed) # new way of doing seeding since Gym 0.26.0 File "/usr/local/lib/python3.10/dist-packages/gymnasium/core.py", line 414, in reset return self.env.reset(seed=seed, options=options) File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/utils/make_env.py", line 125, in reset obs, info = self.env.reset(**kwargs) File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/utils/make_env.py", line 110, in reset obs, info = self.env.reset(**kwargs) File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/wrappers/scenario_wrappers/gathering_reward_shaping.py", line 30, in reset return self.env.reset(**kwargs) File "/usr/local/lib/python3.10/dist-packages/gymnasium/core.py", line 462, in reset obs, info = self.env.reset(seed=seed, options=options) File "/usr/local/lib/python3.10/dist-packages/sample_factory/envs/env_wrappers.py", line 82, in reset obs, info = self.env.reset(**kwargs) File "/usr/local/lib/python3.10/dist-packages/gymnasium/core.py", line 414, in reset return self.env.reset(seed=seed, options=options) File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/wrappers/multiplayer_stats.py", line 51, in reset return self.env.reset(**kwargs) File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/doom_gym.py", line 323, in reset self._ensure_initialized() File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/doom_gym.py", line 274, in _ensure_initialized self.initialize() File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/doom_gym.py", line 269, in initialize self._game_init() File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/doom_gym.py", line 244, in _game_init raise EnvCriticalError() sample_factory.envs.env_utils.EnvCriticalError [2023-06-19 14:06:13,698][11489] Unhandled exception in evt loop rollout_proc4_evt_loop [2023-06-19 14:06:13,691][11491] EvtLoop [rollout_proc6_evt_loop, process=rollout_proc6] unhandled exception in slot='init' connected to emitter=Emitter(object_id='Sampler', signal_name='_inference_workers_initialized'), args=() Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/doom_gym.py", line 228, in _game_init self.game.init() vizdoom.vizdoom.SignalException: Signal SIGINT received. ViZDoom instance has been closed. During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/signal_slot/signal_slot.py", line 355, in _process_signal slot_callable(*args) File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/sampling/rollout_worker.py", line 150, in init env_runner.init(self.timing) File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 418, in init self._reset() File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 430, in _reset observations, info = e.reset(seed=seed) # new way of doing seeding since Gym 0.26.0 File "/usr/local/lib/python3.10/dist-packages/gymnasium/core.py", line 414, in reset return self.env.reset(seed=seed, options=options) File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/utils/make_env.py", line 125, in reset obs, info = self.env.reset(**kwargs) File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/utils/make_env.py", line 110, in reset obs, info = self.env.reset(**kwargs) File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/wrappers/scenario_wrappers/gathering_reward_shaping.py", line 30, in reset return self.env.reset(**kwargs) File "/usr/local/lib/python3.10/dist-packages/gymnasium/core.py", line 462, in reset obs, info = self.env.reset(seed=seed, options=options) File "/usr/local/lib/python3.10/dist-packages/sample_factory/envs/env_wrappers.py", line 82, in reset obs, info = self.env.reset(**kwargs) File "/usr/local/lib/python3.10/dist-packages/gymnasium/core.py", line 414, in reset return self.env.reset(seed=seed, options=options) File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/wrappers/multiplayer_stats.py", line 51, in reset return self.env.reset(**kwargs) File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/doom_gym.py", line 323, in reset self._ensure_initialized() File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/doom_gym.py", line 274, in _ensure_initialized self.initialize() File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/doom_gym.py", line 269, in initialize self._game_init() File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/doom_gym.py", line 244, in _game_init raise EnvCriticalError() sample_factory.envs.env_utils.EnvCriticalError [2023-06-19 14:06:13,700][11491] Unhandled exception in evt loop rollout_proc6_evt_loop [2023-06-19 14:06:13,775][11488] VizDoom game.init() threw an exception SignalException('Signal SIGINT received. ViZDoom instance has been closed.'). Terminate process... [2023-06-19 14:06:13,755][11490] EvtLoop [rollout_proc5_evt_loop, process=rollout_proc5] unhandled exception in slot='init' connected to emitter=Emitter(object_id='Sampler', signal_name='_inference_workers_initialized'), args=() Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/signal_slot/signal_slot.py", line 355, in _process_signal slot_callable(*args) File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/sampling/rollout_worker.py", line 150, in init env_runner.init(self.timing) File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 418, in init self._reset() File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 439, in _reset observations, rew, terminated, truncated, info = e.step(actions) File "/usr/local/lib/python3.10/dist-packages/gymnasium/core.py", line 408, in step return self.env.step(action) File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/utils/make_env.py", line 129, in step obs, rew, terminated, truncated, info = self.env.step(action) File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/utils/make_env.py", line 115, in step obs, rew, terminated, truncated, info = self.env.step(action) File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/wrappers/scenario_wrappers/gathering_reward_shaping.py", line 33, in step observation, reward, terminated, truncated, info = self.env.step(action) File "/usr/local/lib/python3.10/dist-packages/gymnasium/core.py", line 469, in step observation, reward, terminated, truncated, info = self.env.step(action) File "/usr/local/lib/python3.10/dist-packages/sample_factory/envs/env_wrappers.py", line 86, in step obs, reward, terminated, truncated, info = self.env.step(action) File "/usr/local/lib/python3.10/dist-packages/gymnasium/core.py", line 408, in step return self.env.step(action) File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/wrappers/multiplayer_stats.py", line 54, in step obs, reward, terminated, truncated, info = self.env.step(action) File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/doom_gym.py", line 452, in step reward = self.game.make_action(actions_flattened, self.skip_frames) vizdoom.vizdoom.SignalException: Signal SIGINT received. ViZDoom instance has been closed. [2023-06-19 14:06:13,776][11490] Unhandled exception Signal SIGINT received. ViZDoom instance has been closed. in evt loop rollout_proc5_evt_loop [2023-06-19 14:06:13,785][11471] Stopping LearnerWorker_p0... [2023-06-19 14:06:13,786][11471] Loop learner_proc0_evt_loop terminating... [2023-06-19 14:06:13,780][11488] EvtLoop [rollout_proc3_evt_loop, process=rollout_proc3] unhandled exception in slot='init' connected to emitter=Emitter(object_id='Sampler', signal_name='_inference_workers_initialized'), args=() Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/doom_gym.py", line 228, in _game_init self.game.init() vizdoom.vizdoom.SignalException: Signal SIGINT received. ViZDoom instance has been closed. During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/signal_slot/signal_slot.py", line 355, in _process_signal slot_callable(*args) File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/sampling/rollout_worker.py", line 150, in init env_runner.init(self.timing) File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 418, in init self._reset() File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 430, in _reset observations, info = e.reset(seed=seed) # new way of doing seeding since Gym 0.26.0 File "/usr/local/lib/python3.10/dist-packages/gymnasium/core.py", line 414, in reset return self.env.reset(seed=seed, options=options) File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/utils/make_env.py", line 125, in reset obs, info = self.env.reset(**kwargs) File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/utils/make_env.py", line 110, in reset obs, info = self.env.reset(**kwargs) File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/wrappers/scenario_wrappers/gathering_reward_shaping.py", line 30, in reset return self.env.reset(**kwargs) File "/usr/local/lib/python3.10/dist-packages/gymnasium/core.py", line 462, in reset obs, info = self.env.reset(seed=seed, options=options) File "/usr/local/lib/python3.10/dist-packages/sample_factory/envs/env_wrappers.py", line 82, in reset obs, info = self.env.reset(**kwargs) File "/usr/local/lib/python3.10/dist-packages/gymnasium/core.py", line 414, in reset return self.env.reset(seed=seed, options=options) File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/wrappers/multiplayer_stats.py", line 51, in reset return self.env.reset(**kwargs) File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/doom_gym.py", line 323, in reset self._ensure_initialized() File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/doom_gym.py", line 274, in _ensure_initialized self.initialize() File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/doom_gym.py", line 269, in initialize self._game_init() File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/doom_gym.py", line 244, in _game_init raise EnvCriticalError() sample_factory.envs.env_utils.EnvCriticalError [2023-06-19 14:06:13,815][11488] Unhandled exception in evt loop rollout_proc3_evt_loop [2023-06-19 14:06:13,879][11487] Stopping RolloutWorker_w2... [2023-06-19 14:06:13,880][11487] Loop rollout_proc2_evt_loop terminating... [2023-06-19 14:06:14,308][11484] Weights refcount: 2 0 [2023-06-19 14:06:14,312][11484] Stopping InferenceWorker_p0-w0... [2023-06-19 14:06:14,312][11484] Loop inference_proc0-0_evt_loop terminating... [2023-06-19 14:06:15,958][11486] Decorrelating experience for 96 frames... [2023-06-19 14:06:15,961][11492] Decorrelating experience for 64 frames... [2023-06-19 14:06:16,215][11486] Stopping RolloutWorker_w1... [2023-06-19 14:06:16,218][11486] Loop rollout_proc1_evt_loop terminating... [2023-06-19 14:06:17,257][11492] Decorrelating experience for 96 frames... [2023-06-19 14:06:17,355][11492] Stopping RolloutWorker_w7... [2023-06-19 14:06:17,356][11492] Loop rollout_proc7_evt_loop terminating... [2023-06-19 14:11:33,477][00753] Environment doom_basic already registered, overwriting... [2023-06-19 14:11:33,479][00753] Environment doom_two_colors_easy already registered, overwriting... [2023-06-19 14:11:33,481][00753] Environment doom_two_colors_hard already registered, overwriting... [2023-06-19 14:11:33,482][00753] Environment doom_dm already registered, overwriting... [2023-06-19 14:11:33,483][00753] Environment doom_dwango5 already registered, overwriting... [2023-06-19 14:11:33,485][00753] Environment doom_my_way_home_flat_actions already registered, overwriting... [2023-06-19 14:11:33,486][00753] Environment doom_defend_the_center_flat_actions already registered, overwriting... [2023-06-19 14:11:33,488][00753] Environment doom_my_way_home already registered, overwriting... [2023-06-19 14:11:33,489][00753] Environment doom_deadly_corridor already registered, overwriting... [2023-06-19 14:11:33,490][00753] Environment doom_defend_the_center already registered, overwriting... [2023-06-19 14:11:33,492][00753] Environment doom_defend_the_line already registered, overwriting... [2023-06-19 14:11:33,493][00753] Environment doom_health_gathering already registered, overwriting... [2023-06-19 14:11:33,494][00753] Environment doom_health_gathering_supreme already registered, overwriting... [2023-06-19 14:11:33,495][00753] Environment doom_battle already registered, overwriting... [2023-06-19 14:11:33,496][00753] Environment doom_battle2 already registered, overwriting... [2023-06-19 14:11:33,498][00753] Environment doom_duel_bots already registered, overwriting... [2023-06-19 14:11:33,499][00753] Environment doom_deathmatch_bots already registered, overwriting... [2023-06-19 14:11:33,500][00753] Environment doom_duel already registered, overwriting... [2023-06-19 14:11:33,501][00753] Environment doom_deathmatch_full already registered, overwriting... [2023-06-19 14:11:33,503][00753] Environment doom_benchmark already registered, overwriting... [2023-06-19 14:11:33,504][00753] register_encoder_factory: [2023-06-19 14:11:33,527][00753] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2023-06-19 14:11:33,541][00753] Experiment dir /content/train_dir/default_experiment already exists! [2023-06-19 14:11:33,542][00753] Resuming existing experiment from /content/train_dir/default_experiment... [2023-06-19 14:11:33,544][00753] Weights and Biases integration disabled [2023-06-19 14:11:33,548][00753] Environment var CUDA_VISIBLE_DEVICES is 0 [2023-06-19 14:11:35,485][00753] Starting experiment with the following configuration: help=False algo=APPO env=doom_health_gathering_supreme experiment=default_experiment train_dir=/content/train_dir restart_behavior=resume device=gpu seed=None num_policies=1 async_rl=True serial_mode=False batched_sampling=False num_batches_to_accumulate=2 worker_num_splits=2 policy_workers_per_policy=1 max_policy_lag=1000 num_workers=8 num_envs_per_worker=4 batch_size=1024 num_batches_per_epoch=1 num_epochs=1 rollout=32 recurrence=32 shuffle_minibatches=False gamma=0.99 reward_scale=1.0 reward_clip=1000.0 value_bootstrap=False normalize_returns=True exploration_loss_coeff=0.001 value_loss_coeff=0.5 kl_loss_coeff=0.0 exploration_loss=symmetric_kl gae_lambda=0.95 ppo_clip_ratio=0.1 ppo_clip_value=0.2 with_vtrace=False vtrace_rho=1.0 vtrace_c=1.0 optimizer=adam adam_eps=1e-06 adam_beta1=0.9 adam_beta2=0.999 max_grad_norm=4.0 learning_rate=0.0001 lr_schedule=constant lr_schedule_kl_threshold=0.008 lr_adaptive_min=1e-06 lr_adaptive_max=0.01 obs_subtract_mean=0.0 obs_scale=255.0 normalize_input=True normalize_input_keys=None decorrelate_experience_max_seconds=0 decorrelate_envs_on_one_worker=True actor_worker_gpus=[] set_workers_cpu_affinity=True force_envs_single_thread=False default_niceness=0 log_to_file=True experiment_summaries_interval=10 flush_summaries_interval=30 stats_avg=100 summaries_use_frameskip=True heartbeat_interval=20 heartbeat_reporting_interval=600 train_for_env_steps=4000000 train_for_seconds=10000000000 save_every_sec=120 keep_checkpoints=2 load_checkpoint_kind=latest save_milestones_sec=-1 save_best_every_sec=5 save_best_metric=reward save_best_after=100000 benchmark=False encoder_mlp_layers=[512, 512] encoder_conv_architecture=convnet_simple encoder_conv_mlp_layers=[512] use_rnn=True rnn_size=512 rnn_type=gru rnn_num_layers=1 decoder_mlp_layers=[] nonlinearity=elu policy_initialization=orthogonal policy_init_gain=1.0 actor_critic_share_weights=True adaptive_stddev=True continuous_tanh_scale=0.0 initial_stddev=1.0 use_env_info_cache=False env_gpu_actions=False env_gpu_observations=True env_frameskip=4 env_framestack=1 pixel_format=CHW use_record_episode_statistics=False with_wandb=False wandb_user=None wandb_project=sample_factory wandb_group=None wandb_job_type=SF wandb_tags=[] with_pbt=False pbt_mix_policies_in_one_env=True pbt_period_env_steps=5000000 pbt_start_mutation=20000000 pbt_replace_fraction=0.3 pbt_mutation_rate=0.15 pbt_replace_reward_gap=0.1 pbt_replace_reward_gap_absolute=1e-06 pbt_optimize_gamma=False pbt_target_objective=true_objective pbt_perturb_min=1.1 pbt_perturb_max=1.5 num_agents=-1 num_humans=0 num_bots=-1 start_bot_difficulty=None timelimit=None res_w=128 res_h=72 wide_aspect_ratio=False eval_env_frameskip=1 fps=35 command_line=--env=doom_health_gathering_supreme --num_workers=8 --num_envs_per_worker=4 --train_for_env_steps=4000000 cli_args={'env': 'doom_health_gathering_supreme', 'num_workers': 8, 'num_envs_per_worker': 4, 'train_for_env_steps': 4000000} git_hash=unknown git_repo_name=not a git repository [2023-06-19 14:11:35,488][00753] Saving configuration to /content/train_dir/default_experiment/config.json... [2023-06-19 14:11:35,495][00753] Rollout worker 0 uses device cpu [2023-06-19 14:11:35,497][00753] Rollout worker 1 uses device cpu [2023-06-19 14:11:35,498][00753] Rollout worker 2 uses device cpu [2023-06-19 14:11:35,499][00753] Rollout worker 3 uses device cpu [2023-06-19 14:11:35,501][00753] Rollout worker 4 uses device cpu [2023-06-19 14:11:35,502][00753] Rollout worker 5 uses device cpu [2023-06-19 14:11:35,505][00753] Rollout worker 6 uses device cpu [2023-06-19 14:11:35,507][00753] Rollout worker 7 uses device cpu [2023-06-19 14:11:35,600][00753] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-06-19 14:11:35,602][00753] InferenceWorker_p0-w0: min num requests: 2 [2023-06-19 14:11:35,631][00753] Starting all processes... [2023-06-19 14:11:35,634][00753] Starting process learner_proc0 [2023-06-19 14:11:35,681][00753] Starting all processes... [2023-06-19 14:11:35,686][00753] Starting process inference_proc0-0 [2023-06-19 14:11:35,688][00753] Starting process rollout_proc0 [2023-06-19 14:11:35,704][00753] Starting process rollout_proc1 [2023-06-19 14:11:35,705][00753] Starting process rollout_proc2 [2023-06-19 14:11:35,705][00753] Starting process rollout_proc3 [2023-06-19 14:11:35,705][00753] Starting process rollout_proc4 [2023-06-19 14:11:35,705][00753] Starting process rollout_proc5 [2023-06-19 14:11:35,705][00753] Starting process rollout_proc6 [2023-06-19 14:11:35,705][00753] Starting process rollout_proc7 [2023-06-19 14:11:50,934][15729] Worker 3 uses CPU cores [1] [2023-06-19 14:11:51,020][15712] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-06-19 14:11:51,020][15712] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2023-06-19 14:11:51,025][15733] Worker 7 uses CPU cores [1] [2023-06-19 14:11:51,048][15727] Worker 1 uses CPU cores [1] [2023-06-19 14:11:51,063][15712] Num visible devices: 1 [2023-06-19 14:11:51,088][15731] Worker 4 uses CPU cores [0] [2023-06-19 14:11:51,088][15730] Worker 5 uses CPU cores [1] [2023-06-19 14:11:51,104][15726] Worker 0 uses CPU cores [0] [2023-06-19 14:11:51,107][15712] Starting seed is not provided [2023-06-19 14:11:51,107][15712] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-06-19 14:11:51,108][15712] Initializing actor-critic model on device cuda:0 [2023-06-19 14:11:51,109][15712] RunningMeanStd input shape: (3, 72, 128) [2023-06-19 14:11:51,111][15712] RunningMeanStd input shape: (1,) [2023-06-19 14:11:51,121][15728] Worker 2 uses CPU cores [0] [2023-06-19 14:11:51,144][15712] ConvEncoder: input_channels=3 [2023-06-19 14:11:51,177][15725] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-06-19 14:11:51,178][15725] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2023-06-19 14:11:51,195][15732] Worker 6 uses CPU cores [0] [2023-06-19 14:11:51,209][15725] Num visible devices: 1 [2023-06-19 14:11:51,306][15712] Conv encoder output size: 512 [2023-06-19 14:11:51,306][15712] Policy head output size: 512 [2023-06-19 14:11:51,320][15712] Created Actor Critic model with architecture: [2023-06-19 14:11:51,320][15712] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2023-06-19 14:11:53,852][15712] Using optimizer [2023-06-19 14:11:53,854][15712] No checkpoints found [2023-06-19 14:11:53,854][15712] Did not load from checkpoint, starting from scratch! [2023-06-19 14:11:53,855][15712] Initialized policy 0 weights for model version 0 [2023-06-19 14:11:53,863][15712] LearnerWorker_p0 finished initialization! [2023-06-19 14:11:53,863][15712] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-06-19 14:11:54,093][15725] RunningMeanStd input shape: (3, 72, 128) [2023-06-19 14:11:54,094][15725] RunningMeanStd input shape: (1,) [2023-06-19 14:11:54,113][15725] ConvEncoder: input_channels=3 [2023-06-19 14:11:54,292][15725] Conv encoder output size: 512 [2023-06-19 14:11:54,293][15725] Policy head output size: 512 [2023-06-19 14:11:54,380][00753] Inference worker 0-0 is ready! [2023-06-19 14:11:54,382][00753] All inference workers are ready! Signal rollout workers to start! [2023-06-19 14:11:54,493][15728] Doom resolution: 160x120, resize resolution: (128, 72) [2023-06-19 14:11:54,500][15732] Doom resolution: 160x120, resize resolution: (128, 72) [2023-06-19 14:11:54,502][15731] Doom resolution: 160x120, resize resolution: (128, 72) [2023-06-19 14:11:54,504][15726] Doom resolution: 160x120, resize resolution: (128, 72) [2023-06-19 14:11:54,571][15729] Doom resolution: 160x120, resize resolution: (128, 72) [2023-06-19 14:11:54,573][15733] Doom resolution: 160x120, resize resolution: (128, 72) [2023-06-19 14:11:54,575][15727] Doom resolution: 160x120, resize resolution: (128, 72) [2023-06-19 14:11:54,564][15730] Doom resolution: 160x120, resize resolution: (128, 72) [2023-06-19 14:11:55,596][00753] Heartbeat connected on LearnerWorker_p0 [2023-06-19 14:11:55,601][00753] Heartbeat connected on Batcher_0 [2023-06-19 14:11:55,630][00753] Heartbeat connected on InferenceWorker_p0-w0 [2023-06-19 14:11:56,353][15729] Decorrelating experience for 0 frames... [2023-06-19 14:11:56,364][15733] Decorrelating experience for 0 frames... [2023-06-19 14:11:56,773][15728] Decorrelating experience for 0 frames... [2023-06-19 14:11:56,775][15731] Decorrelating experience for 0 frames... [2023-06-19 14:11:56,778][15726] Decorrelating experience for 0 frames... [2023-06-19 14:11:56,785][15732] Decorrelating experience for 0 frames... [2023-06-19 14:11:57,456][15733] Decorrelating experience for 32 frames... [2023-06-19 14:11:58,310][15731] Decorrelating experience for 32 frames... [2023-06-19 14:11:58,375][15732] Decorrelating experience for 32 frames... [2023-06-19 14:11:58,522][15729] Decorrelating experience for 32 frames... [2023-06-19 14:11:58,542][15727] Decorrelating experience for 0 frames... [2023-06-19 14:11:58,547][15730] Decorrelating experience for 0 frames... [2023-06-19 14:11:58,549][00753] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-06-19 14:11:59,421][15728] Decorrelating experience for 32 frames... [2023-06-19 14:11:59,831][15733] Decorrelating experience for 64 frames... [2023-06-19 14:11:59,840][15730] Decorrelating experience for 32 frames... [2023-06-19 14:12:00,246][15726] Decorrelating experience for 32 frames... [2023-06-19 14:12:00,436][15731] Decorrelating experience for 64 frames... [2023-06-19 14:12:01,005][15728] Decorrelating experience for 64 frames... [2023-06-19 14:12:01,145][15732] Decorrelating experience for 64 frames... [2023-06-19 14:12:01,148][15727] Decorrelating experience for 32 frames... [2023-06-19 14:12:01,361][15729] Decorrelating experience for 64 frames... [2023-06-19 14:12:01,535][15730] Decorrelating experience for 64 frames... [2023-06-19 14:12:02,188][15728] Decorrelating experience for 96 frames... [2023-06-19 14:12:02,238][15726] Decorrelating experience for 64 frames... [2023-06-19 14:12:02,287][15727] Decorrelating experience for 64 frames... [2023-06-19 14:12:02,355][15729] Decorrelating experience for 96 frames... [2023-06-19 14:12:02,424][00753] Heartbeat connected on RolloutWorker_w2 [2023-06-19 14:12:02,577][00753] Heartbeat connected on RolloutWorker_w3 [2023-06-19 14:12:03,164][15731] Decorrelating experience for 96 frames... [2023-06-19 14:12:03,436][00753] Heartbeat connected on RolloutWorker_w4 [2023-06-19 14:12:03,549][00753] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 51.2. Samples: 256. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-06-19 14:12:03,554][00753] Avg episode reward: [(0, '0.853')] [2023-06-19 14:12:04,017][15730] Decorrelating experience for 96 frames... [2023-06-19 14:12:04,337][15727] Decorrelating experience for 96 frames... [2023-06-19 14:12:04,383][15732] Decorrelating experience for 96 frames... [2023-06-19 14:12:04,469][00753] Heartbeat connected on RolloutWorker_w5 [2023-06-19 14:12:04,512][15726] Decorrelating experience for 96 frames... [2023-06-19 14:12:04,731][00753] Heartbeat connected on RolloutWorker_w6 [2023-06-19 14:12:04,799][00753] Heartbeat connected on RolloutWorker_w1 [2023-06-19 14:12:04,819][00753] Heartbeat connected on RolloutWorker_w0 [2023-06-19 14:12:05,781][15733] Decorrelating experience for 96 frames... [2023-06-19 14:12:06,301][00753] Heartbeat connected on RolloutWorker_w7 [2023-06-19 14:12:06,935][15712] Signal inference workers to stop experience collection... [2023-06-19 14:12:06,991][15725] InferenceWorker_p0-w0: stopping experience collection [2023-06-19 14:12:08,549][00753] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 220.6. Samples: 2206. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-06-19 14:12:08,558][00753] Avg episode reward: [(0, '2.382')] [2023-06-19 14:12:11,552][15712] Signal inference workers to resume experience collection... [2023-06-19 14:12:11,552][15725] InferenceWorker_p0-w0: resuming experience collection [2023-06-19 14:12:13,555][00753] Fps is (10 sec: 409.3, 60 sec: 273.0, 300 sec: 273.0). Total num frames: 4096. Throughput: 0: 197.8. Samples: 2968. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-06-19 14:12:13,558][00753] Avg episode reward: [(0, '2.520')] [2023-06-19 14:12:18,549][00753] Fps is (10 sec: 2048.0, 60 sec: 1024.0, 300 sec: 1024.0). Total num frames: 20480. Throughput: 0: 317.7. Samples: 6354. Policy #0 lag: (min: 0.0, avg: 0.8, max: 3.0) [2023-06-19 14:12:18,551][00753] Avg episode reward: [(0, '3.413')] [2023-06-19 14:12:23,173][15725] Updated weights for policy 0, policy_version 10 (0.0012) [2023-06-19 14:12:23,549][00753] Fps is (10 sec: 3688.5, 60 sec: 1638.4, 300 sec: 1638.4). Total num frames: 40960. Throughput: 0: 350.8. Samples: 8770. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-06-19 14:12:23,552][00753] Avg episode reward: [(0, '4.084')] [2023-06-19 14:12:28,549][00753] Fps is (10 sec: 4096.1, 60 sec: 2048.0, 300 sec: 2048.0). Total num frames: 61440. Throughput: 0: 494.1. Samples: 14822. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-06-19 14:12:28,551][00753] Avg episode reward: [(0, '4.548')] [2023-06-19 14:12:33,554][00753] Fps is (10 sec: 3684.6, 60 sec: 2223.2, 300 sec: 2223.2). Total num frames: 77824. Throughput: 0: 586.4. Samples: 20526. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-06-19 14:12:33,558][00753] Avg episode reward: [(0, '4.517')] [2023-06-19 14:12:34,205][15725] Updated weights for policy 0, policy_version 20 (0.0017) [2023-06-19 14:12:38,549][00753] Fps is (10 sec: 3276.8, 60 sec: 2355.2, 300 sec: 2355.2). Total num frames: 94208. Throughput: 0: 565.2. Samples: 22610. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-06-19 14:12:38,551][00753] Avg episode reward: [(0, '4.478')] [2023-06-19 14:12:43,549][00753] Fps is (10 sec: 3278.5, 60 sec: 2457.6, 300 sec: 2457.6). Total num frames: 110592. Throughput: 0: 612.8. Samples: 27578. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-06-19 14:12:43,551][00753] Avg episode reward: [(0, '4.381')] [2023-06-19 14:12:43,559][15712] Saving new best policy, reward=4.381! [2023-06-19 14:12:45,633][15725] Updated weights for policy 0, policy_version 30 (0.0021) [2023-06-19 14:12:48,549][00753] Fps is (10 sec: 4095.9, 60 sec: 2703.3, 300 sec: 2703.3). Total num frames: 135168. Throughput: 0: 758.1. Samples: 34370. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-06-19 14:12:48,552][00753] Avg episode reward: [(0, '4.541')] [2023-06-19 14:12:48,556][15712] Saving new best policy, reward=4.541! [2023-06-19 14:12:53,550][00753] Fps is (10 sec: 4095.3, 60 sec: 2755.4, 300 sec: 2755.4). Total num frames: 151552. Throughput: 0: 781.9. Samples: 37392. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-06-19 14:12:53,557][00753] Avg episode reward: [(0, '4.502')] [2023-06-19 14:12:57,335][15725] Updated weights for policy 0, policy_version 40 (0.0021) [2023-06-19 14:12:58,549][00753] Fps is (10 sec: 2867.2, 60 sec: 2730.7, 300 sec: 2730.7). Total num frames: 163840. Throughput: 0: 857.1. Samples: 41534. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-06-19 14:12:58,556][00753] Avg episode reward: [(0, '4.432')] [2023-06-19 14:13:03,549][00753] Fps is (10 sec: 3277.3, 60 sec: 3072.0, 300 sec: 2835.7). Total num frames: 184320. Throughput: 0: 901.1. Samples: 46902. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-06-19 14:13:03,554][00753] Avg episode reward: [(0, '4.480')] [2023-06-19 14:13:07,653][15725] Updated weights for policy 0, policy_version 50 (0.0016) [2023-06-19 14:13:08,549][00753] Fps is (10 sec: 4096.1, 60 sec: 3413.3, 300 sec: 2925.7). Total num frames: 204800. Throughput: 0: 921.5. Samples: 50238. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-06-19 14:13:08,551][00753] Avg episode reward: [(0, '4.520')] [2023-06-19 14:13:13,554][00753] Fps is (10 sec: 4093.9, 60 sec: 3686.5, 300 sec: 3003.5). Total num frames: 225280. Throughput: 0: 923.6. Samples: 56388. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-06-19 14:13:13,557][00753] Avg episode reward: [(0, '4.386')] [2023-06-19 14:13:18,552][00753] Fps is (10 sec: 3275.8, 60 sec: 3618.0, 300 sec: 2969.5). Total num frames: 237568. Throughput: 0: 893.6. Samples: 60738. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-06-19 14:13:18,554][00753] Avg episode reward: [(0, '4.350')] [2023-06-19 14:13:20,124][15725] Updated weights for policy 0, policy_version 60 (0.0023) [2023-06-19 14:13:23,549][00753] Fps is (10 sec: 3278.5, 60 sec: 3618.1, 300 sec: 3035.9). Total num frames: 258048. Throughput: 0: 898.3. Samples: 63032. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-06-19 14:13:23,551][00753] Avg episode reward: [(0, '4.358')] [2023-06-19 14:13:28,549][00753] Fps is (10 sec: 4507.0, 60 sec: 3686.4, 300 sec: 3140.3). Total num frames: 282624. Throughput: 0: 940.1. Samples: 69884. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-06-19 14:13:28,551][00753] Avg episode reward: [(0, '4.463')] [2023-06-19 14:13:29,476][15725] Updated weights for policy 0, policy_version 70 (0.0020) [2023-06-19 14:13:33,549][00753] Fps is (10 sec: 4096.0, 60 sec: 3686.7, 300 sec: 3147.5). Total num frames: 299008. Throughput: 0: 922.1. Samples: 75864. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-06-19 14:13:33,552][00753] Avg episode reward: [(0, '4.633')] [2023-06-19 14:13:33,566][15712] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000073_299008.pth... [2023-06-19 14:13:33,703][15712] Saving new best policy, reward=4.633! [2023-06-19 14:13:38,549][00753] Fps is (10 sec: 2867.2, 60 sec: 3618.1, 300 sec: 3113.0). Total num frames: 311296. Throughput: 0: 898.8. Samples: 77836. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2023-06-19 14:13:38,555][00753] Avg episode reward: [(0, '4.622')] [2023-06-19 14:13:42,437][15725] Updated weights for policy 0, policy_version 80 (0.0028) [2023-06-19 14:13:43,549][00753] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3159.8). Total num frames: 331776. Throughput: 0: 910.3. Samples: 82498. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-06-19 14:13:43,556][00753] Avg episode reward: [(0, '4.430')] [2023-06-19 14:13:48,549][00753] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3202.3). Total num frames: 352256. Throughput: 0: 944.7. Samples: 89412. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-06-19 14:13:48,551][00753] Avg episode reward: [(0, '4.335')] [2023-06-19 14:13:51,112][15725] Updated weights for policy 0, policy_version 90 (0.0023) [2023-06-19 14:13:53,549][00753] Fps is (10 sec: 4096.0, 60 sec: 3686.5, 300 sec: 3241.2). Total num frames: 372736. Throughput: 0: 946.4. Samples: 92826. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-06-19 14:13:53,551][00753] Avg episode reward: [(0, '4.322')] [2023-06-19 14:13:58,553][00753] Fps is (10 sec: 3684.9, 60 sec: 3754.4, 300 sec: 3242.6). Total num frames: 389120. Throughput: 0: 905.0. Samples: 97110. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-06-19 14:13:58,556][00753] Avg episode reward: [(0, '4.420')] [2023-06-19 14:14:03,549][00753] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3244.0). Total num frames: 405504. Throughput: 0: 919.6. Samples: 102116. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-06-19 14:14:03,553][00753] Avg episode reward: [(0, '4.471')] [2023-06-19 14:14:03,967][15725] Updated weights for policy 0, policy_version 100 (0.0015) [2023-06-19 14:14:08,549][00753] Fps is (10 sec: 4097.7, 60 sec: 3754.7, 300 sec: 3308.3). Total num frames: 430080. Throughput: 0: 945.2. Samples: 105566. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-06-19 14:14:08,556][00753] Avg episode reward: [(0, '4.427')] [2023-06-19 14:14:13,551][00753] Fps is (10 sec: 4095.2, 60 sec: 3686.6, 300 sec: 3307.1). Total num frames: 446464. Throughput: 0: 939.4. Samples: 112158. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-06-19 14:14:13,562][00753] Avg episode reward: [(0, '4.576')] [2023-06-19 14:14:13,795][15725] Updated weights for policy 0, policy_version 110 (0.0016) [2023-06-19 14:14:18,556][00753] Fps is (10 sec: 3274.5, 60 sec: 3754.4, 300 sec: 3305.9). Total num frames: 462848. Throughput: 0: 903.6. Samples: 116534. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-06-19 14:14:18,558][00753] Avg episode reward: [(0, '4.581')] [2023-06-19 14:14:23,549][00753] Fps is (10 sec: 3277.5, 60 sec: 3686.4, 300 sec: 3305.0). Total num frames: 479232. Throughput: 0: 907.4. Samples: 118670. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2023-06-19 14:14:23,558][00753] Avg episode reward: [(0, '4.547')] [2023-06-19 14:14:25,764][15725] Updated weights for policy 0, policy_version 120 (0.0036) [2023-06-19 14:14:28,549][00753] Fps is (10 sec: 4098.9, 60 sec: 3686.4, 300 sec: 3358.7). Total num frames: 503808. Throughput: 0: 952.0. Samples: 125338. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-06-19 14:14:28,551][00753] Avg episode reward: [(0, '4.435')] [2023-06-19 14:14:33,549][00753] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3382.5). Total num frames: 524288. Throughput: 0: 936.6. Samples: 131560. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-06-19 14:14:33,552][00753] Avg episode reward: [(0, '4.460')] [2023-06-19 14:14:36,528][15725] Updated weights for policy 0, policy_version 130 (0.0014) [2023-06-19 14:14:38,549][00753] Fps is (10 sec: 3276.6, 60 sec: 3754.6, 300 sec: 3353.6). Total num frames: 536576. Throughput: 0: 907.5. Samples: 133666. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-06-19 14:14:38,555][00753] Avg episode reward: [(0, '4.480')] [2023-06-19 14:14:43,549][00753] Fps is (10 sec: 2867.1, 60 sec: 3686.4, 300 sec: 3351.3). Total num frames: 552960. Throughput: 0: 910.7. Samples: 138088. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-06-19 14:14:43,556][00753] Avg episode reward: [(0, '4.674')] [2023-06-19 14:14:43,567][15712] Saving new best policy, reward=4.674! [2023-06-19 14:14:47,558][15725] Updated weights for policy 0, policy_version 140 (0.0031) [2023-06-19 14:14:48,549][00753] Fps is (10 sec: 4096.2, 60 sec: 3754.7, 300 sec: 3397.3). Total num frames: 577536. Throughput: 0: 951.2. Samples: 144920. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-06-19 14:14:48,551][00753] Avg episode reward: [(0, '4.762')] [2023-06-19 14:14:48,556][15712] Saving new best policy, reward=4.762! [2023-06-19 14:14:53,549][00753] Fps is (10 sec: 4505.7, 60 sec: 3754.7, 300 sec: 3417.2). Total num frames: 598016. Throughput: 0: 948.8. Samples: 148264. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-06-19 14:14:53,552][00753] Avg episode reward: [(0, '4.571')] [2023-06-19 14:14:58,549][00753] Fps is (10 sec: 3276.8, 60 sec: 3686.7, 300 sec: 3390.6). Total num frames: 610304. Throughput: 0: 903.7. Samples: 152822. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-06-19 14:14:58,551][00753] Avg episode reward: [(0, '4.736')] [2023-06-19 14:14:59,439][15725] Updated weights for policy 0, policy_version 150 (0.0018) [2023-06-19 14:15:03,549][00753] Fps is (10 sec: 2867.2, 60 sec: 3686.4, 300 sec: 3387.5). Total num frames: 626688. Throughput: 0: 910.6. Samples: 157506. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-06-19 14:15:03,551][00753] Avg episode reward: [(0, '4.615')] [2023-06-19 14:15:08,549][00753] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3427.7). Total num frames: 651264. Throughput: 0: 940.6. Samples: 160996. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-06-19 14:15:08,556][00753] Avg episode reward: [(0, '4.748')] [2023-06-19 14:15:09,308][15725] Updated weights for policy 0, policy_version 160 (0.0012) [2023-06-19 14:15:13,549][00753] Fps is (10 sec: 4505.6, 60 sec: 3754.8, 300 sec: 3444.8). Total num frames: 671744. Throughput: 0: 943.2. Samples: 167780. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-06-19 14:15:13,553][00753] Avg episode reward: [(0, '4.610')] [2023-06-19 14:15:18,549][00753] Fps is (10 sec: 3276.8, 60 sec: 3686.8, 300 sec: 3420.2). Total num frames: 684032. Throughput: 0: 901.4. Samples: 172124. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-06-19 14:15:18,551][00753] Avg episode reward: [(0, '4.361')] [2023-06-19 14:15:21,959][15725] Updated weights for policy 0, policy_version 170 (0.0021) [2023-06-19 14:15:23,549][00753] Fps is (10 sec: 2867.2, 60 sec: 3686.4, 300 sec: 3416.7). Total num frames: 700416. Throughput: 0: 901.8. Samples: 174248. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-06-19 14:15:23,555][00753] Avg episode reward: [(0, '4.512')] [2023-06-19 14:15:28,549][00753] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3452.3). Total num frames: 724992. Throughput: 0: 945.1. Samples: 180618. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-06-19 14:15:28,556][00753] Avg episode reward: [(0, '4.620')] [2023-06-19 14:15:31,122][15725] Updated weights for policy 0, policy_version 180 (0.0019) [2023-06-19 14:15:33,549][00753] Fps is (10 sec: 4505.6, 60 sec: 3686.4, 300 sec: 3467.3). Total num frames: 745472. Throughput: 0: 935.0. Samples: 186994. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-06-19 14:15:33,558][00753] Avg episode reward: [(0, '4.729')] [2023-06-19 14:15:33,567][15712] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000182_745472.pth... [2023-06-19 14:15:38,549][00753] Fps is (10 sec: 3276.6, 60 sec: 3686.4, 300 sec: 3444.4). Total num frames: 757760. Throughput: 0: 907.2. Samples: 189088. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-06-19 14:15:38,552][00753] Avg episode reward: [(0, '4.905')] [2023-06-19 14:15:38,556][15712] Saving new best policy, reward=4.905! [2023-06-19 14:15:43,549][00753] Fps is (10 sec: 2867.2, 60 sec: 3686.4, 300 sec: 3440.6). Total num frames: 774144. Throughput: 0: 900.0. Samples: 193322. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-06-19 14:15:43,554][00753] Avg episode reward: [(0, '4.924')] [2023-06-19 14:15:43,565][15712] Saving new best policy, reward=4.924! [2023-06-19 14:15:44,200][15725] Updated weights for policy 0, policy_version 190 (0.0025) [2023-06-19 14:15:48,549][00753] Fps is (10 sec: 3686.6, 60 sec: 3618.1, 300 sec: 3454.9). Total num frames: 794624. Throughput: 0: 944.5. Samples: 200008. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-06-19 14:15:48,551][00753] Avg episode reward: [(0, '4.913')] [2023-06-19 14:15:53,549][00753] Fps is (10 sec: 4096.1, 60 sec: 3618.1, 300 sec: 3468.5). Total num frames: 815104. Throughput: 0: 941.1. Samples: 203346. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2023-06-19 14:15:53,551][00753] Avg episode reward: [(0, '4.802')] [2023-06-19 14:15:53,776][15725] Updated weights for policy 0, policy_version 200 (0.0026) [2023-06-19 14:15:58,549][00753] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3464.5). Total num frames: 831488. Throughput: 0: 893.8. Samples: 208000. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-06-19 14:15:58,551][00753] Avg episode reward: [(0, '4.682')] [2023-06-19 14:16:03,549][00753] Fps is (10 sec: 2867.2, 60 sec: 3618.1, 300 sec: 3444.0). Total num frames: 843776. Throughput: 0: 897.2. Samples: 212500. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2023-06-19 14:16:03,551][00753] Avg episode reward: [(0, '4.617')] [2023-06-19 14:16:06,146][15725] Updated weights for policy 0, policy_version 210 (0.0031) [2023-06-19 14:16:08,549][00753] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3473.4). Total num frames: 868352. Throughput: 0: 926.1. Samples: 215922. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-06-19 14:16:08,552][00753] Avg episode reward: [(0, '4.709')] [2023-06-19 14:16:13,551][00753] Fps is (10 sec: 4504.7, 60 sec: 3618.0, 300 sec: 3485.6). Total num frames: 888832. Throughput: 0: 933.7. Samples: 222636. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-06-19 14:16:13,554][00753] Avg episode reward: [(0, '4.876')] [2023-06-19 14:16:16,872][15725] Updated weights for policy 0, policy_version 220 (0.0039) [2023-06-19 14:16:18,549][00753] Fps is (10 sec: 3686.3, 60 sec: 3686.4, 300 sec: 3481.6). Total num frames: 905216. Throughput: 0: 887.2. Samples: 226920. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-06-19 14:16:18,551][00753] Avg episode reward: [(0, '4.941')] [2023-06-19 14:16:18,554][15712] Saving new best policy, reward=4.941! [2023-06-19 14:16:23,549][00753] Fps is (10 sec: 2867.8, 60 sec: 3618.1, 300 sec: 3462.3). Total num frames: 917504. Throughput: 0: 886.1. Samples: 228962. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2023-06-19 14:16:23,551][00753] Avg episode reward: [(0, '4.965')] [2023-06-19 14:16:23,571][15712] Saving new best policy, reward=4.965! [2023-06-19 14:16:28,549][00753] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3474.0). Total num frames: 937984. Throughput: 0: 906.7. Samples: 234122. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2023-06-19 14:16:28,551][00753] Avg episode reward: [(0, '5.020')] [2023-06-19 14:16:28,554][15712] Saving new best policy, reward=5.020! [2023-06-19 14:16:29,550][15725] Updated weights for policy 0, policy_version 230 (0.0033) [2023-06-19 14:16:33,552][00753] Fps is (10 sec: 4094.8, 60 sec: 3549.7, 300 sec: 3485.3). Total num frames: 958464. Throughput: 0: 898.8. Samples: 240456. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-06-19 14:16:33,555][00753] Avg episode reward: [(0, '4.905')] [2023-06-19 14:16:38,549][00753] Fps is (10 sec: 3686.5, 60 sec: 3618.2, 300 sec: 3481.6). Total num frames: 974848. Throughput: 0: 873.1. Samples: 242636. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-06-19 14:16:38,553][00753] Avg episode reward: [(0, '4.889')] [2023-06-19 14:16:41,209][15725] Updated weights for policy 0, policy_version 240 (0.0027) [2023-06-19 14:16:43,551][00753] Fps is (10 sec: 2867.5, 60 sec: 3549.7, 300 sec: 3463.6). Total num frames: 987136. Throughput: 0: 868.8. Samples: 247098. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-06-19 14:16:43,556][00753] Avg episode reward: [(0, '4.894')] [2023-06-19 14:16:48,549][00753] Fps is (10 sec: 3686.2, 60 sec: 3618.1, 300 sec: 3488.7). Total num frames: 1011712. Throughput: 0: 912.6. Samples: 253566. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-06-19 14:16:48,554][00753] Avg episode reward: [(0, '5.113')] [2023-06-19 14:16:48,562][15712] Saving new best policy, reward=5.113! [2023-06-19 14:16:51,108][15725] Updated weights for policy 0, policy_version 250 (0.0024) [2023-06-19 14:16:53,549][00753] Fps is (10 sec: 4506.6, 60 sec: 3618.1, 300 sec: 3499.0). Total num frames: 1032192. Throughput: 0: 910.2. Samples: 256882. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-06-19 14:16:53,559][00753] Avg episode reward: [(0, '5.085')] [2023-06-19 14:16:58,549][00753] Fps is (10 sec: 3686.6, 60 sec: 3618.1, 300 sec: 3554.5). Total num frames: 1048576. Throughput: 0: 877.2. Samples: 262106. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-06-19 14:16:58,552][00753] Avg episode reward: [(0, '5.168')] [2023-06-19 14:16:58,558][15712] Saving new best policy, reward=5.168! [2023-06-19 14:17:03,549][00753] Fps is (10 sec: 2867.2, 60 sec: 3618.1, 300 sec: 3596.1). Total num frames: 1060864. Throughput: 0: 875.8. Samples: 266332. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-06-19 14:17:03,554][00753] Avg episode reward: [(0, '5.222')] [2023-06-19 14:17:03,562][15712] Saving new best policy, reward=5.222! [2023-06-19 14:17:04,022][15725] Updated weights for policy 0, policy_version 260 (0.0022) [2023-06-19 14:17:08,549][00753] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3665.6). Total num frames: 1085440. Throughput: 0: 903.7. Samples: 269628. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-06-19 14:17:08,555][00753] Avg episode reward: [(0, '5.602')] [2023-06-19 14:17:08,561][15712] Saving new best policy, reward=5.602! [2023-06-19 14:17:12,955][15725] Updated weights for policy 0, policy_version 270 (0.0015) [2023-06-19 14:17:13,549][00753] Fps is (10 sec: 4505.6, 60 sec: 3618.3, 300 sec: 3679.5). Total num frames: 1105920. Throughput: 0: 942.3. Samples: 276526. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-06-19 14:17:13,551][00753] Avg episode reward: [(0, '5.298')] [2023-06-19 14:17:18,549][00753] Fps is (10 sec: 3686.3, 60 sec: 3618.1, 300 sec: 3665.6). Total num frames: 1122304. Throughput: 0: 908.5. Samples: 281338. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-06-19 14:17:18,551][00753] Avg episode reward: [(0, '5.483')] [2023-06-19 14:17:23,549][00753] Fps is (10 sec: 2867.2, 60 sec: 3618.1, 300 sec: 3637.8). Total num frames: 1134592. Throughput: 0: 908.1. Samples: 283500. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-06-19 14:17:23,555][00753] Avg episode reward: [(0, '5.635')] [2023-06-19 14:17:23,569][15712] Saving new best policy, reward=5.635! [2023-06-19 14:17:25,750][15725] Updated weights for policy 0, policy_version 280 (0.0024) [2023-06-19 14:17:28,549][00753] Fps is (10 sec: 3686.5, 60 sec: 3686.4, 300 sec: 3665.6). Total num frames: 1159168. Throughput: 0: 938.5. Samples: 289328. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-06-19 14:17:28,557][00753] Avg episode reward: [(0, '5.970')] [2023-06-19 14:17:28,561][15712] Saving new best policy, reward=5.970! [2023-06-19 14:17:33,550][00753] Fps is (10 sec: 4505.1, 60 sec: 3686.5, 300 sec: 3679.4). Total num frames: 1179648. Throughput: 0: 946.8. Samples: 296174. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-06-19 14:17:33,555][00753] Avg episode reward: [(0, '5.947')] [2023-06-19 14:17:33,568][15712] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000288_1179648.pth... [2023-06-19 14:17:33,725][15712] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000073_299008.pth [2023-06-19 14:17:35,383][15725] Updated weights for policy 0, policy_version 290 (0.0012) [2023-06-19 14:17:38,549][00753] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3679.5). Total num frames: 1196032. Throughput: 0: 921.2. Samples: 298338. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-06-19 14:17:38,555][00753] Avg episode reward: [(0, '5.886')] [2023-06-19 14:17:43,549][00753] Fps is (10 sec: 2867.5, 60 sec: 3686.5, 300 sec: 3637.8). Total num frames: 1208320. Throughput: 0: 898.4. Samples: 302536. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2023-06-19 14:17:43,551][00753] Avg episode reward: [(0, '5.922')] [2023-06-19 14:17:47,180][15725] Updated weights for policy 0, policy_version 300 (0.0012) [2023-06-19 14:17:48,549][00753] Fps is (10 sec: 3686.3, 60 sec: 3686.4, 300 sec: 3665.6). Total num frames: 1232896. Throughput: 0: 948.3. Samples: 309006. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-06-19 14:17:48,555][00753] Avg episode reward: [(0, '6.287')] [2023-06-19 14:17:48,560][15712] Saving new best policy, reward=6.287! [2023-06-19 14:17:53,551][00753] Fps is (10 sec: 4914.2, 60 sec: 3754.5, 300 sec: 3707.2). Total num frames: 1257472. Throughput: 0: 951.4. Samples: 312444. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-06-19 14:17:53,556][00753] Avg episode reward: [(0, '6.755')] [2023-06-19 14:17:53,564][15712] Saving new best policy, reward=6.755! [2023-06-19 14:17:57,871][15725] Updated weights for policy 0, policy_version 310 (0.0021) [2023-06-19 14:17:58,549][00753] Fps is (10 sec: 3686.5, 60 sec: 3686.4, 300 sec: 3679.5). Total num frames: 1269760. Throughput: 0: 913.1. Samples: 317614. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2023-06-19 14:17:58,551][00753] Avg episode reward: [(0, '6.949')] [2023-06-19 14:17:58,556][15712] Saving new best policy, reward=6.949! [2023-06-19 14:18:03,549][00753] Fps is (10 sec: 2867.8, 60 sec: 3754.7, 300 sec: 3665.6). Total num frames: 1286144. Throughput: 0: 902.0. Samples: 321930. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2023-06-19 14:18:03,551][00753] Avg episode reward: [(0, '7.182')] [2023-06-19 14:18:03,565][15712] Saving new best policy, reward=7.182! [2023-06-19 14:18:08,549][00753] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3665.6). Total num frames: 1306624. Throughput: 0: 927.5. Samples: 325236. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-06-19 14:18:08,551][00753] Avg episode reward: [(0, '7.397')] [2023-06-19 14:18:08,557][15712] Saving new best policy, reward=7.397! [2023-06-19 14:18:08,855][15725] Updated weights for policy 0, policy_version 320 (0.0015) [2023-06-19 14:18:13,549][00753] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3707.3). Total num frames: 1331200. Throughput: 0: 952.1. Samples: 332174. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-06-19 14:18:13,551][00753] Avg episode reward: [(0, '7.721')] [2023-06-19 14:18:13,568][15712] Saving new best policy, reward=7.721! [2023-06-19 14:18:18,551][00753] Fps is (10 sec: 3685.6, 60 sec: 3686.3, 300 sec: 3679.4). Total num frames: 1343488. Throughput: 0: 904.4. Samples: 336874. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-06-19 14:18:18,556][00753] Avg episode reward: [(0, '7.759')] [2023-06-19 14:18:18,565][15712] Saving new best policy, reward=7.759! [2023-06-19 14:18:20,572][15725] Updated weights for policy 0, policy_version 330 (0.0012) [2023-06-19 14:18:23,549][00753] Fps is (10 sec: 2867.1, 60 sec: 3754.7, 300 sec: 3651.7). Total num frames: 1359872. Throughput: 0: 901.9. Samples: 338922. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-06-19 14:18:23,553][00753] Avg episode reward: [(0, '7.919')] [2023-06-19 14:18:23,571][15712] Saving new best policy, reward=7.919! [2023-06-19 14:18:28,549][00753] Fps is (10 sec: 3687.1, 60 sec: 3686.4, 300 sec: 3665.6). Total num frames: 1380352. Throughput: 0: 937.2. Samples: 344708. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-06-19 14:18:28,556][00753] Avg episode reward: [(0, '7.680')] [2023-06-19 14:18:30,848][15725] Updated weights for policy 0, policy_version 340 (0.0029) [2023-06-19 14:18:33,549][00753] Fps is (10 sec: 4096.1, 60 sec: 3686.5, 300 sec: 3693.3). Total num frames: 1400832. Throughput: 0: 948.0. Samples: 351666. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-06-19 14:18:33,551][00753] Avg episode reward: [(0, '7.957')] [2023-06-19 14:18:33,620][15712] Saving new best policy, reward=7.957! [2023-06-19 14:18:38,549][00753] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3679.5). Total num frames: 1417216. Throughput: 0: 922.4. Samples: 353950. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-06-19 14:18:38,555][00753] Avg episode reward: [(0, '8.508')] [2023-06-19 14:18:38,557][15712] Saving new best policy, reward=8.508! [2023-06-19 14:18:43,506][15725] Updated weights for policy 0, policy_version 350 (0.0011) [2023-06-19 14:18:43,549][00753] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3665.6). Total num frames: 1433600. Throughput: 0: 901.3. Samples: 358172. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-06-19 14:18:43,553][00753] Avg episode reward: [(0, '8.359')] [2023-06-19 14:18:48,549][00753] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3665.6). Total num frames: 1454080. Throughput: 0: 942.8. Samples: 364354. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-06-19 14:18:48,557][00753] Avg episode reward: [(0, '8.789')] [2023-06-19 14:18:48,560][15712] Saving new best policy, reward=8.789! [2023-06-19 14:18:52,602][15725] Updated weights for policy 0, policy_version 360 (0.0012) [2023-06-19 14:18:53,549][00753] Fps is (10 sec: 4505.6, 60 sec: 3686.5, 300 sec: 3693.4). Total num frames: 1478656. Throughput: 0: 944.8. Samples: 367750. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-06-19 14:18:53,555][00753] Avg episode reward: [(0, '9.441')] [2023-06-19 14:18:53,565][15712] Saving new best policy, reward=9.441! [2023-06-19 14:18:58,549][00753] Fps is (10 sec: 3686.5, 60 sec: 3686.4, 300 sec: 3679.5). Total num frames: 1490944. Throughput: 0: 911.6. Samples: 373196. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-06-19 14:18:58,552][00753] Avg episode reward: [(0, '10.522')] [2023-06-19 14:18:58,559][15712] Saving new best policy, reward=10.522! [2023-06-19 14:19:03,551][00753] Fps is (10 sec: 2867.1, 60 sec: 3686.4, 300 sec: 3651.7). Total num frames: 1507328. Throughput: 0: 902.2. Samples: 377470. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-06-19 14:19:03,559][00753] Avg episode reward: [(0, '10.809')] [2023-06-19 14:19:03,574][15712] Saving new best policy, reward=10.809! [2023-06-19 14:19:05,541][15725] Updated weights for policy 0, policy_version 370 (0.0017) [2023-06-19 14:19:08,549][00753] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3665.6). Total num frames: 1527808. Throughput: 0: 921.6. Samples: 380394. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-06-19 14:19:08,551][00753] Avg episode reward: [(0, '11.409')] [2023-06-19 14:19:08,554][15712] Saving new best policy, reward=11.409! [2023-06-19 14:19:13,549][00753] Fps is (10 sec: 4096.2, 60 sec: 3618.1, 300 sec: 3679.5). Total num frames: 1548288. Throughput: 0: 945.1. Samples: 387238. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-06-19 14:19:13,551][00753] Avg episode reward: [(0, '10.759')] [2023-06-19 14:19:14,763][15725] Updated weights for policy 0, policy_version 380 (0.0015) [2023-06-19 14:19:18,549][00753] Fps is (10 sec: 3686.4, 60 sec: 3686.5, 300 sec: 3679.5). Total num frames: 1564672. Throughput: 0: 906.5. Samples: 392460. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-06-19 14:19:18,553][00753] Avg episode reward: [(0, '10.834')] [2023-06-19 14:19:23,549][00753] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3651.7). Total num frames: 1581056. Throughput: 0: 904.3. Samples: 394642. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-06-19 14:19:23,551][00753] Avg episode reward: [(0, '11.892')] [2023-06-19 14:19:23,564][15712] Saving new best policy, reward=11.892! [2023-06-19 14:19:27,241][15725] Updated weights for policy 0, policy_version 390 (0.0029) [2023-06-19 14:19:28,549][00753] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3651.7). Total num frames: 1601536. Throughput: 0: 934.3. Samples: 400214. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-06-19 14:19:28,551][00753] Avg episode reward: [(0, '12.358')] [2023-06-19 14:19:28,558][15712] Saving new best policy, reward=12.358! [2023-06-19 14:19:33,549][00753] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3679.5). Total num frames: 1622016. Throughput: 0: 949.3. Samples: 407074. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-06-19 14:19:33,552][00753] Avg episode reward: [(0, '12.535')] [2023-06-19 14:19:33,568][15712] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000397_1626112.pth... [2023-06-19 14:19:33,673][15712] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000182_745472.pth [2023-06-19 14:19:33,687][15712] Saving new best policy, reward=12.535! [2023-06-19 14:19:37,498][15725] Updated weights for policy 0, policy_version 400 (0.0040) [2023-06-19 14:19:38,550][00753] Fps is (10 sec: 3685.9, 60 sec: 3686.3, 300 sec: 3679.4). Total num frames: 1638400. Throughput: 0: 926.7. Samples: 409452. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-06-19 14:19:38,553][00753] Avg episode reward: [(0, '12.930')] [2023-06-19 14:19:38,557][15712] Saving new best policy, reward=12.930! [2023-06-19 14:19:43,549][00753] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3651.7). Total num frames: 1654784. Throughput: 0: 901.6. Samples: 413768. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-06-19 14:19:43,553][00753] Avg episode reward: [(0, '13.460')] [2023-06-19 14:19:43,568][15712] Saving new best policy, reward=13.460! [2023-06-19 14:19:48,549][00753] Fps is (10 sec: 3686.9, 60 sec: 3686.4, 300 sec: 3651.7). Total num frames: 1675264. Throughput: 0: 942.2. Samples: 419870. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2023-06-19 14:19:48,553][00753] Avg episode reward: [(0, '13.915')] [2023-06-19 14:19:48,561][15712] Saving new best policy, reward=13.915! [2023-06-19 14:19:49,080][15725] Updated weights for policy 0, policy_version 410 (0.0030) [2023-06-19 14:19:53,549][00753] Fps is (10 sec: 4505.6, 60 sec: 3686.4, 300 sec: 3693.3). Total num frames: 1699840. Throughput: 0: 953.1. Samples: 423282. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-06-19 14:19:53,551][00753] Avg episode reward: [(0, '14.676')] [2023-06-19 14:19:53,560][15712] Saving new best policy, reward=14.676! [2023-06-19 14:19:58,549][00753] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3693.3). Total num frames: 1716224. Throughput: 0: 923.9. Samples: 428812. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-06-19 14:19:58,553][00753] Avg episode reward: [(0, '14.714')] [2023-06-19 14:19:58,557][15712] Saving new best policy, reward=14.714! [2023-06-19 14:19:59,953][15725] Updated weights for policy 0, policy_version 420 (0.0023) [2023-06-19 14:20:03,549][00753] Fps is (10 sec: 2867.2, 60 sec: 3686.4, 300 sec: 3651.7). Total num frames: 1728512. Throughput: 0: 906.0. Samples: 433232. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-06-19 14:20:03,558][00753] Avg episode reward: [(0, '14.795')] [2023-06-19 14:20:03,567][15712] Saving new best policy, reward=14.795! [2023-06-19 14:20:08,549][00753] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3651.7). Total num frames: 1748992. Throughput: 0: 918.4. Samples: 435972. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-06-19 14:20:08,551][00753] Avg episode reward: [(0, '14.436')] [2023-06-19 14:20:10,658][15725] Updated weights for policy 0, policy_version 430 (0.0012) [2023-06-19 14:20:13,549][00753] Fps is (10 sec: 4505.5, 60 sec: 3754.7, 300 sec: 3693.3). Total num frames: 1773568. Throughput: 0: 949.2. Samples: 442926. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-06-19 14:20:13,551][00753] Avg episode reward: [(0, '14.408')] [2023-06-19 14:20:18,549][00753] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3693.3). Total num frames: 1789952. Throughput: 0: 915.9. Samples: 448290. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-06-19 14:20:18,555][00753] Avg episode reward: [(0, '14.170')] [2023-06-19 14:20:22,322][15725] Updated weights for policy 0, policy_version 440 (0.0017) [2023-06-19 14:20:23,549][00753] Fps is (10 sec: 2867.2, 60 sec: 3686.4, 300 sec: 3651.7). Total num frames: 1802240. Throughput: 0: 910.7. Samples: 450434. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-06-19 14:20:23,558][00753] Avg episode reward: [(0, '14.235')] [2023-06-19 14:20:28,549][00753] Fps is (10 sec: 3276.9, 60 sec: 3686.4, 300 sec: 3651.7). Total num frames: 1822720. Throughput: 0: 932.1. Samples: 455712. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-06-19 14:20:28,554][00753] Avg episode reward: [(0, '15.125')] [2023-06-19 14:20:28,559][15712] Saving new best policy, reward=15.125! [2023-06-19 14:20:32,398][15725] Updated weights for policy 0, policy_version 450 (0.0021) [2023-06-19 14:20:33,549][00753] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3693.3). Total num frames: 1847296. Throughput: 0: 949.4. Samples: 462594. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-06-19 14:20:33,557][00753] Avg episode reward: [(0, '14.947')] [2023-06-19 14:20:38,549][00753] Fps is (10 sec: 4096.0, 60 sec: 3754.8, 300 sec: 3693.3). Total num frames: 1863680. Throughput: 0: 935.8. Samples: 465392. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-06-19 14:20:38,554][00753] Avg episode reward: [(0, '15.479')] [2023-06-19 14:20:38,563][15712] Saving new best policy, reward=15.479! [2023-06-19 14:20:43,549][00753] Fps is (10 sec: 2867.2, 60 sec: 3686.4, 300 sec: 3665.6). Total num frames: 1875968. Throughput: 0: 906.8. Samples: 469620. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-06-19 14:20:43,553][00753] Avg episode reward: [(0, '14.941')] [2023-06-19 14:20:45,104][15725] Updated weights for policy 0, policy_version 460 (0.0024) [2023-06-19 14:20:48,549][00753] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3679.5). Total num frames: 1900544. Throughput: 0: 940.4. Samples: 475548. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-06-19 14:20:48,554][00753] Avg episode reward: [(0, '16.035')] [2023-06-19 14:20:48,557][15712] Saving new best policy, reward=16.035! [2023-06-19 14:20:53,549][00753] Fps is (10 sec: 4505.6, 60 sec: 3686.4, 300 sec: 3693.3). Total num frames: 1921024. Throughput: 0: 953.5. Samples: 478878. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-06-19 14:20:53,551][00753] Avg episode reward: [(0, '16.908')] [2023-06-19 14:20:53,560][15712] Saving new best policy, reward=16.908! [2023-06-19 14:20:53,975][15725] Updated weights for policy 0, policy_version 470 (0.0018) [2023-06-19 14:20:58,552][00753] Fps is (10 sec: 3685.1, 60 sec: 3686.2, 300 sec: 3707.2). Total num frames: 1937408. Throughput: 0: 930.4. Samples: 484798. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-06-19 14:20:58,558][00753] Avg episode reward: [(0, '17.941')] [2023-06-19 14:20:58,560][15712] Saving new best policy, reward=17.941! [2023-06-19 14:21:03,549][00753] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3679.5). Total num frames: 1953792. Throughput: 0: 904.8. Samples: 489004. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-06-19 14:21:03,553][00753] Avg episode reward: [(0, '17.809')] [2023-06-19 14:21:06,862][15725] Updated weights for policy 0, policy_version 480 (0.0031) [2023-06-19 14:21:08,549][00753] Fps is (10 sec: 3277.9, 60 sec: 3686.4, 300 sec: 3665.6). Total num frames: 1970176. Throughput: 0: 914.3. Samples: 491576. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-06-19 14:21:08,556][00753] Avg episode reward: [(0, '17.267')] [2023-06-19 14:21:13,549][00753] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3693.3). Total num frames: 1994752. Throughput: 0: 948.5. Samples: 498396. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-06-19 14:21:13,554][00753] Avg episode reward: [(0, '16.821')] [2023-06-19 14:21:15,876][15725] Updated weights for policy 0, policy_version 490 (0.0017) [2023-06-19 14:21:18,549][00753] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3707.2). Total num frames: 2011136. Throughput: 0: 923.1. Samples: 504132. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-06-19 14:21:18,551][00753] Avg episode reward: [(0, '17.097')] [2023-06-19 14:21:23,549][00753] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3693.3). Total num frames: 2027520. Throughput: 0: 908.7. Samples: 506284. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-06-19 14:21:23,551][00753] Avg episode reward: [(0, '16.392')] [2023-06-19 14:21:28,549][00753] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3693.4). Total num frames: 2048000. Throughput: 0: 925.6. Samples: 511272. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-06-19 14:21:28,552][15725] Updated weights for policy 0, policy_version 500 (0.0012) [2023-06-19 14:21:28,550][00753] Avg episode reward: [(0, '18.095')] [2023-06-19 14:21:28,562][15712] Saving new best policy, reward=18.095! [2023-06-19 14:21:33,549][00753] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3707.2). Total num frames: 2068480. Throughput: 0: 945.6. Samples: 518098. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-06-19 14:21:33,554][00753] Avg episode reward: [(0, '18.358')] [2023-06-19 14:21:33,567][15712] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000505_2068480.pth... [2023-06-19 14:21:33,678][15712] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000288_1179648.pth [2023-06-19 14:21:33,685][15712] Saving new best policy, reward=18.358! [2023-06-19 14:21:38,549][00753] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3721.1). Total num frames: 2084864. Throughput: 0: 938.0. Samples: 521088. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-06-19 14:21:38,551][00753] Avg episode reward: [(0, '20.300')] [2023-06-19 14:21:38,564][15712] Saving new best policy, reward=20.300! [2023-06-19 14:21:38,816][15725] Updated weights for policy 0, policy_version 510 (0.0032) [2023-06-19 14:21:43,549][00753] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3693.3). Total num frames: 2101248. Throughput: 0: 899.4. Samples: 525266. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-06-19 14:21:43,556][00753] Avg episode reward: [(0, '19.890')] [2023-06-19 14:21:48,549][00753] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3679.5). Total num frames: 2117632. Throughput: 0: 924.3. Samples: 530596. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-06-19 14:21:48,554][00753] Avg episode reward: [(0, '20.911')] [2023-06-19 14:21:48,556][15712] Saving new best policy, reward=20.911! [2023-06-19 14:21:50,633][15725] Updated weights for policy 0, policy_version 520 (0.0020) [2023-06-19 14:21:53,549][00753] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3707.2). Total num frames: 2142208. Throughput: 0: 942.9. Samples: 534006. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-06-19 14:21:53,551][00753] Avg episode reward: [(0, '20.340')] [2023-06-19 14:21:58,550][00753] Fps is (10 sec: 4095.6, 60 sec: 3686.5, 300 sec: 3721.1). Total num frames: 2158592. Throughput: 0: 931.3. Samples: 540306. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-06-19 14:21:58,552][00753] Avg episode reward: [(0, '20.661')] [2023-06-19 14:22:01,536][15725] Updated weights for policy 0, policy_version 530 (0.0022) [2023-06-19 14:22:03,549][00753] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3693.3). Total num frames: 2174976. Throughput: 0: 898.7. Samples: 544572. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-06-19 14:22:03,555][00753] Avg episode reward: [(0, '20.566')] [2023-06-19 14:22:08,549][00753] Fps is (10 sec: 3277.2, 60 sec: 3686.4, 300 sec: 3679.5). Total num frames: 2191360. Throughput: 0: 898.5. Samples: 546718. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-06-19 14:22:08,556][00753] Avg episode reward: [(0, '20.566')] [2023-06-19 14:22:12,406][15725] Updated weights for policy 0, policy_version 540 (0.0014) [2023-06-19 14:22:13,549][00753] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3707.2). Total num frames: 2215936. Throughput: 0: 940.4. Samples: 553588. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-06-19 14:22:13,557][00753] Avg episode reward: [(0, '20.341')] [2023-06-19 14:22:18,549][00753] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3721.1). Total num frames: 2232320. Throughput: 0: 922.7. Samples: 559618. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-06-19 14:22:18,554][00753] Avg episode reward: [(0, '20.097')] [2023-06-19 14:22:23,549][00753] Fps is (10 sec: 3276.7, 60 sec: 3686.4, 300 sec: 3693.3). Total num frames: 2248704. Throughput: 0: 901.9. Samples: 561672. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-06-19 14:22:23,555][00753] Avg episode reward: [(0, '19.525')] [2023-06-19 14:22:24,055][15725] Updated weights for policy 0, policy_version 550 (0.0016) [2023-06-19 14:22:28,550][00753] Fps is (10 sec: 3276.4, 60 sec: 3618.1, 300 sec: 3679.5). Total num frames: 2265088. Throughput: 0: 914.8. Samples: 566434. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2023-06-19 14:22:28,554][00753] Avg episode reward: [(0, '20.315')] [2023-06-19 14:22:33,549][00753] Fps is (10 sec: 4096.1, 60 sec: 3686.4, 300 sec: 3707.2). Total num frames: 2289664. Throughput: 0: 949.5. Samples: 573322. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-06-19 14:22:33,552][00753] Avg episode reward: [(0, '19.414')] [2023-06-19 14:22:34,035][15725] Updated weights for policy 0, policy_version 560 (0.0024) [2023-06-19 14:22:38,549][00753] Fps is (10 sec: 4506.1, 60 sec: 3754.7, 300 sec: 3735.0). Total num frames: 2310144. Throughput: 0: 950.5. Samples: 576778. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-06-19 14:22:38,555][00753] Avg episode reward: [(0, '20.310')] [2023-06-19 14:22:43,549][00753] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3693.3). Total num frames: 2322432. Throughput: 0: 907.2. Samples: 581128. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-06-19 14:22:43,553][00753] Avg episode reward: [(0, '20.931')] [2023-06-19 14:22:43,567][15712] Saving new best policy, reward=20.931! [2023-06-19 14:22:46,923][15725] Updated weights for policy 0, policy_version 570 (0.0023) [2023-06-19 14:22:48,549][00753] Fps is (10 sec: 2867.2, 60 sec: 3686.4, 300 sec: 3665.6). Total num frames: 2338816. Throughput: 0: 924.1. Samples: 586156. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-06-19 14:22:48,557][00753] Avg episode reward: [(0, '20.626')] [2023-06-19 14:22:53,551][00753] Fps is (10 sec: 4095.1, 60 sec: 3686.3, 300 sec: 3707.2). Total num frames: 2363392. Throughput: 0: 953.1. Samples: 589610. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-06-19 14:22:53,553][00753] Avg episode reward: [(0, '19.179')] [2023-06-19 14:22:55,696][15725] Updated weights for policy 0, policy_version 580 (0.0040) [2023-06-19 14:22:58,549][00753] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3721.1). Total num frames: 2383872. Throughput: 0: 949.6. Samples: 596318. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-06-19 14:22:58,554][00753] Avg episode reward: [(0, '18.110')] [2023-06-19 14:23:03,549][00753] Fps is (10 sec: 3687.2, 60 sec: 3754.7, 300 sec: 3707.2). Total num frames: 2400256. Throughput: 0: 909.9. Samples: 600564. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-06-19 14:23:03,556][00753] Avg episode reward: [(0, '18.247')] [2023-06-19 14:23:08,363][15725] Updated weights for policy 0, policy_version 590 (0.0023) [2023-06-19 14:23:08,549][00753] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3679.5). Total num frames: 2416640. Throughput: 0: 913.3. Samples: 602772. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-06-19 14:23:08,551][00753] Avg episode reward: [(0, '18.553')] [2023-06-19 14:23:13,549][00753] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3707.3). Total num frames: 2437120. Throughput: 0: 956.2. Samples: 609464. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-06-19 14:23:13,551][00753] Avg episode reward: [(0, '20.628')] [2023-06-19 14:23:17,184][15725] Updated weights for policy 0, policy_version 600 (0.0013) [2023-06-19 14:23:18,550][00753] Fps is (10 sec: 4095.5, 60 sec: 3754.6, 300 sec: 3721.1). Total num frames: 2457600. Throughput: 0: 945.2. Samples: 615858. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-06-19 14:23:18,552][00753] Avg episode reward: [(0, '21.862')] [2023-06-19 14:23:18,630][15712] Saving new best policy, reward=21.862! [2023-06-19 14:23:23,549][00753] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3707.2). Total num frames: 2473984. Throughput: 0: 915.7. Samples: 617986. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-06-19 14:23:23,557][00753] Avg episode reward: [(0, '22.522')] [2023-06-19 14:23:23,575][15712] Saving new best policy, reward=22.522! [2023-06-19 14:23:28,549][00753] Fps is (10 sec: 3277.2, 60 sec: 3754.7, 300 sec: 3693.3). Total num frames: 2490368. Throughput: 0: 912.5. Samples: 622192. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-06-19 14:23:28,555][00753] Avg episode reward: [(0, '24.060')] [2023-06-19 14:23:28,557][15712] Saving new best policy, reward=24.060! [2023-06-19 14:23:30,128][15725] Updated weights for policy 0, policy_version 610 (0.0024) [2023-06-19 14:23:33,549][00753] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3707.2). Total num frames: 2510848. Throughput: 0: 952.6. Samples: 629022. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-06-19 14:23:33,554][00753] Avg episode reward: [(0, '23.716')] [2023-06-19 14:23:33,626][15712] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000614_2514944.pth... [2023-06-19 14:23:33,740][15712] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000397_1626112.pth [2023-06-19 14:23:38,549][00753] Fps is (10 sec: 4095.9, 60 sec: 3686.4, 300 sec: 3721.1). Total num frames: 2531328. Throughput: 0: 950.4. Samples: 632374. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-06-19 14:23:38,551][00753] Avg episode reward: [(0, '23.141')] [2023-06-19 14:23:40,137][15725] Updated weights for policy 0, policy_version 620 (0.0024) [2023-06-19 14:23:43,549][00753] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3707.2). Total num frames: 2547712. Throughput: 0: 909.4. Samples: 637240. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-06-19 14:23:43,553][00753] Avg episode reward: [(0, '23.405')] [2023-06-19 14:23:48,549][00753] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3679.5). Total num frames: 2564096. Throughput: 0: 919.2. Samples: 641926. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-06-19 14:23:48,551][00753] Avg episode reward: [(0, '23.322')] [2023-06-19 14:23:51,573][15725] Updated weights for policy 0, policy_version 630 (0.0020) [2023-06-19 14:23:53,549][00753] Fps is (10 sec: 4096.0, 60 sec: 3754.8, 300 sec: 3721.1). Total num frames: 2588672. Throughput: 0: 948.3. Samples: 645444. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-06-19 14:23:53,552][00753] Avg episode reward: [(0, '22.911')] [2023-06-19 14:23:58,549][00753] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3735.0). Total num frames: 2609152. Throughput: 0: 953.2. Samples: 652360. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-06-19 14:23:58,553][00753] Avg episode reward: [(0, '23.911')] [2023-06-19 14:24:02,120][15725] Updated weights for policy 0, policy_version 640 (0.0029) [2023-06-19 14:24:03,549][00753] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3721.1). Total num frames: 2625536. Throughput: 0: 909.4. Samples: 656780. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-06-19 14:24:03,556][00753] Avg episode reward: [(0, '23.875')] [2023-06-19 14:24:08,549][00753] Fps is (10 sec: 2867.2, 60 sec: 3686.4, 300 sec: 3693.3). Total num frames: 2637824. Throughput: 0: 910.4. Samples: 658952. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-06-19 14:24:08,553][00753] Avg episode reward: [(0, '23.861')] [2023-06-19 14:24:13,158][15725] Updated weights for policy 0, policy_version 650 (0.0026) [2023-06-19 14:24:13,549][00753] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3721.1). Total num frames: 2662400. Throughput: 0: 957.3. Samples: 665270. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-06-19 14:24:13,554][00753] Avg episode reward: [(0, '22.964')] [2023-06-19 14:24:18,549][00753] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3735.0). Total num frames: 2682880. Throughput: 0: 957.2. Samples: 672098. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-06-19 14:24:18,551][00753] Avg episode reward: [(0, '24.281')] [2023-06-19 14:24:18,554][15712] Saving new best policy, reward=24.281! [2023-06-19 14:24:23,552][00753] Fps is (10 sec: 3685.2, 60 sec: 3754.5, 300 sec: 3721.1). Total num frames: 2699264. Throughput: 0: 930.1. Samples: 674230. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-06-19 14:24:23,555][00753] Avg episode reward: [(0, '24.904')] [2023-06-19 14:24:23,571][15712] Saving new best policy, reward=24.904! [2023-06-19 14:24:24,509][15725] Updated weights for policy 0, policy_version 660 (0.0019) [2023-06-19 14:24:28,549][00753] Fps is (10 sec: 2867.2, 60 sec: 3686.4, 300 sec: 3693.3). Total num frames: 2711552. Throughput: 0: 916.7. Samples: 678492. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-06-19 14:24:28,555][00753] Avg episode reward: [(0, '25.365')] [2023-06-19 14:24:28,624][15712] Saving new best policy, reward=25.365! [2023-06-19 14:24:33,549][00753] Fps is (10 sec: 3687.6, 60 sec: 3754.7, 300 sec: 3721.1). Total num frames: 2736128. Throughput: 0: 954.4. Samples: 684872. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-06-19 14:24:33,553][00753] Avg episode reward: [(0, '25.712')] [2023-06-19 14:24:33,567][15712] Saving new best policy, reward=25.712! [2023-06-19 14:24:34,939][15725] Updated weights for policy 0, policy_version 670 (0.0015) [2023-06-19 14:24:38,549][00753] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3735.0). Total num frames: 2756608. Throughput: 0: 951.5. Samples: 688260. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-06-19 14:24:38,558][00753] Avg episode reward: [(0, '25.987')] [2023-06-19 14:24:38,561][15712] Saving new best policy, reward=25.987! [2023-06-19 14:24:43,550][00753] Fps is (10 sec: 3686.0, 60 sec: 3754.6, 300 sec: 3721.1). Total num frames: 2772992. Throughput: 0: 911.0. Samples: 693356. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-06-19 14:24:43,555][00753] Avg episode reward: [(0, '26.816')] [2023-06-19 14:24:43,567][15712] Saving new best policy, reward=26.816! [2023-06-19 14:24:47,293][15725] Updated weights for policy 0, policy_version 680 (0.0026) [2023-06-19 14:24:48,549][00753] Fps is (10 sec: 2867.2, 60 sec: 3686.4, 300 sec: 3679.5). Total num frames: 2785280. Throughput: 0: 911.0. Samples: 697774. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-06-19 14:24:48,559][00753] Avg episode reward: [(0, '26.980')] [2023-06-19 14:24:48,612][15712] Saving new best policy, reward=26.980! [2023-06-19 14:24:53,549][00753] Fps is (10 sec: 3686.8, 60 sec: 3686.4, 300 sec: 3707.2). Total num frames: 2809856. Throughput: 0: 933.1. Samples: 700942. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-06-19 14:24:53,551][00753] Avg episode reward: [(0, '25.967')] [2023-06-19 14:24:56,664][15725] Updated weights for policy 0, policy_version 690 (0.0042) [2023-06-19 14:24:58,549][00753] Fps is (10 sec: 4915.2, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 2834432. Throughput: 0: 948.0. Samples: 707932. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-06-19 14:24:58,551][00753] Avg episode reward: [(0, '26.305')] [2023-06-19 14:25:03,549][00753] Fps is (10 sec: 3686.3, 60 sec: 3686.4, 300 sec: 3721.1). Total num frames: 2846720. Throughput: 0: 909.8. Samples: 713040. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-06-19 14:25:03,555][00753] Avg episode reward: [(0, '25.463')] [2023-06-19 14:25:08,549][00753] Fps is (10 sec: 2867.0, 60 sec: 3754.6, 300 sec: 3693.3). Total num frames: 2863104. Throughput: 0: 910.6. Samples: 715204. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-06-19 14:25:08,552][00753] Avg episode reward: [(0, '26.561')] [2023-06-19 14:25:09,222][15725] Updated weights for policy 0, policy_version 700 (0.0020) [2023-06-19 14:25:13,549][00753] Fps is (10 sec: 3686.5, 60 sec: 3686.4, 300 sec: 3707.2). Total num frames: 2883584. Throughput: 0: 944.7. Samples: 721002. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-06-19 14:25:13,553][00753] Avg episode reward: [(0, '24.632')] [2023-06-19 14:25:17,882][15725] Updated weights for policy 0, policy_version 710 (0.0019) [2023-06-19 14:25:18,554][00753] Fps is (10 sec: 4503.5, 60 sec: 3754.3, 300 sec: 3748.8). Total num frames: 2908160. Throughput: 0: 960.1. Samples: 728082. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-06-19 14:25:18,557][00753] Avg episode reward: [(0, '25.128')] [2023-06-19 14:25:23,549][00753] Fps is (10 sec: 4096.0, 60 sec: 3754.9, 300 sec: 3735.0). Total num frames: 2924544. Throughput: 0: 942.0. Samples: 730652. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-06-19 14:25:23,557][00753] Avg episode reward: [(0, '26.490')] [2023-06-19 14:25:28,549][00753] Fps is (10 sec: 2868.7, 60 sec: 3754.7, 300 sec: 3693.3). Total num frames: 2936832. Throughput: 0: 925.0. Samples: 734980. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-06-19 14:25:28,552][00753] Avg episode reward: [(0, '25.926')] [2023-06-19 14:25:30,767][15725] Updated weights for policy 0, policy_version 720 (0.0015) [2023-06-19 14:25:33,549][00753] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3721.1). Total num frames: 2961408. Throughput: 0: 960.0. Samples: 740972. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-06-19 14:25:33,557][00753] Avg episode reward: [(0, '26.396')] [2023-06-19 14:25:33,569][15712] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000723_2961408.pth... [2023-06-19 14:25:33,682][15712] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000505_2068480.pth [2023-06-19 14:25:38,549][00753] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 2981888. Throughput: 0: 966.6. Samples: 744438. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-06-19 14:25:38,551][00753] Avg episode reward: [(0, '26.917')] [2023-06-19 14:25:39,966][15725] Updated weights for policy 0, policy_version 730 (0.0018) [2023-06-19 14:25:43,549][00753] Fps is (10 sec: 3686.3, 60 sec: 3754.7, 300 sec: 3721.1). Total num frames: 2998272. Throughput: 0: 934.6. Samples: 749990. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-06-19 14:25:43,551][00753] Avg episode reward: [(0, '27.643')] [2023-06-19 14:25:43,566][15712] Saving new best policy, reward=27.643! [2023-06-19 14:25:48,549][00753] Fps is (10 sec: 3276.7, 60 sec: 3822.9, 300 sec: 3707.2). Total num frames: 3014656. Throughput: 0: 914.7. Samples: 754202. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-06-19 14:25:48,553][00753] Avg episode reward: [(0, '26.404')] [2023-06-19 14:25:52,564][15725] Updated weights for policy 0, policy_version 740 (0.0031) [2023-06-19 14:25:53,549][00753] Fps is (10 sec: 3686.5, 60 sec: 3754.7, 300 sec: 3721.2). Total num frames: 3035136. Throughput: 0: 931.7. Samples: 757128. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-06-19 14:25:53,555][00753] Avg episode reward: [(0, '25.247')] [2023-06-19 14:25:58,549][00753] Fps is (10 sec: 4096.2, 60 sec: 3686.4, 300 sec: 3735.0). Total num frames: 3055616. Throughput: 0: 958.3. Samples: 764124. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-06-19 14:25:58,551][00753] Avg episode reward: [(0, '24.583')] [2023-06-19 14:26:02,437][15725] Updated weights for policy 0, policy_version 750 (0.0018) [2023-06-19 14:26:03,553][00753] Fps is (10 sec: 3684.9, 60 sec: 3754.4, 300 sec: 3734.9). Total num frames: 3072000. Throughput: 0: 919.1. Samples: 769442. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-06-19 14:26:03,560][00753] Avg episode reward: [(0, '23.439')] [2023-06-19 14:26:08,549][00753] Fps is (10 sec: 3276.7, 60 sec: 3754.7, 300 sec: 3707.2). Total num frames: 3088384. Throughput: 0: 909.7. Samples: 771588. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-06-19 14:26:08,557][00753] Avg episode reward: [(0, '23.079')] [2023-06-19 14:26:13,549][00753] Fps is (10 sec: 3687.9, 60 sec: 3754.7, 300 sec: 3721.1). Total num frames: 3108864. Throughput: 0: 933.6. Samples: 776994. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-06-19 14:26:13,551][00753] Avg episode reward: [(0, '22.671')] [2023-06-19 14:26:14,083][15725] Updated weights for policy 0, policy_version 760 (0.0023) [2023-06-19 14:26:18,549][00753] Fps is (10 sec: 4505.6, 60 sec: 3755.0, 300 sec: 3748.9). Total num frames: 3133440. Throughput: 0: 955.6. Samples: 783974. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-06-19 14:26:18,551][00753] Avg episode reward: [(0, '22.104')] [2023-06-19 14:26:23,549][00753] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3735.0). Total num frames: 3149824. Throughput: 0: 942.7. Samples: 786860. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-06-19 14:26:23,551][00753] Avg episode reward: [(0, '22.932')] [2023-06-19 14:26:24,534][15725] Updated weights for policy 0, policy_version 770 (0.0028) [2023-06-19 14:26:28,549][00753] Fps is (10 sec: 2867.2, 60 sec: 3754.7, 300 sec: 3707.2). Total num frames: 3162112. Throughput: 0: 915.8. Samples: 791202. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-06-19 14:26:28,554][00753] Avg episode reward: [(0, '23.194')] [2023-06-19 14:26:33,549][00753] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3721.1). Total num frames: 3182592. Throughput: 0: 945.4. Samples: 796746. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-06-19 14:26:33,556][00753] Avg episode reward: [(0, '24.083')] [2023-06-19 14:26:35,617][15725] Updated weights for policy 0, policy_version 780 (0.0019) [2023-06-19 14:26:38,549][00753] Fps is (10 sec: 4505.7, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 3207168. Throughput: 0: 954.8. Samples: 800094. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-06-19 14:26:38,551][00753] Avg episode reward: [(0, '25.015')] [2023-06-19 14:26:43,549][00753] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 3223552. Throughput: 0: 931.2. Samples: 806028. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-06-19 14:26:43,554][00753] Avg episode reward: [(0, '25.565')] [2023-06-19 14:26:47,089][15725] Updated weights for policy 0, policy_version 790 (0.0024) [2023-06-19 14:26:48,549][00753] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3721.1). Total num frames: 3239936. Throughput: 0: 909.1. Samples: 810348. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-06-19 14:26:48,554][00753] Avg episode reward: [(0, '26.374')] [2023-06-19 14:26:53,549][00753] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3721.1). Total num frames: 3256320. Throughput: 0: 917.5. Samples: 812876. Policy #0 lag: (min: 0.0, avg: 0.3, max: 2.0) [2023-06-19 14:26:53,552][00753] Avg episode reward: [(0, '26.270')] [2023-06-19 14:26:57,599][15725] Updated weights for policy 0, policy_version 800 (0.0023) [2023-06-19 14:26:58,549][00753] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3735.0). Total num frames: 3276800. Throughput: 0: 941.8. Samples: 819376. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-06-19 14:26:58,551][00753] Avg episode reward: [(0, '26.020')] [2023-06-19 14:27:03,549][00753] Fps is (10 sec: 4096.1, 60 sec: 3754.9, 300 sec: 3748.9). Total num frames: 3297280. Throughput: 0: 905.3. Samples: 824712. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-06-19 14:27:03,551][00753] Avg episode reward: [(0, '26.398')] [2023-06-19 14:27:08,549][00753] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3707.2). Total num frames: 3309568. Throughput: 0: 885.5. Samples: 826706. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-06-19 14:27:08,551][00753] Avg episode reward: [(0, '25.847')] [2023-06-19 14:27:10,903][15725] Updated weights for policy 0, policy_version 810 (0.0028) [2023-06-19 14:27:13,549][00753] Fps is (10 sec: 2867.2, 60 sec: 3618.1, 300 sec: 3707.2). Total num frames: 3325952. Throughput: 0: 896.4. Samples: 831542. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-06-19 14:27:13,551][00753] Avg episode reward: [(0, '25.049')] [2023-06-19 14:27:18,549][00753] Fps is (10 sec: 4095.9, 60 sec: 3618.1, 300 sec: 3735.0). Total num frames: 3350528. Throughput: 0: 921.3. Samples: 838206. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-06-19 14:27:18,552][00753] Avg episode reward: [(0, '23.905')] [2023-06-19 14:27:20,181][15725] Updated weights for policy 0, policy_version 820 (0.0016) [2023-06-19 14:27:23,549][00753] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3735.0). Total num frames: 3366912. Throughput: 0: 912.2. Samples: 841144. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-06-19 14:27:23,552][00753] Avg episode reward: [(0, '22.080')] [2023-06-19 14:27:28,549][00753] Fps is (10 sec: 2867.3, 60 sec: 3618.1, 300 sec: 3693.3). Total num frames: 3379200. Throughput: 0: 870.4. Samples: 845198. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-06-19 14:27:28,550][00753] Avg episode reward: [(0, '22.160')] [2023-06-19 14:27:33,549][00753] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3679.5). Total num frames: 3395584. Throughput: 0: 886.9. Samples: 850260. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-06-19 14:27:33,556][00753] Avg episode reward: [(0, '23.197')] [2023-06-19 14:27:33,568][15712] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000829_3395584.pth... [2023-06-19 14:27:33,688][15712] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000614_2514944.pth [2023-06-19 14:27:33,811][15725] Updated weights for policy 0, policy_version 830 (0.0038) [2023-06-19 14:27:38,549][00753] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3721.1). Total num frames: 3420160. Throughput: 0: 903.5. Samples: 853532. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-06-19 14:27:38,551][00753] Avg episode reward: [(0, '24.463')] [2023-06-19 14:27:43,549][00753] Fps is (10 sec: 4095.9, 60 sec: 3549.8, 300 sec: 3721.1). Total num frames: 3436544. Throughput: 0: 891.9. Samples: 859512. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-06-19 14:27:43,557][00753] Avg episode reward: [(0, '24.208')] [2023-06-19 14:27:44,183][15725] Updated weights for policy 0, policy_version 840 (0.0018) [2023-06-19 14:27:48,549][00753] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3679.5). Total num frames: 3448832. Throughput: 0: 858.9. Samples: 863362. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-06-19 14:27:48,554][00753] Avg episode reward: [(0, '24.357')] [2023-06-19 14:27:53,549][00753] Fps is (10 sec: 2867.3, 60 sec: 3481.6, 300 sec: 3665.6). Total num frames: 3465216. Throughput: 0: 860.0. Samples: 865408. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-06-19 14:27:53,551][00753] Avg episode reward: [(0, '24.807')] [2023-06-19 14:27:56,880][15725] Updated weights for policy 0, policy_version 850 (0.0036) [2023-06-19 14:27:58,549][00753] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3679.5). Total num frames: 3485696. Throughput: 0: 886.2. Samples: 871422. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-06-19 14:27:58,551][00753] Avg episode reward: [(0, '26.831')] [2023-06-19 14:28:03,549][00753] Fps is (10 sec: 4096.0, 60 sec: 3481.6, 300 sec: 3693.3). Total num frames: 3506176. Throughput: 0: 871.5. Samples: 877424. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-06-19 14:28:03,556][00753] Avg episode reward: [(0, '24.616')] [2023-06-19 14:28:08,553][00753] Fps is (10 sec: 3275.4, 60 sec: 3481.4, 300 sec: 3665.5). Total num frames: 3518464. Throughput: 0: 850.9. Samples: 879436. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-06-19 14:28:08,558][00753] Avg episode reward: [(0, '23.077')] [2023-06-19 14:28:09,151][15725] Updated weights for policy 0, policy_version 860 (0.0025) [2023-06-19 14:28:13,549][00753] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3651.7). Total num frames: 3534848. Throughput: 0: 850.8. Samples: 883482. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-06-19 14:28:13,551][00753] Avg episode reward: [(0, '23.620')] [2023-06-19 14:28:18,549][00753] Fps is (10 sec: 3687.9, 60 sec: 3413.3, 300 sec: 3665.6). Total num frames: 3555328. Throughput: 0: 877.3. Samples: 889738. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-06-19 14:28:18,559][00753] Avg episode reward: [(0, '24.873')] [2023-06-19 14:28:19,918][15725] Updated weights for policy 0, policy_version 870 (0.0017) [2023-06-19 14:28:23,551][00753] Fps is (10 sec: 4095.2, 60 sec: 3481.5, 300 sec: 3679.4). Total num frames: 3575808. Throughput: 0: 879.1. Samples: 893092. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-06-19 14:28:23,553][00753] Avg episode reward: [(0, '24.180')] [2023-06-19 14:28:28,549][00753] Fps is (10 sec: 3276.9, 60 sec: 3481.6, 300 sec: 3651.7). Total num frames: 3588096. Throughput: 0: 850.4. Samples: 897782. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-06-19 14:28:28,556][00753] Avg episode reward: [(0, '24.184')] [2023-06-19 14:28:33,462][15725] Updated weights for policy 0, policy_version 880 (0.0012) [2023-06-19 14:28:33,549][00753] Fps is (10 sec: 2867.7, 60 sec: 3481.6, 300 sec: 3637.8). Total num frames: 3604480. Throughput: 0: 851.5. Samples: 901678. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-06-19 14:28:33,556][00753] Avg episode reward: [(0, '24.410')] [2023-06-19 14:28:38,549][00753] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3651.7). Total num frames: 3624960. Throughput: 0: 871.8. Samples: 904640. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-06-19 14:28:38,552][00753] Avg episode reward: [(0, '25.314')] [2023-06-19 14:28:43,296][15725] Updated weights for policy 0, policy_version 890 (0.0020) [2023-06-19 14:28:43,549][00753] Fps is (10 sec: 4096.0, 60 sec: 3481.6, 300 sec: 3665.6). Total num frames: 3645440. Throughput: 0: 878.7. Samples: 910964. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-06-19 14:28:43,556][00753] Avg episode reward: [(0, '24.791')] [2023-06-19 14:28:48,549][00753] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3623.9). Total num frames: 3657728. Throughput: 0: 848.1. Samples: 915590. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-06-19 14:28:48,551][00753] Avg episode reward: [(0, '23.805')] [2023-06-19 14:28:53,549][00753] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3610.0). Total num frames: 3674112. Throughput: 0: 849.8. Samples: 917674. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-06-19 14:28:53,552][00753] Avg episode reward: [(0, '24.707')] [2023-06-19 14:28:56,371][15725] Updated weights for policy 0, policy_version 900 (0.0060) [2023-06-19 14:28:58,549][00753] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3623.9). Total num frames: 3694592. Throughput: 0: 884.7. Samples: 923292. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-06-19 14:28:58,551][00753] Avg episode reward: [(0, '24.907')] [2023-06-19 14:29:03,549][00753] Fps is (10 sec: 4095.9, 60 sec: 3481.6, 300 sec: 3651.7). Total num frames: 3715072. Throughput: 0: 898.5. Samples: 930170. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-06-19 14:29:03,553][00753] Avg episode reward: [(0, '23.733')] [2023-06-19 14:29:06,068][15725] Updated weights for policy 0, policy_version 910 (0.0025) [2023-06-19 14:29:08,549][00753] Fps is (10 sec: 3686.4, 60 sec: 3550.1, 300 sec: 3623.9). Total num frames: 3731456. Throughput: 0: 876.2. Samples: 932518. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-06-19 14:29:08,552][00753] Avg episode reward: [(0, '23.744')] [2023-06-19 14:29:13,549][00753] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3610.0). Total num frames: 3747840. Throughput: 0: 867.5. Samples: 936818. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-06-19 14:29:13,556][00753] Avg episode reward: [(0, '25.003')] [2023-06-19 14:29:18,308][15725] Updated weights for policy 0, policy_version 920 (0.0016) [2023-06-19 14:29:18,549][00753] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3624.0). Total num frames: 3768320. Throughput: 0: 912.9. Samples: 942760. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-06-19 14:29:18,551][00753] Avg episode reward: [(0, '26.885')] [2023-06-19 14:29:23,549][00753] Fps is (10 sec: 4096.0, 60 sec: 3550.0, 300 sec: 3651.7). Total num frames: 3788800. Throughput: 0: 922.8. Samples: 946164. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-06-19 14:29:23,551][00753] Avg episode reward: [(0, '26.821')] [2023-06-19 14:29:28,551][00753] Fps is (10 sec: 3685.7, 60 sec: 3618.0, 300 sec: 3623.9). Total num frames: 3805184. Throughput: 0: 905.7. Samples: 951722. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-06-19 14:29:28,558][00753] Avg episode reward: [(0, '26.204')] [2023-06-19 14:29:29,146][15725] Updated weights for policy 0, policy_version 930 (0.0015) [2023-06-19 14:29:33,549][00753] Fps is (10 sec: 3276.7, 60 sec: 3618.1, 300 sec: 3610.0). Total num frames: 3821568. Throughput: 0: 897.5. Samples: 955980. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-06-19 14:29:33,552][00753] Avg episode reward: [(0, '26.791')] [2023-06-19 14:29:33,567][15712] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000933_3821568.pth... [2023-06-19 14:29:33,771][15712] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000723_2961408.pth [2023-06-19 14:29:38,549][00753] Fps is (10 sec: 3687.1, 60 sec: 3618.1, 300 sec: 3623.9). Total num frames: 3842048. Throughput: 0: 913.5. Samples: 958782. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-06-19 14:29:38,550][00753] Avg episode reward: [(0, '26.335')] [2023-06-19 14:29:40,204][15725] Updated weights for policy 0, policy_version 940 (0.0014) [2023-06-19 14:29:43,549][00753] Fps is (10 sec: 4096.1, 60 sec: 3618.1, 300 sec: 3651.7). Total num frames: 3862528. Throughput: 0: 940.5. Samples: 965614. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-06-19 14:29:43,560][00753] Avg episode reward: [(0, '24.612')] [2023-06-19 14:29:48,549][00753] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3623.9). Total num frames: 3878912. Throughput: 0: 904.0. Samples: 970850. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-06-19 14:29:48,555][00753] Avg episode reward: [(0, '24.456')] [2023-06-19 14:29:51,972][15725] Updated weights for policy 0, policy_version 950 (0.0013) [2023-06-19 14:29:53,549][00753] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3596.1). Total num frames: 3895296. Throughput: 0: 898.0. Samples: 972926. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-06-19 14:29:53,553][00753] Avg episode reward: [(0, '24.337')] [2023-06-19 14:29:58,549][00753] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3623.9). Total num frames: 3915776. Throughput: 0: 922.4. Samples: 978328. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-06-19 14:29:58,554][00753] Avg episode reward: [(0, '25.835')] [2023-06-19 14:30:02,029][15725] Updated weights for policy 0, policy_version 960 (0.0026) [2023-06-19 14:30:03,549][00753] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3637.8). Total num frames: 3936256. Throughput: 0: 942.3. Samples: 985164. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-06-19 14:30:03,551][00753] Avg episode reward: [(0, '25.912')] [2023-06-19 14:30:08,554][00753] Fps is (10 sec: 3684.5, 60 sec: 3686.1, 300 sec: 3623.9). Total num frames: 3952640. Throughput: 0: 925.9. Samples: 987834. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-06-19 14:30:08,556][00753] Avg episode reward: [(0, '27.207')] [2023-06-19 14:30:13,549][00753] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3596.2). Total num frames: 3969024. Throughput: 0: 898.0. Samples: 992132. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-06-19 14:30:13,557][00753] Avg episode reward: [(0, '27.661')] [2023-06-19 14:30:13,567][15712] Saving new best policy, reward=27.661! [2023-06-19 14:30:14,814][15725] Updated weights for policy 0, policy_version 970 (0.0025) [2023-06-19 14:30:18,549][00753] Fps is (10 sec: 3688.3, 60 sec: 3686.4, 300 sec: 3610.0). Total num frames: 3989504. Throughput: 0: 932.0. Samples: 997918. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-06-19 14:30:18,557][00753] Avg episode reward: [(0, '29.371')] [2023-06-19 14:30:18,560][15712] Saving new best policy, reward=29.371! [2023-06-19 14:30:22,031][00753] Component Batcher_0 stopped! [2023-06-19 14:30:22,029][15712] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2023-06-19 14:30:22,056][15729] Stopping RolloutWorker_w3... [2023-06-19 14:30:22,030][15712] Stopping Batcher_0... [2023-06-19 14:30:22,057][00753] Component RolloutWorker_w3 stopped! [2023-06-19 14:30:22,073][15712] Loop batcher_evt_loop terminating... [2023-06-19 14:30:22,065][15729] Loop rollout_proc3_evt_loop terminating... [2023-06-19 14:30:22,076][00753] Component RolloutWorker_w4 stopped! [2023-06-19 14:30:22,080][00753] Component RolloutWorker_w5 stopped! [2023-06-19 14:30:22,075][15731] Stopping RolloutWorker_w4... [2023-06-19 14:30:22,085][15731] Loop rollout_proc4_evt_loop terminating... [2023-06-19 14:30:22,077][15730] Stopping RolloutWorker_w5... [2023-06-19 14:30:22,092][00753] Component RolloutWorker_w1 stopped! [2023-06-19 14:30:22,091][15727] Stopping RolloutWorker_w1... [2023-06-19 14:30:22,104][00753] Component RolloutWorker_w0 stopped! [2023-06-19 14:30:22,107][15726] Stopping RolloutWorker_w0... [2023-06-19 14:30:22,108][15726] Loop rollout_proc0_evt_loop terminating... [2023-06-19 14:30:22,094][15730] Loop rollout_proc5_evt_loop terminating... [2023-06-19 14:30:22,113][00753] Component RolloutWorker_w6 stopped! [2023-06-19 14:30:22,118][15732] Stopping RolloutWorker_w6... [2023-06-19 14:30:22,103][15727] Loop rollout_proc1_evt_loop terminating... [2023-06-19 14:30:22,120][00753] Component RolloutWorker_w2 stopped! [2023-06-19 14:30:22,125][15728] Stopping RolloutWorker_w2... [2023-06-19 14:30:22,126][15728] Loop rollout_proc2_evt_loop terminating... [2023-06-19 14:30:22,126][15732] Loop rollout_proc6_evt_loop terminating... [2023-06-19 14:30:22,147][15733] Stopping RolloutWorker_w7... [2023-06-19 14:30:22,147][00753] Component RolloutWorker_w7 stopped! [2023-06-19 14:30:22,150][15733] Loop rollout_proc7_evt_loop terminating... [2023-06-19 14:30:22,164][15725] Weights refcount: 2 0 [2023-06-19 14:30:22,169][15725] Stopping InferenceWorker_p0-w0... [2023-06-19 14:30:22,172][15725] Loop inference_proc0-0_evt_loop terminating... [2023-06-19 14:30:22,169][00753] Component InferenceWorker_p0-w0 stopped! [2023-06-19 14:30:22,230][15712] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000829_3395584.pth [2023-06-19 14:30:22,245][15712] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2023-06-19 14:30:22,371][15712] Stopping LearnerWorker_p0... [2023-06-19 14:30:22,372][15712] Loop learner_proc0_evt_loop terminating... [2023-06-19 14:30:22,370][00753] Component LearnerWorker_p0 stopped! [2023-06-19 14:30:22,378][00753] Waiting for process learner_proc0 to stop... [2023-06-19 14:30:24,169][00753] Waiting for process inference_proc0-0 to join... [2023-06-19 14:30:24,176][00753] Waiting for process rollout_proc0 to join... [2023-06-19 14:30:25,777][00753] Waiting for process rollout_proc1 to join... [2023-06-19 14:30:26,139][00753] Waiting for process rollout_proc2 to join... [2023-06-19 14:30:26,145][00753] Waiting for process rollout_proc3 to join... [2023-06-19 14:30:26,146][00753] Waiting for process rollout_proc4 to join... [2023-06-19 14:30:26,148][00753] Waiting for process rollout_proc5 to join... [2023-06-19 14:30:26,149][00753] Waiting for process rollout_proc6 to join... [2023-06-19 14:30:26,151][00753] Waiting for process rollout_proc7 to join... [2023-06-19 14:30:26,153][00753] Batcher 0 profile tree view: batching: 28.3181, releasing_batches: 0.0196 [2023-06-19 14:30:26,156][00753] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0001 wait_policy_total: 481.4413 update_model: 8.0245 weight_update: 0.0025 one_step: 0.0180 handle_policy_step: 573.0894 deserialize: 15.2663, stack: 3.0542, obs_to_device_normalize: 112.6684, forward: 311.2599, send_messages: 28.2131 prepare_outputs: 76.1728 to_cpu: 43.4564 [2023-06-19 14:30:26,161][00753] Learner 0 profile tree view: misc: 0.0051, prepare_batch: 19.5693 train: 74.6204 epoch_init: 0.0189, minibatch_init: 0.0090, losses_postprocess: 0.6263, kl_divergence: 0.6649, after_optimizer: 3.7719 calculate_losses: 25.3343 losses_init: 0.0046, forward_head: 1.2514, bptt_initial: 16.9429, tail: 1.0722, advantages_returns: 0.2578, losses: 3.4928 bptt: 1.9604 bptt_forward_core: 1.8777 update: 43.5924 clip: 32.7360 [2023-06-19 14:30:26,162][00753] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.3329, enqueue_policy_requests: 130.0667, env_step: 837.8421, overhead: 21.4076, complete_rollouts: 6.9483 save_policy_outputs: 19.2290 split_output_tensors: 9.0097 [2023-06-19 14:30:26,164][00753] RolloutWorker_w7 profile tree view: wait_for_trajectories: 0.2819, enqueue_policy_requests: 131.4119, env_step: 834.9022, overhead: 21.4959, complete_rollouts: 6.7752 save_policy_outputs: 19.4500 split_output_tensors: 9.4796 [2023-06-19 14:30:26,165][00753] Loop Runner_EvtLoop terminating... [2023-06-19 14:30:26,167][00753] Runner profile tree view: main_loop: 1130.5358 [2023-06-19 14:30:26,168][00753] Collected {0: 4005888}, FPS: 3543.4 [2023-06-19 14:30:38,058][00753] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2023-06-19 14:30:38,060][00753] Overriding arg 'num_workers' with value 1 passed from command line [2023-06-19 14:30:38,062][00753] Adding new argument 'no_render'=True that is not in the saved config file! [2023-06-19 14:30:38,063][00753] Adding new argument 'save_video'=True that is not in the saved config file! [2023-06-19 14:30:38,066][00753] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2023-06-19 14:30:38,071][00753] Adding new argument 'video_name'=None that is not in the saved config file! [2023-06-19 14:30:38,074][00753] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! [2023-06-19 14:30:38,076][00753] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2023-06-19 14:30:38,078][00753] Adding new argument 'push_to_hub'=False that is not in the saved config file! [2023-06-19 14:30:38,080][00753] Adding new argument 'hf_repository'=None that is not in the saved config file! [2023-06-19 14:30:38,084][00753] Adding new argument 'policy_index'=0 that is not in the saved config file! [2023-06-19 14:30:38,088][00753] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2023-06-19 14:30:38,089][00753] Adding new argument 'train_script'=None that is not in the saved config file! [2023-06-19 14:30:38,091][00753] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2023-06-19 14:30:38,094][00753] Using frameskip 1 and render_action_repeat=4 for evaluation [2023-06-19 14:30:38,109][00753] Doom resolution: 160x120, resize resolution: (128, 72) [2023-06-19 14:30:38,111][00753] RunningMeanStd input shape: (3, 72, 128) [2023-06-19 14:30:38,114][00753] RunningMeanStd input shape: (1,) [2023-06-19 14:30:38,129][00753] ConvEncoder: input_channels=3 [2023-06-19 14:30:38,255][00753] Conv encoder output size: 512 [2023-06-19 14:30:38,257][00753] Policy head output size: 512 [2023-06-19 14:30:41,544][00753] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2023-06-19 14:30:43,145][00753] Num frames 100... [2023-06-19 14:30:43,365][00753] Num frames 200... [2023-06-19 14:30:43,617][00753] Num frames 300... [2023-06-19 14:30:43,826][00753] Num frames 400... [2023-06-19 14:30:44,044][00753] Num frames 500... [2023-06-19 14:30:44,352][00753] Num frames 600... [2023-06-19 14:30:44,695][00753] Num frames 700... [2023-06-19 14:30:44,966][00753] Num frames 800... [2023-06-19 14:30:45,217][00753] Num frames 900... [2023-06-19 14:30:45,479][00753] Num frames 1000... [2023-06-19 14:30:45,616][00753] Num frames 1100... [2023-06-19 14:30:45,737][00753] Num frames 1200... [2023-06-19 14:30:45,862][00753] Num frames 1300... [2023-06-19 14:30:45,989][00753] Num frames 1400... [2023-06-19 14:30:46,114][00753] Num frames 1500... [2023-06-19 14:30:46,236][00753] Num frames 1600... [2023-06-19 14:30:46,357][00753] Num frames 1700... [2023-06-19 14:30:46,481][00753] Num frames 1800... [2023-06-19 14:30:46,610][00753] Num frames 1900... [2023-06-19 14:30:46,731][00753] Num frames 2000... [2023-06-19 14:30:46,853][00753] Num frames 2100... [2023-06-19 14:30:46,905][00753] Avg episode rewards: #0: 50.999, true rewards: #0: 21.000 [2023-06-19 14:30:46,908][00753] Avg episode reward: 50.999, avg true_objective: 21.000 [2023-06-19 14:30:47,025][00753] Num frames 2200... [2023-06-19 14:30:47,149][00753] Num frames 2300... [2023-06-19 14:30:47,271][00753] Num frames 2400... [2023-06-19 14:30:47,395][00753] Num frames 2500... [2023-06-19 14:30:47,517][00753] Num frames 2600... [2023-06-19 14:30:47,646][00753] Num frames 2700... [2023-06-19 14:30:47,772][00753] Num frames 2800... [2023-06-19 14:30:47,897][00753] Num frames 2900... [2023-06-19 14:30:48,025][00753] Num frames 3000... [2023-06-19 14:30:48,149][00753] Num frames 3100... [2023-06-19 14:30:48,274][00753] Num frames 3200... [2023-06-19 14:30:48,395][00753] Num frames 3300... [2023-06-19 14:30:48,517][00753] Num frames 3400... [2023-06-19 14:30:48,654][00753] Num frames 3500... [2023-06-19 14:30:48,778][00753] Num frames 3600... [2023-06-19 14:30:48,904][00753] Num frames 3700... [2023-06-19 14:30:49,035][00753] Num frames 3800... [2023-06-19 14:30:49,172][00753] Num frames 3900... [2023-06-19 14:30:49,308][00753] Num frames 4000... [2023-06-19 14:30:49,437][00753] Num frames 4100... [2023-06-19 14:30:49,574][00753] Num frames 4200... [2023-06-19 14:30:49,626][00753] Avg episode rewards: #0: 55.999, true rewards: #0: 21.000 [2023-06-19 14:30:49,628][00753] Avg episode reward: 55.999, avg true_objective: 21.000 [2023-06-19 14:30:49,762][00753] Num frames 4300... [2023-06-19 14:30:49,891][00753] Num frames 4400... [2023-06-19 14:30:50,027][00753] Num frames 4500... [2023-06-19 14:30:50,150][00753] Num frames 4600... [2023-06-19 14:30:50,280][00753] Num frames 4700... [2023-06-19 14:30:50,352][00753] Avg episode rewards: #0: 40.039, true rewards: #0: 15.707 [2023-06-19 14:30:50,354][00753] Avg episode reward: 40.039, avg true_objective: 15.707 [2023-06-19 14:30:50,466][00753] Num frames 4800... [2023-06-19 14:30:50,605][00753] Num frames 4900... [2023-06-19 14:30:50,735][00753] Num frames 5000... [2023-06-19 14:30:50,863][00753] Num frames 5100... [2023-06-19 14:30:50,995][00753] Num frames 5200... [2023-06-19 14:30:51,126][00753] Num frames 5300... [2023-06-19 14:30:51,252][00753] Num frames 5400... [2023-06-19 14:30:51,374][00753] Avg episode rewards: #0: 33.369, true rewards: #0: 13.620 [2023-06-19 14:30:51,381][00753] Avg episode reward: 33.369, avg true_objective: 13.620 [2023-06-19 14:30:51,452][00753] Num frames 5500... [2023-06-19 14:30:51,578][00753] Num frames 5600... [2023-06-19 14:30:51,712][00753] Num frames 5700... [2023-06-19 14:30:51,845][00753] Num frames 5800... [2023-06-19 14:30:51,977][00753] Num frames 5900... [2023-06-19 14:30:52,101][00753] Num frames 6000... [2023-06-19 14:30:52,239][00753] Num frames 6100... [2023-06-19 14:30:52,420][00753] Num frames 6200... [2023-06-19 14:30:52,612][00753] Num frames 6300... [2023-06-19 14:30:52,802][00753] Num frames 6400... [2023-06-19 14:30:52,989][00753] Num frames 6500... [2023-06-19 14:30:53,174][00753] Num frames 6600... [2023-06-19 14:30:53,354][00753] Num frames 6700... [2023-06-19 14:30:53,536][00753] Num frames 6800... [2023-06-19 14:30:53,717][00753] Num frames 6900... [2023-06-19 14:30:53,902][00753] Num frames 7000... [2023-06-19 14:30:54,079][00753] Num frames 7100... [2023-06-19 14:30:54,313][00753] Avg episode rewards: #0: 35.797, true rewards: #0: 14.398 [2023-06-19 14:30:54,315][00753] Avg episode reward: 35.797, avg true_objective: 14.398 [2023-06-19 14:30:54,319][00753] Num frames 7200... [2023-06-19 14:30:54,491][00753] Num frames 7300... [2023-06-19 14:30:54,669][00753] Num frames 7400... [2023-06-19 14:30:54,849][00753] Num frames 7500... [2023-06-19 14:30:55,027][00753] Num frames 7600... [2023-06-19 14:30:55,203][00753] Num frames 7700... [2023-06-19 14:30:55,381][00753] Num frames 7800... [2023-06-19 14:30:55,556][00753] Num frames 7900... [2023-06-19 14:30:55,737][00753] Num frames 8000... [2023-06-19 14:30:55,923][00753] Num frames 8100... [2023-06-19 14:30:56,110][00753] Num frames 8200... [2023-06-19 14:30:56,259][00753] Num frames 8300... [2023-06-19 14:30:56,380][00753] Num frames 8400... [2023-06-19 14:30:56,503][00753] Num frames 8500... [2023-06-19 14:30:56,628][00753] Num frames 8600... [2023-06-19 14:30:56,791][00753] Avg episode rewards: #0: 35.645, true rewards: #0: 14.478 [2023-06-19 14:30:56,793][00753] Avg episode reward: 35.645, avg true_objective: 14.478 [2023-06-19 14:30:56,814][00753] Num frames 8700... [2023-06-19 14:30:56,940][00753] Num frames 8800... [2023-06-19 14:30:57,064][00753] Num frames 8900... [2023-06-19 14:30:57,190][00753] Num frames 9000... [2023-06-19 14:30:57,310][00753] Num frames 9100... [2023-06-19 14:30:57,428][00753] Num frames 9200... [2023-06-19 14:30:57,559][00753] Num frames 9300... [2023-06-19 14:30:57,682][00753] Num frames 9400... [2023-06-19 14:30:57,813][00753] Num frames 9500... [2023-06-19 14:30:57,942][00753] Num frames 9600... [2023-06-19 14:30:58,022][00753] Avg episode rewards: #0: 33.885, true rewards: #0: 13.743 [2023-06-19 14:30:58,024][00753] Avg episode reward: 33.885, avg true_objective: 13.743 [2023-06-19 14:30:58,123][00753] Num frames 9700... [2023-06-19 14:30:58,249][00753] Num frames 9800... [2023-06-19 14:30:58,370][00753] Num frames 9900... [2023-06-19 14:30:58,494][00753] Num frames 10000... [2023-06-19 14:30:58,615][00753] Num frames 10100... [2023-06-19 14:30:58,785][00753] Avg episode rewards: #0: 30.997, true rewards: #0: 12.748 [2023-06-19 14:30:58,786][00753] Avg episode reward: 30.997, avg true_objective: 12.748 [2023-06-19 14:30:58,793][00753] Num frames 10200... [2023-06-19 14:30:58,931][00753] Num frames 10300... [2023-06-19 14:30:59,066][00753] Num frames 10400... [2023-06-19 14:30:59,196][00753] Num frames 10500... [2023-06-19 14:30:59,315][00753] Num frames 10600... [2023-06-19 14:30:59,444][00753] Num frames 10700... [2023-06-19 14:30:59,565][00753] Num frames 10800... [2023-06-19 14:30:59,690][00753] Num frames 10900... [2023-06-19 14:30:59,814][00753] Num frames 11000... [2023-06-19 14:30:59,945][00753] Num frames 11100... [2023-06-19 14:31:00,068][00753] Num frames 11200... [2023-06-19 14:31:00,193][00753] Num frames 11300... [2023-06-19 14:31:00,321][00753] Num frames 11400... [2023-06-19 14:31:00,441][00753] Num frames 11500... [2023-06-19 14:31:00,563][00753] Num frames 11600... [2023-06-19 14:31:00,691][00753] Num frames 11700... [2023-06-19 14:31:00,817][00753] Num frames 11800... [2023-06-19 14:31:00,953][00753] Num frames 11900... [2023-06-19 14:31:01,031][00753] Avg episode rewards: #0: 32.462, true rewards: #0: 13.240 [2023-06-19 14:31:01,032][00753] Avg episode reward: 32.462, avg true_objective: 13.240 [2023-06-19 14:31:01,134][00753] Num frames 12000... [2023-06-19 14:31:01,258][00753] Num frames 12100... [2023-06-19 14:31:01,378][00753] Num frames 12200... [2023-06-19 14:31:01,521][00753] Num frames 12300... [2023-06-19 14:31:01,649][00753] Num frames 12400... [2023-06-19 14:31:01,787][00753] Num frames 12500... [2023-06-19 14:31:01,918][00753] Num frames 12600... [2023-06-19 14:31:01,998][00753] Avg episode rewards: #0: 30.620, true rewards: #0: 12.620 [2023-06-19 14:31:02,000][00753] Avg episode reward: 30.620, avg true_objective: 12.620 [2023-06-19 14:32:18,953][00753] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2023-06-19 14:33:47,781][00753] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2023-06-19 14:33:47,783][00753] Overriding arg 'num_workers' with value 1 passed from command line [2023-06-19 14:33:47,785][00753] Adding new argument 'no_render'=True that is not in the saved config file! [2023-06-19 14:33:47,786][00753] Adding new argument 'save_video'=True that is not in the saved config file! [2023-06-19 14:33:47,788][00753] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2023-06-19 14:33:47,790][00753] Adding new argument 'video_name'=None that is not in the saved config file! [2023-06-19 14:33:47,794][00753] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2023-06-19 14:33:47,796][00753] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2023-06-19 14:33:47,798][00753] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2023-06-19 14:33:47,799][00753] Adding new argument 'hf_repository'='Ditrip/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2023-06-19 14:33:47,800][00753] Adding new argument 'policy_index'=0 that is not in the saved config file! [2023-06-19 14:33:47,801][00753] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2023-06-19 14:33:47,802][00753] Adding new argument 'train_script'=None that is not in the saved config file! [2023-06-19 14:33:47,804][00753] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2023-06-19 14:33:47,806][00753] Using frameskip 1 and render_action_repeat=4 for evaluation [2023-06-19 14:33:47,833][00753] RunningMeanStd input shape: (3, 72, 128) [2023-06-19 14:33:47,839][00753] RunningMeanStd input shape: (1,) [2023-06-19 14:33:47,856][00753] ConvEncoder: input_channels=3 [2023-06-19 14:33:47,911][00753] Conv encoder output size: 512 [2023-06-19 14:33:47,913][00753] Policy head output size: 512 [2023-06-19 14:33:47,941][00753] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2023-06-19 14:33:48,658][00753] Num frames 100... [2023-06-19 14:33:48,835][00753] Num frames 200... [2023-06-19 14:33:49,014][00753] Num frames 300... [2023-06-19 14:33:49,203][00753] Num frames 400... [2023-06-19 14:33:49,388][00753] Num frames 500... [2023-06-19 14:33:49,573][00753] Num frames 600... [2023-06-19 14:33:49,761][00753] Num frames 700... [2023-06-19 14:33:49,949][00753] Num frames 800... [2023-06-19 14:33:50,002][00753] Avg episode rewards: #0: 17.000, true rewards: #0: 8.000 [2023-06-19 14:33:50,004][00753] Avg episode reward: 17.000, avg true_objective: 8.000 [2023-06-19 14:33:50,193][00753] Num frames 900... [2023-06-19 14:33:50,379][00753] Num frames 1000... [2023-06-19 14:33:50,565][00753] Num frames 1100... [2023-06-19 14:33:50,740][00753] Num frames 1200... [2023-06-19 14:33:50,869][00753] Num frames 1300... [2023-06-19 14:33:50,996][00753] Num frames 1400... [2023-06-19 14:33:51,122][00753] Num frames 1500... [2023-06-19 14:33:51,254][00753] Num frames 1600... [2023-06-19 14:33:51,380][00753] Num frames 1700... [2023-06-19 14:33:51,512][00753] Num frames 1800... [2023-06-19 14:33:51,645][00753] Num frames 1900... [2023-06-19 14:33:51,774][00753] Num frames 2000... [2023-06-19 14:33:51,907][00753] Num frames 2100... [2023-06-19 14:33:52,031][00753] Num frames 2200... [2023-06-19 14:33:52,155][00753] Num frames 2300... [2023-06-19 14:33:52,291][00753] Num frames 2400... [2023-06-19 14:33:52,418][00753] Num frames 2500... [2023-06-19 14:33:52,564][00753] Num frames 2600... [2023-06-19 14:33:52,693][00753] Num frames 2700... [2023-06-19 14:33:52,816][00753] Num frames 2800... [2023-06-19 14:33:52,951][00753] Num frames 2900... [2023-06-19 14:33:53,003][00753] Avg episode rewards: #0: 37.500, true rewards: #0: 14.500 [2023-06-19 14:33:53,004][00753] Avg episode reward: 37.500, avg true_objective: 14.500 [2023-06-19 14:33:53,134][00753] Num frames 3000... [2023-06-19 14:33:53,264][00753] Num frames 3100... [2023-06-19 14:33:53,394][00753] Num frames 3200... [2023-06-19 14:33:53,520][00753] Num frames 3300... [2023-06-19 14:33:53,640][00753] Num frames 3400... [2023-06-19 14:33:53,777][00753] Num frames 3500... [2023-06-19 14:33:53,902][00753] Num frames 3600... [2023-06-19 14:33:54,040][00753] Num frames 3700... [2023-06-19 14:33:54,166][00753] Num frames 3800... [2023-06-19 14:33:54,295][00753] Num frames 3900... [2023-06-19 14:33:54,423][00753] Num frames 4000... [2023-06-19 14:33:54,554][00753] Num frames 4100... [2023-06-19 14:33:54,677][00753] Num frames 4200... [2023-06-19 14:33:54,813][00753] Num frames 4300... [2023-06-19 14:33:54,937][00753] Num frames 4400... [2023-06-19 14:33:55,059][00753] Num frames 4500... [2023-06-19 14:33:55,192][00753] Num frames 4600... [2023-06-19 14:33:55,316][00753] Num frames 4700... [2023-06-19 14:33:55,449][00753] Avg episode rewards: #0: 40.519, true rewards: #0: 15.853 [2023-06-19 14:33:55,451][00753] Avg episode reward: 40.519, avg true_objective: 15.853 [2023-06-19 14:33:55,520][00753] Num frames 4800... [2023-06-19 14:33:55,645][00753] Num frames 4900... [2023-06-19 14:33:55,777][00753] Num frames 5000... [2023-06-19 14:33:55,905][00753] Num frames 5100... [2023-06-19 14:33:56,043][00753] Num frames 5200... [2023-06-19 14:33:56,173][00753] Num frames 5300... [2023-06-19 14:33:56,300][00753] Num frames 5400... [2023-06-19 14:33:56,431][00753] Num frames 5500... [2023-06-19 14:33:56,565][00753] Num frames 5600... [2023-06-19 14:33:56,695][00753] Num frames 5700... [2023-06-19 14:33:56,819][00753] Num frames 5800... [2023-06-19 14:33:56,956][00753] Num frames 5900... [2023-06-19 14:33:57,081][00753] Num frames 6000... [2023-06-19 14:33:57,219][00753] Num frames 6100... [2023-06-19 14:33:57,344][00753] Num frames 6200... [2023-06-19 14:33:57,474][00753] Num frames 6300... [2023-06-19 14:33:57,605][00753] Num frames 6400... [2023-06-19 14:33:57,732][00753] Num frames 6500... [2023-06-19 14:33:57,856][00753] Avg episode rewards: #0: 41.597, true rewards: #0: 16.348 [2023-06-19 14:33:57,858][00753] Avg episode reward: 41.597, avg true_objective: 16.348 [2023-06-19 14:33:57,938][00753] Num frames 6600... [2023-06-19 14:33:58,059][00753] Num frames 6700... [2023-06-19 14:33:58,194][00753] Num frames 6800... [2023-06-19 14:33:58,320][00753] Num frames 6900... [2023-06-19 14:33:58,445][00753] Avg episode rewards: #0: 34.308, true rewards: #0: 13.908 [2023-06-19 14:33:58,448][00753] Avg episode reward: 34.308, avg true_objective: 13.908 [2023-06-19 14:33:58,511][00753] Num frames 7000... [2023-06-19 14:33:58,647][00753] Num frames 7100... [2023-06-19 14:33:58,771][00753] Num frames 7200... [2023-06-19 14:33:58,895][00753] Num frames 7300... [2023-06-19 14:33:59,023][00753] Num frames 7400... [2023-06-19 14:33:59,157][00753] Num frames 7500... [2023-06-19 14:33:59,281][00753] Num frames 7600... [2023-06-19 14:33:59,446][00753] Avg episode rewards: #0: 30.983, true rewards: #0: 12.817 [2023-06-19 14:33:59,448][00753] Avg episode reward: 30.983, avg true_objective: 12.817 [2023-06-19 14:33:59,463][00753] Num frames 7700... [2023-06-19 14:33:59,601][00753] Num frames 7800... [2023-06-19 14:33:59,724][00753] Num frames 7900... [2023-06-19 14:33:59,855][00753] Num frames 8000... [2023-06-19 14:33:59,983][00753] Num frames 8100... [2023-06-19 14:34:00,115][00753] Num frames 8200... [2023-06-19 14:34:00,245][00753] Num frames 8300... [2023-06-19 14:34:00,378][00753] Num frames 8400... [2023-06-19 14:34:00,466][00753] Avg episode rewards: #0: 29.323, true rewards: #0: 12.037 [2023-06-19 14:34:00,468][00753] Avg episode reward: 29.323, avg true_objective: 12.037 [2023-06-19 14:34:00,563][00753] Num frames 8500... [2023-06-19 14:34:00,694][00753] Num frames 8600... [2023-06-19 14:34:00,867][00753] Num frames 8700... [2023-06-19 14:34:01,050][00753] Num frames 8800... [2023-06-19 14:34:01,236][00753] Num frames 8900... [2023-06-19 14:34:01,420][00753] Num frames 9000... [2023-06-19 14:34:01,607][00753] Num frames 9100... [2023-06-19 14:34:01,812][00753] Num frames 9200... [2023-06-19 14:34:02,016][00753] Num frames 9300... [2023-06-19 14:34:02,202][00753] Num frames 9400... [2023-06-19 14:34:02,381][00753] Num frames 9500... [2023-06-19 14:34:02,560][00753] Num frames 9600... [2023-06-19 14:34:02,743][00753] Num frames 9700... [2023-06-19 14:34:02,930][00753] Num frames 9800... [2023-06-19 14:34:03,053][00753] Avg episode rewards: #0: 29.792, true rewards: #0: 12.292 [2023-06-19 14:34:03,055][00753] Avg episode reward: 29.792, avg true_objective: 12.292 [2023-06-19 14:34:03,177][00753] Num frames 9900... [2023-06-19 14:34:03,363][00753] Num frames 10000... [2023-06-19 14:34:03,547][00753] Num frames 10100... [2023-06-19 14:34:03,737][00753] Num frames 10200... [2023-06-19 14:34:03,930][00753] Num frames 10300... [2023-06-19 14:34:04,129][00753] Avg episode rewards: #0: 27.419, true rewards: #0: 11.530 [2023-06-19 14:34:04,132][00753] Avg episode reward: 27.419, avg true_objective: 11.530 [2023-06-19 14:34:04,179][00753] Num frames 10400... [2023-06-19 14:34:04,359][00753] Num frames 10500... [2023-06-19 14:34:04,535][00753] Num frames 10600... [2023-06-19 14:34:04,716][00753] Num frames 10700... [2023-06-19 14:34:04,916][00753] Avg episode rewards: #0: 25.293, true rewards: #0: 10.793 [2023-06-19 14:34:04,918][00753] Avg episode reward: 25.293, avg true_objective: 10.793 [2023-06-19 14:35:11,109][00753] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2023-06-19 14:35:14,170][00753] The model has been pushed to https://huggingface.co/Ditrip/rl_course_vizdoom_health_gathering_supreme [2023-06-19 14:36:13,295][00753] Environment doom_basic already registered, overwriting... [2023-06-19 14:36:13,298][00753] Environment doom_two_colors_easy already registered, overwriting... [2023-06-19 14:36:13,299][00753] Environment doom_two_colors_hard already registered, overwriting... [2023-06-19 14:36:13,300][00753] Environment doom_dm already registered, overwriting... [2023-06-19 14:36:13,302][00753] Environment doom_dwango5 already registered, overwriting... [2023-06-19 14:36:13,303][00753] Environment doom_my_way_home_flat_actions already registered, overwriting... [2023-06-19 14:36:13,304][00753] Environment doom_defend_the_center_flat_actions already registered, overwriting... [2023-06-19 14:36:13,306][00753] Environment doom_my_way_home already registered, overwriting... [2023-06-19 14:36:13,307][00753] Environment doom_deadly_corridor already registered, overwriting... [2023-06-19 14:36:13,308][00753] Environment doom_defend_the_center already registered, overwriting... [2023-06-19 14:36:13,310][00753] Environment doom_defend_the_line already registered, overwriting... [2023-06-19 14:36:13,311][00753] Environment doom_health_gathering already registered, overwriting... [2023-06-19 14:36:13,312][00753] Environment doom_health_gathering_supreme already registered, overwriting... [2023-06-19 14:36:13,314][00753] Environment doom_battle already registered, overwriting... [2023-06-19 14:36:13,315][00753] Environment doom_battle2 already registered, overwriting... [2023-06-19 14:36:13,316][00753] Environment doom_duel_bots already registered, overwriting... [2023-06-19 14:36:13,318][00753] Environment doom_deathmatch_bots already registered, overwriting... [2023-06-19 14:36:13,319][00753] Environment doom_duel already registered, overwriting... [2023-06-19 14:36:13,320][00753] Environment doom_deathmatch_full already registered, overwriting... [2023-06-19 14:36:13,322][00753] Environment doom_benchmark already registered, overwriting... [2023-06-19 14:36:13,323][00753] register_encoder_factory: [2023-06-19 14:36:13,349][00753] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2023-06-19 14:36:13,351][00753] Overriding arg 'num_workers' with value 12 passed from command line [2023-06-19 14:36:13,352][00753] Overriding arg 'train_for_env_steps' with value 6000000 passed from command line [2023-06-19 14:36:13,357][00753] Experiment dir /content/train_dir/default_experiment already exists! [2023-06-19 14:36:13,362][00753] Resuming existing experiment from /content/train_dir/default_experiment... [2023-06-19 14:36:13,363][00753] Weights and Biases integration disabled [2023-06-19 14:36:13,366][00753] Environment var CUDA_VISIBLE_DEVICES is 0 [2023-06-19 14:36:15,416][00753] Starting experiment with the following configuration: help=False algo=APPO env=doom_health_gathering_supreme experiment=default_experiment train_dir=/content/train_dir restart_behavior=resume device=gpu seed=None num_policies=1 async_rl=True serial_mode=False batched_sampling=False num_batches_to_accumulate=2 worker_num_splits=2 policy_workers_per_policy=1 max_policy_lag=1000 num_workers=12 num_envs_per_worker=4 batch_size=1024 num_batches_per_epoch=1 num_epochs=1 rollout=32 recurrence=32 shuffle_minibatches=False gamma=0.99 reward_scale=1.0 reward_clip=1000.0 value_bootstrap=False normalize_returns=True exploration_loss_coeff=0.001 value_loss_coeff=0.5 kl_loss_coeff=0.0 exploration_loss=symmetric_kl gae_lambda=0.95 ppo_clip_ratio=0.1 ppo_clip_value=0.2 with_vtrace=False vtrace_rho=1.0 vtrace_c=1.0 optimizer=adam adam_eps=1e-06 adam_beta1=0.9 adam_beta2=0.999 max_grad_norm=4.0 learning_rate=0.0001 lr_schedule=constant lr_schedule_kl_threshold=0.008 lr_adaptive_min=1e-06 lr_adaptive_max=0.01 obs_subtract_mean=0.0 obs_scale=255.0 normalize_input=True normalize_input_keys=None decorrelate_experience_max_seconds=0 decorrelate_envs_on_one_worker=True actor_worker_gpus=[] set_workers_cpu_affinity=True force_envs_single_thread=False default_niceness=0 log_to_file=True experiment_summaries_interval=10 flush_summaries_interval=30 stats_avg=100 summaries_use_frameskip=True heartbeat_interval=20 heartbeat_reporting_interval=600 train_for_env_steps=6000000 train_for_seconds=10000000000 save_every_sec=120 keep_checkpoints=2 load_checkpoint_kind=latest save_milestones_sec=-1 save_best_every_sec=5 save_best_metric=reward save_best_after=100000 benchmark=False encoder_mlp_layers=[512, 512] encoder_conv_architecture=convnet_simple encoder_conv_mlp_layers=[512] use_rnn=True rnn_size=512 rnn_type=gru rnn_num_layers=1 decoder_mlp_layers=[] nonlinearity=elu policy_initialization=orthogonal policy_init_gain=1.0 actor_critic_share_weights=True adaptive_stddev=True continuous_tanh_scale=0.0 initial_stddev=1.0 use_env_info_cache=False env_gpu_actions=False env_gpu_observations=True env_frameskip=4 env_framestack=1 pixel_format=CHW use_record_episode_statistics=False with_wandb=False wandb_user=None wandb_project=sample_factory wandb_group=None wandb_job_type=SF wandb_tags=[] with_pbt=False pbt_mix_policies_in_one_env=True pbt_period_env_steps=5000000 pbt_start_mutation=20000000 pbt_replace_fraction=0.3 pbt_mutation_rate=0.15 pbt_replace_reward_gap=0.1 pbt_replace_reward_gap_absolute=1e-06 pbt_optimize_gamma=False pbt_target_objective=true_objective pbt_perturb_min=1.1 pbt_perturb_max=1.5 num_agents=-1 num_humans=0 num_bots=-1 start_bot_difficulty=None timelimit=None res_w=128 res_h=72 wide_aspect_ratio=False eval_env_frameskip=1 fps=35 command_line=--env=doom_health_gathering_supreme --num_workers=8 --num_envs_per_worker=4 --train_for_env_steps=4000000 cli_args={'env': 'doom_health_gathering_supreme', 'num_workers': 8, 'num_envs_per_worker': 4, 'train_for_env_steps': 4000000} git_hash=unknown git_repo_name=not a git repository [2023-06-19 14:36:15,418][00753] Saving configuration to /content/train_dir/default_experiment/config.json... [2023-06-19 14:36:15,423][00753] Rollout worker 0 uses device cpu [2023-06-19 14:36:15,425][00753] Rollout worker 1 uses device cpu [2023-06-19 14:36:15,427][00753] Rollout worker 2 uses device cpu [2023-06-19 14:36:15,429][00753] Rollout worker 3 uses device cpu [2023-06-19 14:36:15,430][00753] Rollout worker 4 uses device cpu [2023-06-19 14:36:15,431][00753] Rollout worker 5 uses device cpu [2023-06-19 14:36:15,432][00753] Rollout worker 6 uses device cpu [2023-06-19 14:36:15,434][00753] Rollout worker 7 uses device cpu [2023-06-19 14:36:15,435][00753] Rollout worker 8 uses device cpu [2023-06-19 14:36:15,436][00753] Rollout worker 9 uses device cpu [2023-06-19 14:36:15,437][00753] Rollout worker 10 uses device cpu [2023-06-19 14:36:15,439][00753] Rollout worker 11 uses device cpu [2023-06-19 14:36:15,557][00753] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-06-19 14:36:15,560][00753] InferenceWorker_p0-w0: min num requests: 4 [2023-06-19 14:36:15,607][00753] Starting all processes... [2023-06-19 14:36:15,608][00753] Starting process learner_proc0 [2023-06-19 14:36:15,657][00753] Starting all processes... [2023-06-19 14:36:15,663][00753] Starting process inference_proc0-0 [2023-06-19 14:36:15,665][00753] Starting process rollout_proc0 [2023-06-19 14:36:15,679][00753] Starting process rollout_proc1 [2023-06-19 14:36:15,680][00753] Starting process rollout_proc2 [2023-06-19 14:36:15,680][00753] Starting process rollout_proc3 [2023-06-19 14:36:15,680][00753] Starting process rollout_proc4 [2023-06-19 14:36:15,680][00753] Starting process rollout_proc5 [2023-06-19 14:36:15,680][00753] Starting process rollout_proc6 [2023-06-19 14:36:15,680][00753] Starting process rollout_proc7 [2023-06-19 14:36:15,680][00753] Starting process rollout_proc8 [2023-06-19 14:36:15,680][00753] Starting process rollout_proc9 [2023-06-19 14:36:15,680][00753] Starting process rollout_proc10 [2023-06-19 14:36:15,680][00753] Starting process rollout_proc11 [2023-06-19 14:36:39,488][22299] Worker 3 uses CPU cores [1] [2023-06-19 14:36:39,604][00753] Heartbeat connected on RolloutWorker_w3 [2023-06-19 14:36:39,792][22310] Worker 9 uses CPU cores [1] [2023-06-19 14:36:39,977][22295] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-06-19 14:36:39,981][22295] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2023-06-19 14:36:40,008][22303] Worker 8 uses CPU cores [0] [2023-06-19 14:36:40,015][22311] Worker 11 uses CPU cores [1] [2023-06-19 14:36:40,044][00753] Heartbeat connected on RolloutWorker_w9 [2023-06-19 14:36:40,062][22295] Num visible devices: 1 [2023-06-19 14:36:40,068][22300] Worker 4 uses CPU cores [0] [2023-06-19 14:36:40,079][22297] Worker 1 uses CPU cores [1] [2023-06-19 14:36:40,084][00753] Heartbeat connected on InferenceWorker_p0-w0 [2023-06-19 14:36:40,083][22278] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-06-19 14:36:40,087][22278] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2023-06-19 14:36:40,088][22296] Worker 0 uses CPU cores [0] [2023-06-19 14:36:40,116][22309] Worker 10 uses CPU cores [0] [2023-06-19 14:36:40,119][22304] Worker 7 uses CPU cores [1] [2023-06-19 14:36:40,127][00753] Heartbeat connected on RolloutWorker_w8 [2023-06-19 14:36:40,132][22298] Worker 2 uses CPU cores [0] [2023-06-19 14:36:40,135][22278] Num visible devices: 1 [2023-06-19 14:36:40,151][22302] Worker 6 uses CPU cores [0] [2023-06-19 14:36:40,156][00753] Heartbeat connected on RolloutWorker_w0 [2023-06-19 14:36:40,157][00753] Heartbeat connected on RolloutWorker_w4 [2023-06-19 14:36:40,161][00753] Heartbeat connected on RolloutWorker_w10 [2023-06-19 14:36:40,169][00753] Heartbeat connected on RolloutWorker_w11 [2023-06-19 14:36:40,176][22278] Starting seed is not provided [2023-06-19 14:36:40,177][22278] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-06-19 14:36:40,178][22278] Initializing actor-critic model on device cuda:0 [2023-06-19 14:36:40,176][00753] Heartbeat connected on RolloutWorker_w2 [2023-06-19 14:36:40,178][22278] RunningMeanStd input shape: (3, 72, 128) [2023-06-19 14:36:40,182][22278] RunningMeanStd input shape: (1,) [2023-06-19 14:36:40,181][00753] Heartbeat connected on RolloutWorker_w6 [2023-06-19 14:36:40,183][00753] Heartbeat connected on RolloutWorker_w7 [2023-06-19 14:36:40,187][00753] Heartbeat connected on RolloutWorker_w1 [2023-06-19 14:36:40,194][00753] Heartbeat connected on Batcher_0 [2023-06-19 14:36:40,213][22278] ConvEncoder: input_channels=3 [2023-06-19 14:36:40,248][22301] Worker 5 uses CPU cores [1] [2023-06-19 14:36:40,259][00753] Heartbeat connected on RolloutWorker_w5 [2023-06-19 14:36:40,355][22278] Conv encoder output size: 512 [2023-06-19 14:36:40,356][22278] Policy head output size: 512 [2023-06-19 14:36:40,375][22278] Created Actor Critic model with architecture: [2023-06-19 14:36:40,376][22278] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2023-06-19 14:36:40,600][22278] Using optimizer [2023-06-19 14:36:40,601][22278] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2023-06-19 14:36:40,636][22278] Loading model from checkpoint [2023-06-19 14:36:40,641][22278] Loaded experiment state at self.train_step=978, self.env_steps=4005888 [2023-06-19 14:36:40,641][22278] Initialized policy 0 weights for model version 978 [2023-06-19 14:36:40,648][22278] LearnerWorker_p0 finished initialization! [2023-06-19 14:36:40,649][22278] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-06-19 14:36:40,650][00753] Heartbeat connected on LearnerWorker_p0 [2023-06-19 14:36:40,856][22295] RunningMeanStd input shape: (3, 72, 128) [2023-06-19 14:36:40,857][22295] RunningMeanStd input shape: (1,) [2023-06-19 14:36:40,870][22295] ConvEncoder: input_channels=3 [2023-06-19 14:36:40,978][22295] Conv encoder output size: 512 [2023-06-19 14:36:40,979][22295] Policy head output size: 512 [2023-06-19 14:36:41,041][00753] Inference worker 0-0 is ready! [2023-06-19 14:36:41,042][00753] All inference workers are ready! Signal rollout workers to start! [2023-06-19 14:36:41,214][22309] Doom resolution: 160x120, resize resolution: (128, 72) [2023-06-19 14:36:41,233][22297] Doom resolution: 160x120, resize resolution: (128, 72) [2023-06-19 14:36:41,241][22301] Doom resolution: 160x120, resize resolution: (128, 72) [2023-06-19 14:36:41,243][22304] Doom resolution: 160x120, resize resolution: (128, 72) [2023-06-19 14:36:41,246][22311] Doom resolution: 160x120, resize resolution: (128, 72) [2023-06-19 14:36:41,243][22296] Doom resolution: 160x120, resize resolution: (128, 72) [2023-06-19 14:36:41,247][22298] Doom resolution: 160x120, resize resolution: (128, 72) [2023-06-19 14:36:41,259][22300] Doom resolution: 160x120, resize resolution: (128, 72) [2023-06-19 14:36:41,264][22310] Doom resolution: 160x120, resize resolution: (128, 72) [2023-06-19 14:36:41,258][22302] Doom resolution: 160x120, resize resolution: (128, 72) [2023-06-19 14:36:41,266][22299] Doom resolution: 160x120, resize resolution: (128, 72) [2023-06-19 14:36:41,263][22303] Doom resolution: 160x120, resize resolution: (128, 72) [2023-06-19 14:36:43,083][22298] Decorrelating experience for 0 frames... [2023-06-19 14:36:43,087][22300] Decorrelating experience for 0 frames... [2023-06-19 14:36:43,088][22309] Decorrelating experience for 0 frames... [2023-06-19 14:36:43,247][22310] Decorrelating experience for 0 frames... [2023-06-19 14:36:43,249][22304] Decorrelating experience for 0 frames... [2023-06-19 14:36:43,254][22301] Decorrelating experience for 0 frames... [2023-06-19 14:36:43,257][22297] Decorrelating experience for 0 frames... [2023-06-19 14:36:43,259][22299] Decorrelating experience for 0 frames... [2023-06-19 14:36:43,367][00753] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 4005888. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-06-19 14:36:44,486][22299] Decorrelating experience for 32 frames... [2023-06-19 14:36:44,488][22297] Decorrelating experience for 32 frames... [2023-06-19 14:36:44,495][22310] Decorrelating experience for 32 frames... [2023-06-19 14:36:44,539][22300] Decorrelating experience for 32 frames... [2023-06-19 14:36:44,537][22309] Decorrelating experience for 32 frames... [2023-06-19 14:36:44,992][22302] Decorrelating experience for 0 frames... [2023-06-19 14:36:44,995][22303] Decorrelating experience for 0 frames... [2023-06-19 14:36:46,139][22304] Decorrelating experience for 32 frames... [2023-06-19 14:36:46,160][22301] Decorrelating experience for 32 frames... [2023-06-19 14:36:46,208][22296] Decorrelating experience for 0 frames... [2023-06-19 14:36:46,406][22297] Decorrelating experience for 64 frames... [2023-06-19 14:36:46,417][22310] Decorrelating experience for 64 frames... [2023-06-19 14:36:46,457][22300] Decorrelating experience for 64 frames... [2023-06-19 14:36:46,646][22303] Decorrelating experience for 32 frames... [2023-06-19 14:36:46,648][22298] Decorrelating experience for 32 frames... [2023-06-19 14:36:47,782][22311] Decorrelating experience for 0 frames... [2023-06-19 14:36:47,868][22296] Decorrelating experience for 32 frames... [2023-06-19 14:36:48,045][22310] Decorrelating experience for 96 frames... [2023-06-19 14:36:48,108][22302] Decorrelating experience for 32 frames... [2023-06-19 14:36:48,207][22304] Decorrelating experience for 64 frames... [2023-06-19 14:36:48,270][22299] Decorrelating experience for 64 frames... [2023-06-19 14:36:48,367][00753] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 4005888. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-06-19 14:36:48,555][22298] Decorrelating experience for 64 frames... [2023-06-19 14:36:48,556][22303] Decorrelating experience for 64 frames... [2023-06-19 14:36:49,352][22311] Decorrelating experience for 32 frames... [2023-06-19 14:36:49,587][22304] Decorrelating experience for 96 frames... [2023-06-19 14:36:49,707][22300] Decorrelating experience for 96 frames... [2023-06-19 14:36:49,714][22296] Decorrelating experience for 64 frames... [2023-06-19 14:36:50,839][22298] Decorrelating experience for 96 frames... [2023-06-19 14:36:52,493][22302] Decorrelating experience for 64 frames... [2023-06-19 14:36:53,370][00753] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 4005888. Throughput: 0: 125.2. Samples: 1252. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-06-19 14:36:53,372][00753] Avg episode reward: [(0, '4.357')] [2023-06-19 14:36:53,446][22299] Decorrelating experience for 96 frames... [2023-06-19 14:36:54,089][22311] Decorrelating experience for 64 frames... [2023-06-19 14:36:54,305][22303] Decorrelating experience for 96 frames... [2023-06-19 14:36:54,934][22296] Decorrelating experience for 96 frames... [2023-06-19 14:36:58,366][00753] Fps is (10 sec: 409.6, 60 sec: 273.1, 300 sec: 273.1). Total num frames: 4009984. Throughput: 0: 131.5. Samples: 1972. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-06-19 14:36:58,373][00753] Avg episode reward: [(0, '6.364')] [2023-06-19 14:36:58,720][22301] Decorrelating experience for 64 frames... [2023-06-19 14:37:00,102][22278] Signal inference workers to stop experience collection... [2023-06-19 14:37:00,136][22295] InferenceWorker_p0-w0: stopping experience collection [2023-06-19 14:37:00,246][22278] Signal inference workers to resume experience collection... [2023-06-19 14:37:00,248][22295] InferenceWorker_p0-w0: resuming experience collection [2023-06-19 14:37:00,881][22311] Decorrelating experience for 96 frames... [2023-06-19 14:37:01,859][22309] Decorrelating experience for 64 frames... [2023-06-19 14:37:02,338][22302] Decorrelating experience for 96 frames... [2023-06-19 14:37:03,369][00753] Fps is (10 sec: 2048.2, 60 sec: 1023.9, 300 sec: 1023.9). Total num frames: 4026368. Throughput: 0: 228.6. Samples: 4572. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-06-19 14:37:03,372][00753] Avg episode reward: [(0, '7.372')] [2023-06-19 14:37:06,350][22309] Decorrelating experience for 96 frames... [2023-06-19 14:37:06,391][22297] Decorrelating experience for 96 frames... [2023-06-19 14:37:07,239][22301] Decorrelating experience for 96 frames... [2023-06-19 14:37:08,366][00753] Fps is (10 sec: 3276.8, 60 sec: 1474.6, 300 sec: 1474.6). Total num frames: 4042752. Throughput: 0: 388.1. Samples: 9702. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-06-19 14:37:08,371][00753] Avg episode reward: [(0, '10.755')] [2023-06-19 14:37:09,179][22295] Updated weights for policy 0, policy_version 988 (0.0024) [2023-06-19 14:37:13,366][00753] Fps is (10 sec: 3687.3, 60 sec: 1911.5, 300 sec: 1911.5). Total num frames: 4063232. Throughput: 0: 442.4. Samples: 13272. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-06-19 14:37:13,373][00753] Avg episode reward: [(0, '12.563')] [2023-06-19 14:37:18,366][00753] Fps is (10 sec: 3686.4, 60 sec: 2106.5, 300 sec: 2106.5). Total num frames: 4079616. Throughput: 0: 521.0. Samples: 18236. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-06-19 14:37:18,369][00753] Avg episode reward: [(0, '14.803')] [2023-06-19 14:37:20,866][22295] Updated weights for policy 0, policy_version 998 (0.0012) [2023-06-19 14:37:23,367][00753] Fps is (10 sec: 2867.1, 60 sec: 2150.4, 300 sec: 2150.4). Total num frames: 4091904. Throughput: 0: 576.4. Samples: 23056. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-06-19 14:37:23,374][00753] Avg episode reward: [(0, '18.143')] [2023-06-19 14:37:28,366][00753] Fps is (10 sec: 3686.4, 60 sec: 2457.6, 300 sec: 2457.6). Total num frames: 4116480. Throughput: 0: 577.7. Samples: 25996. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-06-19 14:37:28,369][00753] Avg episode reward: [(0, '19.802')] [2023-06-19 14:37:30,628][22295] Updated weights for policy 0, policy_version 1008 (0.0013) [2023-06-19 14:37:33,367][00753] Fps is (10 sec: 4915.3, 60 sec: 2703.4, 300 sec: 2703.4). Total num frames: 4141056. Throughput: 0: 738.5. Samples: 33232. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-06-19 14:37:33,373][00753] Avg episode reward: [(0, '22.458')] [2023-06-19 14:37:38,366][00753] Fps is (10 sec: 4096.0, 60 sec: 2755.5, 300 sec: 2755.5). Total num frames: 4157440. Throughput: 0: 837.6. Samples: 38942. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-06-19 14:37:38,376][00753] Avg episode reward: [(0, '23.440')] [2023-06-19 14:37:41,628][22295] Updated weights for policy 0, policy_version 1018 (0.0027) [2023-06-19 14:37:43,366][00753] Fps is (10 sec: 3276.8, 60 sec: 2798.9, 300 sec: 2798.9). Total num frames: 4173824. Throughput: 0: 875.1. Samples: 41350. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-06-19 14:37:43,374][00753] Avg episode reward: [(0, '24.698')] [2023-06-19 14:37:48,366][00753] Fps is (10 sec: 3276.8, 60 sec: 3072.0, 300 sec: 2835.7). Total num frames: 4190208. Throughput: 0: 922.5. Samples: 46084. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-06-19 14:37:48,376][00753] Avg episode reward: [(0, '27.588')] [2023-06-19 14:37:51,962][22295] Updated weights for policy 0, policy_version 1028 (0.0016) [2023-06-19 14:37:53,367][00753] Fps is (10 sec: 4096.0, 60 sec: 3481.8, 300 sec: 2984.2). Total num frames: 4214784. Throughput: 0: 967.2. Samples: 53226. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-06-19 14:37:53,380][00753] Avg episode reward: [(0, '25.768')] [2023-06-19 14:37:58,369][00753] Fps is (10 sec: 4504.5, 60 sec: 3754.5, 300 sec: 3058.2). Total num frames: 4235264. Throughput: 0: 967.6. Samples: 56816. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2023-06-19 14:37:58,371][00753] Avg episode reward: [(0, '25.108')] [2023-06-19 14:38:02,515][22295] Updated weights for policy 0, policy_version 1038 (0.0011) [2023-06-19 14:38:03,367][00753] Fps is (10 sec: 3686.4, 60 sec: 3754.8, 300 sec: 3072.0). Total num frames: 4251648. Throughput: 0: 971.7. Samples: 61964. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-06-19 14:38:03,373][00753] Avg episode reward: [(0, '25.022')] [2023-06-19 14:38:08,366][00753] Fps is (10 sec: 3277.6, 60 sec: 3754.7, 300 sec: 3084.0). Total num frames: 4268032. Throughput: 0: 969.5. Samples: 66682. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2023-06-19 14:38:08,371][00753] Avg episode reward: [(0, '25.362')] [2023-06-19 14:38:13,367][00753] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3140.3). Total num frames: 4288512. Throughput: 0: 965.4. Samples: 69440. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-06-19 14:38:13,374][00753] Avg episode reward: [(0, '24.075')] [2023-06-19 14:38:13,384][22278] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001047_4288512.pth... [2023-06-19 14:38:13,516][22278] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000933_3821568.pth [2023-06-19 14:38:13,731][22295] Updated weights for policy 0, policy_version 1048 (0.0035) [2023-06-19 14:38:18,366][00753] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3233.7). Total num frames: 4313088. Throughput: 0: 962.4. Samples: 76540. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-06-19 14:38:18,368][00753] Avg episode reward: [(0, '25.057')] [2023-06-19 14:38:23,350][22295] Updated weights for policy 0, policy_version 1058 (0.0020) [2023-06-19 14:38:23,367][00753] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3276.8). Total num frames: 4333568. Throughput: 0: 968.5. Samples: 82526. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-06-19 14:38:23,369][00753] Avg episode reward: [(0, '25.397')] [2023-06-19 14:38:28,366][00753] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3237.8). Total num frames: 4345856. Throughput: 0: 967.2. Samples: 84874. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-06-19 14:38:28,372][00753] Avg episode reward: [(0, '25.798')] [2023-06-19 14:38:33,366][00753] Fps is (10 sec: 2867.2, 60 sec: 3686.4, 300 sec: 3239.6). Total num frames: 4362240. Throughput: 0: 968.5. Samples: 89666. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-06-19 14:38:33,369][00753] Avg episode reward: [(0, '26.800')] [2023-06-19 14:38:34,887][22295] Updated weights for policy 0, policy_version 1068 (0.0021) [2023-06-19 14:38:38,366][00753] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3312.4). Total num frames: 4386816. Throughput: 0: 966.7. Samples: 96728. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-06-19 14:38:38,373][00753] Avg episode reward: [(0, '26.468')] [2023-06-19 14:38:43,367][00753] Fps is (10 sec: 4915.2, 60 sec: 3959.5, 300 sec: 3379.2). Total num frames: 4411392. Throughput: 0: 967.6. Samples: 100356. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-06-19 14:38:43,373][00753] Avg episode reward: [(0, '28.813')] [2023-06-19 14:38:44,016][22295] Updated weights for policy 0, policy_version 1078 (0.0012) [2023-06-19 14:38:48,366][00753] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3375.1). Total num frames: 4427776. Throughput: 0: 972.0. Samples: 105706. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-06-19 14:38:48,373][00753] Avg episode reward: [(0, '29.018')] [2023-06-19 14:38:53,369][00753] Fps is (10 sec: 3275.9, 60 sec: 3822.8, 300 sec: 3371.3). Total num frames: 4444160. Throughput: 0: 973.0. Samples: 110472. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-06-19 14:38:53,374][00753] Avg episode reward: [(0, '27.659')] [2023-06-19 14:38:56,289][22295] Updated weights for policy 0, policy_version 1088 (0.0022) [2023-06-19 14:38:58,367][00753] Fps is (10 sec: 3686.4, 60 sec: 3823.1, 300 sec: 3398.2). Total num frames: 4464640. Throughput: 0: 970.4. Samples: 113108. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-06-19 14:38:58,372][00753] Avg episode reward: [(0, '28.410')] [2023-06-19 14:39:03,367][00753] Fps is (10 sec: 4506.9, 60 sec: 3959.5, 300 sec: 3452.3). Total num frames: 4489216. Throughput: 0: 971.8. Samples: 120270. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2023-06-19 14:39:03,373][00753] Avg episode reward: [(0, '27.645')] [2023-06-19 14:39:04,933][22295] Updated weights for policy 0, policy_version 1098 (0.0017) [2023-06-19 14:39:08,367][00753] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3446.3). Total num frames: 4505600. Throughput: 0: 973.9. Samples: 126350. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2023-06-19 14:39:08,373][00753] Avg episode reward: [(0, '24.769')] [2023-06-19 14:39:13,368][00753] Fps is (10 sec: 3276.5, 60 sec: 3891.1, 300 sec: 3440.6). Total num frames: 4521984. Throughput: 0: 973.1. Samples: 128664. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-06-19 14:39:13,371][00753] Avg episode reward: [(0, '24.195')] [2023-06-19 14:39:17,415][22295] Updated weights for policy 0, policy_version 1108 (0.0025) [2023-06-19 14:39:18,367][00753] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3435.4). Total num frames: 4538368. Throughput: 0: 973.0. Samples: 133452. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-06-19 14:39:18,369][00753] Avg episode reward: [(0, '23.530')] [2023-06-19 14:39:23,367][00753] Fps is (10 sec: 4096.4, 60 sec: 3822.9, 300 sec: 3481.6). Total num frames: 4562944. Throughput: 0: 974.4. Samples: 140578. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2023-06-19 14:39:23,373][00753] Avg episode reward: [(0, '22.835')] [2023-06-19 14:39:26,090][22295] Updated weights for policy 0, policy_version 1118 (0.0020) [2023-06-19 14:39:28,367][00753] Fps is (10 sec: 4915.2, 60 sec: 4027.7, 300 sec: 3525.0). Total num frames: 4587520. Throughput: 0: 974.7. Samples: 144218. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2023-06-19 14:39:28,375][00753] Avg episode reward: [(0, '23.982')] [2023-06-19 14:39:33,367][00753] Fps is (10 sec: 4095.9, 60 sec: 4027.7, 300 sec: 3517.7). Total num frames: 4603904. Throughput: 0: 978.8. Samples: 149754. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-06-19 14:39:33,369][00753] Avg episode reward: [(0, '24.738')] [2023-06-19 14:39:38,103][22295] Updated weights for policy 0, policy_version 1128 (0.0012) [2023-06-19 14:39:38,366][00753] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3510.9). Total num frames: 4620288. Throughput: 0: 979.4. Samples: 154540. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2023-06-19 14:39:38,369][00753] Avg episode reward: [(0, '24.929')] [2023-06-19 14:39:43,367][00753] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3527.1). Total num frames: 4640768. Throughput: 0: 981.4. Samples: 157270. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-06-19 14:39:43,373][00753] Avg episode reward: [(0, '24.468')] [2023-06-19 14:39:47,183][22295] Updated weights for policy 0, policy_version 1138 (0.0033) [2023-06-19 14:39:48,366][00753] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3564.6). Total num frames: 4665344. Throughput: 0: 983.2. Samples: 164512. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2023-06-19 14:39:48,369][00753] Avg episode reward: [(0, '26.204')] [2023-06-19 14:39:53,367][00753] Fps is (10 sec: 4505.6, 60 sec: 4027.9, 300 sec: 3578.6). Total num frames: 4685824. Throughput: 0: 987.2. Samples: 170774. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-06-19 14:39:53,373][00753] Avg episode reward: [(0, '26.814')] [2023-06-19 14:39:57,949][22295] Updated weights for policy 0, policy_version 1148 (0.0058) [2023-06-19 14:39:58,366][00753] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3570.9). Total num frames: 4702208. Throughput: 0: 987.9. Samples: 173120. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-06-19 14:39:58,372][00753] Avg episode reward: [(0, '25.868')] [2023-06-19 14:40:03,367][00753] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3563.5). Total num frames: 4718592. Throughput: 0: 989.6. Samples: 177984. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2023-06-19 14:40:03,369][00753] Avg episode reward: [(0, '26.029')] [2023-06-19 14:40:08,366][00753] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3576.5). Total num frames: 4739072. Throughput: 0: 983.1. Samples: 184818. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-06-19 14:40:08,370][00753] Avg episode reward: [(0, '27.939')] [2023-06-19 14:40:08,387][22295] Updated weights for policy 0, policy_version 1158 (0.0032) [2023-06-19 14:40:13,367][00753] Fps is (10 sec: 4505.5, 60 sec: 4027.8, 300 sec: 3608.4). Total num frames: 4763648. Throughput: 0: 981.5. Samples: 188384. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2023-06-19 14:40:13,376][00753] Avg episode reward: [(0, '27.435')] [2023-06-19 14:40:13,386][22278] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001163_4763648.pth... [2023-06-19 14:40:13,564][22278] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth [2023-06-19 14:40:18,366][00753] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3600.7). Total num frames: 4780032. Throughput: 0: 980.7. Samples: 193886. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2023-06-19 14:40:18,372][00753] Avg episode reward: [(0, '26.868')] [2023-06-19 14:40:19,029][22295] Updated weights for policy 0, policy_version 1168 (0.0032) [2023-06-19 14:40:23,367][00753] Fps is (10 sec: 3276.9, 60 sec: 3891.2, 300 sec: 3593.3). Total num frames: 4796416. Throughput: 0: 979.8. Samples: 198632. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2023-06-19 14:40:23,368][00753] Avg episode reward: [(0, '27.007')] [2023-06-19 14:40:28,367][00753] Fps is (10 sec: 3686.3, 60 sec: 3822.9, 300 sec: 3604.5). Total num frames: 4816896. Throughput: 0: 978.0. Samples: 201278. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-06-19 14:40:28,370][00753] Avg episode reward: [(0, '28.959')] [2023-06-19 14:40:29,866][22295] Updated weights for policy 0, policy_version 1178 (0.0017) [2023-06-19 14:40:33,366][00753] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3633.0). Total num frames: 4841472. Throughput: 0: 979.7. Samples: 208600. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-06-19 14:40:33,369][00753] Avg episode reward: [(0, '27.656')] [2023-06-19 14:40:38,366][00753] Fps is (10 sec: 4096.1, 60 sec: 3959.5, 300 sec: 3625.4). Total num frames: 4857856. Throughput: 0: 977.2. Samples: 214746. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-06-19 14:40:38,369][00753] Avg episode reward: [(0, '27.204')] [2023-06-19 14:40:39,109][22295] Updated weights for policy 0, policy_version 1188 (0.0026) [2023-06-19 14:40:43,370][00753] Fps is (10 sec: 3685.1, 60 sec: 3959.2, 300 sec: 3635.1). Total num frames: 4878336. Throughput: 0: 979.5. Samples: 217200. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-06-19 14:40:43,372][00753] Avg episode reward: [(0, '27.556')] [2023-06-19 14:40:48,366][00753] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3627.9). Total num frames: 4894720. Throughput: 0: 978.4. Samples: 222010. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-06-19 14:40:48,371][00753] Avg episode reward: [(0, '27.730')] [2023-06-19 14:40:50,756][22295] Updated weights for policy 0, policy_version 1198 (0.0012) [2023-06-19 14:40:53,366][00753] Fps is (10 sec: 4097.4, 60 sec: 3891.2, 300 sec: 3653.6). Total num frames: 4919296. Throughput: 0: 979.2. Samples: 228884. Policy #0 lag: (min: 0.0, avg: 0.8, max: 3.0) [2023-06-19 14:40:53,374][00753] Avg episode reward: [(0, '26.206')] [2023-06-19 14:40:58,366][00753] Fps is (10 sec: 4915.2, 60 sec: 4027.7, 300 sec: 3678.4). Total num frames: 4943872. Throughput: 0: 980.9. Samples: 232526. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-06-19 14:40:58,374][00753] Avg episode reward: [(0, '25.497')] [2023-06-19 14:40:59,940][22295] Updated weights for policy 0, policy_version 1208 (0.0022) [2023-06-19 14:41:03,366][00753] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3654.9). Total num frames: 4956160. Throughput: 0: 985.7. Samples: 238242. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-06-19 14:41:03,369][00753] Avg episode reward: [(0, '25.035')] [2023-06-19 14:41:08,367][00753] Fps is (10 sec: 2867.2, 60 sec: 3891.2, 300 sec: 3647.8). Total num frames: 4972544. Throughput: 0: 988.7. Samples: 243124. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-06-19 14:41:08,373][00753] Avg episode reward: [(0, '25.931')] [2023-06-19 14:41:11,788][22295] Updated weights for policy 0, policy_version 1218 (0.0042) [2023-06-19 14:41:13,366][00753] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3656.1). Total num frames: 4993024. Throughput: 0: 985.4. Samples: 245620. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-06-19 14:41:13,374][00753] Avg episode reward: [(0, '24.143')] [2023-06-19 14:41:18,367][00753] Fps is (10 sec: 4505.4, 60 sec: 3959.4, 300 sec: 3678.9). Total num frames: 5017600. Throughput: 0: 983.4. Samples: 252852. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-06-19 14:41:18,369][00753] Avg episode reward: [(0, '25.066')] [2023-06-19 14:41:20,436][22295] Updated weights for policy 0, policy_version 1228 (0.0012) [2023-06-19 14:41:23,367][00753] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3686.4). Total num frames: 5038080. Throughput: 0: 990.6. Samples: 259322. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-06-19 14:41:23,369][00753] Avg episode reward: [(0, '26.740')] [2023-06-19 14:41:28,367][00753] Fps is (10 sec: 3686.5, 60 sec: 3959.5, 300 sec: 3679.2). Total num frames: 5054464. Throughput: 0: 990.0. Samples: 261748. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-06-19 14:41:28,369][00753] Avg episode reward: [(0, '26.941')] [2023-06-19 14:41:32,587][22295] Updated weights for policy 0, policy_version 1238 (0.0012) [2023-06-19 14:41:33,367][00753] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3672.3). Total num frames: 5070848. Throughput: 0: 990.0. Samples: 266560. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-06-19 14:41:33,369][00753] Avg episode reward: [(0, '28.274')] [2023-06-19 14:41:38,367][00753] Fps is (10 sec: 4095.9, 60 sec: 3959.4, 300 sec: 3693.3). Total num frames: 5095424. Throughput: 0: 983.9. Samples: 273160. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-06-19 14:41:38,369][00753] Avg episode reward: [(0, '27.551')] [2023-06-19 14:41:41,510][22295] Updated weights for policy 0, policy_version 1248 (0.0012) [2023-06-19 14:41:43,367][00753] Fps is (10 sec: 4915.2, 60 sec: 4028.0, 300 sec: 3776.7). Total num frames: 5120000. Throughput: 0: 983.6. Samples: 276786. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2023-06-19 14:41:43,371][00753] Avg episode reward: [(0, '28.989')] [2023-06-19 14:41:48,366][00753] Fps is (10 sec: 3686.5, 60 sec: 3959.5, 300 sec: 3818.3). Total num frames: 5132288. Throughput: 0: 982.9. Samples: 282472. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2023-06-19 14:41:48,369][00753] Avg episode reward: [(0, '27.900')] [2023-06-19 14:41:53,367][00753] Fps is (10 sec: 2867.1, 60 sec: 3822.9, 300 sec: 3860.0). Total num frames: 5148672. Throughput: 0: 981.1. Samples: 287272. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-06-19 14:41:53,374][00753] Avg episode reward: [(0, '27.637')] [2023-06-19 14:41:53,413][22295] Updated weights for policy 0, policy_version 1258 (0.0012) [2023-06-19 14:41:58,367][00753] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3873.9). Total num frames: 5169152. Throughput: 0: 979.7. Samples: 289706. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-06-19 14:41:58,371][00753] Avg episode reward: [(0, '27.081')] [2023-06-19 14:42:02,707][22295] Updated weights for policy 0, policy_version 1268 (0.0021) [2023-06-19 14:42:03,366][00753] Fps is (10 sec: 4505.8, 60 sec: 3959.5, 300 sec: 3901.6). Total num frames: 5193728. Throughput: 0: 978.9. Samples: 296904. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2023-06-19 14:42:03,371][00753] Avg episode reward: [(0, '27.350')] [2023-06-19 14:42:08,366][00753] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3901.6). Total num frames: 5214208. Throughput: 0: 974.8. Samples: 303186. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2023-06-19 14:42:08,370][00753] Avg episode reward: [(0, '27.132')] [2023-06-19 14:42:13,367][00753] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3901.6). Total num frames: 5230592. Throughput: 0: 974.4. Samples: 305596. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2023-06-19 14:42:13,369][00753] Avg episode reward: [(0, '28.582')] [2023-06-19 14:42:13,387][22278] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001277_5230592.pth... [2023-06-19 14:42:13,551][22278] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001047_4288512.pth [2023-06-19 14:42:14,268][22295] Updated weights for policy 0, policy_version 1278 (0.0012) [2023-06-19 14:42:18,367][00753] Fps is (10 sec: 3276.8, 60 sec: 3823.0, 300 sec: 3915.5). Total num frames: 5246976. Throughput: 0: 972.5. Samples: 310324. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-06-19 14:42:18,372][00753] Avg episode reward: [(0, '28.962')] [2023-06-19 14:42:23,367][00753] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3915.5). Total num frames: 5271552. Throughput: 0: 976.2. Samples: 317088. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-06-19 14:42:23,368][00753] Avg episode reward: [(0, '28.250')] [2023-06-19 14:42:24,003][22295] Updated weights for policy 0, policy_version 1288 (0.0019) [2023-06-19 14:42:28,366][00753] Fps is (10 sec: 4915.2, 60 sec: 4027.7, 300 sec: 3915.5). Total num frames: 5296128. Throughput: 0: 976.3. Samples: 320718. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-06-19 14:42:28,371][00753] Avg episode reward: [(0, '29.213')] [2023-06-19 14:42:33,366][00753] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3915.5). Total num frames: 5312512. Throughput: 0: 978.7. Samples: 326514. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-06-19 14:42:33,373][00753] Avg episode reward: [(0, '28.213')] [2023-06-19 14:42:34,383][22295] Updated weights for policy 0, policy_version 1298 (0.0031) [2023-06-19 14:42:38,367][00753] Fps is (10 sec: 3276.7, 60 sec: 3891.2, 300 sec: 3915.5). Total num frames: 5328896. Throughput: 0: 980.9. Samples: 331412. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2023-06-19 14:42:38,375][00753] Avg episode reward: [(0, '28.573')] [2023-06-19 14:42:43,367][00753] Fps is (10 sec: 3276.7, 60 sec: 3754.6, 300 sec: 3915.5). Total num frames: 5345280. Throughput: 0: 981.5. Samples: 333876. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-06-19 14:42:43,369][00753] Avg episode reward: [(0, '27.184')] [2023-06-19 14:42:45,049][22295] Updated weights for policy 0, policy_version 1308 (0.0027) [2023-06-19 14:42:48,366][00753] Fps is (10 sec: 4096.2, 60 sec: 3959.5, 300 sec: 3915.5). Total num frames: 5369856. Throughput: 0: 983.1. Samples: 341142. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2023-06-19 14:42:48,368][00753] Avg episode reward: [(0, '28.363')] [2023-06-19 14:42:53,369][00753] Fps is (10 sec: 4504.6, 60 sec: 4027.6, 300 sec: 3915.5). Total num frames: 5390336. Throughput: 0: 988.2. Samples: 347658. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-06-19 14:42:53,373][00753] Avg episode reward: [(0, '27.651')] [2023-06-19 14:42:54,792][22295] Updated weights for policy 0, policy_version 1318 (0.0014) [2023-06-19 14:42:58,366][00753] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3915.5). Total num frames: 5406720. Throughput: 0: 987.3. Samples: 350024. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2023-06-19 14:42:58,372][00753] Avg episode reward: [(0, '27.618')] [2023-06-19 14:43:03,366][00753] Fps is (10 sec: 3277.6, 60 sec: 3822.9, 300 sec: 3915.5). Total num frames: 5423104. Throughput: 0: 992.0. Samples: 354964. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-06-19 14:43:03,374][00753] Avg episode reward: [(0, '27.952')] [2023-06-19 14:43:06,170][22295] Updated weights for policy 0, policy_version 1328 (0.0032) [2023-06-19 14:43:08,366][00753] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3929.4). Total num frames: 5447680. Throughput: 0: 989.9. Samples: 361634. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2023-06-19 14:43:08,372][00753] Avg episode reward: [(0, '28.384')] [2023-06-19 14:43:13,367][00753] Fps is (10 sec: 4915.2, 60 sec: 4027.7, 300 sec: 3929.4). Total num frames: 5472256. Throughput: 0: 989.8. Samples: 365260. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-06-19 14:43:13,372][00753] Avg episode reward: [(0, '27.141')] [2023-06-19 14:43:15,087][22295] Updated weights for policy 0, policy_version 1338 (0.0014) [2023-06-19 14:43:18,366][00753] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3915.5). Total num frames: 5488640. Throughput: 0: 990.6. Samples: 371090. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-06-19 14:43:18,369][00753] Avg episode reward: [(0, '26.356')] [2023-06-19 14:43:23,370][00753] Fps is (10 sec: 3275.6, 60 sec: 3891.0, 300 sec: 3929.3). Total num frames: 5505024. Throughput: 0: 987.7. Samples: 375862. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-06-19 14:43:23,377][00753] Avg episode reward: [(0, '26.087')] [2023-06-19 14:43:27,190][22295] Updated weights for policy 0, policy_version 1348 (0.0021) [2023-06-19 14:43:28,366][00753] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3943.3). Total num frames: 5525504. Throughput: 0: 987.3. Samples: 378306. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-06-19 14:43:28,374][00753] Avg episode reward: [(0, '24.648')] [2023-06-19 14:43:33,370][00753] Fps is (10 sec: 4505.8, 60 sec: 3959.3, 300 sec: 3943.2). Total num frames: 5550080. Throughput: 0: 982.6. Samples: 385362. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-06-19 14:43:33,372][00753] Avg episode reward: [(0, '24.712')] [2023-06-19 14:43:35,812][22295] Updated weights for policy 0, policy_version 1358 (0.0012) [2023-06-19 14:43:38,367][00753] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3915.5). Total num frames: 5566464. Throughput: 0: 979.1. Samples: 391716. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-06-19 14:43:38,378][00753] Avg episode reward: [(0, '26.031')] [2023-06-19 14:43:43,367][00753] Fps is (10 sec: 3277.9, 60 sec: 3959.5, 300 sec: 3915.5). Total num frames: 5582848. Throughput: 0: 979.2. Samples: 394090. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-06-19 14:43:43,369][00753] Avg episode reward: [(0, '26.289')] [2023-06-19 14:43:48,366][00753] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3915.5). Total num frames: 5599232. Throughput: 0: 975.5. Samples: 398862. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-06-19 14:43:48,376][00753] Avg episode reward: [(0, '25.812')] [2023-06-19 14:43:48,815][22295] Updated weights for policy 0, policy_version 1368 (0.0032) [2023-06-19 14:43:53,367][00753] Fps is (10 sec: 4096.0, 60 sec: 3891.4, 300 sec: 3929.4). Total num frames: 5623808. Throughput: 0: 971.7. Samples: 405362. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-06-19 14:43:53,369][00753] Avg episode reward: [(0, '27.435')] [2023-06-19 14:43:57,357][22295] Updated weights for policy 0, policy_version 1378 (0.0022) [2023-06-19 14:43:58,366][00753] Fps is (10 sec: 4915.2, 60 sec: 4027.7, 300 sec: 3929.4). Total num frames: 5648384. Throughput: 0: 971.3. Samples: 408968. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-06-19 14:43:58,375][00753] Avg episode reward: [(0, '29.401')] [2023-06-19 14:43:58,379][22278] Saving new best policy, reward=29.401! [2023-06-19 14:44:03,366][00753] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3929.4). Total num frames: 5664768. Throughput: 0: 969.6. Samples: 414722. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-06-19 14:44:03,371][00753] Avg episode reward: [(0, '28.680')] [2023-06-19 14:44:08,369][00753] Fps is (10 sec: 3275.9, 60 sec: 3891.0, 300 sec: 3929.4). Total num frames: 5681152. Throughput: 0: 969.4. Samples: 419484. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-06-19 14:44:08,376][00753] Avg episode reward: [(0, '29.090')] [2023-06-19 14:44:09,491][22295] Updated weights for policy 0, policy_version 1388 (0.0011) [2023-06-19 14:44:13,367][00753] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3929.4). Total num frames: 5697536. Throughput: 0: 967.4. Samples: 421838. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-06-19 14:44:13,373][00753] Avg episode reward: [(0, '28.578')] [2023-06-19 14:44:13,384][22278] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001391_5697536.pth... [2023-06-19 14:44:13,573][22278] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001163_4763648.pth [2023-06-19 14:44:18,367][00753] Fps is (10 sec: 4097.0, 60 sec: 3891.2, 300 sec: 3929.4). Total num frames: 5722112. Throughput: 0: 965.4. Samples: 428802. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2023-06-19 14:44:18,370][00753] Avg episode reward: [(0, '30.459')] [2023-06-19 14:44:18,382][22278] Saving new best policy, reward=30.459! [2023-06-19 14:44:19,022][22295] Updated weights for policy 0, policy_version 1398 (0.0016) [2023-06-19 14:44:23,368][00753] Fps is (10 sec: 4505.1, 60 sec: 3959.6, 300 sec: 3915.5). Total num frames: 5742592. Throughput: 0: 969.9. Samples: 435362. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-06-19 14:44:23,373][00753] Avg episode reward: [(0, '28.421')] [2023-06-19 14:44:28,367][00753] Fps is (10 sec: 3686.5, 60 sec: 3891.2, 300 sec: 3915.5). Total num frames: 5758976. Throughput: 0: 970.5. Samples: 437764. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-06-19 14:44:28,369][00753] Avg episode reward: [(0, '27.132')] [2023-06-19 14:44:30,114][22295] Updated weights for policy 0, policy_version 1408 (0.0033) [2023-06-19 14:44:33,367][00753] Fps is (10 sec: 3277.2, 60 sec: 3754.9, 300 sec: 3915.5). Total num frames: 5775360. Throughput: 0: 970.1. Samples: 442518. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-06-19 14:44:33,371][00753] Avg episode reward: [(0, '28.250')] [2023-06-19 14:44:38,367][00753] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3915.5). Total num frames: 5795840. Throughput: 0: 963.1. Samples: 448702. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-06-19 14:44:38,368][00753] Avg episode reward: [(0, '29.758')] [2023-06-19 14:44:40,242][22295] Updated weights for policy 0, policy_version 1418 (0.0012) [2023-06-19 14:44:43,367][00753] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3915.5). Total num frames: 5820416. Throughput: 0: 962.8. Samples: 452296. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2023-06-19 14:44:43,369][00753] Avg episode reward: [(0, '29.920')] [2023-06-19 14:44:48,368][00753] Fps is (10 sec: 4504.9, 60 sec: 4027.6, 300 sec: 3915.5). Total num frames: 5840896. Throughput: 0: 971.1. Samples: 458424. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-06-19 14:44:48,378][00753] Avg episode reward: [(0, '30.304')] [2023-06-19 14:44:51,347][22295] Updated weights for policy 0, policy_version 1428 (0.0017) [2023-06-19 14:44:53,366][00753] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3901.6). Total num frames: 5853184. Throughput: 0: 970.2. Samples: 463140. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-06-19 14:44:53,372][00753] Avg episode reward: [(0, '30.202')] [2023-06-19 14:44:58,367][00753] Fps is (10 sec: 2867.6, 60 sec: 3686.4, 300 sec: 3901.6). Total num frames: 5869568. Throughput: 0: 970.2. Samples: 465498. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-06-19 14:44:58,369][00753] Avg episode reward: [(0, '32.907')] [2023-06-19 14:44:58,385][22278] Saving new best policy, reward=32.907! [2023-06-19 14:45:01,667][22295] Updated weights for policy 0, policy_version 1438 (0.0031) [2023-06-19 14:45:03,369][00753] Fps is (10 sec: 4504.3, 60 sec: 3891.0, 300 sec: 3929.3). Total num frames: 5898240. Throughput: 0: 965.5. Samples: 472250. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-06-19 14:45:03,372][00753] Avg episode reward: [(0, '30.852')] [2023-06-19 14:45:08,366][00753] Fps is (10 sec: 4915.2, 60 sec: 3959.7, 300 sec: 3915.5). Total num frames: 5918720. Throughput: 0: 972.4. Samples: 479118. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2023-06-19 14:45:08,370][00753] Avg episode reward: [(0, '27.310')] [2023-06-19 14:45:11,909][22295] Updated weights for policy 0, policy_version 1448 (0.0016) [2023-06-19 14:45:13,367][00753] Fps is (10 sec: 3687.4, 60 sec: 3959.5, 300 sec: 3915.5). Total num frames: 5935104. Throughput: 0: 972.5. Samples: 481526. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-06-19 14:45:13,370][00753] Avg episode reward: [(0, '27.353')] [2023-06-19 14:45:18,374][00753] Fps is (10 sec: 3274.4, 60 sec: 3822.5, 300 sec: 3915.4). Total num frames: 5951488. Throughput: 0: 973.3. Samples: 486322. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-06-19 14:45:18,377][00753] Avg episode reward: [(0, '25.210')] [2023-06-19 14:45:22,942][22295] Updated weights for policy 0, policy_version 1458 (0.0021) [2023-06-19 14:45:23,366][00753] Fps is (10 sec: 3686.4, 60 sec: 3823.0, 300 sec: 3915.5). Total num frames: 5971968. Throughput: 0: 975.9. Samples: 492618. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2023-06-19 14:45:23,369][00753] Avg episode reward: [(0, '22.058')] [2023-06-19 14:45:28,367][00753] Fps is (10 sec: 4508.7, 60 sec: 3959.4, 300 sec: 3915.5). Total num frames: 5996544. Throughput: 0: 977.9. Samples: 496300. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2023-06-19 14:45:28,369][00753] Avg episode reward: [(0, '22.318')] [2023-06-19 14:45:29,568][22278] Stopping Batcher_0... [2023-06-19 14:45:29,569][22278] Loop batcher_evt_loop terminating... [2023-06-19 14:45:29,570][00753] Component Batcher_0 stopped! [2023-06-19 14:45:29,572][22278] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001466_6004736.pth... [2023-06-19 14:45:29,671][22301] Stopping RolloutWorker_w5... [2023-06-19 14:45:29,671][22301] Loop rollout_proc5_evt_loop terminating... [2023-06-19 14:45:29,668][00753] Component RolloutWorker_w5 stopped! [2023-06-19 14:45:29,684][22302] Stopping RolloutWorker_w6... [2023-06-19 14:45:29,687][22296] Stopping RolloutWorker_w0... [2023-06-19 14:45:29,688][00753] Component RolloutWorker_w6 stopped! [2023-06-19 14:45:29,692][00753] Component RolloutWorker_w0 stopped! [2023-06-19 14:45:29,699][22309] Stopping RolloutWorker_w10... [2023-06-19 14:45:29,700][00753] Component RolloutWorker_w10 stopped! [2023-06-19 14:45:29,714][22296] Loop rollout_proc0_evt_loop terminating... [2023-06-19 14:45:29,718][22303] Stopping RolloutWorker_w8... [2023-06-19 14:45:29,719][00753] Component RolloutWorker_w8 stopped! [2023-06-19 14:45:29,722][22299] Stopping RolloutWorker_w3... [2023-06-19 14:45:29,720][22309] Loop rollout_proc10_evt_loop terminating... [2023-06-19 14:45:29,723][22299] Loop rollout_proc3_evt_loop terminating... [2023-06-19 14:45:29,722][00753] Component RolloutWorker_w3 stopped! [2023-06-19 14:45:29,689][22302] Loop rollout_proc6_evt_loop terminating... [2023-06-19 14:45:29,730][22310] Stopping RolloutWorker_w9... [2023-06-19 14:45:29,731][22310] Loop rollout_proc9_evt_loop terminating... [2023-06-19 14:45:29,730][00753] Component RolloutWorker_w9 stopped! [2023-06-19 14:45:29,718][22303] Loop rollout_proc8_evt_loop terminating... [2023-06-19 14:45:29,740][22297] Stopping RolloutWorker_w1... [2023-06-19 14:45:29,740][22297] Loop rollout_proc1_evt_loop terminating... [2023-06-19 14:45:29,738][00753] Component RolloutWorker_w1 stopped! [2023-06-19 14:45:29,744][22304] Stopping RolloutWorker_w7... [2023-06-19 14:45:29,743][00753] Component RolloutWorker_w7 stopped! [2023-06-19 14:45:29,751][22300] Stopping RolloutWorker_w4... [2023-06-19 14:45:29,752][22300] Loop rollout_proc4_evt_loop terminating... [2023-06-19 14:45:29,753][22298] Stopping RolloutWorker_w2... [2023-06-19 14:45:29,754][22298] Loop rollout_proc2_evt_loop terminating... [2023-06-19 14:45:29,753][00753] Component RolloutWorker_w4 stopped! [2023-06-19 14:45:29,762][00753] Component RolloutWorker_w2 stopped! [2023-06-19 14:45:29,748][22304] Loop rollout_proc7_evt_loop terminating... [2023-06-19 14:45:29,781][22295] Weights refcount: 2 0 [2023-06-19 14:45:29,782][22295] Stopping InferenceWorker_p0-w0... [2023-06-19 14:45:29,782][22295] Loop inference_proc0-0_evt_loop terminating... [2023-06-19 14:45:29,783][00753] Component InferenceWorker_p0-w0 stopped! [2023-06-19 14:45:29,795][00753] Component RolloutWorker_w11 stopped! [2023-06-19 14:45:29,796][22311] Stopping RolloutWorker_w11... [2023-06-19 14:45:29,797][22311] Loop rollout_proc11_evt_loop terminating... [2023-06-19 14:45:29,823][22278] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001277_5230592.pth [2023-06-19 14:45:29,840][22278] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001466_6004736.pth... [2023-06-19 14:45:30,054][00753] Component LearnerWorker_p0 stopped! [2023-06-19 14:45:30,056][00753] Waiting for process learner_proc0 to stop... [2023-06-19 14:45:30,059][22278] Stopping LearnerWorker_p0... [2023-06-19 14:45:30,060][22278] Loop learner_proc0_evt_loop terminating... [2023-06-19 14:45:31,937][00753] Waiting for process inference_proc0-0 to join... [2023-06-19 14:45:32,377][00753] Waiting for process rollout_proc0 to join... [2023-06-19 14:45:36,938][00753] Waiting for process rollout_proc1 to join... [2023-06-19 14:45:36,968][00753] Waiting for process rollout_proc2 to join... [2023-06-19 14:45:36,970][00753] Waiting for process rollout_proc3 to join... [2023-06-19 14:45:36,971][00753] Waiting for process rollout_proc4 to join... [2023-06-19 14:45:36,973][00753] Waiting for process rollout_proc5 to join... [2023-06-19 14:45:36,977][00753] Waiting for process rollout_proc6 to join... [2023-06-19 14:45:36,982][00753] Waiting for process rollout_proc7 to join... [2023-06-19 14:45:36,983][00753] Waiting for process rollout_proc8 to join... [2023-06-19 14:45:36,985][00753] Waiting for process rollout_proc9 to join... [2023-06-19 14:45:36,987][00753] Waiting for process rollout_proc10 to join... [2023-06-19 14:45:36,989][00753] Waiting for process rollout_proc11 to join... [2023-06-19 14:45:36,991][00753] Batcher 0 profile tree view: batching: 15.1503, releasing_batches: 0.0120 [2023-06-19 14:45:36,992][00753] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0108 wait_policy_total: 324.6077 update_model: 2.7738 weight_update: 0.0024 one_step: 0.0033 handle_policy_step: 187.4206 deserialize: 6.1076, stack: 0.9300, obs_to_device_normalize: 37.4810, forward: 98.0533, send_messages: 13.2084 prepare_outputs: 23.5951 to_cpu: 13.3490 [2023-06-19 14:45:36,994][00753] Learner 0 profile tree view: misc: 0.0026, prepare_batch: 11.4592 train: 39.3034 epoch_init: 0.0029, minibatch_init: 0.0054, losses_postprocess: 0.2337, kl_divergence: 0.3921, after_optimizer: 1.6729 calculate_losses: 14.1472 losses_init: 0.0015, forward_head: 0.9955, bptt_initial: 8.6592, tail: 0.6937, advantages_returns: 0.1729, losses: 2.3929 bptt: 1.0778 bptt_forward_core: 1.0180 update: 22.4473 clip: 16.4902 [2023-06-19 14:45:36,995][00753] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.2178, enqueue_policy_requests: 80.0690, env_step: 372.0584, overhead: 8.8974, complete_rollouts: 2.6271 save_policy_outputs: 6.8942 split_output_tensors: 3.4991 [2023-06-19 14:45:36,997][00753] RolloutWorker_w11 profile tree view: wait_for_trajectories: 0.1147, enqueue_policy_requests: 81.5770, env_step: 366.0075, overhead: 8.1784, complete_rollouts: 2.7526 save_policy_outputs: 6.5695 split_output_tensors: 3.0774 [2023-06-19 14:45:36,998][00753] Loop Runner_EvtLoop terminating... [2023-06-19 14:45:36,999][00753] Runner profile tree view: main_loop: 561.3929 [2023-06-19 14:45:37,001][00753] Collected {0: 6004736}, FPS: 3560.5 [2023-06-19 14:45:47,199][00753] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2023-06-19 14:45:47,201][00753] Overriding arg 'num_workers' with value 1 passed from command line [2023-06-19 14:45:47,202][00753] Adding new argument 'no_render'=True that is not in the saved config file! [2023-06-19 14:45:47,204][00753] Adding new argument 'save_video'=True that is not in the saved config file! [2023-06-19 14:45:47,205][00753] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2023-06-19 14:45:47,206][00753] Adding new argument 'video_name'=None that is not in the saved config file! [2023-06-19 14:45:47,207][00753] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2023-06-19 14:45:47,208][00753] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2023-06-19 14:45:47,209][00753] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2023-06-19 14:45:47,210][00753] Adding new argument 'hf_repository'='Ditrip/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2023-06-19 14:45:47,211][00753] Adding new argument 'policy_index'=0 that is not in the saved config file! [2023-06-19 14:45:47,212][00753] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2023-06-19 14:45:47,213][00753] Adding new argument 'train_script'=None that is not in the saved config file! [2023-06-19 14:45:47,214][00753] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2023-06-19 14:45:47,215][00753] Using frameskip 1 and render_action_repeat=4 for evaluation [2023-06-19 14:45:47,241][00753] RunningMeanStd input shape: (3, 72, 128) [2023-06-19 14:45:47,246][00753] RunningMeanStd input shape: (1,) [2023-06-19 14:45:47,266][00753] ConvEncoder: input_channels=3 [2023-06-19 14:45:47,322][00753] Conv encoder output size: 512 [2023-06-19 14:45:47,324][00753] Policy head output size: 512 [2023-06-19 14:45:47,351][00753] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001466_6004736.pth... [2023-06-19 14:45:48,057][00753] Num frames 100... [2023-06-19 14:45:48,243][00753] Num frames 200... [2023-06-19 14:45:48,422][00753] Num frames 300... [2023-06-19 14:45:48,615][00753] Num frames 400... [2023-06-19 14:45:48,793][00753] Num frames 500... [2023-06-19 14:45:48,991][00753] Num frames 600... [2023-06-19 14:45:49,171][00753] Num frames 700... [2023-06-19 14:45:49,355][00753] Num frames 800... [2023-06-19 14:45:49,545][00753] Num frames 900... [2023-06-19 14:45:49,730][00753] Num frames 1000... [2023-06-19 14:45:49,918][00753] Num frames 1100... [2023-06-19 14:45:50,114][00753] Num frames 1200... [2023-06-19 14:45:50,301][00753] Num frames 1300... [2023-06-19 14:45:50,486][00753] Num frames 1400... [2023-06-19 14:45:50,672][00753] Num frames 1500... [2023-06-19 14:45:50,856][00753] Num frames 1600... [2023-06-19 14:45:50,986][00753] Num frames 1700... [2023-06-19 14:45:51,114][00753] Num frames 1800... [2023-06-19 14:45:51,243][00753] Num frames 1900... [2023-06-19 14:45:51,367][00753] Num frames 2000... [2023-06-19 14:45:51,496][00753] Num frames 2100... [2023-06-19 14:45:51,548][00753] Avg episode rewards: #0: 59.999, true rewards: #0: 21.000 [2023-06-19 14:45:51,549][00753] Avg episode reward: 59.999, avg true_objective: 21.000 [2023-06-19 14:45:51,672][00753] Num frames 2200... [2023-06-19 14:45:51,789][00753] Num frames 2300... [2023-06-19 14:45:51,918][00753] Num frames 2400... [2023-06-19 14:45:52,040][00753] Num frames 2500... [2023-06-19 14:45:52,181][00753] Num frames 2600... [2023-06-19 14:45:52,305][00753] Num frames 2700... [2023-06-19 14:45:52,425][00753] Num frames 2800... [2023-06-19 14:45:52,550][00753] Num frames 2900... [2023-06-19 14:45:52,670][00753] Num frames 3000... [2023-06-19 14:45:52,796][00753] Num frames 3100... [2023-06-19 14:45:52,936][00753] Avg episode rewards: #0: 42.854, true rewards: #0: 15.855 [2023-06-19 14:45:52,937][00753] Avg episode reward: 42.854, avg true_objective: 15.855 [2023-06-19 14:45:52,978][00753] Num frames 3200... [2023-06-19 14:45:53,113][00753] Num frames 3300... [2023-06-19 14:45:53,236][00753] Num frames 3400... [2023-06-19 14:45:53,362][00753] Num frames 3500... [2023-06-19 14:45:53,483][00753] Num frames 3600... [2023-06-19 14:45:53,608][00753] Num frames 3700... [2023-06-19 14:45:53,731][00753] Num frames 3800... [2023-06-19 14:45:53,859][00753] Num frames 3900... [2023-06-19 14:45:53,982][00753] Num frames 4000... [2023-06-19 14:45:54,115][00753] Num frames 4100... [2023-06-19 14:45:54,238][00753] Num frames 4200... [2023-06-19 14:45:54,366][00753] Num frames 4300... [2023-06-19 14:45:54,490][00753] Num frames 4400... [2023-06-19 14:45:54,616][00753] Num frames 4500... [2023-06-19 14:45:54,737][00753] Num frames 4600... [2023-06-19 14:45:54,861][00753] Num frames 4700... [2023-06-19 14:45:55,009][00753] Avg episode rewards: #0: 40.913, true rewards: #0: 15.913 [2023-06-19 14:45:55,010][00753] Avg episode reward: 40.913, avg true_objective: 15.913 [2023-06-19 14:45:55,043][00753] Num frames 4800... [2023-06-19 14:45:55,176][00753] Num frames 4900... [2023-06-19 14:45:55,295][00753] Num frames 5000... [2023-06-19 14:45:55,417][00753] Num frames 5100... [2023-06-19 14:45:55,539][00753] Num frames 5200... [2023-06-19 14:45:55,620][00753] Avg episode rewards: #0: 32.055, true rewards: #0: 13.055 [2023-06-19 14:45:55,622][00753] Avg episode reward: 32.055, avg true_objective: 13.055 [2023-06-19 14:45:55,716][00753] Num frames 5300... [2023-06-19 14:45:55,842][00753] Num frames 5400... [2023-06-19 14:45:55,963][00753] Num frames 5500... [2023-06-19 14:45:56,086][00753] Num frames 5600... [2023-06-19 14:45:56,221][00753] Num frames 5700... [2023-06-19 14:45:56,344][00753] Num frames 5800... [2023-06-19 14:45:56,468][00753] Num frames 5900... [2023-06-19 14:45:56,629][00753] Avg episode rewards: #0: 28.780, true rewards: #0: 11.980 [2023-06-19 14:45:56,630][00753] Avg episode reward: 28.780, avg true_objective: 11.980 [2023-06-19 14:45:56,648][00753] Num frames 6000... [2023-06-19 14:45:56,770][00753] Num frames 6100... [2023-06-19 14:45:56,889][00753] Num frames 6200... [2023-06-19 14:45:57,012][00753] Num frames 6300... [2023-06-19 14:45:57,139][00753] Num frames 6400... [2023-06-19 14:45:57,271][00753] Num frames 6500... [2023-06-19 14:45:57,398][00753] Num frames 6600... [2023-06-19 14:45:57,520][00753] Num frames 6700... [2023-06-19 14:45:57,642][00753] Num frames 6800... [2023-06-19 14:45:57,774][00753] Num frames 6900... [2023-06-19 14:45:57,852][00753] Avg episode rewards: #0: 28.030, true rewards: #0: 11.530 [2023-06-19 14:45:57,853][00753] Avg episode reward: 28.030, avg true_objective: 11.530 [2023-06-19 14:45:57,957][00753] Num frames 7000... [2023-06-19 14:45:58,086][00753] Num frames 7100... [2023-06-19 14:45:58,217][00753] Num frames 7200... [2023-06-19 14:45:58,344][00753] Num frames 7300... [2023-06-19 14:45:58,469][00753] Num frames 7400... [2023-06-19 14:45:58,591][00753] Num frames 7500... [2023-06-19 14:45:58,714][00753] Num frames 7600... [2023-06-19 14:45:58,838][00753] Num frames 7700... [2023-06-19 14:45:58,965][00753] Num frames 7800... [2023-06-19 14:45:59,091][00753] Num frames 7900... [2023-06-19 14:45:59,247][00753] Avg episode rewards: #0: 28.248, true rewards: #0: 11.391 [2023-06-19 14:45:59,248][00753] Avg episode reward: 28.248, avg true_objective: 11.391 [2023-06-19 14:45:59,284][00753] Num frames 8000... [2023-06-19 14:45:59,406][00753] Num frames 8100... [2023-06-19 14:45:59,537][00753] Num frames 8200... [2023-06-19 14:45:59,660][00753] Num frames 8300... [2023-06-19 14:45:59,783][00753] Num frames 8400... [2023-06-19 14:45:59,909][00753] Num frames 8500... [2023-06-19 14:46:00,033][00753] Num frames 8600... [2023-06-19 14:46:00,169][00753] Num frames 8700... [2023-06-19 14:46:00,297][00753] Num frames 8800... [2023-06-19 14:46:00,419][00753] Num frames 8900... [2023-06-19 14:46:00,545][00753] Num frames 9000... [2023-06-19 14:46:00,667][00753] Num frames 9100... [2023-06-19 14:46:00,791][00753] Num frames 9200... [2023-06-19 14:46:00,946][00753] Num frames 9300... [2023-06-19 14:46:01,140][00753] Num frames 9400... [2023-06-19 14:46:01,330][00753] Num frames 9500... [2023-06-19 14:46:01,505][00753] Num frames 9600... [2023-06-19 14:46:01,690][00753] Num frames 9700... [2023-06-19 14:46:01,817][00753] Avg episode rewards: #0: 30.667, true rewards: #0: 12.167 [2023-06-19 14:46:01,823][00753] Avg episode reward: 30.667, avg true_objective: 12.167 [2023-06-19 14:46:01,959][00753] Num frames 9800... [2023-06-19 14:46:02,157][00753] Num frames 9900... [2023-06-19 14:46:02,354][00753] Num frames 10000... [2023-06-19 14:46:02,541][00753] Num frames 10100... [2023-06-19 14:46:02,721][00753] Num frames 10200... [2023-06-19 14:46:02,905][00753] Num frames 10300... [2023-06-19 14:46:03,086][00753] Num frames 10400... [2023-06-19 14:46:03,268][00753] Num frames 10500... [2023-06-19 14:46:03,454][00753] Num frames 10600... [2023-06-19 14:46:03,636][00753] Num frames 10700... [2023-06-19 14:46:03,822][00753] Num frames 10800... [2023-06-19 14:46:04,017][00753] Num frames 10900... [2023-06-19 14:46:04,199][00753] Num frames 11000... [2023-06-19 14:46:04,390][00753] Num frames 11100... [2023-06-19 14:46:04,573][00753] Num frames 11200... [2023-06-19 14:46:04,757][00753] Num frames 11300... [2023-06-19 14:46:04,923][00753] Num frames 11400... [2023-06-19 14:46:05,054][00753] Num frames 11500... [2023-06-19 14:46:05,120][00753] Avg episode rewards: #0: 32.230, true rewards: #0: 12.786 [2023-06-19 14:46:05,121][00753] Avg episode reward: 32.230, avg true_objective: 12.786 [2023-06-19 14:46:05,246][00753] Num frames 11600... [2023-06-19 14:46:05,372][00753] Num frames 11700... [2023-06-19 14:46:05,505][00753] Num frames 11800... [2023-06-19 14:46:05,627][00753] Num frames 11900... [2023-06-19 14:46:05,750][00753] Num frames 12000... [2023-06-19 14:46:05,872][00753] Num frames 12100... [2023-06-19 14:46:06,003][00753] Num frames 12200... [2023-06-19 14:46:06,127][00753] Num frames 12300... [2023-06-19 14:46:06,187][00753] Avg episode rewards: #0: 30.603, true rewards: #0: 12.303 [2023-06-19 14:46:06,189][00753] Avg episode reward: 30.603, avg true_objective: 12.303 [2023-06-19 14:47:20,286][00753] Replay video saved to /content/train_dir/default_experiment/replay.mp4!