diff --git "a/sf_log.txt" "b/sf_log.txt"
new file mode 100644--- /dev/null
+++ "b/sf_log.txt"
@@ -0,0 +1,1758 @@
+[2024-09-01 14:19:32,046][11658] Saving configuration to /home/montana/repos/deep-rl/unit8-ppo/train_dir/default_experiment/config.json...
+[2024-09-01 14:19:32,081][11658] Rollout worker 0 uses device cpu
+[2024-09-01 14:19:32,082][11658] Rollout worker 1 uses device cpu
+[2024-09-01 14:19:32,083][11658] Rollout worker 2 uses device cpu
+[2024-09-01 14:19:32,084][11658] Rollout worker 3 uses device cpu
+[2024-09-01 14:19:32,085][11658] Rollout worker 4 uses device cpu
+[2024-09-01 14:19:32,087][11658] Rollout worker 5 uses device cpu
+[2024-09-01 14:19:32,089][11658] Rollout worker 6 uses device cpu
+[2024-09-01 14:19:32,090][11658] Rollout worker 7 uses device cpu
+[2024-09-01 14:19:32,146][11658] Using GPUs [0] for process 0 (actually maps to GPUs [0])
+[2024-09-01 14:19:32,147][11658] InferenceWorker_p0-w0: min num requests: 2
+[2024-09-01 14:19:32,173][11658] Starting all processes...
+[2024-09-01 14:19:32,175][11658] Starting process learner_proc0
+[2024-09-01 14:19:32,264][11658] Starting all processes...
+[2024-09-01 14:19:32,275][11658] Starting process inference_proc0-0
+[2024-09-01 14:19:32,276][11658] Starting process rollout_proc0
+[2024-09-01 14:19:32,276][11658] Starting process rollout_proc1
+[2024-09-01 14:19:32,276][11658] Starting process rollout_proc2
+[2024-09-01 14:19:32,277][11658] Starting process rollout_proc3
+[2024-09-01 14:19:32,277][11658] Starting process rollout_proc4
+[2024-09-01 14:19:32,278][11658] Starting process rollout_proc5
+[2024-09-01 14:19:32,278][11658] Starting process rollout_proc6
+[2024-09-01 14:19:32,278][11658] Starting process rollout_proc7
+[2024-09-01 14:19:37,152][12736] Using GPUs [0] for process 0 (actually maps to GPUs [0])
+[2024-09-01 14:19:37,153][12736] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0
+[2024-09-01 14:19:37,248][12736] Num visible devices: 1
+[2024-09-01 14:19:37,329][12736] Starting seed is not provided
+[2024-09-01 14:19:37,330][12736] Using GPUs [0] for process 0 (actually maps to GPUs [0])
+[2024-09-01 14:19:37,330][12736] Initializing actor-critic model on device cuda:0
+[2024-09-01 14:19:37,330][12736] RunningMeanStd input shape: (3, 72, 128)
+[2024-09-01 14:19:37,332][12736] RunningMeanStd input shape: (1,)
+[2024-09-01 14:19:37,358][12736] ConvEncoder: input_channels=3
+[2024-09-01 14:19:37,519][12754] Worker 4 uses CPU cores [4]
+[2024-09-01 14:19:37,679][12756] Worker 5 uses CPU cores [5]
+[2024-09-01 14:19:37,897][12751] Worker 1 uses CPU cores [1]
+[2024-09-01 14:19:37,929][12753] Worker 2 uses CPU cores [2]
+[2024-09-01 14:19:37,934][12752] Worker 3 uses CPU cores [3]
+[2024-09-01 14:19:37,952][12749] Using GPUs [0] for process 0 (actually maps to GPUs [0])
+[2024-09-01 14:19:37,953][12749] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0
+[2024-09-01 14:19:38,000][12750] Worker 0 uses CPU cores [0]
+[2024-09-01 14:19:38,002][12757] Worker 7 uses CPU cores [7]
+[2024-09-01 14:19:38,008][12749] Num visible devices: 1
+[2024-09-01 14:19:38,106][12755] Worker 6 uses CPU cores [6]
+[2024-09-01 14:19:38,201][12736] Conv encoder output size: 512
+[2024-09-01 14:19:38,201][12736] Policy head output size: 512
+[2024-09-01 14:19:38,215][12736] Created Actor Critic model with architecture:
+[2024-09-01 14:19:38,215][12736] ActorCriticSharedWeights(
+  (obs_normalizer): ObservationNormalizer(
+    (running_mean_std): RunningMeanStdDictInPlace(
+      (running_mean_std): ModuleDict(
+        (obs): RunningMeanStdInPlace()
+      )
+    )
+  )
+  (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace)
+  (encoder): VizdoomEncoder(
+    (basic_encoder): ConvEncoder(
+      (enc): RecursiveScriptModule(
+        original_name=ConvEncoderImpl
+        (conv_head): RecursiveScriptModule(
+          original_name=Sequential
+          (0): RecursiveScriptModule(original_name=Conv2d)
+          (1): RecursiveScriptModule(original_name=ELU)
+          (2): RecursiveScriptModule(original_name=Conv2d)
+          (3): RecursiveScriptModule(original_name=ELU)
+          (4): RecursiveScriptModule(original_name=Conv2d)
+          (5): RecursiveScriptModule(original_name=ELU)
+        )
+        (mlp_layers): RecursiveScriptModule(
+          original_name=Sequential
+          (0): RecursiveScriptModule(original_name=Linear)
+          (1): RecursiveScriptModule(original_name=ELU)
+        )
+      )
+    )
+  )
+  (core): ModelCoreRNN(
+    (core): GRU(512, 512)
+  )
+  (decoder): MlpDecoder(
+    (mlp): Identity()
+  )
+  (critic_linear): Linear(in_features=512, out_features=1, bias=True)
+  (action_parameterization): ActionParameterizationDefault(
+    (distribution_linear): Linear(in_features=512, out_features=5, bias=True)
+  )
+)
+[2024-09-01 14:19:52,141][11658] Heartbeat connected on Batcher_0
+[2024-09-01 14:19:52,146][11658] Heartbeat connected on InferenceWorker_p0-w0
+[2024-09-01 14:19:52,152][11658] Heartbeat connected on RolloutWorker_w0
+[2024-09-01 14:19:52,157][11658] Heartbeat connected on RolloutWorker_w2
+[2024-09-01 14:19:52,159][11658] Heartbeat connected on RolloutWorker_w3
+[2024-09-01 14:19:52,161][11658] Heartbeat connected on RolloutWorker_w1
+[2024-09-01 14:19:52,162][11658] Heartbeat connected on RolloutWorker_w4
+[2024-09-01 14:19:52,166][11658] Heartbeat connected on RolloutWorker_w5
+[2024-09-01 14:19:52,170][11658] Heartbeat connected on RolloutWorker_w6
+[2024-09-01 14:19:52,190][11658] Heartbeat connected on RolloutWorker_w7
+[2024-09-01 14:20:07,736][12736] Using optimizer <class 'torch.optim.adam.Adam'>
+[2024-09-01 14:20:07,737][12736] No checkpoints found
+[2024-09-01 14:20:07,737][12736] Did not load from checkpoint, starting from scratch!
+[2024-09-01 14:20:07,738][12736] Initialized policy 0 weights for model version 0
+[2024-09-01 14:20:07,756][12736] LearnerWorker_p0 finished initialization!
+[2024-09-01 14:20:07,756][12736] Using GPUs [0] for process 0 (actually maps to GPUs [0])
+[2024-09-01 14:20:07,759][11658] Heartbeat connected on LearnerWorker_p0
+[2024-09-01 14:20:11,913][11658] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:20:16,913][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:20:18,288][12749] Unhandled exception CUDA error: unknown error
+CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
+For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
+Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
+ in evt loop inference_proc0-0_evt_loop
+[2024-09-01 14:20:21,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:20:26,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:20:31,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:20:36,913][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:20:41,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:20:46,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:20:51,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:20:56,913][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:21:01,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:21:06,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:21:11,913][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:21:16,913][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:21:21,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:21:26,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:21:26,915][12736] Saving /home/montana/repos/deep-rl/unit8-ppo/train_dir/default_experiment/checkpoint_p0/checkpoint_000000000_0.pth...
+[2024-09-01 14:21:31,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:21:36,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:21:41,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:21:46,913][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:21:51,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:21:56,914][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:22:01,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:22:06,913][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:22:11,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:22:16,913][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:22:21,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:22:26,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:22:31,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:22:36,913][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:22:41,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:22:46,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:22:51,913][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:22:56,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:23:01,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:23:06,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:23:11,913][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:23:16,913][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:23:21,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:23:26,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:23:26,915][12736] Saving /home/montana/repos/deep-rl/unit8-ppo/train_dir/default_experiment/checkpoint_p0/checkpoint_000000000_0.pth...
+[2024-09-01 14:23:31,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:23:36,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:23:41,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:23:46,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:23:51,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:23:56,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:24:01,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:24:06,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:24:11,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:24:16,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:24:21,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:24:26,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:24:31,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:24:36,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:24:41,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:24:46,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:24:51,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:24:56,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:25:01,913][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:25:06,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:25:11,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:25:16,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:25:21,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:25:26,913][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:25:26,916][12736] Saving /home/montana/repos/deep-rl/unit8-ppo/train_dir/default_experiment/checkpoint_p0/checkpoint_000000000_0.pth...
+[2024-09-01 14:25:31,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:25:36,913][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:25:41,913][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:25:46,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:25:51,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:25:56,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:26:01,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:26:06,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:26:11,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:26:16,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:26:21,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:26:26,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:26:31,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:26:36,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:26:41,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:26:46,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:26:51,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:26:56,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:27:01,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:27:06,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:27:11,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:27:16,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:27:21,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:27:26,913][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:27:26,915][12736] Saving /home/montana/repos/deep-rl/unit8-ppo/train_dir/default_experiment/checkpoint_p0/checkpoint_000000000_0.pth...
+[2024-09-01 14:27:31,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:27:36,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:27:41,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:27:46,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:27:51,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:27:56,913][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:28:01,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:28:06,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:28:11,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:28:16,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:28:21,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:28:26,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:28:31,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:28:36,913][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:28:41,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:28:46,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:28:51,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:28:56,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:29:01,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:29:06,913][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:29:11,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:29:16,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:29:21,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:29:26,913][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:29:26,915][12736] Saving /home/montana/repos/deep-rl/unit8-ppo/train_dir/default_experiment/checkpoint_p0/checkpoint_000000000_0.pth...
+[2024-09-01 14:29:31,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:29:36,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:29:41,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:29:46,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:29:51,913][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:29:56,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:30:01,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:30:06,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:30:11,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:30:16,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:30:21,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:30:26,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:30:31,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:30:36,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:30:41,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:30:46,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:30:51,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:30:56,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:31:01,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:31:06,913][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:31:11,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:31:16,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:31:21,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:31:26,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:31:26,915][12736] Saving /home/montana/repos/deep-rl/unit8-ppo/train_dir/default_experiment/checkpoint_p0/checkpoint_000000000_0.pth...
+[2024-09-01 14:31:31,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:31:36,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:31:41,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:31:46,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:31:51,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:31:56,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:32:01,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:32:06,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:32:11,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:32:16,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:32:21,913][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:32:26,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:32:31,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:32:36,913][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:32:41,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:32:46,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:32:51,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:32:56,913][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:33:01,913][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:33:06,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:33:11,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:33:16,913][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:33:21,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:33:26,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:33:26,915][12736] Saving /home/montana/repos/deep-rl/unit8-ppo/train_dir/default_experiment/checkpoint_p0/checkpoint_000000000_0.pth...
+[2024-09-01 14:33:31,913][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:33:36,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:33:41,913][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:33:46,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:33:51,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:33:56,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:34:01,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:34:06,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:34:11,913][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:34:16,913][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:34:21,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:34:26,913][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:34:31,913][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:34:36,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:34:41,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:34:46,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:34:51,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:34:56,913][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:35:01,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:35:06,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:35:11,913][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:35:16,913][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:35:21,913][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:35:26,913][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:35:26,916][12736] Saving /home/montana/repos/deep-rl/unit8-ppo/train_dir/default_experiment/checkpoint_p0/checkpoint_000000000_0.pth...
+[2024-09-01 14:35:31,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:35:36,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:35:41,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:35:46,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:35:51,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:35:56,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:36:01,913][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:36:06,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:36:11,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:36:16,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:36:21,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:36:26,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:36:31,913][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:36:36,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:36:41,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:36:46,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:36:51,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:36:56,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:37:01,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:37:06,913][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:37:11,913][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:37:16,913][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:37:21,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:37:26,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:37:26,915][12736] Saving /home/montana/repos/deep-rl/unit8-ppo/train_dir/default_experiment/checkpoint_p0/checkpoint_000000000_0.pth...
+[2024-09-01 14:37:31,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:37:36,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:37:41,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:37:46,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:37:51,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:37:56,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:38:01,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:38:06,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:38:11,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:38:16,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:38:21,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:38:26,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:38:31,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:38:36,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:38:41,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:38:46,913][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:38:51,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:38:56,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:39:01,913][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:39:06,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:39:11,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:39:16,913][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:39:21,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:39:26,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-01 14:39:26,915][12736] Saving /home/montana/repos/deep-rl/unit8-ppo/train_dir/default_experiment/checkpoint_p0/checkpoint_000000000_0.pth...
+[2024-09-01 14:39:26,917][11658] No heartbeat for components: InferenceWorker_p0-w0 (1174 seconds)
+[2024-09-01 14:39:26,918][11658] Stopping training due to lack of heartbeats from <class 'sample_factory.algo.sampling.inference_worker.InferenceWorker'>
+[2024-09-01 14:39:26,922][11658] Component InferenceWorker_p0-w0 process died already! Don't wait for it.
+[2024-09-01 14:39:26,923][11658] Component RolloutWorker_w6 stopped!
+[2024-09-01 14:39:26,925][11658] Waiting for ['Batcher_0', 'LearnerWorker_p0', 'RolloutWorker_w0', 'RolloutWorker_w1', 'RolloutWorker_w2', 'RolloutWorker_w3', 'RolloutWorker_w4', 'RolloutWorker_w5', 'RolloutWorker_w7'] to stop...
+[2024-09-01 14:39:26,926][11658] Component RolloutWorker_w2 stopped!
+[2024-09-01 14:39:26,927][11658] Waiting for ['Batcher_0', 'LearnerWorker_p0', 'RolloutWorker_w0', 'RolloutWorker_w1', 'RolloutWorker_w3', 'RolloutWorker_w4', 'RolloutWorker_w5', 'RolloutWorker_w7'] to stop...
+[2024-09-01 14:39:26,928][11658] Component RolloutWorker_w4 stopped!
+[2024-09-01 14:39:26,929][11658] Waiting for ['Batcher_0', 'LearnerWorker_p0', 'RolloutWorker_w0', 'RolloutWorker_w1', 'RolloutWorker_w3', 'RolloutWorker_w5', 'RolloutWorker_w7'] to stop...
+[2024-09-01 14:39:26,930][11658] Component RolloutWorker_w5 stopped!
+[2024-09-01 14:39:26,932][11658] Waiting for ['Batcher_0', 'LearnerWorker_p0', 'RolloutWorker_w0', 'RolloutWorker_w1', 'RolloutWorker_w3', 'RolloutWorker_w7'] to stop...
+[2024-09-01 14:39:26,924][12755] Stopping RolloutWorker_w6...
+[2024-09-01 14:39:26,935][12755] Loop rollout_proc6_evt_loop terminating...
+[2024-09-01 14:39:26,924][12753] Stopping RolloutWorker_w2...
+[2024-09-01 14:39:26,925][12754] Stopping RolloutWorker_w4...
+[2024-09-01 14:39:26,937][12753] Loop rollout_proc2_evt_loop terminating...
+[2024-09-01 14:39:26,938][12754] Loop rollout_proc4_evt_loop terminating...
+[2024-09-01 14:39:26,942][12736] Stopping Batcher_0...
+[2024-09-01 14:39:26,943][12736] Loop batcher_evt_loop terminating...
+[2024-09-01 14:39:26,943][11658] Component RolloutWorker_w7 stopped!
+[2024-09-01 14:39:26,934][12750] Stopping RolloutWorker_w0...
+[2024-09-01 14:39:26,947][11658] Waiting for ['Batcher_0', 'LearnerWorker_p0', 'RolloutWorker_w0', 'RolloutWorker_w1', 'RolloutWorker_w3'] to stop...
+[2024-09-01 14:39:26,949][11658] Component RolloutWorker_w0 stopped!
+[2024-09-01 14:39:26,936][12757] Stopping RolloutWorker_w7...
+[2024-09-01 14:39:26,951][12750] Loop rollout_proc0_evt_loop terminating...
+[2024-09-01 14:39:26,951][11658] Waiting for ['Batcher_0', 'LearnerWorker_p0', 'RolloutWorker_w1', 'RolloutWorker_w3'] to stop...
+[2024-09-01 14:39:26,953][12757] Loop rollout_proc7_evt_loop terminating...
+[2024-09-01 14:39:26,928][12756] Stopping RolloutWorker_w5...
+[2024-09-01 14:39:26,952][11658] Component RolloutWorker_w1 stopped!
+[2024-09-01 14:39:26,956][11658] Waiting for ['Batcher_0', 'LearnerWorker_p0', 'RolloutWorker_w3'] to stop...
+[2024-09-01 14:39:26,958][12756] Loop rollout_proc5_evt_loop terminating...
+[2024-09-01 14:39:26,957][11658] Component RolloutWorker_w3 stopped!
+[2024-09-01 14:39:26,961][11658] Waiting for ['Batcher_0', 'LearnerWorker_p0'] to stop...
+[2024-09-01 14:39:26,965][11658] Component Batcher_0 stopped!
+[2024-09-01 14:39:26,971][11658] Waiting for ['LearnerWorker_p0'] to stop...
+[2024-09-01 14:39:27,002][12736] Saving /home/montana/repos/deep-rl/unit8-ppo/train_dir/default_experiment/checkpoint_p0/checkpoint_000000000_0.pth...
+[2024-09-01 14:39:26,943][12752] Stopping RolloutWorker_w3...
+[2024-09-01 14:39:27,017][12752] Loop rollout_proc3_evt_loop terminating...
+[2024-09-01 14:39:26,934][12751] Stopping RolloutWorker_w1...
+[2024-09-01 14:39:27,055][12751] Loop rollout_proc1_evt_loop terminating...
+[2024-09-01 14:39:27,049][12736] Saving /home/montana/repos/deep-rl/unit8-ppo/train_dir/default_experiment/checkpoint_p0/checkpoint_000000000_0.pth...
+[2024-09-01 14:39:27,110][12736] Stopping LearnerWorker_p0...
+[2024-09-01 14:39:27,111][12736] Loop learner_proc0_evt_loop terminating...
+[2024-09-01 14:39:27,110][11658] Component LearnerWorker_p0 stopped!
+[2024-09-01 14:39:27,112][11658] Waiting for process learner_proc0 to stop...
+[2024-09-01 14:39:31,369][11658] Waiting for process inference_proc0-0 to join...
+[2024-09-01 14:39:31,370][11658] Waiting for process rollout_proc0 to join...
+[2024-09-01 14:39:31,372][11658] Waiting for process rollout_proc1 to join...
+[2024-09-01 14:39:31,373][11658] Waiting for process rollout_proc2 to join...
+[2024-09-01 14:39:31,374][11658] Waiting for process rollout_proc3 to join...
+[2024-09-01 14:39:31,375][11658] Waiting for process rollout_proc4 to join...
+[2024-09-01 14:39:31,376][11658] Waiting for process rollout_proc5 to join...
+[2024-09-01 14:39:31,378][11658] Waiting for process rollout_proc6 to join...
+[2024-09-01 14:39:31,379][11658] Waiting for process rollout_proc7 to join...
+[2024-09-01 14:39:31,380][11658] Batcher 0 profile tree view:
+[2024-09-01 14:39:31,380][11658] Learner 0 profile tree view:
+[2024-09-01 14:39:31,381][11658] RolloutWorker_w0 profile tree view:
+[2024-09-01 14:39:31,383][11658] RolloutWorker_w7 profile tree view:
+[2024-09-01 14:39:31,384][11658] Loop Runner_EvtLoop terminating...
+[2024-09-01 14:39:31,385][11658] Runner profile tree view:
+main_loop: 1199.2124
+[2024-09-01 14:39:31,386][11658] Collected {0: 0}, FPS: 0.0
+[2024-09-01 14:47:24,536][11658] Loading existing experiment configuration from /home/montana/repos/deep-rl/unit8-ppo/train_dir/default_experiment/config.json
+[2024-09-01 14:47:24,538][11658] Overriding arg 'num_workers' with value 1 passed from command line
+[2024-09-01 14:47:24,540][11658] Adding new argument 'no_render'=True that is not in the saved config file!
+[2024-09-01 14:47:24,541][11658] Adding new argument 'save_video'=True that is not in the saved config file!
+[2024-09-01 14:47:24,541][11658] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
+[2024-09-01 14:47:24,542][11658] Adding new argument 'video_name'=None that is not in the saved config file!
+[2024-09-01 14:47:24,543][11658] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file!
+[2024-09-01 14:47:24,543][11658] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
+[2024-09-01 14:47:24,544][11658] Adding new argument 'push_to_hub'=False that is not in the saved config file!
+[2024-09-01 14:47:24,545][11658] Adding new argument 'hf_repository'=None that is not in the saved config file!
+[2024-09-01 14:47:24,546][11658] Adding new argument 'policy_index'=0 that is not in the saved config file!
+[2024-09-01 14:47:24,547][11658] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
+[2024-09-01 14:47:24,548][11658] Adding new argument 'train_script'=None that is not in the saved config file!
+[2024-09-01 14:47:24,549][11658] Adding new argument 'enjoy_script'=None that is not in the saved config file!
+[2024-09-01 14:47:24,549][11658] Using frameskip 1 and render_action_repeat=4 for evaluation
+[2024-09-01 14:47:24,610][11658] Doom resolution: 160x120, resize resolution: (128, 72)
+[2024-09-01 14:47:24,618][11658] RunningMeanStd input shape: (3, 72, 128)
+[2024-09-01 14:47:24,630][11658] RunningMeanStd input shape: (1,)
+[2024-09-01 14:47:24,757][11658] ConvEncoder: input_channels=3
+[2024-09-01 14:47:25,427][11658] Conv encoder output size: 512
+[2024-09-01 14:47:25,429][11658] Policy head output size: 512
+[2024-09-01 14:47:36,923][11658] Loading state from checkpoint /home/montana/repos/deep-rl/unit8-ppo/train_dir/default_experiment/checkpoint_p0/checkpoint_000000000_0.pth...
+[2024-09-01 14:47:45,808][11658] Num frames 100...
+[2024-09-01 14:47:46,145][11658] Num frames 200...
+[2024-09-01 14:47:46,429][11658] Num frames 300...
+[2024-09-01 14:47:46,688][11658] Num frames 400...
+[2024-09-01 14:47:46,855][11658] Avg episode rewards: #0: 5.160, true rewards: #0: 4.160
+[2024-09-01 14:47:46,857][11658] Avg episode reward: 5.160, avg true_objective: 4.160
+[2024-09-01 14:47:47,056][11658] Num frames 500...
+[2024-09-01 14:47:47,292][11658] Num frames 600...
+[2024-09-01 14:47:47,523][11658] Num frames 700...
+[2024-09-01 14:47:47,759][11658] Num frames 800...
+[2024-09-01 14:47:47,811][11658] Avg episode rewards: #0: 4.500, true rewards: #0: 4.000
+[2024-09-01 14:47:47,813][11658] Avg episode reward: 4.500, avg true_objective: 4.000
+[2024-09-01 14:47:48,097][11658] Num frames 900...
+[2024-09-01 14:47:48,311][11658] Num frames 1000...
+[2024-09-01 14:47:48,515][11658] Num frames 1100...
+[2024-09-01 14:47:48,753][11658] Avg episode rewards: #0: 4.280, true rewards: #0: 3.947
+[2024-09-01 14:47:48,755][11658] Avg episode reward: 4.280, avg true_objective: 3.947
+[2024-09-01 14:47:48,794][11658] Num frames 1200...
+[2024-09-01 14:47:49,014][11658] Num frames 1300...
+[2024-09-01 14:47:49,231][11658] Num frames 1400...
+[2024-09-01 14:47:49,452][11658] Num frames 1500...
+[2024-09-01 14:47:49,667][11658] Num frames 1600...
+[2024-09-01 14:47:49,879][11658] Num frames 1700...
+[2024-09-01 14:47:50,003][11658] Avg episode rewards: #0: 5.070, true rewards: #0: 4.320
+[2024-09-01 14:47:50,004][11658] Avg episode reward: 5.070, avg true_objective: 4.320
+[2024-09-01 14:47:50,161][11658] Num frames 1800...
+[2024-09-01 14:47:50,400][11658] Num frames 1900...
+[2024-09-01 14:47:50,651][11658] Num frames 2000...
+[2024-09-01 14:47:50,921][11658] Num frames 2100...
+[2024-09-01 14:47:51,007][11658] Avg episode rewards: #0: 4.824, true rewards: #0: 4.224
+[2024-09-01 14:47:51,009][11658] Avg episode reward: 4.824, avg true_objective: 4.224
+[2024-09-01 14:47:51,287][11658] Num frames 2200...
+[2024-09-01 14:47:51,552][11658] Num frames 2300...
+[2024-09-01 14:47:51,784][11658] Num frames 2400...
+[2024-09-01 14:47:52,096][11658] Avg episode rewards: #0: 4.660, true rewards: #0: 4.160
+[2024-09-01 14:47:52,099][11658] Avg episode reward: 4.660, avg true_objective: 4.160
+[2024-09-01 14:47:52,120][11658] Num frames 2500...
+[2024-09-01 14:47:52,397][11658] Num frames 2600...
+[2024-09-01 14:47:52,684][11658] Num frames 2700...
+[2024-09-01 14:47:52,963][11658] Num frames 2800...
+[2024-09-01 14:47:53,271][11658] Avg episode rewards: #0: 4.543, true rewards: #0: 4.114
+[2024-09-01 14:47:53,273][11658] Avg episode reward: 4.543, avg true_objective: 4.114
+[2024-09-01 14:47:53,379][11658] Num frames 2900...
+[2024-09-01 14:47:53,659][11658] Num frames 3000...
+[2024-09-01 14:47:53,916][11658] Num frames 3100...
+[2024-09-01 14:47:54,160][11658] Num frames 3200...
+[2024-09-01 14:47:54,389][11658] Num frames 3300...
+[2024-09-01 14:47:54,511][11658] Avg episode rewards: #0: 4.660, true rewards: #0: 4.160
+[2024-09-01 14:47:54,512][11658] Avg episode reward: 4.660, avg true_objective: 4.160
+[2024-09-01 14:47:54,681][11658] Num frames 3400...
+[2024-09-01 14:47:54,907][11658] Num frames 3500...
+[2024-09-01 14:47:55,118][11658] Num frames 3600...
+[2024-09-01 14:47:55,342][11658] Num frames 3700...
+[2024-09-01 14:47:55,420][11658] Avg episode rewards: #0: 4.569, true rewards: #0: 4.124
+[2024-09-01 14:47:55,421][11658] Avg episode reward: 4.569, avg true_objective: 4.124
+[2024-09-01 14:47:55,629][11658] Num frames 3800...
+[2024-09-01 14:47:55,807][11658] Num frames 3900...
+[2024-09-01 14:47:55,961][11658] Num frames 4000...
+[2024-09-01 14:47:56,113][11658] Num frames 4100...
+[2024-09-01 14:47:56,257][11658] Avg episode rewards: #0: 4.660, true rewards: #0: 4.160
+[2024-09-01 14:47:56,258][11658] Avg episode reward: 4.660, avg true_objective: 4.160
+[2024-09-01 14:48:04,794][11658] Replay video saved to /home/montana/repos/deep-rl/unit8-ppo/train_dir/default_experiment/replay.mp4!
+[2024-09-02 00:15:20,339][11658] Environment doom_basic already registered, overwriting...
+[2024-09-02 00:15:20,352][11658] Environment doom_two_colors_easy already registered, overwriting...
+[2024-09-02 00:15:20,354][11658] Environment doom_two_colors_hard already registered, overwriting...
+[2024-09-02 00:15:20,355][11658] Environment doom_dm already registered, overwriting...
+[2024-09-02 00:15:20,356][11658] Environment doom_dwango5 already registered, overwriting...
+[2024-09-02 00:15:20,357][11658] Environment doom_my_way_home_flat_actions already registered, overwriting...
+[2024-09-02 00:15:20,357][11658] Environment doom_defend_the_center_flat_actions already registered, overwriting...
+[2024-09-02 00:15:20,358][11658] Environment doom_my_way_home already registered, overwriting...
+[2024-09-02 00:15:20,359][11658] Environment doom_deadly_corridor already registered, overwriting...
+[2024-09-02 00:15:20,361][11658] Environment doom_defend_the_center already registered, overwriting...
+[2024-09-02 00:15:20,361][11658] Environment doom_defend_the_line already registered, overwriting...
+[2024-09-02 00:15:20,362][11658] Environment doom_health_gathering already registered, overwriting...
+[2024-09-02 00:15:20,363][11658] Environment doom_health_gathering_supreme already registered, overwriting...
+[2024-09-02 00:15:20,364][11658] Environment doom_battle already registered, overwriting...
+[2024-09-02 00:15:20,365][11658] Environment doom_battle2 already registered, overwriting...
+[2024-09-02 00:15:20,366][11658] Environment doom_duel_bots already registered, overwriting...
+[2024-09-02 00:15:20,367][11658] Environment doom_deathmatch_bots already registered, overwriting...
+[2024-09-02 00:15:20,368][11658] Environment doom_duel already registered, overwriting...
+[2024-09-02 00:15:20,370][11658] Environment doom_deathmatch_full already registered, overwriting...
+[2024-09-02 00:15:20,373][11658] Environment doom_benchmark already registered, overwriting...
+[2024-09-02 00:15:20,374][11658] register_encoder_factory: <function make_vizdoom_encoder at 0x7ff4f61fe7a0>
+[2024-09-02 00:15:20,462][11658] Loading existing experiment configuration from /home/montana/repos/deep-rl/unit8-ppo/train_dir/default_experiment/config.json
+[2024-09-02 00:15:20,490][11658] Experiment dir /home/montana/repos/deep-rl/unit8-ppo/train_dir/default_experiment already exists!
+[2024-09-02 00:15:20,491][11658] Resuming existing experiment from /home/montana/repos/deep-rl/unit8-ppo/train_dir/default_experiment...
+[2024-09-02 00:15:20,492][11658] Weights and Biases integration disabled
+[2024-09-02 00:15:20,518][11658] Environment var CUDA_VISIBLE_DEVICES is 0
+
+[2024-09-02 00:15:25,589][11658] Starting experiment with the following configuration:
+help=False
+algo=APPO
+env=doom_health_gathering_supreme
+experiment=default_experiment
+train_dir=/home/montana/repos/deep-rl/unit8-ppo/train_dir
+restart_behavior=resume
+device=gpu
+seed=None
+num_policies=1
+async_rl=True
+serial_mode=False
+batched_sampling=False
+num_batches_to_accumulate=2
+worker_num_splits=2
+policy_workers_per_policy=1
+max_policy_lag=1000
+num_workers=8
+num_envs_per_worker=4
+batch_size=1024
+num_batches_per_epoch=1
+num_epochs=1
+rollout=32
+recurrence=32
+shuffle_minibatches=False
+gamma=0.99
+reward_scale=1.0
+reward_clip=1000.0
+value_bootstrap=False
+normalize_returns=True
+exploration_loss_coeff=0.001
+value_loss_coeff=0.5
+kl_loss_coeff=0.0
+exploration_loss=symmetric_kl
+gae_lambda=0.95
+ppo_clip_ratio=0.1
+ppo_clip_value=0.2
+with_vtrace=False
+vtrace_rho=1.0
+vtrace_c=1.0
+optimizer=adam
+adam_eps=1e-06
+adam_beta1=0.9
+adam_beta2=0.999
+max_grad_norm=4.0
+learning_rate=0.0001
+lr_schedule=constant
+lr_schedule_kl_threshold=0.008
+lr_adaptive_min=1e-06
+lr_adaptive_max=0.01
+obs_subtract_mean=0.0
+obs_scale=255.0
+normalize_input=True
+normalize_input_keys=None
+decorrelate_experience_max_seconds=0
+decorrelate_envs_on_one_worker=True
+actor_worker_gpus=[]
+set_workers_cpu_affinity=True
+force_envs_single_thread=False
+default_niceness=0
+log_to_file=True
+experiment_summaries_interval=10
+flush_summaries_interval=30
+stats_avg=100
+summaries_use_frameskip=True
+heartbeat_interval=20
+heartbeat_reporting_interval=600
+train_for_env_steps=4000000
+train_for_seconds=10000000000
+save_every_sec=120
+keep_checkpoints=2
+load_checkpoint_kind=latest
+save_milestones_sec=-1
+save_best_every_sec=5
+save_best_metric=reward
+save_best_after=100000
+benchmark=False
+encoder_mlp_layers=[512, 512]
+encoder_conv_architecture=convnet_simple
+encoder_conv_mlp_layers=[512]
+use_rnn=True
+rnn_size=512
+rnn_type=gru
+rnn_num_layers=1
+decoder_mlp_layers=[]
+nonlinearity=elu
+policy_initialization=orthogonal
+policy_init_gain=1.0
+actor_critic_share_weights=True
+adaptive_stddev=True
+continuous_tanh_scale=0.0
+initial_stddev=1.0
+use_env_info_cache=False
+env_gpu_actions=False
+env_gpu_observations=True
+env_frameskip=4
+env_framestack=1
+pixel_format=CHW
+use_record_episode_statistics=False
+with_wandb=False
+wandb_user=None
+wandb_project=sample_factory
+wandb_group=None
+wandb_job_type=SF
+wandb_tags=[]
+with_pbt=False
+pbt_mix_policies_in_one_env=True
+pbt_period_env_steps=5000000
+pbt_start_mutation=20000000
+pbt_replace_fraction=0.3
+pbt_mutation_rate=0.15
+pbt_replace_reward_gap=0.1
+pbt_replace_reward_gap_absolute=1e-06
+pbt_optimize_gamma=False
+pbt_target_objective=true_objective
+pbt_perturb_min=1.1
+pbt_perturb_max=1.5
+num_agents=-1
+num_humans=0
+num_bots=-1
+start_bot_difficulty=None
+timelimit=None
+res_w=128
+res_h=72
+wide_aspect_ratio=False
+eval_env_frameskip=1
+fps=35
+command_line=--env=doom_health_gathering_supreme --num_workers=8 --num_envs_per_worker=4 --train_for_env_steps=4000000
+cli_args={'env': 'doom_health_gathering_supreme', 'num_workers': 8, 'num_envs_per_worker': 4, 'train_for_env_steps': 4000000}
+git_hash=e923ca6811d177eb3a7a4b268a75d06335cade44
+git_repo_name=https://github.com/monti-python/deep-rl.git
+[2024-09-02 00:15:25,592][11658] Saving configuration to /home/montana/repos/deep-rl/unit8-ppo/train_dir/default_experiment/config.json...
+[2024-09-02 00:15:25,601][11658] Rollout worker 0 uses device cpu
+[2024-09-02 00:15:25,603][11658] Rollout worker 1 uses device cpu
+[2024-09-02 00:15:25,604][11658] Rollout worker 2 uses device cpu
+[2024-09-02 00:15:25,605][11658] Rollout worker 3 uses device cpu
+[2024-09-02 00:15:25,606][11658] Rollout worker 4 uses device cpu
+[2024-09-02 00:15:25,607][11658] Rollout worker 5 uses device cpu
+[2024-09-02 00:15:25,607][11658] Rollout worker 6 uses device cpu
+[2024-09-02 00:15:25,608][11658] Rollout worker 7 uses device cpu
+[2024-09-02 00:15:25,817][11658] Using GPUs [0] for process 0 (actually maps to GPUs [0])
+[2024-09-02 00:15:25,818][11658] InferenceWorker_p0-w0: min num requests: 2
+[2024-09-02 00:15:25,848][11658] Starting all processes...
+[2024-09-02 00:15:25,849][11658] Starting process learner_proc0
+[2024-09-02 00:15:25,889][11658] Starting all processes...
+[2024-09-02 00:15:25,901][11658] Starting process inference_proc0-0
+[2024-09-02 00:15:25,907][11658] Starting process rollout_proc0
+[2024-09-02 00:15:25,909][11658] Starting process rollout_proc1
+[2024-09-02 00:15:25,910][11658] Starting process rollout_proc2
+[2024-09-02 00:15:25,911][11658] Starting process rollout_proc3
+[2024-09-02 00:15:25,913][11658] Starting process rollout_proc4
+[2024-09-02 00:15:25,914][11658] Starting process rollout_proc5
+[2024-09-02 00:15:25,924][11658] Starting process rollout_proc6
+[2024-09-02 00:15:25,930][11658] Starting process rollout_proc7
+[2024-09-02 00:15:30,110][00805] Worker 5 uses CPU cores [5]
+[2024-09-02 00:15:30,114][00808] Worker 6 uses CPU cores [6]
+[2024-09-02 00:15:30,156][00795] Worker 1 uses CPU cores [1]
+[2024-09-02 00:15:30,156][00794] Worker 0 uses CPU cores [0]
+[2024-09-02 00:15:30,157][00793] Using GPUs [0] for process 0 (actually maps to GPUs [0])
+[2024-09-02 00:15:30,157][00793] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0
+[2024-09-02 00:15:30,164][00780] Using GPUs [0] for process 0 (actually maps to GPUs [0])
+[2024-09-02 00:15:30,165][00780] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0
+[2024-09-02 00:15:30,217][00807] Worker 7 uses CPU cores [7]
+[2024-09-02 00:15:30,256][00796] Worker 3 uses CPU cores [3]
+[2024-09-02 00:15:30,306][00780] Num visible devices: 1
+[2024-09-02 00:15:30,306][00793] Num visible devices: 1
+[2024-09-02 00:15:30,381][00806] Worker 4 uses CPU cores [4]
+[2024-09-02 00:15:30,401][00780] Starting seed is not provided
+[2024-09-02 00:15:30,402][00780] Using GPUs [0] for process 0 (actually maps to GPUs [0])
+[2024-09-02 00:15:30,402][00780] Initializing actor-critic model on device cuda:0
+[2024-09-02 00:15:30,402][00780] RunningMeanStd input shape: (3, 72, 128)
+[2024-09-02 00:15:30,407][00780] RunningMeanStd input shape: (1,)
+[2024-09-02 00:15:30,430][00780] ConvEncoder: input_channels=3
+[2024-09-02 00:15:30,499][00797] Worker 2 uses CPU cores [2]
+[2024-09-02 00:15:30,804][00780] Conv encoder output size: 512
+[2024-09-02 00:15:30,804][00780] Policy head output size: 512
+[2024-09-02 00:15:30,851][00780] Created Actor Critic model with architecture:
+[2024-09-02 00:15:30,851][00780] ActorCriticSharedWeights(
+  (obs_normalizer): ObservationNormalizer(
+    (running_mean_std): RunningMeanStdDictInPlace(
+      (running_mean_std): ModuleDict(
+        (obs): RunningMeanStdInPlace()
+      )
+    )
+  )
+  (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace)
+  (encoder): VizdoomEncoder(
+    (basic_encoder): ConvEncoder(
+      (enc): RecursiveScriptModule(
+        original_name=ConvEncoderImpl
+        (conv_head): RecursiveScriptModule(
+          original_name=Sequential
+          (0): RecursiveScriptModule(original_name=Conv2d)
+          (1): RecursiveScriptModule(original_name=ELU)
+          (2): RecursiveScriptModule(original_name=Conv2d)
+          (3): RecursiveScriptModule(original_name=ELU)
+          (4): RecursiveScriptModule(original_name=Conv2d)
+          (5): RecursiveScriptModule(original_name=ELU)
+        )
+        (mlp_layers): RecursiveScriptModule(
+          original_name=Sequential
+          (0): RecursiveScriptModule(original_name=Linear)
+          (1): RecursiveScriptModule(original_name=ELU)
+        )
+      )
+    )
+  )
+  (core): ModelCoreRNN(
+    (core): GRU(512, 512)
+  )
+  (decoder): MlpDecoder(
+    (mlp): Identity()
+  )
+  (critic_linear): Linear(in_features=512, out_features=1, bias=True)
+  (action_parameterization): ActionParameterizationDefault(
+    (distribution_linear): Linear(in_features=512, out_features=5, bias=True)
+  )
+)
+[2024-09-02 00:15:37,309][00780] Using optimizer <class 'torch.optim.adam.Adam'>
+[2024-09-02 00:15:37,310][00780] Loading state from checkpoint /home/montana/repos/deep-rl/unit8-ppo/train_dir/default_experiment/checkpoint_p0/checkpoint_000000000_0.pth...
+[2024-09-02 00:15:37,336][00780] Loading model from checkpoint
+[2024-09-02 00:15:37,338][00780] Loaded experiment state at self.train_step=0, self.env_steps=0
+[2024-09-02 00:15:37,340][00780] Initialized policy 0 weights for model version 0
+[2024-09-02 00:15:37,348][00780] LearnerWorker_p0 finished initialization!
+[2024-09-02 00:15:37,349][00780] Using GPUs [0] for process 0 (actually maps to GPUs [0])
+[2024-09-02 00:15:38,194][00793] Unhandled exception CUDA error: unknown error
+CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
+For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
+Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
+ in evt loop inference_proc0-0_evt_loop
+[2024-09-02 00:15:40,519][11658] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-02 00:15:45,518][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-02 00:15:45,810][11658] Heartbeat connected on Batcher_0
+[2024-09-02 00:15:45,812][11658] Heartbeat connected on LearnerWorker_p0
+[2024-09-02 00:15:45,824][11658] Heartbeat connected on RolloutWorker_w0
+[2024-09-02 00:15:45,827][11658] Heartbeat connected on RolloutWorker_w1
+[2024-09-02 00:15:45,829][11658] Heartbeat connected on RolloutWorker_w2
+[2024-09-02 00:15:45,831][11658] Heartbeat connected on RolloutWorker_w3
+[2024-09-02 00:15:45,834][11658] Heartbeat connected on RolloutWorker_w4
+[2024-09-02 00:15:45,838][11658] Heartbeat connected on RolloutWorker_w5
+[2024-09-02 00:15:45,841][11658] Heartbeat connected on RolloutWorker_w6
+[2024-09-02 00:15:45,850][11658] Heartbeat connected on RolloutWorker_w7
+[2024-09-02 00:15:50,518][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-02 00:15:55,519][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-02 00:16:00,518][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-02 00:16:05,518][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-02 00:16:10,518][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-02 00:16:15,518][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-02 00:16:20,518][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-02 00:16:25,518][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-02 00:16:29,395][11658] Keyboard interrupt detected in the event loop EvtLoop [Runner_EvtLoop, process=main process 11658], exiting...
+[2024-09-02 00:16:29,397][00806] Stopping RolloutWorker_w4...
+[2024-09-02 00:16:29,397][00807] Stopping RolloutWorker_w7...
+[2024-09-02 00:16:29,398][00806] Loop rollout_proc4_evt_loop terminating...
+[2024-09-02 00:16:29,397][00805] Stopping RolloutWorker_w5...
+[2024-09-02 00:16:29,398][00780] Stopping Batcher_0...
+[2024-09-02 00:16:29,398][00807] Loop rollout_proc7_evt_loop terminating...
+[2024-09-02 00:16:29,398][00780] Loop batcher_evt_loop terminating...
+[2024-09-02 00:16:29,398][00805] Loop rollout_proc5_evt_loop terminating...
+[2024-09-02 00:16:29,397][00796] Stopping RolloutWorker_w3...
+[2024-09-02 00:16:29,398][00796] Loop rollout_proc3_evt_loop terminating...
+[2024-09-02 00:16:29,399][00808] Stopping RolloutWorker_w6...
+[2024-09-02 00:16:29,397][11658] Runner profile tree view:
+main_loop: 63.5517
+[2024-09-02 00:16:29,400][00780] Saving /home/montana/repos/deep-rl/unit8-ppo/train_dir/default_experiment/checkpoint_p0/checkpoint_000000000_0.pth...
+[2024-09-02 00:16:29,400][00794] Stopping RolloutWorker_w0...
+[2024-09-02 00:16:29,400][11658] Collected {0: 0}, FPS: 0.0
+[2024-09-02 00:16:29,401][00794] Loop rollout_proc0_evt_loop terminating...
+[2024-09-02 00:16:29,398][00797] Stopping RolloutWorker_w2...
+[2024-09-02 00:16:29,402][00797] Loop rollout_proc2_evt_loop terminating...
+[2024-09-02 00:16:29,407][00808] Loop rollout_proc6_evt_loop terminating...
+[2024-09-02 00:16:29,410][00795] Stopping RolloutWorker_w1...
+[2024-09-02 00:16:29,410][00795] Loop rollout_proc1_evt_loop terminating...
+[2024-09-02 00:16:29,482][00780] Stopping LearnerWorker_p0...
+[2024-09-02 00:16:29,483][00780] Loop learner_proc0_evt_loop terminating...
+[2024-09-02 00:21:05,147][11658] Environment doom_basic already registered, overwriting...
+[2024-09-02 00:21:05,149][11658] Environment doom_two_colors_easy already registered, overwriting...
+[2024-09-02 00:21:05,151][11658] Environment doom_two_colors_hard already registered, overwriting...
+[2024-09-02 00:21:05,152][11658] Environment doom_dm already registered, overwriting...
+[2024-09-02 00:21:05,153][11658] Environment doom_dwango5 already registered, overwriting...
+[2024-09-02 00:21:05,154][11658] Environment doom_my_way_home_flat_actions already registered, overwriting...
+[2024-09-02 00:21:05,154][11658] Environment doom_defend_the_center_flat_actions already registered, overwriting...
+[2024-09-02 00:21:05,155][11658] Environment doom_my_way_home already registered, overwriting...
+[2024-09-02 00:21:05,156][11658] Environment doom_deadly_corridor already registered, overwriting...
+[2024-09-02 00:21:05,157][11658] Environment doom_defend_the_center already registered, overwriting...
+[2024-09-02 00:21:05,158][11658] Environment doom_defend_the_line already registered, overwriting...
+[2024-09-02 00:21:05,159][11658] Environment doom_health_gathering already registered, overwriting...
+[2024-09-02 00:21:05,159][11658] Environment doom_health_gathering_supreme already registered, overwriting...
+[2024-09-02 00:21:05,160][11658] Environment doom_battle already registered, overwriting...
+[2024-09-02 00:21:05,167][11658] Environment doom_battle2 already registered, overwriting...
+[2024-09-02 00:21:05,168][11658] Environment doom_duel_bots already registered, overwriting...
+[2024-09-02 00:21:05,170][11658] Environment doom_deathmatch_bots already registered, overwriting...
+[2024-09-02 00:21:05,171][11658] Environment doom_duel already registered, overwriting...
+[2024-09-02 00:21:05,172][11658] Environment doom_deathmatch_full already registered, overwriting...
+[2024-09-02 00:21:05,173][11658] Environment doom_benchmark already registered, overwriting...
+[2024-09-02 00:21:05,174][11658] register_encoder_factory: <function make_vizdoom_encoder at 0x7ff4f61fe7a0>
+[2024-09-02 00:21:05,196][11658] Loading existing experiment configuration from /home/montana/repos/deep-rl/unit8-ppo/train_dir/default_experiment/config.json
+[2024-09-02 00:21:05,202][11658] Experiment dir /home/montana/repos/deep-rl/unit8-ppo/train_dir/default_experiment already exists!
+[2024-09-02 00:21:05,203][11658] Resuming existing experiment from /home/montana/repos/deep-rl/unit8-ppo/train_dir/default_experiment...
+[2024-09-02 00:21:05,204][11658] Weights and Biases integration disabled
+[2024-09-02 00:21:05,208][11658] Environment var CUDA_VISIBLE_DEVICES is 0
+
+[2024-09-02 00:21:07,147][11658] Starting experiment with the following configuration:
+help=False
+algo=APPO
+env=doom_health_gathering_supreme
+experiment=default_experiment
+train_dir=/home/montana/repos/deep-rl/unit8-ppo/train_dir
+restart_behavior=resume
+device=gpu
+seed=None
+num_policies=1
+async_rl=True
+serial_mode=False
+batched_sampling=False
+num_batches_to_accumulate=2
+worker_num_splits=2
+policy_workers_per_policy=1
+max_policy_lag=1000
+num_workers=8
+num_envs_per_worker=4
+batch_size=1024
+num_batches_per_epoch=1
+num_epochs=1
+rollout=32
+recurrence=32
+shuffle_minibatches=False
+gamma=0.99
+reward_scale=1.0
+reward_clip=1000.0
+value_bootstrap=False
+normalize_returns=True
+exploration_loss_coeff=0.001
+value_loss_coeff=0.5
+kl_loss_coeff=0.0
+exploration_loss=symmetric_kl
+gae_lambda=0.95
+ppo_clip_ratio=0.1
+ppo_clip_value=0.2
+with_vtrace=False
+vtrace_rho=1.0
+vtrace_c=1.0
+optimizer=adam
+adam_eps=1e-06
+adam_beta1=0.9
+adam_beta2=0.999
+max_grad_norm=4.0
+learning_rate=0.0001
+lr_schedule=constant
+lr_schedule_kl_threshold=0.008
+lr_adaptive_min=1e-06
+lr_adaptive_max=0.01
+obs_subtract_mean=0.0
+obs_scale=255.0
+normalize_input=True
+normalize_input_keys=None
+decorrelate_experience_max_seconds=0
+decorrelate_envs_on_one_worker=True
+actor_worker_gpus=[]
+set_workers_cpu_affinity=True
+force_envs_single_thread=False
+default_niceness=0
+log_to_file=True
+experiment_summaries_interval=10
+flush_summaries_interval=30
+stats_avg=100
+summaries_use_frameskip=True
+heartbeat_interval=20
+heartbeat_reporting_interval=600
+train_for_env_steps=4000000
+train_for_seconds=10000000000
+save_every_sec=120
+keep_checkpoints=2
+load_checkpoint_kind=latest
+save_milestones_sec=-1
+save_best_every_sec=5
+save_best_metric=reward
+save_best_after=100000
+benchmark=False
+encoder_mlp_layers=[512, 512]
+encoder_conv_architecture=convnet_simple
+encoder_conv_mlp_layers=[512]
+use_rnn=True
+rnn_size=512
+rnn_type=gru
+rnn_num_layers=1
+decoder_mlp_layers=[]
+nonlinearity=elu
+policy_initialization=orthogonal
+policy_init_gain=1.0
+actor_critic_share_weights=True
+adaptive_stddev=True
+continuous_tanh_scale=0.0
+initial_stddev=1.0
+use_env_info_cache=False
+env_gpu_actions=False
+env_gpu_observations=True
+env_frameskip=4
+env_framestack=1
+pixel_format=CHW
+use_record_episode_statistics=False
+with_wandb=False
+wandb_user=None
+wandb_project=sample_factory
+wandb_group=None
+wandb_job_type=SF
+wandb_tags=[]
+with_pbt=False
+pbt_mix_policies_in_one_env=True
+pbt_period_env_steps=5000000
+pbt_start_mutation=20000000
+pbt_replace_fraction=0.3
+pbt_mutation_rate=0.15
+pbt_replace_reward_gap=0.1
+pbt_replace_reward_gap_absolute=1e-06
+pbt_optimize_gamma=False
+pbt_target_objective=true_objective
+pbt_perturb_min=1.1
+pbt_perturb_max=1.5
+num_agents=-1
+num_humans=0
+num_bots=-1
+start_bot_difficulty=None
+timelimit=None
+res_w=128
+res_h=72
+wide_aspect_ratio=False
+eval_env_frameskip=1
+fps=35
+command_line=--env=doom_health_gathering_supreme --num_workers=8 --num_envs_per_worker=4 --train_for_env_steps=4000000
+cli_args={'env': 'doom_health_gathering_supreme', 'num_workers': 8, 'num_envs_per_worker': 4, 'train_for_env_steps': 4000000}
+git_hash=e923ca6811d177eb3a7a4b268a75d06335cade44
+git_repo_name=https://github.com/monti-python/deep-rl.git
+[2024-09-02 00:21:07,149][11658] Saving configuration to /home/montana/repos/deep-rl/unit8-ppo/train_dir/default_experiment/config.json...
+[2024-09-02 00:21:07,152][11658] Rollout worker 0 uses device cpu
+[2024-09-02 00:21:07,153][11658] Rollout worker 1 uses device cpu
+[2024-09-02 00:21:07,154][11658] Rollout worker 2 uses device cpu
+[2024-09-02 00:21:07,155][11658] Rollout worker 3 uses device cpu
+[2024-09-02 00:21:07,155][11658] Rollout worker 4 uses device cpu
+[2024-09-02 00:21:07,156][11658] Rollout worker 5 uses device cpu
+[2024-09-02 00:21:07,157][11658] Rollout worker 6 uses device cpu
+[2024-09-02 00:21:07,158][11658] Rollout worker 7 uses device cpu
+[2024-09-02 00:21:07,210][11658] Using GPUs [0] for process 0 (actually maps to GPUs [0])
+[2024-09-02 00:21:07,212][11658] InferenceWorker_p0-w0: min num requests: 2
+[2024-09-02 00:21:07,262][11658] Starting all processes...
+[2024-09-02 00:21:07,263][11658] Starting process learner_proc0
+[2024-09-02 00:21:07,312][11658] Starting all processes...
+[2024-09-02 00:21:07,316][11658] Starting process inference_proc0-0
+[2024-09-02 00:21:07,317][11658] Starting process rollout_proc0
+[2024-09-02 00:21:07,317][11658] Starting process rollout_proc1
+[2024-09-02 00:21:07,318][11658] Starting process rollout_proc2
+[2024-09-02 00:21:07,318][11658] Starting process rollout_proc3
+[2024-09-02 00:21:07,319][11658] Starting process rollout_proc4
+[2024-09-02 00:21:07,321][11658] Starting process rollout_proc5
+[2024-09-02 00:21:07,325][11658] Starting process rollout_proc6
+[2024-09-02 00:21:07,326][11658] Starting process rollout_proc7
+[2024-09-02 00:21:10,400][02982] Using GPUs [0] for process 0 (actually maps to GPUs [0])
+[2024-09-02 00:21:10,400][02982] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0
+[2024-09-02 00:21:10,412][02995] Using GPUs [0] for process 0 (actually maps to GPUs [0])
+[2024-09-02 00:21:10,412][02995] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0
+[2024-09-02 00:21:10,442][02982] Num visible devices: 1
+[2024-09-02 00:21:10,454][02995] Num visible devices: 1
+[2024-09-02 00:21:10,531][02982] Starting seed is not provided
+[2024-09-02 00:21:10,531][02982] Using GPUs [0] for process 0 (actually maps to GPUs [0])
+[2024-09-02 00:21:10,531][02982] Initializing actor-critic model on device cuda:0
+[2024-09-02 00:21:10,531][02982] RunningMeanStd input shape: (3, 72, 128)
+[2024-09-02 00:21:10,532][02982] RunningMeanStd input shape: (1,)
+[2024-09-02 00:21:10,581][02982] ConvEncoder: input_channels=3
+[2024-09-02 00:21:10,594][02996] Worker 0 uses CPU cores [0]
+[2024-09-02 00:21:10,651][03000] Worker 4 uses CPU cores [4]
+[2024-09-02 00:21:10,683][02999] Worker 3 uses CPU cores [3]
+[2024-09-02 00:21:10,759][03010] Worker 7 uses CPU cores [7]
+[2024-09-02 00:21:10,829][03002] Worker 5 uses CPU cores [5]
+[2024-09-02 00:21:10,835][02982] Conv encoder output size: 512
+[2024-09-02 00:21:10,836][02982] Policy head output size: 512
+[2024-09-02 00:21:10,856][02982] Created Actor Critic model with architecture:
+[2024-09-02 00:21:10,857][02982] ActorCriticSharedWeights(
+  (obs_normalizer): ObservationNormalizer(
+    (running_mean_std): RunningMeanStdDictInPlace(
+      (running_mean_std): ModuleDict(
+        (obs): RunningMeanStdInPlace()
+      )
+    )
+  )
+  (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace)
+  (encoder): VizdoomEncoder(
+    (basic_encoder): ConvEncoder(
+      (enc): RecursiveScriptModule(
+        original_name=ConvEncoderImpl
+        (conv_head): RecursiveScriptModule(
+          original_name=Sequential
+          (0): RecursiveScriptModule(original_name=Conv2d)
+          (1): RecursiveScriptModule(original_name=ELU)
+          (2): RecursiveScriptModule(original_name=Conv2d)
+          (3): RecursiveScriptModule(original_name=ELU)
+          (4): RecursiveScriptModule(original_name=Conv2d)
+          (5): RecursiveScriptModule(original_name=ELU)
+        )
+        (mlp_layers): RecursiveScriptModule(
+          original_name=Sequential
+          (0): RecursiveScriptModule(original_name=Linear)
+          (1): RecursiveScriptModule(original_name=ELU)
+        )
+      )
+    )
+  )
+  (core): ModelCoreRNN(
+    (core): GRU(512, 512)
+  )
+  (decoder): MlpDecoder(
+    (mlp): Identity()
+  )
+  (critic_linear): Linear(in_features=512, out_features=1, bias=True)
+  (action_parameterization): ActionParameterizationDefault(
+    (distribution_linear): Linear(in_features=512, out_features=5, bias=True)
+  )
+)
+[2024-09-02 00:21:10,889][02998] Worker 2 uses CPU cores [2]
+[2024-09-02 00:21:10,910][02997] Worker 1 uses CPU cores [1]
+[2024-09-02 00:21:11,001][03001] Worker 6 uses CPU cores [6]
+[2024-09-02 00:21:12,348][02982] Using optimizer <class 'torch.optim.adam.Adam'>
+[2024-09-02 00:21:12,349][02982] Loading state from checkpoint /home/montana/repos/deep-rl/unit8-ppo/train_dir/default_experiment/checkpoint_p0/checkpoint_000000000_0.pth...
+[2024-09-02 00:21:12,369][02982] Loading model from checkpoint
+[2024-09-02 00:21:12,371][02982] Loaded experiment state at self.train_step=0, self.env_steps=0
+[2024-09-02 00:21:12,372][02982] Initialized policy 0 weights for model version 0
+[2024-09-02 00:21:12,378][02982] LearnerWorker_p0 finished initialization!
+[2024-09-02 00:21:12,379][02982] Using GPUs [0] for process 0 (actually maps to GPUs [0])
+[2024-09-02 00:21:13,106][02995] Unhandled exception CUDA error: unknown error
+Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
+ in evt loop inference_proc0-0_evt_loop
+[2024-09-02 00:21:15,209][11658] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-02 00:21:20,209][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-02 00:21:25,209][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-02 00:21:27,204][11658] Heartbeat connected on Batcher_0
+[2024-09-02 00:21:27,208][11658] Heartbeat connected on LearnerWorker_p0
+[2024-09-02 00:21:27,216][11658] Heartbeat connected on RolloutWorker_w0
+[2024-09-02 00:21:27,218][11658] Heartbeat connected on RolloutWorker_w1
+[2024-09-02 00:21:27,224][11658] Heartbeat connected on RolloutWorker_w3
+[2024-09-02 00:21:27,226][11658] Heartbeat connected on RolloutWorker_w4
+[2024-09-02 00:21:27,229][11658] Heartbeat connected on RolloutWorker_w5
+[2024-09-02 00:21:27,232][11658] Heartbeat connected on RolloutWorker_w6
+[2024-09-02 00:21:27,240][11658] Heartbeat connected on RolloutWorker_w2
+[2024-09-02 00:21:27,262][11658] Heartbeat connected on RolloutWorker_w7
+[2024-09-02 00:21:30,209][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-02 00:21:35,209][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-02 00:21:39,062][11658] Keyboard interrupt detected in the event loop EvtLoop [Runner_EvtLoop, process=main process 11658], exiting...
+[2024-09-02 00:21:39,064][02999] Stopping RolloutWorker_w3...
+[2024-09-02 00:21:39,064][03001] Stopping RolloutWorker_w6...
+[2024-09-02 00:21:39,064][03002] Stopping RolloutWorker_w5...
+[2024-09-02 00:21:39,064][02999] Loop rollout_proc3_evt_loop terminating...
+[2024-09-02 00:21:39,064][03001] Loop rollout_proc6_evt_loop terminating...
+[2024-09-02 00:21:39,064][03002] Loop rollout_proc5_evt_loop terminating...
+[2024-09-02 00:21:39,064][03010] Stopping RolloutWorker_w7...
+[2024-09-02 00:21:39,064][03000] Stopping RolloutWorker_w4...
+[2024-09-02 00:21:39,065][03010] Loop rollout_proc7_evt_loop terminating...
+[2024-09-02 00:21:39,065][02997] Stopping RolloutWorker_w1...
+[2024-09-02 00:21:39,065][02982] Stopping Batcher_0...
+[2024-09-02 00:21:39,065][02997] Loop rollout_proc1_evt_loop terminating...
+[2024-09-02 00:21:39,065][03000] Loop rollout_proc4_evt_loop terminating...
+[2024-09-02 00:21:39,065][02982] Loop batcher_evt_loop terminating...
+[2024-09-02 00:21:39,067][02982] Saving /home/montana/repos/deep-rl/unit8-ppo/train_dir/default_experiment/checkpoint_p0/checkpoint_000000000_0.pth...
+[2024-09-02 00:21:39,064][11658] Runner profile tree view:
+main_loop: 31.8019
+[2024-09-02 00:21:39,075][11658] Collected {0: 0}, FPS: 0.0
+[2024-09-02 00:21:39,079][02998] Stopping RolloutWorker_w2...
+[2024-09-02 00:21:39,080][02998] Loop rollout_proc2_evt_loop terminating...
+[2024-09-02 00:21:39,089][02996] Stopping RolloutWorker_w0...
+[2024-09-02 00:21:39,110][02996] Loop rollout_proc0_evt_loop terminating...
+[2024-09-02 00:21:39,177][02982] Stopping LearnerWorker_p0...
+[2024-09-02 00:21:39,178][02982] Loop learner_proc0_evt_loop terminating...
+[2024-09-02 00:22:45,206][11658] Environment doom_basic already registered, overwriting...
+[2024-09-02 00:22:45,208][11658] Environment doom_two_colors_easy already registered, overwriting...
+[2024-09-02 00:22:45,209][11658] Environment doom_two_colors_hard already registered, overwriting...
+[2024-09-02 00:22:45,210][11658] Environment doom_dm already registered, overwriting...
+[2024-09-02 00:22:45,211][11658] Environment doom_dwango5 already registered, overwriting...
+[2024-09-02 00:22:45,213][11658] Environment doom_my_way_home_flat_actions already registered, overwriting...
+[2024-09-02 00:22:45,213][11658] Environment doom_defend_the_center_flat_actions already registered, overwriting...
+[2024-09-02 00:22:45,214][11658] Environment doom_my_way_home already registered, overwriting...
+[2024-09-02 00:22:45,216][11658] Environment doom_deadly_corridor already registered, overwriting...
+[2024-09-02 00:22:45,217][11658] Environment doom_defend_the_center already registered, overwriting...
+[2024-09-02 00:22:45,218][11658] Environment doom_defend_the_line already registered, overwriting...
+[2024-09-02 00:22:45,219][11658] Environment doom_health_gathering already registered, overwriting...
+[2024-09-02 00:22:45,222][11658] Environment doom_health_gathering_supreme already registered, overwriting...
+[2024-09-02 00:22:45,223][11658] Environment doom_battle already registered, overwriting...
+[2024-09-02 00:22:45,223][11658] Environment doom_battle2 already registered, overwriting...
+[2024-09-02 00:22:45,226][11658] Environment doom_duel_bots already registered, overwriting...
+[2024-09-02 00:22:45,227][11658] Environment doom_deathmatch_bots already registered, overwriting...
+[2024-09-02 00:22:45,228][11658] Environment doom_duel already registered, overwriting...
+[2024-09-02 00:22:45,229][11658] Environment doom_deathmatch_full already registered, overwriting...
+[2024-09-02 00:22:45,230][11658] Environment doom_benchmark already registered, overwriting...
+[2024-09-02 00:22:45,231][11658] register_encoder_factory: <function make_vizdoom_encoder at 0x7ff4f61fe7a0>
+[2024-09-02 00:22:45,247][11658] Loading existing experiment configuration from /home/montana/repos/deep-rl/unit8-ppo/train_dir/default_experiment/config.json
+[2024-09-02 00:22:45,253][11658] Experiment dir /home/montana/repos/deep-rl/unit8-ppo/train_dir/default_experiment already exists!
+[2024-09-02 00:22:45,254][11658] Resuming existing experiment from /home/montana/repos/deep-rl/unit8-ppo/train_dir/default_experiment...
+[2024-09-02 00:22:45,255][11658] Weights and Biases integration disabled
+[2024-09-02 00:22:45,257][11658] Environment var CUDA_VISIBLE_DEVICES is 1
+[2024-09-02 00:22:46,924][11658] Starting experiment with the following configuration:
+help=False
+algo=APPO
+env=doom_health_gathering_supreme
+experiment=default_experiment
+train_dir=/home/montana/repos/deep-rl/unit8-ppo/train_dir
+restart_behavior=resume
+device=gpu
+seed=None
+num_policies=1
+async_rl=True
+serial_mode=False
+batched_sampling=False
+num_batches_to_accumulate=2
+worker_num_splits=2
+policy_workers_per_policy=1
+max_policy_lag=1000
+num_workers=8
+num_envs_per_worker=4
+batch_size=1024
+num_batches_per_epoch=1
+num_epochs=1
+rollout=32
+recurrence=32
+shuffle_minibatches=False
+gamma=0.99
+reward_scale=1.0
+reward_clip=1000.0
+value_bootstrap=False
+normalize_returns=True
+exploration_loss_coeff=0.001
+value_loss_coeff=0.5
+kl_loss_coeff=0.0
+exploration_loss=symmetric_kl
+gae_lambda=0.95
+ppo_clip_ratio=0.1
+ppo_clip_value=0.2
+with_vtrace=False
+vtrace_rho=1.0
+vtrace_c=1.0
+optimizer=adam
+adam_eps=1e-06
+adam_beta1=0.9
+adam_beta2=0.999
+max_grad_norm=4.0
+learning_rate=0.0001
+lr_schedule=constant
+lr_schedule_kl_threshold=0.008
+lr_adaptive_min=1e-06
+lr_adaptive_max=0.01
+obs_subtract_mean=0.0
+obs_scale=255.0
+normalize_input=True
+normalize_input_keys=None
+decorrelate_experience_max_seconds=0
+decorrelate_envs_on_one_worker=True
+actor_worker_gpus=[]
+set_workers_cpu_affinity=True
+force_envs_single_thread=False
+default_niceness=0
+log_to_file=True
+experiment_summaries_interval=10
+flush_summaries_interval=30
+stats_avg=100
+summaries_use_frameskip=True
+heartbeat_interval=20
+heartbeat_reporting_interval=600
+train_for_env_steps=4000000
+train_for_seconds=10000000000
+save_every_sec=120
+keep_checkpoints=2
+load_checkpoint_kind=latest
+save_milestones_sec=-1
+save_best_every_sec=5
+save_best_metric=reward
+save_best_after=100000
+benchmark=False
+encoder_mlp_layers=[512, 512]
+encoder_conv_architecture=convnet_simple
+encoder_conv_mlp_layers=[512]
+use_rnn=True
+rnn_size=512
+rnn_type=gru
+rnn_num_layers=1
+decoder_mlp_layers=[]
+nonlinearity=elu
+policy_initialization=orthogonal
+policy_init_gain=1.0
+actor_critic_share_weights=True
+adaptive_stddev=True
+continuous_tanh_scale=0.0
+initial_stddev=1.0
+use_env_info_cache=False
+env_gpu_actions=False
+env_gpu_observations=True
+env_frameskip=4
+env_framestack=1
+pixel_format=CHW
+use_record_episode_statistics=False
+with_wandb=False
+wandb_user=None
+wandb_project=sample_factory
+wandb_group=None
+wandb_job_type=SF
+wandb_tags=[]
+with_pbt=False
+pbt_mix_policies_in_one_env=True
+pbt_period_env_steps=5000000
+pbt_start_mutation=20000000
+pbt_replace_fraction=0.3
+pbt_mutation_rate=0.15
+pbt_replace_reward_gap=0.1
+pbt_replace_reward_gap_absolute=1e-06
+pbt_optimize_gamma=False
+pbt_target_objective=true_objective
+pbt_perturb_min=1.1
+pbt_perturb_max=1.5
+num_agents=-1
+num_humans=0
+num_bots=-1
+start_bot_difficulty=None
+timelimit=None
+res_w=128
+res_h=72
+wide_aspect_ratio=False
+eval_env_frameskip=1
+fps=35
+command_line=--env=doom_health_gathering_supreme --num_workers=8 --num_envs_per_worker=4 --train_for_env_steps=4000000
+cli_args={'env': 'doom_health_gathering_supreme', 'num_workers': 8, 'num_envs_per_worker': 4, 'train_for_env_steps': 4000000}
+git_hash=e923ca6811d177eb3a7a4b268a75d06335cade44
+git_repo_name=https://github.com/monti-python/deep-rl.git
+[2024-09-02 00:22:46,925][11658] Saving configuration to /home/montana/repos/deep-rl/unit8-ppo/train_dir/default_experiment/config.json...
+[2024-09-02 00:22:46,929][11658] Rollout worker 0 uses device cpu
+[2024-09-02 00:22:46,930][11658] Rollout worker 1 uses device cpu
+[2024-09-02 00:22:46,930][11658] Rollout worker 2 uses device cpu
+[2024-09-02 00:22:46,931][11658] Rollout worker 3 uses device cpu
+[2024-09-02 00:22:46,932][11658] Rollout worker 4 uses device cpu
+[2024-09-02 00:22:46,932][11658] Rollout worker 5 uses device cpu
+[2024-09-02 00:22:46,933][11658] Rollout worker 6 uses device cpu
+[2024-09-02 00:22:46,934][11658] Rollout worker 7 uses device cpu
+[2024-09-02 00:22:46,977][11658] Using GPUs [0] for process 0 (actually maps to GPUs [1])
+[2024-09-02 00:22:46,978][11658] InferenceWorker_p0-w0: min num requests: 2
+[2024-09-02 00:22:47,002][11658] Starting all processes...
+[2024-09-02 00:22:47,003][11658] Starting process learner_proc0
+[2024-09-02 00:22:47,052][11658] Starting all processes...
+[2024-09-02 00:22:47,058][11658] Starting process inference_proc0-0
+[2024-09-02 00:22:47,059][11658] Starting process rollout_proc0
+[2024-09-02 00:22:47,059][11658] Starting process rollout_proc1
+[2024-09-02 00:22:47,060][11658] Starting process rollout_proc2
+[2024-09-02 00:22:47,060][11658] Starting process rollout_proc3
+[2024-09-02 00:22:47,061][11658] Starting process rollout_proc4
+[2024-09-02 00:22:47,064][11658] Starting process rollout_proc5
+[2024-09-02 00:22:47,064][11658] Starting process rollout_proc6
+[2024-09-02 00:22:47,067][11658] Starting process rollout_proc7
+[2024-09-02 00:22:49,676][03756] Worker 3 uses CPU cores [3]
+[2024-09-02 00:22:49,779][03754] Worker 1 uses CPU cores [1]
+[2024-09-02 00:22:49,779][03755] Worker 2 uses CPU cores [2]
+[2024-09-02 00:22:50,035][03752] Using GPUs [0] for process 0 (actually maps to GPUs [1])
+[2024-09-02 00:22:50,035][03752] Set environment var CUDA_VISIBLE_DEVICES to '1' (GPU indices [0]) for inference process 0
+[2024-09-02 00:22:50,058][03752] Num visible devices: 0
+[2024-09-02 00:22:50,062][03757] Worker 4 uses CPU cores [4]
+[2024-09-02 00:22:50,102][03739] Using GPUs [0] for process 0 (actually maps to GPUs [1])
+[2024-09-02 00:22:50,103][03739] Set environment var CUDA_VISIBLE_DEVICES to '1' (GPU indices [0]) for learning process 0
+[2024-09-02 00:22:50,119][03766] Worker 5 uses CPU cores [5]
+[2024-09-02 00:22:50,124][03739] Num visible devices: 0
+[2024-09-02 00:22:50,160][03739] Starting seed is not provided
+[2024-09-02 00:22:50,161][03739] Using GPUs [0] for process 0 (actually maps to GPUs [1])
+[2024-09-02 00:22:50,161][03739] Initializing actor-critic model on device cuda:0
+[2024-09-02 00:22:50,161][03739] RunningMeanStd input shape: (3, 72, 128)
+[2024-09-02 00:22:50,163][03739] RunningMeanStd input shape: (1,)
+[2024-09-02 00:22:50,179][03739] ConvEncoder: input_channels=3
+[2024-09-02 00:22:50,233][03758] Worker 6 uses CPU cores [6]
+[2024-09-02 00:22:50,247][03767] Worker 7 uses CPU cores [7]
+[2024-09-02 00:22:50,247][03753] Worker 0 uses CPU cores [0]
+[2024-09-02 00:22:50,335][03739] Conv encoder output size: 512
+[2024-09-02 00:22:50,336][03739] Policy head output size: 512
+[2024-09-02 00:22:50,348][03739] Created Actor Critic model with architecture:
+[2024-09-02 00:22:50,349][03739] ActorCriticSharedWeights(
+  (obs_normalizer): ObservationNormalizer(
+    (running_mean_std): RunningMeanStdDictInPlace(
+      (running_mean_std): ModuleDict(
+        (obs): RunningMeanStdInPlace()
+      )
+    )
+  )
+  (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace)
+  (encoder): VizdoomEncoder(
+    (basic_encoder): ConvEncoder(
+      (enc): RecursiveScriptModule(
+        original_name=ConvEncoderImpl
+        (conv_head): RecursiveScriptModule(
+          original_name=Sequential
+          (0): RecursiveScriptModule(original_name=Conv2d)
+          (1): RecursiveScriptModule(original_name=ELU)
+          (2): RecursiveScriptModule(original_name=Conv2d)
+          (3): RecursiveScriptModule(original_name=ELU)
+          (4): RecursiveScriptModule(original_name=Conv2d)
+          (5): RecursiveScriptModule(original_name=ELU)
+        )
+        (mlp_layers): RecursiveScriptModule(
+          original_name=Sequential
+          (0): RecursiveScriptModule(original_name=Linear)
+          (1): RecursiveScriptModule(original_name=ELU)
+        )
+      )
+    )
+  )
+  (core): ModelCoreRNN(
+    (core): GRU(512, 512)
+  )
+  (decoder): MlpDecoder(
+    (mlp): Identity()
+  )
+  (critic_linear): Linear(in_features=512, out_features=1, bias=True)
+  (action_parameterization): ActionParameterizationDefault(
+    (distribution_linear): Linear(in_features=512, out_features=5, bias=True)
+  )
+)
+[2024-09-02 00:22:50,357][03739] EvtLoop [learner_proc0_evt_loop, process=learner_proc0] unhandled exception in slot='init' connected to emitter=Emitter(object_id='Runner_EvtLoop', signal_name='start'), args=()
+Traceback (most recent call last):
+  File "/home/montana/.local/lib/python3.10/site-packages/signal_slot/signal_slot.py", line 355, in _process_signal
+    slot_callable(*args)
+  File "/home/montana/.local/lib/python3.10/site-packages/sample_factory/algo/learning/learner_worker.py", line 139, in init
+    init_model_data = self.learner.init()
+  File "/home/montana/.local/lib/python3.10/site-packages/sample_factory/algo/learning/learner.py", line 215, in init
+    self.actor_critic.model_to_device(self.device)
+  File "/home/montana/.local/lib/python3.10/site-packages/sample_factory/model/actor_critic.py", line 60, in model_to_device
+    module.to(device)
+  File "/home/montana/miniconda3/envs/deep-rl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1145, in to
+    return self._apply(convert)
+  File "/home/montana/miniconda3/envs/deep-rl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 797, in _apply
+    module._apply(fn)
+  File "/home/montana/miniconda3/envs/deep-rl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 797, in _apply
+    module._apply(fn)
+  File "/home/montana/miniconda3/envs/deep-rl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 797, in _apply
+    module._apply(fn)
+  File "/home/montana/miniconda3/envs/deep-rl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 844, in _apply
+    self._buffers[key] = fn(buf)
+  File "/home/montana/miniconda3/envs/deep-rl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1143, in convert
+    return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
+  File "/home/montana/miniconda3/envs/deep-rl/lib/python3.10/site-packages/torch/cuda/__init__.py", line 247, in _lazy_init
+    torch._C._cuda_init()
+RuntimeError: No CUDA GPUs are available
+[2024-09-02 00:22:50,360][03739] Unhandled exception No CUDA GPUs are available in evt loop learner_proc0_evt_loop
+[2024-09-02 00:23:06,971][11658] Heartbeat connected on Batcher_0
+[2024-09-02 00:23:06,978][11658] Heartbeat connected on InferenceWorker_p0-w0
+[2024-09-02 00:23:06,983][11658] Heartbeat connected on RolloutWorker_w0
+[2024-09-02 00:23:06,985][11658] Heartbeat connected on RolloutWorker_w1
+[2024-09-02 00:23:06,987][11658] Heartbeat connected on RolloutWorker_w2
+[2024-09-02 00:23:06,989][11658] Heartbeat connected on RolloutWorker_w3
+[2024-09-02 00:23:06,992][11658] Heartbeat connected on RolloutWorker_w4
+[2024-09-02 00:23:06,995][11658] Heartbeat connected on RolloutWorker_w5
+[2024-09-02 00:23:07,000][11658] Heartbeat connected on RolloutWorker_w6
+[2024-09-02 00:23:07,030][11658] Heartbeat connected on RolloutWorker_w7
+[2024-09-02 00:23:09,252][11658] Keyboard interrupt detected in the event loop EvtLoop [Runner_EvtLoop, process=main process 11658], exiting...
+[2024-09-02 00:23:09,254][03754] Stopping RolloutWorker_w1...
+[2024-09-02 00:23:09,254][03757] Stopping RolloutWorker_w4...
+[2024-09-02 00:23:09,254][03752] Stopping InferenceWorker_p0-w0...
+[2024-09-02 00:23:09,254][03754] Loop rollout_proc1_evt_loop terminating...
+[2024-09-02 00:23:09,254][03739] Stopping Batcher_0...
+[2024-09-02 00:23:09,254][03752] Loop inference_proc0-0_evt_loop terminating...
+[2024-09-02 00:23:09,254][03757] Loop rollout_proc4_evt_loop terminating...
+[2024-09-02 00:23:09,254][03753] Stopping RolloutWorker_w0...
+[2024-09-02 00:23:09,254][03739] Loop batcher_evt_loop terminating...
+[2024-09-02 00:23:09,254][03756] Stopping RolloutWorker_w3...
+[2024-09-02 00:23:09,253][11658] Runner profile tree view:
+main_loop: 22.2519
+[2024-09-02 00:23:09,254][03767] Stopping RolloutWorker_w7...
+[2024-09-02 00:23:09,255][03767] Loop rollout_proc7_evt_loop terminating...
+[2024-09-02 00:23:09,255][03753] Loop rollout_proc0_evt_loop terminating...
+[2024-09-02 00:23:09,255][03758] Stopping RolloutWorker_w6...
+[2024-09-02 00:23:09,255][03758] Loop rollout_proc6_evt_loop terminating...
+[2024-09-02 00:23:09,254][11658] Collected {}, FPS: 0.0
+[2024-09-02 00:23:09,255][03756] Loop rollout_proc3_evt_loop terminating...
+[2024-09-02 00:23:09,265][03766] Stopping RolloutWorker_w5...
+[2024-09-02 00:23:09,260][03755] Stopping RolloutWorker_w2...
+[2024-09-02 00:23:09,265][03766] Loop rollout_proc5_evt_loop terminating...
+[2024-09-02 00:23:09,266][03755] Loop rollout_proc2_evt_loop terminating...
+[2024-09-02 00:44:05,383][13975] Saving configuration to /home/montana/repos/deep-rl/unit8-ppo/train_dir/default_experiment/config.json...
+[2024-09-02 00:44:05,390][13975] Rollout worker 0 uses device cpu
+[2024-09-02 00:44:05,391][13975] Rollout worker 1 uses device cpu
+[2024-09-02 00:44:05,392][13975] Rollout worker 2 uses device cpu
+[2024-09-02 00:44:05,393][13975] Rollout worker 3 uses device cpu
+[2024-09-02 00:44:05,394][13975] Rollout worker 4 uses device cpu
+[2024-09-02 00:44:05,395][13975] Rollout worker 5 uses device cpu
+[2024-09-02 00:44:05,395][13975] Rollout worker 6 uses device cpu
+[2024-09-02 00:44:05,396][13975] Rollout worker 7 uses device cpu
+[2024-09-02 00:44:05,467][13975] Using GPUs [0] for process 0 (actually maps to GPUs [0])
+[2024-09-02 00:44:05,468][13975] InferenceWorker_p0-w0: min num requests: 2
+[2024-09-02 00:44:05,491][13975] Starting all processes...
+[2024-09-02 00:44:05,492][13975] Starting process learner_proc0
+[2024-09-02 00:44:05,601][13975] Starting all processes...
+[2024-09-02 00:44:05,610][13975] Starting process inference_proc0-0
+[2024-09-02 00:44:05,610][13975] Starting process rollout_proc0
+[2024-09-02 00:44:05,611][13975] Starting process rollout_proc1
+[2024-09-02 00:44:05,612][13975] Starting process rollout_proc2
+[2024-09-02 00:44:05,613][13975] Starting process rollout_proc3
+[2024-09-02 00:44:05,614][13975] Starting process rollout_proc4
+[2024-09-02 00:44:05,614][13975] Starting process rollout_proc5
+[2024-09-02 00:44:05,615][13975] Starting process rollout_proc6
+[2024-09-02 00:44:05,616][13975] Starting process rollout_proc7
+[2024-09-02 00:44:16,282][14226] Worker 2 uses CPU cores [2]
+[2024-09-02 00:44:16,391][14237] Worker 5 uses CPU cores [5]
+[2024-09-02 00:44:16,400][14210] Using GPUs [0] for process 0 (actually maps to GPUs [0])
+[2024-09-02 00:44:16,401][14210] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0
+[2024-09-02 00:44:16,412][14225] Worker 1 uses CPU cores [1]
+[2024-09-02 00:44:16,534][14227] Worker 3 uses CPU cores [3]
+[2024-09-02 00:44:16,599][14228] Worker 4 uses CPU cores [4]
+[2024-09-02 00:44:16,638][14238] Worker 7 uses CPU cores [7]
+[2024-09-02 00:44:16,719][14210] Num visible devices: 1
+[2024-09-02 00:44:16,759][14224] Worker 0 uses CPU cores [0]
+[2024-09-02 00:44:16,771][14210] Starting seed is not provided
+[2024-09-02 00:44:16,771][14210] Using GPUs [0] for process 0 (actually maps to GPUs [0])
+[2024-09-02 00:44:16,771][14210] Initializing actor-critic model on device cuda:0
+[2024-09-02 00:44:16,772][14210] RunningMeanStd input shape: (3, 72, 128)
+[2024-09-02 00:44:16,774][14210] RunningMeanStd input shape: (1,)
+[2024-09-02 00:44:16,789][14229] Worker 6 uses CPU cores [6]
+[2024-09-02 00:44:16,801][14210] ConvEncoder: input_channels=3
+[2024-09-02 00:44:16,828][14223] Using GPUs [0] for process 0 (actually maps to GPUs [0])
+[2024-09-02 00:44:16,828][14223] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0
+[2024-09-02 00:44:16,907][14223] Num visible devices: 1
+[2024-09-02 00:44:17,064][14210] Conv encoder output size: 512
+[2024-09-02 00:44:17,065][14210] Policy head output size: 512
+[2024-09-02 00:44:17,097][14210] Created Actor Critic model with architecture:
+[2024-09-02 00:44:17,097][14210] ActorCriticSharedWeights(
+  (obs_normalizer): ObservationNormalizer(
+    (running_mean_std): RunningMeanStdDictInPlace(
+      (running_mean_std): ModuleDict(
+        (obs): RunningMeanStdInPlace()
+      )
+    )
+  )
+  (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace)
+  (encoder): VizdoomEncoder(
+    (basic_encoder): ConvEncoder(
+      (enc): RecursiveScriptModule(
+        original_name=ConvEncoderImpl
+        (conv_head): RecursiveScriptModule(
+          original_name=Sequential
+          (0): RecursiveScriptModule(original_name=Conv2d)
+          (1): RecursiveScriptModule(original_name=ELU)
+          (2): RecursiveScriptModule(original_name=Conv2d)
+          (3): RecursiveScriptModule(original_name=ELU)
+          (4): RecursiveScriptModule(original_name=Conv2d)
+          (5): RecursiveScriptModule(original_name=ELU)
+        )
+        (mlp_layers): RecursiveScriptModule(
+          original_name=Sequential
+          (0): RecursiveScriptModule(original_name=Linear)
+          (1): RecursiveScriptModule(original_name=ELU)
+        )
+      )
+    )
+  )
+  (core): ModelCoreRNN(
+    (core): GRU(512, 512)
+  )
+  (decoder): MlpDecoder(
+    (mlp): Identity()
+  )
+  (critic_linear): Linear(in_features=512, out_features=1, bias=True)
+  (action_parameterization): ActionParameterizationDefault(
+    (distribution_linear): Linear(in_features=512, out_features=5, bias=True)
+  )
+)
+[2024-09-02 00:44:25,462][13975] Heartbeat connected on Batcher_0
+[2024-09-02 00:44:25,468][13975] Heartbeat connected on InferenceWorker_p0-w0
+[2024-09-02 00:44:25,473][13975] Heartbeat connected on RolloutWorker_w0
+[2024-09-02 00:44:25,476][13975] Heartbeat connected on RolloutWorker_w1
+[2024-09-02 00:44:25,478][13975] Heartbeat connected on RolloutWorker_w2
+[2024-09-02 00:44:25,485][13975] Heartbeat connected on RolloutWorker_w5
+[2024-09-02 00:44:25,488][13975] Heartbeat connected on RolloutWorker_w6
+[2024-09-02 00:44:25,491][13975] Heartbeat connected on RolloutWorker_w7
+[2024-09-02 00:44:25,500][13975] Heartbeat connected on RolloutWorker_w4
+[2024-09-02 00:44:25,519][13975] Heartbeat connected on RolloutWorker_w3
+[2024-09-02 00:44:40,951][14210] Using optimizer <class 'torch.optim.adam.Adam'>
+[2024-09-02 00:44:40,953][14210] Loading state from checkpoint /home/montana/repos/deep-rl/unit8-ppo/train_dir/default_experiment/checkpoint_p0/checkpoint_000000000_0.pth...
+[2024-09-02 00:44:41,217][14210] Loading model from checkpoint
+[2024-09-02 00:44:41,218][14210] Loaded experiment state at self.train_step=0, self.env_steps=0
+[2024-09-02 00:44:41,224][14210] Initialized policy 0 weights for model version 0
+[2024-09-02 00:44:41,233][14210] LearnerWorker_p0 finished initialization!
+[2024-09-02 00:44:41,234][14210] Using GPUs [0] for process 0 (actually maps to GPUs [0])
+[2024-09-02 00:44:41,235][13975] Heartbeat connected on LearnerWorker_p0
+[2024-09-02 00:44:42,424][13975] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-02 00:44:47,424][13975] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-02 00:44:50,417][14223] Unhandled exception CUDA error: unknown error
+CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
+For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
+Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
+ in evt loop inference_proc0-0_evt_loop
+[2024-09-02 00:44:52,424][13975] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-02 00:44:55,291][13975] Keyboard interrupt detected in the event loop EvtLoop [Runner_EvtLoop, process=main process 13975], exiting...
+[2024-09-02 00:44:55,293][14226] Stopping RolloutWorker_w2...
+[2024-09-02 00:44:55,293][14228] Stopping RolloutWorker_w4...
+[2024-09-02 00:44:55,293][14228] Loop rollout_proc4_evt_loop terminating...
+[2024-09-02 00:44:55,293][14238] Stopping RolloutWorker_w7...
+[2024-09-02 00:44:55,293][14226] Loop rollout_proc2_evt_loop terminating...
+[2024-09-02 00:44:55,293][14210] Stopping Batcher_0...
+[2024-09-02 00:44:55,293][14229] Stopping RolloutWorker_w6...
+[2024-09-02 00:44:55,293][14237] Stopping RolloutWorker_w5...
+[2024-09-02 00:44:55,293][13975] Runner profile tree view:
+main_loop: 49.8020
+[2024-09-02 00:44:55,294][14229] Loop rollout_proc6_evt_loop terminating...
+[2024-09-02 00:44:55,294][14210] Loop batcher_evt_loop terminating...
+[2024-09-02 00:44:55,294][14238] Loop rollout_proc7_evt_loop terminating...
+[2024-09-02 00:44:55,294][14224] Stopping RolloutWorker_w0...
+[2024-09-02 00:44:55,294][14225] Stopping RolloutWorker_w1...
+[2024-09-02 00:44:55,294][13975] Collected {0: 0}, FPS: 0.0
+[2024-09-02 00:44:55,295][14225] Loop rollout_proc1_evt_loop terminating...
+[2024-09-02 00:44:55,294][14237] Loop rollout_proc5_evt_loop terminating...
+[2024-09-02 00:44:55,295][14210] Saving /home/montana/repos/deep-rl/unit8-ppo/train_dir/default_experiment/checkpoint_p0/checkpoint_000000000_0.pth...
+[2024-09-02 00:44:55,295][14224] Loop rollout_proc0_evt_loop terminating...
+[2024-09-02 00:44:55,299][14227] Stopping RolloutWorker_w3...
+[2024-09-02 00:44:55,300][14227] Loop rollout_proc3_evt_loop terminating...
+[2024-09-02 00:44:55,410][14210] Stopping LearnerWorker_p0...
+[2024-09-02 00:44:55,411][14210] Loop learner_proc0_evt_loop terminating...
+[2024-09-02 00:48:10,686][15596] Saving configuration to /home/montana/repos/deep-rl/unit8-ppo/train_dir/default_experiment/config.json...
+[2024-09-02 00:48:10,692][15596] Rollout worker 0 uses device cpu
+[2024-09-02 00:48:10,693][15596] Rollout worker 1 uses device cpu
+[2024-09-02 00:48:10,694][15596] Rollout worker 2 uses device cpu
+[2024-09-02 00:48:10,695][15596] Rollout worker 3 uses device cpu
+[2024-09-02 00:48:10,696][15596] Rollout worker 4 uses device cpu
+[2024-09-02 00:48:10,697][15596] Rollout worker 5 uses device cpu
+[2024-09-02 00:48:10,697][15596] Rollout worker 6 uses device cpu
+[2024-09-02 00:48:10,698][15596] Rollout worker 7 uses device cpu
+[2024-09-02 00:48:10,741][15596] Using GPUs [0] for process 0 (actually maps to GPUs [0])
+[2024-09-02 00:48:10,742][15596] InferenceWorker_p0-w0: min num requests: 2
+[2024-09-02 00:48:10,764][15596] Starting all processes...
+[2024-09-02 00:48:10,765][15596] Starting process learner_proc0
+[2024-09-02 00:48:10,837][15596] Starting all processes...
+[2024-09-02 00:48:10,844][15596] Starting process inference_proc0-0
+[2024-09-02 00:48:10,846][15596] Starting process rollout_proc0
+[2024-09-02 00:48:10,846][15596] Starting process rollout_proc1
+[2024-09-02 00:48:10,847][15596] Starting process rollout_proc2
+[2024-09-02 00:48:10,848][15596] Starting process rollout_proc3
+[2024-09-02 00:48:10,849][15596] Starting process rollout_proc4
+[2024-09-02 00:48:10,849][15596] Starting process rollout_proc5
+[2024-09-02 00:48:10,849][15596] Starting process rollout_proc6
+[2024-09-02 00:48:10,849][15596] Starting process rollout_proc7
+[2024-09-02 00:48:13,462][15849] Using GPUs [0] for process 0 (actually maps to GPUs [0])
+[2024-09-02 00:48:13,462][15849] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0
+[2024-09-02 00:48:13,539][15849] Num visible devices: 1
+[2024-09-02 00:48:13,613][15849] Starting seed is not provided
+[2024-09-02 00:48:13,614][15849] Using GPUs [0] for process 0 (actually maps to GPUs [0])
+[2024-09-02 00:48:13,614][15849] Initializing actor-critic model on device cuda:0
+[2024-09-02 00:48:13,614][15849] RunningMeanStd input shape: (3, 72, 128)
+[2024-09-02 00:48:13,615][15849] RunningMeanStd input shape: (1,)
+[2024-09-02 00:48:13,619][15863] Worker 0 uses CPU cores [0]
+[2024-09-02 00:48:13,652][15849] ConvEncoder: input_channels=3
+[2024-09-02 00:48:13,709][15866] Worker 3 uses CPU cores [3]
+[2024-09-02 00:48:13,897][15849] Conv encoder output size: 512
+[2024-09-02 00:48:13,898][15849] Policy head output size: 512
+[2024-09-02 00:48:13,919][15849] Created Actor Critic model with architecture:
+[2024-09-02 00:48:13,919][15849] ActorCriticSharedWeights(
+  (obs_normalizer): ObservationNormalizer(
+    (running_mean_std): RunningMeanStdDictInPlace(
+      (running_mean_std): ModuleDict(
+        (obs): RunningMeanStdInPlace()
+      )
+    )
+  )
+  (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace)
+  (encoder): VizdoomEncoder(
+    (basic_encoder): ConvEncoder(
+      (enc): RecursiveScriptModule(
+        original_name=ConvEncoderImpl
+        (conv_head): RecursiveScriptModule(
+          original_name=Sequential
+          (0): RecursiveScriptModule(original_name=Conv2d)
+          (1): RecursiveScriptModule(original_name=ELU)
+          (2): RecursiveScriptModule(original_name=Conv2d)
+          (3): RecursiveScriptModule(original_name=ELU)
+          (4): RecursiveScriptModule(original_name=Conv2d)
+          (5): RecursiveScriptModule(original_name=ELU)
+        )
+        (mlp_layers): RecursiveScriptModule(
+          original_name=Sequential
+          (0): RecursiveScriptModule(original_name=Linear)
+          (1): RecursiveScriptModule(original_name=ELU)
+        )
+      )
+    )
+  )
+  (core): ModelCoreRNN(
+    (core): GRU(512, 512)
+  )
+  (decoder): MlpDecoder(
+    (mlp): Identity()
+  )
+  (critic_linear): Linear(in_features=512, out_features=1, bias=True)
+  (action_parameterization): ActionParameterizationDefault(
+    (distribution_linear): Linear(in_features=512, out_features=5, bias=True)
+  )
+)
+[2024-09-02 00:48:14,019][15864] Worker 1 uses CPU cores [1]
+[2024-09-02 00:48:14,071][15869] Worker 6 uses CPU cores [6]
+[2024-09-02 00:48:14,139][15865] Worker 2 uses CPU cores [2]
+[2024-09-02 00:48:14,149][15868] Worker 5 uses CPU cores [5]
+[2024-09-02 00:48:14,164][15867] Worker 4 uses CPU cores [4]
+[2024-09-02 00:48:14,229][15870] Worker 7 uses CPU cores [7]
+[2024-09-02 00:48:14,291][15862] Using GPUs [0] for process 0 (actually maps to GPUs [0])
+[2024-09-02 00:48:14,291][15862] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0
+[2024-09-02 00:48:14,310][15862] Num visible devices: 1
+[2024-09-02 00:48:17,232][15849] Using optimizer <class 'torch.optim.adam.Adam'>
+[2024-09-02 00:48:17,233][15849] Loading state from checkpoint /home/montana/repos/deep-rl/unit8-ppo/train_dir/default_experiment/checkpoint_p0/checkpoint_000000000_0.pth...
+[2024-09-02 00:48:17,259][15849] Loading model from checkpoint
+[2024-09-02 00:48:17,261][15849] Loaded experiment state at self.train_step=0, self.env_steps=0
+[2024-09-02 00:48:17,261][15849] Initialized policy 0 weights for model version 0
+[2024-09-02 00:48:17,267][15849] LearnerWorker_p0 finished initialization!
+[2024-09-02 00:48:17,267][15849] Using GPUs [0] for process 0 (actually maps to GPUs [0])
+[2024-09-02 00:48:17,296][15596] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-02 00:48:20,485][15862] Unhandled exception CUDA error: unknown error
+CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
+For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
+Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
+ in evt loop inference_proc0-0_evt_loop
+[2024-09-02 00:48:22,296][15596] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-02 00:48:27,295][15596] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-02 00:48:30,736][15596] Heartbeat connected on Batcher_0
+[2024-09-02 00:48:30,738][15596] Heartbeat connected on LearnerWorker_p0
+[2024-09-02 00:48:30,747][15596] Heartbeat connected on RolloutWorker_w0
+[2024-09-02 00:48:30,750][15596] Heartbeat connected on RolloutWorker_w1
+[2024-09-02 00:48:30,754][15596] Heartbeat connected on RolloutWorker_w3
+[2024-09-02 00:48:30,756][15596] Heartbeat connected on RolloutWorker_w4
+[2024-09-02 00:48:30,759][15596] Heartbeat connected on RolloutWorker_w5
+[2024-09-02 00:48:30,760][15596] Heartbeat connected on RolloutWorker_w2
+[2024-09-02 00:48:30,762][15596] Heartbeat connected on RolloutWorker_w6
+[2024-09-02 00:48:30,764][15596] Heartbeat connected on RolloutWorker_w7
+[2024-09-02 00:48:32,295][15596] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2024-09-02 00:48:34,014][15596] Keyboard interrupt detected in the event loop EvtLoop [Runner_EvtLoop, process=main process 15596], exiting...
+[2024-09-02 00:48:34,015][15869] Stopping RolloutWorker_w6...
+[2024-09-02 00:48:34,015][15865] Stopping RolloutWorker_w2...
+[2024-09-02 00:48:34,015][15864] Stopping RolloutWorker_w1...
+[2024-09-02 00:48:34,015][15866] Stopping RolloutWorker_w3...
+[2024-09-02 00:48:34,016][15869] Loop rollout_proc6_evt_loop terminating...
+[2024-09-02 00:48:34,016][15865] Loop rollout_proc2_evt_loop terminating...
+[2024-09-02 00:48:34,016][15868] Stopping RolloutWorker_w5...
+[2024-09-02 00:48:34,016][15864] Loop rollout_proc1_evt_loop terminating...
+[2024-09-02 00:48:34,016][15866] Loop rollout_proc3_evt_loop terminating...
+[2024-09-02 00:48:34,016][15849] Stopping Batcher_0...
+[2024-09-02 00:48:34,016][15868] Loop rollout_proc5_evt_loop terminating...
+[2024-09-02 00:48:34,016][15849] Loop batcher_evt_loop terminating...
+[2024-09-02 00:48:34,016][15870] Stopping RolloutWorker_w7...
+[2024-09-02 00:48:34,016][15870] Loop rollout_proc7_evt_loop terminating...
+[2024-09-02 00:48:34,016][15867] Stopping RolloutWorker_w4...
+[2024-09-02 00:48:34,017][15867] Loop rollout_proc4_evt_loop terminating...
+[2024-09-02 00:48:34,018][15849] Saving /home/montana/repos/deep-rl/unit8-ppo/train_dir/default_experiment/checkpoint_p0/checkpoint_000000000_0.pth...
+[2024-09-02 00:48:34,015][15596] Runner profile tree view:
+main_loop: 23.2514
+[2024-09-02 00:48:34,020][15596] Collected {0: 0}, FPS: 0.0
+[2024-09-02 00:48:34,030][15863] Stopping RolloutWorker_w0...
+[2024-09-02 00:48:34,030][15863] Loop rollout_proc0_evt_loop terminating...
+[2024-09-02 00:48:34,100][15849] Stopping LearnerWorker_p0...
+[2024-09-02 00:48:34,101][15849] Loop learner_proc0_evt_loop terminating...
+[2024-09-02 01:04:25,924][15596] Loading existing experiment configuration from /home/montana/repos/deep-rl/unit8-ppo/train_dir/default_experiment/config.json
+[2024-09-02 01:04:25,926][15596] Overriding arg 'num_workers' with value 1 passed from command line
+[2024-09-02 01:04:25,926][15596] Adding new argument 'no_render'=True that is not in the saved config file!
+[2024-09-02 01:04:25,927][15596] Adding new argument 'save_video'=True that is not in the saved config file!
+[2024-09-02 01:04:25,928][15596] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
+[2024-09-02 01:04:25,930][15596] Adding new argument 'video_name'=None that is not in the saved config file!
+[2024-09-02 01:04:25,930][15596] Adding new argument 'max_num_frames'=100000 that is not in the saved config file!
+[2024-09-02 01:04:25,931][15596] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
+[2024-09-02 01:04:25,932][15596] Adding new argument 'push_to_hub'=True that is not in the saved config file!
+[2024-09-02 01:04:25,933][15596] Adding new argument 'hf_repository'='monti-python/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file!
+[2024-09-02 01:04:25,933][15596] Adding new argument 'policy_index'=0 that is not in the saved config file!
+[2024-09-02 01:04:25,934][15596] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
+[2024-09-02 01:04:25,935][15596] Adding new argument 'train_script'=None that is not in the saved config file!
+[2024-09-02 01:04:25,935][15596] Adding new argument 'enjoy_script'=None that is not in the saved config file!
+[2024-09-02 01:04:25,936][15596] Using frameskip 1 and render_action_repeat=4 for evaluation
+[2024-09-02 01:04:25,962][15596] Doom resolution: 160x120, resize resolution: (128, 72)
+[2024-09-02 01:04:25,966][15596] RunningMeanStd input shape: (3, 72, 128)
+[2024-09-02 01:04:25,971][15596] RunningMeanStd input shape: (1,)
+[2024-09-02 01:04:26,012][15596] ConvEncoder: input_channels=3
+[2024-09-02 01:04:26,159][15596] Conv encoder output size: 512
+[2024-09-02 01:04:26,160][15596] Policy head output size: 512
+[2024-09-02 01:04:34,377][15596] Loading state from checkpoint /home/montana/repos/deep-rl/unit8-ppo/train_dir/default_experiment/checkpoint_p0/checkpoint_000000000_0.pth...
+[2024-09-02 01:04:41,601][15596] Num frames 100...
+[2024-09-02 01:04:41,846][15596] Num frames 200...
+[2024-09-02 01:04:42,022][15596] Num frames 300...
+[2024-09-02 01:04:42,234][15596] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840
+[2024-09-02 01:04:42,236][15596] Avg episode reward: 3.840, avg true_objective: 3.840
+[2024-09-02 01:04:42,270][15596] Num frames 400...
+[2024-09-02 01:04:42,440][15596] Num frames 500...
+[2024-09-02 01:04:42,609][15596] Num frames 600...
+[2024-09-02 01:04:42,762][15596] Num frames 700...
+[2024-09-02 01:04:42,951][15596] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840
+[2024-09-02 01:04:42,953][15596] Avg episode reward: 3.840, avg true_objective: 3.840
+[2024-09-02 01:04:43,018][15596] Num frames 800...
+[2024-09-02 01:04:43,189][15596] Num frames 900...
+[2024-09-02 01:04:43,357][15596] Num frames 1000...
+[2024-09-02 01:04:43,516][15596] Num frames 1100...
+[2024-09-02 01:04:43,648][15596] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840
+[2024-09-02 01:04:43,650][15596] Avg episode reward: 3.840, avg true_objective: 3.840
+[2024-09-02 01:04:43,733][15596] Num frames 1200...
+[2024-09-02 01:04:43,915][15596] Num frames 1300...
+[2024-09-02 01:04:44,067][15596] Num frames 1400...
+[2024-09-02 01:04:44,229][15596] Num frames 1500...
+[2024-09-02 01:04:44,397][15596] Num frames 1600...
+[2024-09-02 01:04:44,511][15596] Avg episode rewards: #0: 4.580, true rewards: #0: 4.080
+[2024-09-02 01:04:44,513][15596] Avg episode reward: 4.580, avg true_objective: 4.080
+[2024-09-02 01:04:44,633][15596] Num frames 1700...
+[2024-09-02 01:04:44,778][15596] Num frames 1800...
+[2024-09-02 01:04:44,947][15596] Num frames 1900...
+[2024-09-02 01:04:45,117][15596] Num frames 2000...
+[2024-09-02 01:04:45,307][15596] Avg episode rewards: #0: 4.760, true rewards: #0: 4.160
+[2024-09-02 01:04:45,308][15596] Avg episode reward: 4.760, avg true_objective: 4.160
+[2024-09-02 01:04:45,347][15596] Num frames 2100...
+[2024-09-02 01:04:45,520][15596] Num frames 2200...
+[2024-09-02 01:04:45,687][15596] Num frames 2300...
+[2024-09-02 01:04:45,859][15596] Num frames 2400...
+[2024-09-02 01:04:46,030][15596] Num frames 2500...
+[2024-09-02 01:04:46,134][15596] Avg episode rewards: #0: 4.880, true rewards: #0: 4.213
+[2024-09-02 01:04:46,135][15596] Avg episode reward: 4.880, avg true_objective: 4.213
+[2024-09-02 01:04:46,268][15596] Num frames 2600...
+[2024-09-02 01:04:46,459][15596] Num frames 2700...
+[2024-09-02 01:04:46,644][15596] Num frames 2800...
+[2024-09-02 01:04:46,826][15596] Num frames 2900...
+[2024-09-02 01:04:47,022][15596] Avg episode rewards: #0: 4.966, true rewards: #0: 4.251
+[2024-09-02 01:04:47,024][15596] Avg episode reward: 4.966, avg true_objective: 4.251
+[2024-09-02 01:04:47,072][15596] Num frames 3000...
+[2024-09-02 01:04:47,278][15596] Num frames 3100...
+[2024-09-02 01:04:47,512][15596] Num frames 3200...
+[2024-09-02 01:04:47,746][15596] Num frames 3300...
+[2024-09-02 01:04:47,946][15596] Avg episode rewards: #0: 4.825, true rewards: #0: 4.200
+[2024-09-02 01:04:47,947][15596] Avg episode reward: 4.825, avg true_objective: 4.200
+[2024-09-02 01:04:48,042][15596] Num frames 3400...
+[2024-09-02 01:04:48,273][15596] Num frames 3500...
+[2024-09-02 01:04:48,524][15596] Num frames 3600...
+[2024-09-02 01:04:48,750][15596] Num frames 3700...
+[2024-09-02 01:04:48,932][15596] Avg episode rewards: #0: 4.716, true rewards: #0: 4.160
+[2024-09-02 01:04:48,933][15596] Avg episode reward: 4.716, avg true_objective: 4.160
+[2024-09-02 01:04:49,061][15596] Num frames 3800...
+[2024-09-02 01:04:49,288][15596] Num frames 3900...
+[2024-09-02 01:04:49,509][15596] Num frames 4000...
+[2024-09-02 01:04:49,672][15596] Num frames 4100...
+[2024-09-02 01:04:49,776][15596] Avg episode rewards: #0: 4.628, true rewards: #0: 4.128
+[2024-09-02 01:04:49,777][15596] Avg episode reward: 4.628, avg true_objective: 4.128
+[2024-09-02 01:04:57,844][15596] Replay video saved to /home/montana/repos/deep-rl/unit8-ppo/train_dir/default_experiment/replay.mp4!