diff --git "a/sf_log.txt" "b/sf_log.txt" new file mode 100644--- /dev/null +++ "b/sf_log.txt" @@ -0,0 +1,1758 @@ +[2024-09-01 14:19:32,046][11658] Saving configuration to /home/montana/repos/deep-rl/unit8-ppo/train_dir/default_experiment/config.json... +[2024-09-01 14:19:32,081][11658] Rollout worker 0 uses device cpu +[2024-09-01 14:19:32,082][11658] Rollout worker 1 uses device cpu +[2024-09-01 14:19:32,083][11658] Rollout worker 2 uses device cpu +[2024-09-01 14:19:32,084][11658] Rollout worker 3 uses device cpu +[2024-09-01 14:19:32,085][11658] Rollout worker 4 uses device cpu +[2024-09-01 14:19:32,087][11658] Rollout worker 5 uses device cpu +[2024-09-01 14:19:32,089][11658] Rollout worker 6 uses device cpu +[2024-09-01 14:19:32,090][11658] Rollout worker 7 uses device cpu +[2024-09-01 14:19:32,146][11658] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-09-01 14:19:32,147][11658] InferenceWorker_p0-w0: min num requests: 2 +[2024-09-01 14:19:32,173][11658] Starting all processes... +[2024-09-01 14:19:32,175][11658] Starting process learner_proc0 +[2024-09-01 14:19:32,264][11658] Starting all processes... +[2024-09-01 14:19:32,275][11658] Starting process inference_proc0-0 +[2024-09-01 14:19:32,276][11658] Starting process rollout_proc0 +[2024-09-01 14:19:32,276][11658] Starting process rollout_proc1 +[2024-09-01 14:19:32,276][11658] Starting process rollout_proc2 +[2024-09-01 14:19:32,277][11658] Starting process rollout_proc3 +[2024-09-01 14:19:32,277][11658] Starting process rollout_proc4 +[2024-09-01 14:19:32,278][11658] Starting process rollout_proc5 +[2024-09-01 14:19:32,278][11658] Starting process rollout_proc6 +[2024-09-01 14:19:32,278][11658] Starting process rollout_proc7 +[2024-09-01 14:19:37,152][12736] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-09-01 14:19:37,153][12736] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 +[2024-09-01 14:19:37,248][12736] Num visible devices: 1 +[2024-09-01 14:19:37,329][12736] Starting seed is not provided +[2024-09-01 14:19:37,330][12736] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-09-01 14:19:37,330][12736] Initializing actor-critic model on device cuda:0 +[2024-09-01 14:19:37,330][12736] RunningMeanStd input shape: (3, 72, 128) +[2024-09-01 14:19:37,332][12736] RunningMeanStd input shape: (1,) +[2024-09-01 14:19:37,358][12736] ConvEncoder: input_channels=3 +[2024-09-01 14:19:37,519][12754] Worker 4 uses CPU cores [4] +[2024-09-01 14:19:37,679][12756] Worker 5 uses CPU cores [5] +[2024-09-01 14:19:37,897][12751] Worker 1 uses CPU cores [1] +[2024-09-01 14:19:37,929][12753] Worker 2 uses CPU cores [2] +[2024-09-01 14:19:37,934][12752] Worker 3 uses CPU cores [3] +[2024-09-01 14:19:37,952][12749] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-09-01 14:19:37,953][12749] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 +[2024-09-01 14:19:38,000][12750] Worker 0 uses CPU cores [0] +[2024-09-01 14:19:38,002][12757] Worker 7 uses CPU cores [7] +[2024-09-01 14:19:38,008][12749] Num visible devices: 1 +[2024-09-01 14:19:38,106][12755] Worker 6 uses CPU cores [6] +[2024-09-01 14:19:38,201][12736] Conv encoder output size: 512 +[2024-09-01 14:19:38,201][12736] Policy head output size: 512 +[2024-09-01 14:19:38,215][12736] Created Actor Critic model with architecture: +[2024-09-01 14:19:38,215][12736] ActorCriticSharedWeights( + (obs_normalizer): ObservationNormalizer( + (running_mean_std): RunningMeanStdDictInPlace( + (running_mean_std): ModuleDict( + (obs): RunningMeanStdInPlace() + ) + ) + ) + (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) + (encoder): VizdoomEncoder( + (basic_encoder): ConvEncoder( + (enc): RecursiveScriptModule( + original_name=ConvEncoderImpl + (conv_head): RecursiveScriptModule( + original_name=Sequential + (0): RecursiveScriptModule(original_name=Conv2d) + (1): RecursiveScriptModule(original_name=ELU) + (2): RecursiveScriptModule(original_name=Conv2d) + (3): RecursiveScriptModule(original_name=ELU) + (4): RecursiveScriptModule(original_name=Conv2d) + (5): RecursiveScriptModule(original_name=ELU) + ) + (mlp_layers): RecursiveScriptModule( + original_name=Sequential + (0): RecursiveScriptModule(original_name=Linear) + (1): RecursiveScriptModule(original_name=ELU) + ) + ) + ) + ) + (core): ModelCoreRNN( + (core): GRU(512, 512) + ) + (decoder): MlpDecoder( + (mlp): Identity() + ) + (critic_linear): Linear(in_features=512, out_features=1, bias=True) + (action_parameterization): ActionParameterizationDefault( + (distribution_linear): Linear(in_features=512, out_features=5, bias=True) + ) +) +[2024-09-01 14:19:52,141][11658] Heartbeat connected on Batcher_0 +[2024-09-01 14:19:52,146][11658] Heartbeat connected on InferenceWorker_p0-w0 +[2024-09-01 14:19:52,152][11658] Heartbeat connected on RolloutWorker_w0 +[2024-09-01 14:19:52,157][11658] Heartbeat connected on RolloutWorker_w2 +[2024-09-01 14:19:52,159][11658] Heartbeat connected on RolloutWorker_w3 +[2024-09-01 14:19:52,161][11658] Heartbeat connected on RolloutWorker_w1 +[2024-09-01 14:19:52,162][11658] Heartbeat connected on RolloutWorker_w4 +[2024-09-01 14:19:52,166][11658] Heartbeat connected on RolloutWorker_w5 +[2024-09-01 14:19:52,170][11658] Heartbeat connected on RolloutWorker_w6 +[2024-09-01 14:19:52,190][11658] Heartbeat connected on RolloutWorker_w7 +[2024-09-01 14:20:07,736][12736] Using optimizer +[2024-09-01 14:20:07,737][12736] No checkpoints found +[2024-09-01 14:20:07,737][12736] Did not load from checkpoint, starting from scratch! +[2024-09-01 14:20:07,738][12736] Initialized policy 0 weights for model version 0 +[2024-09-01 14:20:07,756][12736] LearnerWorker_p0 finished initialization! +[2024-09-01 14:20:07,756][12736] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-09-01 14:20:07,759][11658] Heartbeat connected on LearnerWorker_p0 +[2024-09-01 14:20:11,913][11658] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:20:16,913][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:20:18,288][12749] Unhandled exception CUDA error: unknown error +CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. +For debugging consider passing CUDA_LAUNCH_BLOCKING=1. +Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions. + in evt loop inference_proc0-0_evt_loop +[2024-09-01 14:20:21,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:20:26,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:20:31,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:20:36,913][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:20:41,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:20:46,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:20:51,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:20:56,913][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:21:01,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:21:06,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:21:11,913][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:21:16,913][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:21:21,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:21:26,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:21:26,915][12736] Saving /home/montana/repos/deep-rl/unit8-ppo/train_dir/default_experiment/checkpoint_p0/checkpoint_000000000_0.pth... +[2024-09-01 14:21:31,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:21:36,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:21:41,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:21:46,913][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:21:51,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:21:56,914][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:22:01,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:22:06,913][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:22:11,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:22:16,913][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:22:21,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:22:26,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:22:31,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:22:36,913][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:22:41,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:22:46,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:22:51,913][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:22:56,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:23:01,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:23:06,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:23:11,913][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:23:16,913][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:23:21,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:23:26,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:23:26,915][12736] Saving /home/montana/repos/deep-rl/unit8-ppo/train_dir/default_experiment/checkpoint_p0/checkpoint_000000000_0.pth... +[2024-09-01 14:23:31,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:23:36,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:23:41,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:23:46,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:23:51,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:23:56,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:24:01,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:24:06,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:24:11,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:24:16,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:24:21,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:24:26,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:24:31,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:24:36,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:24:41,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:24:46,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:24:51,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:24:56,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:25:01,913][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:25:06,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:25:11,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:25:16,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:25:21,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:25:26,913][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:25:26,916][12736] Saving /home/montana/repos/deep-rl/unit8-ppo/train_dir/default_experiment/checkpoint_p0/checkpoint_000000000_0.pth... +[2024-09-01 14:25:31,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:25:36,913][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:25:41,913][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:25:46,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:25:51,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:25:56,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:26:01,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:26:06,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:26:11,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:26:16,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:26:21,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:26:26,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:26:31,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:26:36,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:26:41,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:26:46,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:26:51,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:26:56,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:27:01,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:27:06,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:27:11,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:27:16,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:27:21,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:27:26,913][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:27:26,915][12736] Saving /home/montana/repos/deep-rl/unit8-ppo/train_dir/default_experiment/checkpoint_p0/checkpoint_000000000_0.pth... +[2024-09-01 14:27:31,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:27:36,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:27:41,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:27:46,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:27:51,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:27:56,913][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:28:01,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:28:06,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:28:11,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:28:16,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:28:21,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:28:26,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:28:31,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:28:36,913][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:28:41,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:28:46,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:28:51,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:28:56,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:29:01,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:29:06,913][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:29:11,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:29:16,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:29:21,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:29:26,913][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:29:26,915][12736] Saving /home/montana/repos/deep-rl/unit8-ppo/train_dir/default_experiment/checkpoint_p0/checkpoint_000000000_0.pth... +[2024-09-01 14:29:31,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:29:36,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:29:41,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:29:46,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:29:51,913][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:29:56,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:30:01,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:30:06,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:30:11,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:30:16,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:30:21,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:30:26,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:30:31,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:30:36,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:30:41,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:30:46,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:30:51,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:30:56,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:31:01,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:31:06,913][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:31:11,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:31:16,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:31:21,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:31:26,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:31:26,915][12736] Saving /home/montana/repos/deep-rl/unit8-ppo/train_dir/default_experiment/checkpoint_p0/checkpoint_000000000_0.pth... +[2024-09-01 14:31:31,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:31:36,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:31:41,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:31:46,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:31:51,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:31:56,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:32:01,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:32:06,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:32:11,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:32:16,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:32:21,913][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:32:26,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:32:31,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:32:36,913][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:32:41,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:32:46,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:32:51,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:32:56,913][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:33:01,913][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:33:06,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:33:11,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:33:16,913][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:33:21,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:33:26,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:33:26,915][12736] Saving /home/montana/repos/deep-rl/unit8-ppo/train_dir/default_experiment/checkpoint_p0/checkpoint_000000000_0.pth... +[2024-09-01 14:33:31,913][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:33:36,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:33:41,913][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:33:46,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:33:51,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:33:56,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:34:01,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:34:06,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:34:11,913][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:34:16,913][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:34:21,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:34:26,913][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:34:31,913][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:34:36,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:34:41,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:34:46,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:34:51,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:34:56,913][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:35:01,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:35:06,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:35:11,913][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:35:16,913][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:35:21,913][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:35:26,913][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:35:26,916][12736] Saving /home/montana/repos/deep-rl/unit8-ppo/train_dir/default_experiment/checkpoint_p0/checkpoint_000000000_0.pth... +[2024-09-01 14:35:31,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:35:36,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:35:41,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:35:46,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:35:51,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:35:56,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:36:01,913][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:36:06,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:36:11,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:36:16,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:36:21,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:36:26,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:36:31,913][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:36:36,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:36:41,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:36:46,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:36:51,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:36:56,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:37:01,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:37:06,913][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:37:11,913][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:37:16,913][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:37:21,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:37:26,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:37:26,915][12736] Saving /home/montana/repos/deep-rl/unit8-ppo/train_dir/default_experiment/checkpoint_p0/checkpoint_000000000_0.pth... +[2024-09-01 14:37:31,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:37:36,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:37:41,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:37:46,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:37:51,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:37:56,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:38:01,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:38:06,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:38:11,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:38:16,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:38:21,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:38:26,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:38:31,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:38:36,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:38:41,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:38:46,913][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:38:51,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:38:56,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:39:01,913][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:39:06,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:39:11,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:39:16,913][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:39:21,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:39:26,912][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-01 14:39:26,915][12736] Saving /home/montana/repos/deep-rl/unit8-ppo/train_dir/default_experiment/checkpoint_p0/checkpoint_000000000_0.pth... +[2024-09-01 14:39:26,917][11658] No heartbeat for components: InferenceWorker_p0-w0 (1174 seconds) +[2024-09-01 14:39:26,918][11658] Stopping training due to lack of heartbeats from +[2024-09-01 14:39:26,922][11658] Component InferenceWorker_p0-w0 process died already! Don't wait for it. +[2024-09-01 14:39:26,923][11658] Component RolloutWorker_w6 stopped! +[2024-09-01 14:39:26,925][11658] Waiting for ['Batcher_0', 'LearnerWorker_p0', 'RolloutWorker_w0', 'RolloutWorker_w1', 'RolloutWorker_w2', 'RolloutWorker_w3', 'RolloutWorker_w4', 'RolloutWorker_w5', 'RolloutWorker_w7'] to stop... +[2024-09-01 14:39:26,926][11658] Component RolloutWorker_w2 stopped! +[2024-09-01 14:39:26,927][11658] Waiting for ['Batcher_0', 'LearnerWorker_p0', 'RolloutWorker_w0', 'RolloutWorker_w1', 'RolloutWorker_w3', 'RolloutWorker_w4', 'RolloutWorker_w5', 'RolloutWorker_w7'] to stop... +[2024-09-01 14:39:26,928][11658] Component RolloutWorker_w4 stopped! +[2024-09-01 14:39:26,929][11658] Waiting for ['Batcher_0', 'LearnerWorker_p0', 'RolloutWorker_w0', 'RolloutWorker_w1', 'RolloutWorker_w3', 'RolloutWorker_w5', 'RolloutWorker_w7'] to stop... +[2024-09-01 14:39:26,930][11658] Component RolloutWorker_w5 stopped! +[2024-09-01 14:39:26,932][11658] Waiting for ['Batcher_0', 'LearnerWorker_p0', 'RolloutWorker_w0', 'RolloutWorker_w1', 'RolloutWorker_w3', 'RolloutWorker_w7'] to stop... +[2024-09-01 14:39:26,924][12755] Stopping RolloutWorker_w6... +[2024-09-01 14:39:26,935][12755] Loop rollout_proc6_evt_loop terminating... +[2024-09-01 14:39:26,924][12753] Stopping RolloutWorker_w2... +[2024-09-01 14:39:26,925][12754] Stopping RolloutWorker_w4... +[2024-09-01 14:39:26,937][12753] Loop rollout_proc2_evt_loop terminating... +[2024-09-01 14:39:26,938][12754] Loop rollout_proc4_evt_loop terminating... +[2024-09-01 14:39:26,942][12736] Stopping Batcher_0... +[2024-09-01 14:39:26,943][12736] Loop batcher_evt_loop terminating... +[2024-09-01 14:39:26,943][11658] Component RolloutWorker_w7 stopped! +[2024-09-01 14:39:26,934][12750] Stopping RolloutWorker_w0... +[2024-09-01 14:39:26,947][11658] Waiting for ['Batcher_0', 'LearnerWorker_p0', 'RolloutWorker_w0', 'RolloutWorker_w1', 'RolloutWorker_w3'] to stop... +[2024-09-01 14:39:26,949][11658] Component RolloutWorker_w0 stopped! +[2024-09-01 14:39:26,936][12757] Stopping RolloutWorker_w7... +[2024-09-01 14:39:26,951][12750] Loop rollout_proc0_evt_loop terminating... +[2024-09-01 14:39:26,951][11658] Waiting for ['Batcher_0', 'LearnerWorker_p0', 'RolloutWorker_w1', 'RolloutWorker_w3'] to stop... +[2024-09-01 14:39:26,953][12757] Loop rollout_proc7_evt_loop terminating... +[2024-09-01 14:39:26,928][12756] Stopping RolloutWorker_w5... +[2024-09-01 14:39:26,952][11658] Component RolloutWorker_w1 stopped! +[2024-09-01 14:39:26,956][11658] Waiting for ['Batcher_0', 'LearnerWorker_p0', 'RolloutWorker_w3'] to stop... +[2024-09-01 14:39:26,958][12756] Loop rollout_proc5_evt_loop terminating... +[2024-09-01 14:39:26,957][11658] Component RolloutWorker_w3 stopped! +[2024-09-01 14:39:26,961][11658] Waiting for ['Batcher_0', 'LearnerWorker_p0'] to stop... +[2024-09-01 14:39:26,965][11658] Component Batcher_0 stopped! +[2024-09-01 14:39:26,971][11658] Waiting for ['LearnerWorker_p0'] to stop... +[2024-09-01 14:39:27,002][12736] Saving /home/montana/repos/deep-rl/unit8-ppo/train_dir/default_experiment/checkpoint_p0/checkpoint_000000000_0.pth... +[2024-09-01 14:39:26,943][12752] Stopping RolloutWorker_w3... +[2024-09-01 14:39:27,017][12752] Loop rollout_proc3_evt_loop terminating... +[2024-09-01 14:39:26,934][12751] Stopping RolloutWorker_w1... +[2024-09-01 14:39:27,055][12751] Loop rollout_proc1_evt_loop terminating... +[2024-09-01 14:39:27,049][12736] Saving /home/montana/repos/deep-rl/unit8-ppo/train_dir/default_experiment/checkpoint_p0/checkpoint_000000000_0.pth... +[2024-09-01 14:39:27,110][12736] Stopping LearnerWorker_p0... +[2024-09-01 14:39:27,111][12736] Loop learner_proc0_evt_loop terminating... +[2024-09-01 14:39:27,110][11658] Component LearnerWorker_p0 stopped! +[2024-09-01 14:39:27,112][11658] Waiting for process learner_proc0 to stop... +[2024-09-01 14:39:31,369][11658] Waiting for process inference_proc0-0 to join... +[2024-09-01 14:39:31,370][11658] Waiting for process rollout_proc0 to join... +[2024-09-01 14:39:31,372][11658] Waiting for process rollout_proc1 to join... +[2024-09-01 14:39:31,373][11658] Waiting for process rollout_proc2 to join... +[2024-09-01 14:39:31,374][11658] Waiting for process rollout_proc3 to join... +[2024-09-01 14:39:31,375][11658] Waiting for process rollout_proc4 to join... +[2024-09-01 14:39:31,376][11658] Waiting for process rollout_proc5 to join... +[2024-09-01 14:39:31,378][11658] Waiting for process rollout_proc6 to join... +[2024-09-01 14:39:31,379][11658] Waiting for process rollout_proc7 to join... +[2024-09-01 14:39:31,380][11658] Batcher 0 profile tree view: +[2024-09-01 14:39:31,380][11658] Learner 0 profile tree view: +[2024-09-01 14:39:31,381][11658] RolloutWorker_w0 profile tree view: +[2024-09-01 14:39:31,383][11658] RolloutWorker_w7 profile tree view: +[2024-09-01 14:39:31,384][11658] Loop Runner_EvtLoop terminating... +[2024-09-01 14:39:31,385][11658] Runner profile tree view: +main_loop: 1199.2124 +[2024-09-01 14:39:31,386][11658] Collected {0: 0}, FPS: 0.0 +[2024-09-01 14:47:24,536][11658] Loading existing experiment configuration from /home/montana/repos/deep-rl/unit8-ppo/train_dir/default_experiment/config.json +[2024-09-01 14:47:24,538][11658] Overriding arg 'num_workers' with value 1 passed from command line +[2024-09-01 14:47:24,540][11658] Adding new argument 'no_render'=True that is not in the saved config file! +[2024-09-01 14:47:24,541][11658] Adding new argument 'save_video'=True that is not in the saved config file! +[2024-09-01 14:47:24,541][11658] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! +[2024-09-01 14:47:24,542][11658] Adding new argument 'video_name'=None that is not in the saved config file! +[2024-09-01 14:47:24,543][11658] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! +[2024-09-01 14:47:24,543][11658] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! +[2024-09-01 14:47:24,544][11658] Adding new argument 'push_to_hub'=False that is not in the saved config file! +[2024-09-01 14:47:24,545][11658] Adding new argument 'hf_repository'=None that is not in the saved config file! +[2024-09-01 14:47:24,546][11658] Adding new argument 'policy_index'=0 that is not in the saved config file! +[2024-09-01 14:47:24,547][11658] Adding new argument 'eval_deterministic'=False that is not in the saved config file! +[2024-09-01 14:47:24,548][11658] Adding new argument 'train_script'=None that is not in the saved config file! +[2024-09-01 14:47:24,549][11658] Adding new argument 'enjoy_script'=None that is not in the saved config file! +[2024-09-01 14:47:24,549][11658] Using frameskip 1 and render_action_repeat=4 for evaluation +[2024-09-01 14:47:24,610][11658] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-09-01 14:47:24,618][11658] RunningMeanStd input shape: (3, 72, 128) +[2024-09-01 14:47:24,630][11658] RunningMeanStd input shape: (1,) +[2024-09-01 14:47:24,757][11658] ConvEncoder: input_channels=3 +[2024-09-01 14:47:25,427][11658] Conv encoder output size: 512 +[2024-09-01 14:47:25,429][11658] Policy head output size: 512 +[2024-09-01 14:47:36,923][11658] Loading state from checkpoint /home/montana/repos/deep-rl/unit8-ppo/train_dir/default_experiment/checkpoint_p0/checkpoint_000000000_0.pth... +[2024-09-01 14:47:45,808][11658] Num frames 100... +[2024-09-01 14:47:46,145][11658] Num frames 200... +[2024-09-01 14:47:46,429][11658] Num frames 300... +[2024-09-01 14:47:46,688][11658] Num frames 400... +[2024-09-01 14:47:46,855][11658] Avg episode rewards: #0: 5.160, true rewards: #0: 4.160 +[2024-09-01 14:47:46,857][11658] Avg episode reward: 5.160, avg true_objective: 4.160 +[2024-09-01 14:47:47,056][11658] Num frames 500... +[2024-09-01 14:47:47,292][11658] Num frames 600... +[2024-09-01 14:47:47,523][11658] Num frames 700... +[2024-09-01 14:47:47,759][11658] Num frames 800... +[2024-09-01 14:47:47,811][11658] Avg episode rewards: #0: 4.500, true rewards: #0: 4.000 +[2024-09-01 14:47:47,813][11658] Avg episode reward: 4.500, avg true_objective: 4.000 +[2024-09-01 14:47:48,097][11658] Num frames 900... +[2024-09-01 14:47:48,311][11658] Num frames 1000... +[2024-09-01 14:47:48,515][11658] Num frames 1100... +[2024-09-01 14:47:48,753][11658] Avg episode rewards: #0: 4.280, true rewards: #0: 3.947 +[2024-09-01 14:47:48,755][11658] Avg episode reward: 4.280, avg true_objective: 3.947 +[2024-09-01 14:47:48,794][11658] Num frames 1200... +[2024-09-01 14:47:49,014][11658] Num frames 1300... +[2024-09-01 14:47:49,231][11658] Num frames 1400... +[2024-09-01 14:47:49,452][11658] Num frames 1500... +[2024-09-01 14:47:49,667][11658] Num frames 1600... +[2024-09-01 14:47:49,879][11658] Num frames 1700... +[2024-09-01 14:47:50,003][11658] Avg episode rewards: #0: 5.070, true rewards: #0: 4.320 +[2024-09-01 14:47:50,004][11658] Avg episode reward: 5.070, avg true_objective: 4.320 +[2024-09-01 14:47:50,161][11658] Num frames 1800... +[2024-09-01 14:47:50,400][11658] Num frames 1900... +[2024-09-01 14:47:50,651][11658] Num frames 2000... +[2024-09-01 14:47:50,921][11658] Num frames 2100... +[2024-09-01 14:47:51,007][11658] Avg episode rewards: #0: 4.824, true rewards: #0: 4.224 +[2024-09-01 14:47:51,009][11658] Avg episode reward: 4.824, avg true_objective: 4.224 +[2024-09-01 14:47:51,287][11658] Num frames 2200... +[2024-09-01 14:47:51,552][11658] Num frames 2300... +[2024-09-01 14:47:51,784][11658] Num frames 2400... +[2024-09-01 14:47:52,096][11658] Avg episode rewards: #0: 4.660, true rewards: #0: 4.160 +[2024-09-01 14:47:52,099][11658] Avg episode reward: 4.660, avg true_objective: 4.160 +[2024-09-01 14:47:52,120][11658] Num frames 2500... +[2024-09-01 14:47:52,397][11658] Num frames 2600... +[2024-09-01 14:47:52,684][11658] Num frames 2700... +[2024-09-01 14:47:52,963][11658] Num frames 2800... +[2024-09-01 14:47:53,271][11658] Avg episode rewards: #0: 4.543, true rewards: #0: 4.114 +[2024-09-01 14:47:53,273][11658] Avg episode reward: 4.543, avg true_objective: 4.114 +[2024-09-01 14:47:53,379][11658] Num frames 2900... +[2024-09-01 14:47:53,659][11658] Num frames 3000... +[2024-09-01 14:47:53,916][11658] Num frames 3100... +[2024-09-01 14:47:54,160][11658] Num frames 3200... +[2024-09-01 14:47:54,389][11658] Num frames 3300... +[2024-09-01 14:47:54,511][11658] Avg episode rewards: #0: 4.660, true rewards: #0: 4.160 +[2024-09-01 14:47:54,512][11658] Avg episode reward: 4.660, avg true_objective: 4.160 +[2024-09-01 14:47:54,681][11658] Num frames 3400... +[2024-09-01 14:47:54,907][11658] Num frames 3500... +[2024-09-01 14:47:55,118][11658] Num frames 3600... +[2024-09-01 14:47:55,342][11658] Num frames 3700... +[2024-09-01 14:47:55,420][11658] Avg episode rewards: #0: 4.569, true rewards: #0: 4.124 +[2024-09-01 14:47:55,421][11658] Avg episode reward: 4.569, avg true_objective: 4.124 +[2024-09-01 14:47:55,629][11658] Num frames 3800... +[2024-09-01 14:47:55,807][11658] Num frames 3900... +[2024-09-01 14:47:55,961][11658] Num frames 4000... +[2024-09-01 14:47:56,113][11658] Num frames 4100... +[2024-09-01 14:47:56,257][11658] Avg episode rewards: #0: 4.660, true rewards: #0: 4.160 +[2024-09-01 14:47:56,258][11658] Avg episode reward: 4.660, avg true_objective: 4.160 +[2024-09-01 14:48:04,794][11658] Replay video saved to /home/montana/repos/deep-rl/unit8-ppo/train_dir/default_experiment/replay.mp4! +[2024-09-02 00:15:20,339][11658] Environment doom_basic already registered, overwriting... +[2024-09-02 00:15:20,352][11658] Environment doom_two_colors_easy already registered, overwriting... +[2024-09-02 00:15:20,354][11658] Environment doom_two_colors_hard already registered, overwriting... +[2024-09-02 00:15:20,355][11658] Environment doom_dm already registered, overwriting... +[2024-09-02 00:15:20,356][11658] Environment doom_dwango5 already registered, overwriting... +[2024-09-02 00:15:20,357][11658] Environment doom_my_way_home_flat_actions already registered, overwriting... +[2024-09-02 00:15:20,357][11658] Environment doom_defend_the_center_flat_actions already registered, overwriting... +[2024-09-02 00:15:20,358][11658] Environment doom_my_way_home already registered, overwriting... +[2024-09-02 00:15:20,359][11658] Environment doom_deadly_corridor already registered, overwriting... +[2024-09-02 00:15:20,361][11658] Environment doom_defend_the_center already registered, overwriting... +[2024-09-02 00:15:20,361][11658] Environment doom_defend_the_line already registered, overwriting... +[2024-09-02 00:15:20,362][11658] Environment doom_health_gathering already registered, overwriting... +[2024-09-02 00:15:20,363][11658] Environment doom_health_gathering_supreme already registered, overwriting... +[2024-09-02 00:15:20,364][11658] Environment doom_battle already registered, overwriting... +[2024-09-02 00:15:20,365][11658] Environment doom_battle2 already registered, overwriting... +[2024-09-02 00:15:20,366][11658] Environment doom_duel_bots already registered, overwriting... +[2024-09-02 00:15:20,367][11658] Environment doom_deathmatch_bots already registered, overwriting... +[2024-09-02 00:15:20,368][11658] Environment doom_duel already registered, overwriting... +[2024-09-02 00:15:20,370][11658] Environment doom_deathmatch_full already registered, overwriting... +[2024-09-02 00:15:20,373][11658] Environment doom_benchmark already registered, overwriting... +[2024-09-02 00:15:20,374][11658] register_encoder_factory: +[2024-09-02 00:15:20,462][11658] Loading existing experiment configuration from /home/montana/repos/deep-rl/unit8-ppo/train_dir/default_experiment/config.json +[2024-09-02 00:15:20,490][11658] Experiment dir /home/montana/repos/deep-rl/unit8-ppo/train_dir/default_experiment already exists! +[2024-09-02 00:15:20,491][11658] Resuming existing experiment from /home/montana/repos/deep-rl/unit8-ppo/train_dir/default_experiment... +[2024-09-02 00:15:20,492][11658] Weights and Biases integration disabled +[2024-09-02 00:15:20,518][11658] Environment var CUDA_VISIBLE_DEVICES is 0 + +[2024-09-02 00:15:25,589][11658] Starting experiment with the following configuration: +help=False +algo=APPO +env=doom_health_gathering_supreme +experiment=default_experiment +train_dir=/home/montana/repos/deep-rl/unit8-ppo/train_dir +restart_behavior=resume +device=gpu +seed=None +num_policies=1 +async_rl=True +serial_mode=False +batched_sampling=False +num_batches_to_accumulate=2 +worker_num_splits=2 +policy_workers_per_policy=1 +max_policy_lag=1000 +num_workers=8 +num_envs_per_worker=4 +batch_size=1024 +num_batches_per_epoch=1 +num_epochs=1 +rollout=32 +recurrence=32 +shuffle_minibatches=False +gamma=0.99 +reward_scale=1.0 +reward_clip=1000.0 +value_bootstrap=False +normalize_returns=True +exploration_loss_coeff=0.001 +value_loss_coeff=0.5 +kl_loss_coeff=0.0 +exploration_loss=symmetric_kl +gae_lambda=0.95 +ppo_clip_ratio=0.1 +ppo_clip_value=0.2 +with_vtrace=False +vtrace_rho=1.0 +vtrace_c=1.0 +optimizer=adam +adam_eps=1e-06 +adam_beta1=0.9 +adam_beta2=0.999 +max_grad_norm=4.0 +learning_rate=0.0001 +lr_schedule=constant +lr_schedule_kl_threshold=0.008 +lr_adaptive_min=1e-06 +lr_adaptive_max=0.01 +obs_subtract_mean=0.0 +obs_scale=255.0 +normalize_input=True +normalize_input_keys=None +decorrelate_experience_max_seconds=0 +decorrelate_envs_on_one_worker=True +actor_worker_gpus=[] +set_workers_cpu_affinity=True +force_envs_single_thread=False +default_niceness=0 +log_to_file=True +experiment_summaries_interval=10 +flush_summaries_interval=30 +stats_avg=100 +summaries_use_frameskip=True +heartbeat_interval=20 +heartbeat_reporting_interval=600 +train_for_env_steps=4000000 +train_for_seconds=10000000000 +save_every_sec=120 +keep_checkpoints=2 +load_checkpoint_kind=latest +save_milestones_sec=-1 +save_best_every_sec=5 +save_best_metric=reward +save_best_after=100000 +benchmark=False +encoder_mlp_layers=[512, 512] +encoder_conv_architecture=convnet_simple +encoder_conv_mlp_layers=[512] +use_rnn=True +rnn_size=512 +rnn_type=gru +rnn_num_layers=1 +decoder_mlp_layers=[] +nonlinearity=elu +policy_initialization=orthogonal +policy_init_gain=1.0 +actor_critic_share_weights=True +adaptive_stddev=True +continuous_tanh_scale=0.0 +initial_stddev=1.0 +use_env_info_cache=False +env_gpu_actions=False +env_gpu_observations=True +env_frameskip=4 +env_framestack=1 +pixel_format=CHW +use_record_episode_statistics=False +with_wandb=False +wandb_user=None +wandb_project=sample_factory +wandb_group=None +wandb_job_type=SF +wandb_tags=[] +with_pbt=False +pbt_mix_policies_in_one_env=True +pbt_period_env_steps=5000000 +pbt_start_mutation=20000000 +pbt_replace_fraction=0.3 +pbt_mutation_rate=0.15 +pbt_replace_reward_gap=0.1 +pbt_replace_reward_gap_absolute=1e-06 +pbt_optimize_gamma=False +pbt_target_objective=true_objective +pbt_perturb_min=1.1 +pbt_perturb_max=1.5 +num_agents=-1 +num_humans=0 +num_bots=-1 +start_bot_difficulty=None +timelimit=None +res_w=128 +res_h=72 +wide_aspect_ratio=False +eval_env_frameskip=1 +fps=35 +command_line=--env=doom_health_gathering_supreme --num_workers=8 --num_envs_per_worker=4 --train_for_env_steps=4000000 +cli_args={'env': 'doom_health_gathering_supreme', 'num_workers': 8, 'num_envs_per_worker': 4, 'train_for_env_steps': 4000000} +git_hash=e923ca6811d177eb3a7a4b268a75d06335cade44 +git_repo_name=https://github.com/monti-python/deep-rl.git +[2024-09-02 00:15:25,592][11658] Saving configuration to /home/montana/repos/deep-rl/unit8-ppo/train_dir/default_experiment/config.json... +[2024-09-02 00:15:25,601][11658] Rollout worker 0 uses device cpu +[2024-09-02 00:15:25,603][11658] Rollout worker 1 uses device cpu +[2024-09-02 00:15:25,604][11658] Rollout worker 2 uses device cpu +[2024-09-02 00:15:25,605][11658] Rollout worker 3 uses device cpu +[2024-09-02 00:15:25,606][11658] Rollout worker 4 uses device cpu +[2024-09-02 00:15:25,607][11658] Rollout worker 5 uses device cpu +[2024-09-02 00:15:25,607][11658] Rollout worker 6 uses device cpu +[2024-09-02 00:15:25,608][11658] Rollout worker 7 uses device cpu +[2024-09-02 00:15:25,817][11658] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-09-02 00:15:25,818][11658] InferenceWorker_p0-w0: min num requests: 2 +[2024-09-02 00:15:25,848][11658] Starting all processes... +[2024-09-02 00:15:25,849][11658] Starting process learner_proc0 +[2024-09-02 00:15:25,889][11658] Starting all processes... +[2024-09-02 00:15:25,901][11658] Starting process inference_proc0-0 +[2024-09-02 00:15:25,907][11658] Starting process rollout_proc0 +[2024-09-02 00:15:25,909][11658] Starting process rollout_proc1 +[2024-09-02 00:15:25,910][11658] Starting process rollout_proc2 +[2024-09-02 00:15:25,911][11658] Starting process rollout_proc3 +[2024-09-02 00:15:25,913][11658] Starting process rollout_proc4 +[2024-09-02 00:15:25,914][11658] Starting process rollout_proc5 +[2024-09-02 00:15:25,924][11658] Starting process rollout_proc6 +[2024-09-02 00:15:25,930][11658] Starting process rollout_proc7 +[2024-09-02 00:15:30,110][00805] Worker 5 uses CPU cores [5] +[2024-09-02 00:15:30,114][00808] Worker 6 uses CPU cores [6] +[2024-09-02 00:15:30,156][00795] Worker 1 uses CPU cores [1] +[2024-09-02 00:15:30,156][00794] Worker 0 uses CPU cores [0] +[2024-09-02 00:15:30,157][00793] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-09-02 00:15:30,157][00793] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 +[2024-09-02 00:15:30,164][00780] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-09-02 00:15:30,165][00780] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 +[2024-09-02 00:15:30,217][00807] Worker 7 uses CPU cores [7] +[2024-09-02 00:15:30,256][00796] Worker 3 uses CPU cores [3] +[2024-09-02 00:15:30,306][00780] Num visible devices: 1 +[2024-09-02 00:15:30,306][00793] Num visible devices: 1 +[2024-09-02 00:15:30,381][00806] Worker 4 uses CPU cores [4] +[2024-09-02 00:15:30,401][00780] Starting seed is not provided +[2024-09-02 00:15:30,402][00780] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-09-02 00:15:30,402][00780] Initializing actor-critic model on device cuda:0 +[2024-09-02 00:15:30,402][00780] RunningMeanStd input shape: (3, 72, 128) +[2024-09-02 00:15:30,407][00780] RunningMeanStd input shape: (1,) +[2024-09-02 00:15:30,430][00780] ConvEncoder: input_channels=3 +[2024-09-02 00:15:30,499][00797] Worker 2 uses CPU cores [2] +[2024-09-02 00:15:30,804][00780] Conv encoder output size: 512 +[2024-09-02 00:15:30,804][00780] Policy head output size: 512 +[2024-09-02 00:15:30,851][00780] Created Actor Critic model with architecture: +[2024-09-02 00:15:30,851][00780] ActorCriticSharedWeights( + (obs_normalizer): ObservationNormalizer( + (running_mean_std): RunningMeanStdDictInPlace( + (running_mean_std): ModuleDict( + (obs): RunningMeanStdInPlace() + ) + ) + ) + (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) + (encoder): VizdoomEncoder( + (basic_encoder): ConvEncoder( + (enc): RecursiveScriptModule( + original_name=ConvEncoderImpl + (conv_head): RecursiveScriptModule( + original_name=Sequential + (0): RecursiveScriptModule(original_name=Conv2d) + (1): RecursiveScriptModule(original_name=ELU) + (2): RecursiveScriptModule(original_name=Conv2d) + (3): RecursiveScriptModule(original_name=ELU) + (4): RecursiveScriptModule(original_name=Conv2d) + (5): RecursiveScriptModule(original_name=ELU) + ) + (mlp_layers): RecursiveScriptModule( + original_name=Sequential + (0): RecursiveScriptModule(original_name=Linear) + (1): RecursiveScriptModule(original_name=ELU) + ) + ) + ) + ) + (core): ModelCoreRNN( + (core): GRU(512, 512) + ) + (decoder): MlpDecoder( + (mlp): Identity() + ) + (critic_linear): Linear(in_features=512, out_features=1, bias=True) + (action_parameterization): ActionParameterizationDefault( + (distribution_linear): Linear(in_features=512, out_features=5, bias=True) + ) +) +[2024-09-02 00:15:37,309][00780] Using optimizer +[2024-09-02 00:15:37,310][00780] Loading state from checkpoint /home/montana/repos/deep-rl/unit8-ppo/train_dir/default_experiment/checkpoint_p0/checkpoint_000000000_0.pth... +[2024-09-02 00:15:37,336][00780] Loading model from checkpoint +[2024-09-02 00:15:37,338][00780] Loaded experiment state at self.train_step=0, self.env_steps=0 +[2024-09-02 00:15:37,340][00780] Initialized policy 0 weights for model version 0 +[2024-09-02 00:15:37,348][00780] LearnerWorker_p0 finished initialization! +[2024-09-02 00:15:37,349][00780] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-09-02 00:15:38,194][00793] Unhandled exception CUDA error: unknown error +CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. +For debugging consider passing CUDA_LAUNCH_BLOCKING=1. +Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions. + in evt loop inference_proc0-0_evt_loop +[2024-09-02 00:15:40,519][11658] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-02 00:15:45,518][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-02 00:15:45,810][11658] Heartbeat connected on Batcher_0 +[2024-09-02 00:15:45,812][11658] Heartbeat connected on LearnerWorker_p0 +[2024-09-02 00:15:45,824][11658] Heartbeat connected on RolloutWorker_w0 +[2024-09-02 00:15:45,827][11658] Heartbeat connected on RolloutWorker_w1 +[2024-09-02 00:15:45,829][11658] Heartbeat connected on RolloutWorker_w2 +[2024-09-02 00:15:45,831][11658] Heartbeat connected on RolloutWorker_w3 +[2024-09-02 00:15:45,834][11658] Heartbeat connected on RolloutWorker_w4 +[2024-09-02 00:15:45,838][11658] Heartbeat connected on RolloutWorker_w5 +[2024-09-02 00:15:45,841][11658] Heartbeat connected on RolloutWorker_w6 +[2024-09-02 00:15:45,850][11658] Heartbeat connected on RolloutWorker_w7 +[2024-09-02 00:15:50,518][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-02 00:15:55,519][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-02 00:16:00,518][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-02 00:16:05,518][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-02 00:16:10,518][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-02 00:16:15,518][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-02 00:16:20,518][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-02 00:16:25,518][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-02 00:16:29,395][11658] Keyboard interrupt detected in the event loop EvtLoop [Runner_EvtLoop, process=main process 11658], exiting... +[2024-09-02 00:16:29,397][00806] Stopping RolloutWorker_w4... +[2024-09-02 00:16:29,397][00807] Stopping RolloutWorker_w7... +[2024-09-02 00:16:29,398][00806] Loop rollout_proc4_evt_loop terminating... +[2024-09-02 00:16:29,397][00805] Stopping RolloutWorker_w5... +[2024-09-02 00:16:29,398][00780] Stopping Batcher_0... +[2024-09-02 00:16:29,398][00807] Loop rollout_proc7_evt_loop terminating... +[2024-09-02 00:16:29,398][00780] Loop batcher_evt_loop terminating... +[2024-09-02 00:16:29,398][00805] Loop rollout_proc5_evt_loop terminating... +[2024-09-02 00:16:29,397][00796] Stopping RolloutWorker_w3... +[2024-09-02 00:16:29,398][00796] Loop rollout_proc3_evt_loop terminating... +[2024-09-02 00:16:29,399][00808] Stopping RolloutWorker_w6... +[2024-09-02 00:16:29,397][11658] Runner profile tree view: +main_loop: 63.5517 +[2024-09-02 00:16:29,400][00780] Saving /home/montana/repos/deep-rl/unit8-ppo/train_dir/default_experiment/checkpoint_p0/checkpoint_000000000_0.pth... +[2024-09-02 00:16:29,400][00794] Stopping RolloutWorker_w0... +[2024-09-02 00:16:29,400][11658] Collected {0: 0}, FPS: 0.0 +[2024-09-02 00:16:29,401][00794] Loop rollout_proc0_evt_loop terminating... +[2024-09-02 00:16:29,398][00797] Stopping RolloutWorker_w2... +[2024-09-02 00:16:29,402][00797] Loop rollout_proc2_evt_loop terminating... +[2024-09-02 00:16:29,407][00808] Loop rollout_proc6_evt_loop terminating... +[2024-09-02 00:16:29,410][00795] Stopping RolloutWorker_w1... +[2024-09-02 00:16:29,410][00795] Loop rollout_proc1_evt_loop terminating... +[2024-09-02 00:16:29,482][00780] Stopping LearnerWorker_p0... +[2024-09-02 00:16:29,483][00780] Loop learner_proc0_evt_loop terminating... +[2024-09-02 00:21:05,147][11658] Environment doom_basic already registered, overwriting... +[2024-09-02 00:21:05,149][11658] Environment doom_two_colors_easy already registered, overwriting... +[2024-09-02 00:21:05,151][11658] Environment doom_two_colors_hard already registered, overwriting... +[2024-09-02 00:21:05,152][11658] Environment doom_dm already registered, overwriting... +[2024-09-02 00:21:05,153][11658] Environment doom_dwango5 already registered, overwriting... +[2024-09-02 00:21:05,154][11658] Environment doom_my_way_home_flat_actions already registered, overwriting... +[2024-09-02 00:21:05,154][11658] Environment doom_defend_the_center_flat_actions already registered, overwriting... +[2024-09-02 00:21:05,155][11658] Environment doom_my_way_home already registered, overwriting... +[2024-09-02 00:21:05,156][11658] Environment doom_deadly_corridor already registered, overwriting... +[2024-09-02 00:21:05,157][11658] Environment doom_defend_the_center already registered, overwriting... +[2024-09-02 00:21:05,158][11658] Environment doom_defend_the_line already registered, overwriting... +[2024-09-02 00:21:05,159][11658] Environment doom_health_gathering already registered, overwriting... +[2024-09-02 00:21:05,159][11658] Environment doom_health_gathering_supreme already registered, overwriting... +[2024-09-02 00:21:05,160][11658] Environment doom_battle already registered, overwriting... +[2024-09-02 00:21:05,167][11658] Environment doom_battle2 already registered, overwriting... +[2024-09-02 00:21:05,168][11658] Environment doom_duel_bots already registered, overwriting... +[2024-09-02 00:21:05,170][11658] Environment doom_deathmatch_bots already registered, overwriting... +[2024-09-02 00:21:05,171][11658] Environment doom_duel already registered, overwriting... +[2024-09-02 00:21:05,172][11658] Environment doom_deathmatch_full already registered, overwriting... +[2024-09-02 00:21:05,173][11658] Environment doom_benchmark already registered, overwriting... +[2024-09-02 00:21:05,174][11658] register_encoder_factory: +[2024-09-02 00:21:05,196][11658] Loading existing experiment configuration from /home/montana/repos/deep-rl/unit8-ppo/train_dir/default_experiment/config.json +[2024-09-02 00:21:05,202][11658] Experiment dir /home/montana/repos/deep-rl/unit8-ppo/train_dir/default_experiment already exists! +[2024-09-02 00:21:05,203][11658] Resuming existing experiment from /home/montana/repos/deep-rl/unit8-ppo/train_dir/default_experiment... +[2024-09-02 00:21:05,204][11658] Weights and Biases integration disabled +[2024-09-02 00:21:05,208][11658] Environment var CUDA_VISIBLE_DEVICES is 0 + +[2024-09-02 00:21:07,147][11658] Starting experiment with the following configuration: +help=False +algo=APPO +env=doom_health_gathering_supreme +experiment=default_experiment +train_dir=/home/montana/repos/deep-rl/unit8-ppo/train_dir +restart_behavior=resume +device=gpu +seed=None +num_policies=1 +async_rl=True +serial_mode=False +batched_sampling=False +num_batches_to_accumulate=2 +worker_num_splits=2 +policy_workers_per_policy=1 +max_policy_lag=1000 +num_workers=8 +num_envs_per_worker=4 +batch_size=1024 +num_batches_per_epoch=1 +num_epochs=1 +rollout=32 +recurrence=32 +shuffle_minibatches=False +gamma=0.99 +reward_scale=1.0 +reward_clip=1000.0 +value_bootstrap=False +normalize_returns=True +exploration_loss_coeff=0.001 +value_loss_coeff=0.5 +kl_loss_coeff=0.0 +exploration_loss=symmetric_kl +gae_lambda=0.95 +ppo_clip_ratio=0.1 +ppo_clip_value=0.2 +with_vtrace=False +vtrace_rho=1.0 +vtrace_c=1.0 +optimizer=adam +adam_eps=1e-06 +adam_beta1=0.9 +adam_beta2=0.999 +max_grad_norm=4.0 +learning_rate=0.0001 +lr_schedule=constant +lr_schedule_kl_threshold=0.008 +lr_adaptive_min=1e-06 +lr_adaptive_max=0.01 +obs_subtract_mean=0.0 +obs_scale=255.0 +normalize_input=True +normalize_input_keys=None +decorrelate_experience_max_seconds=0 +decorrelate_envs_on_one_worker=True +actor_worker_gpus=[] +set_workers_cpu_affinity=True +force_envs_single_thread=False +default_niceness=0 +log_to_file=True +experiment_summaries_interval=10 +flush_summaries_interval=30 +stats_avg=100 +summaries_use_frameskip=True +heartbeat_interval=20 +heartbeat_reporting_interval=600 +train_for_env_steps=4000000 +train_for_seconds=10000000000 +save_every_sec=120 +keep_checkpoints=2 +load_checkpoint_kind=latest +save_milestones_sec=-1 +save_best_every_sec=5 +save_best_metric=reward +save_best_after=100000 +benchmark=False +encoder_mlp_layers=[512, 512] +encoder_conv_architecture=convnet_simple +encoder_conv_mlp_layers=[512] +use_rnn=True +rnn_size=512 +rnn_type=gru +rnn_num_layers=1 +decoder_mlp_layers=[] +nonlinearity=elu +policy_initialization=orthogonal +policy_init_gain=1.0 +actor_critic_share_weights=True +adaptive_stddev=True +continuous_tanh_scale=0.0 +initial_stddev=1.0 +use_env_info_cache=False +env_gpu_actions=False +env_gpu_observations=True +env_frameskip=4 +env_framestack=1 +pixel_format=CHW +use_record_episode_statistics=False +with_wandb=False +wandb_user=None +wandb_project=sample_factory +wandb_group=None +wandb_job_type=SF +wandb_tags=[] +with_pbt=False +pbt_mix_policies_in_one_env=True +pbt_period_env_steps=5000000 +pbt_start_mutation=20000000 +pbt_replace_fraction=0.3 +pbt_mutation_rate=0.15 +pbt_replace_reward_gap=0.1 +pbt_replace_reward_gap_absolute=1e-06 +pbt_optimize_gamma=False +pbt_target_objective=true_objective +pbt_perturb_min=1.1 +pbt_perturb_max=1.5 +num_agents=-1 +num_humans=0 +num_bots=-1 +start_bot_difficulty=None +timelimit=None +res_w=128 +res_h=72 +wide_aspect_ratio=False +eval_env_frameskip=1 +fps=35 +command_line=--env=doom_health_gathering_supreme --num_workers=8 --num_envs_per_worker=4 --train_for_env_steps=4000000 +cli_args={'env': 'doom_health_gathering_supreme', 'num_workers': 8, 'num_envs_per_worker': 4, 'train_for_env_steps': 4000000} +git_hash=e923ca6811d177eb3a7a4b268a75d06335cade44 +git_repo_name=https://github.com/monti-python/deep-rl.git +[2024-09-02 00:21:07,149][11658] Saving configuration to /home/montana/repos/deep-rl/unit8-ppo/train_dir/default_experiment/config.json... +[2024-09-02 00:21:07,152][11658] Rollout worker 0 uses device cpu +[2024-09-02 00:21:07,153][11658] Rollout worker 1 uses device cpu +[2024-09-02 00:21:07,154][11658] Rollout worker 2 uses device cpu +[2024-09-02 00:21:07,155][11658] Rollout worker 3 uses device cpu +[2024-09-02 00:21:07,155][11658] Rollout worker 4 uses device cpu +[2024-09-02 00:21:07,156][11658] Rollout worker 5 uses device cpu +[2024-09-02 00:21:07,157][11658] Rollout worker 6 uses device cpu +[2024-09-02 00:21:07,158][11658] Rollout worker 7 uses device cpu +[2024-09-02 00:21:07,210][11658] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-09-02 00:21:07,212][11658] InferenceWorker_p0-w0: min num requests: 2 +[2024-09-02 00:21:07,262][11658] Starting all processes... +[2024-09-02 00:21:07,263][11658] Starting process learner_proc0 +[2024-09-02 00:21:07,312][11658] Starting all processes... +[2024-09-02 00:21:07,316][11658] Starting process inference_proc0-0 +[2024-09-02 00:21:07,317][11658] Starting process rollout_proc0 +[2024-09-02 00:21:07,317][11658] Starting process rollout_proc1 +[2024-09-02 00:21:07,318][11658] Starting process rollout_proc2 +[2024-09-02 00:21:07,318][11658] Starting process rollout_proc3 +[2024-09-02 00:21:07,319][11658] Starting process rollout_proc4 +[2024-09-02 00:21:07,321][11658] Starting process rollout_proc5 +[2024-09-02 00:21:07,325][11658] Starting process rollout_proc6 +[2024-09-02 00:21:07,326][11658] Starting process rollout_proc7 +[2024-09-02 00:21:10,400][02982] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-09-02 00:21:10,400][02982] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 +[2024-09-02 00:21:10,412][02995] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-09-02 00:21:10,412][02995] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 +[2024-09-02 00:21:10,442][02982] Num visible devices: 1 +[2024-09-02 00:21:10,454][02995] Num visible devices: 1 +[2024-09-02 00:21:10,531][02982] Starting seed is not provided +[2024-09-02 00:21:10,531][02982] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-09-02 00:21:10,531][02982] Initializing actor-critic model on device cuda:0 +[2024-09-02 00:21:10,531][02982] RunningMeanStd input shape: (3, 72, 128) +[2024-09-02 00:21:10,532][02982] RunningMeanStd input shape: (1,) +[2024-09-02 00:21:10,581][02982] ConvEncoder: input_channels=3 +[2024-09-02 00:21:10,594][02996] Worker 0 uses CPU cores [0] +[2024-09-02 00:21:10,651][03000] Worker 4 uses CPU cores [4] +[2024-09-02 00:21:10,683][02999] Worker 3 uses CPU cores [3] +[2024-09-02 00:21:10,759][03010] Worker 7 uses CPU cores [7] +[2024-09-02 00:21:10,829][03002] Worker 5 uses CPU cores [5] +[2024-09-02 00:21:10,835][02982] Conv encoder output size: 512 +[2024-09-02 00:21:10,836][02982] Policy head output size: 512 +[2024-09-02 00:21:10,856][02982] Created Actor Critic model with architecture: +[2024-09-02 00:21:10,857][02982] ActorCriticSharedWeights( + (obs_normalizer): ObservationNormalizer( + (running_mean_std): RunningMeanStdDictInPlace( + (running_mean_std): ModuleDict( + (obs): RunningMeanStdInPlace() + ) + ) + ) + (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) + (encoder): VizdoomEncoder( + (basic_encoder): ConvEncoder( + (enc): RecursiveScriptModule( + original_name=ConvEncoderImpl + (conv_head): RecursiveScriptModule( + original_name=Sequential + (0): RecursiveScriptModule(original_name=Conv2d) + (1): RecursiveScriptModule(original_name=ELU) + (2): RecursiveScriptModule(original_name=Conv2d) + (3): RecursiveScriptModule(original_name=ELU) + (4): RecursiveScriptModule(original_name=Conv2d) + (5): RecursiveScriptModule(original_name=ELU) + ) + (mlp_layers): RecursiveScriptModule( + original_name=Sequential + (0): RecursiveScriptModule(original_name=Linear) + (1): RecursiveScriptModule(original_name=ELU) + ) + ) + ) + ) + (core): ModelCoreRNN( + (core): GRU(512, 512) + ) + (decoder): MlpDecoder( + (mlp): Identity() + ) + (critic_linear): Linear(in_features=512, out_features=1, bias=True) + (action_parameterization): ActionParameterizationDefault( + (distribution_linear): Linear(in_features=512, out_features=5, bias=True) + ) +) +[2024-09-02 00:21:10,889][02998] Worker 2 uses CPU cores [2] +[2024-09-02 00:21:10,910][02997] Worker 1 uses CPU cores [1] +[2024-09-02 00:21:11,001][03001] Worker 6 uses CPU cores [6] +[2024-09-02 00:21:12,348][02982] Using optimizer +[2024-09-02 00:21:12,349][02982] Loading state from checkpoint /home/montana/repos/deep-rl/unit8-ppo/train_dir/default_experiment/checkpoint_p0/checkpoint_000000000_0.pth... +[2024-09-02 00:21:12,369][02982] Loading model from checkpoint +[2024-09-02 00:21:12,371][02982] Loaded experiment state at self.train_step=0, self.env_steps=0 +[2024-09-02 00:21:12,372][02982] Initialized policy 0 weights for model version 0 +[2024-09-02 00:21:12,378][02982] LearnerWorker_p0 finished initialization! +[2024-09-02 00:21:12,379][02982] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-09-02 00:21:13,106][02995] Unhandled exception CUDA error: unknown error +Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions. + in evt loop inference_proc0-0_evt_loop +[2024-09-02 00:21:15,209][11658] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-02 00:21:20,209][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-02 00:21:25,209][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-02 00:21:27,204][11658] Heartbeat connected on Batcher_0 +[2024-09-02 00:21:27,208][11658] Heartbeat connected on LearnerWorker_p0 +[2024-09-02 00:21:27,216][11658] Heartbeat connected on RolloutWorker_w0 +[2024-09-02 00:21:27,218][11658] Heartbeat connected on RolloutWorker_w1 +[2024-09-02 00:21:27,224][11658] Heartbeat connected on RolloutWorker_w3 +[2024-09-02 00:21:27,226][11658] Heartbeat connected on RolloutWorker_w4 +[2024-09-02 00:21:27,229][11658] Heartbeat connected on RolloutWorker_w5 +[2024-09-02 00:21:27,232][11658] Heartbeat connected on RolloutWorker_w6 +[2024-09-02 00:21:27,240][11658] Heartbeat connected on RolloutWorker_w2 +[2024-09-02 00:21:27,262][11658] Heartbeat connected on RolloutWorker_w7 +[2024-09-02 00:21:30,209][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-02 00:21:35,209][11658] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-02 00:21:39,062][11658] Keyboard interrupt detected in the event loop EvtLoop [Runner_EvtLoop, process=main process 11658], exiting... +[2024-09-02 00:21:39,064][02999] Stopping RolloutWorker_w3... +[2024-09-02 00:21:39,064][03001] Stopping RolloutWorker_w6... +[2024-09-02 00:21:39,064][03002] Stopping RolloutWorker_w5... +[2024-09-02 00:21:39,064][02999] Loop rollout_proc3_evt_loop terminating... +[2024-09-02 00:21:39,064][03001] Loop rollout_proc6_evt_loop terminating... +[2024-09-02 00:21:39,064][03002] Loop rollout_proc5_evt_loop terminating... +[2024-09-02 00:21:39,064][03010] Stopping RolloutWorker_w7... +[2024-09-02 00:21:39,064][03000] Stopping RolloutWorker_w4... +[2024-09-02 00:21:39,065][03010] Loop rollout_proc7_evt_loop terminating... +[2024-09-02 00:21:39,065][02997] Stopping RolloutWorker_w1... +[2024-09-02 00:21:39,065][02982] Stopping Batcher_0... +[2024-09-02 00:21:39,065][02997] Loop rollout_proc1_evt_loop terminating... +[2024-09-02 00:21:39,065][03000] Loop rollout_proc4_evt_loop terminating... +[2024-09-02 00:21:39,065][02982] Loop batcher_evt_loop terminating... +[2024-09-02 00:21:39,067][02982] Saving /home/montana/repos/deep-rl/unit8-ppo/train_dir/default_experiment/checkpoint_p0/checkpoint_000000000_0.pth... +[2024-09-02 00:21:39,064][11658] Runner profile tree view: +main_loop: 31.8019 +[2024-09-02 00:21:39,075][11658] Collected {0: 0}, FPS: 0.0 +[2024-09-02 00:21:39,079][02998] Stopping RolloutWorker_w2... +[2024-09-02 00:21:39,080][02998] Loop rollout_proc2_evt_loop terminating... +[2024-09-02 00:21:39,089][02996] Stopping RolloutWorker_w0... +[2024-09-02 00:21:39,110][02996] Loop rollout_proc0_evt_loop terminating... +[2024-09-02 00:21:39,177][02982] Stopping LearnerWorker_p0... +[2024-09-02 00:21:39,178][02982] Loop learner_proc0_evt_loop terminating... +[2024-09-02 00:22:45,206][11658] Environment doom_basic already registered, overwriting... +[2024-09-02 00:22:45,208][11658] Environment doom_two_colors_easy already registered, overwriting... +[2024-09-02 00:22:45,209][11658] Environment doom_two_colors_hard already registered, overwriting... +[2024-09-02 00:22:45,210][11658] Environment doom_dm already registered, overwriting... +[2024-09-02 00:22:45,211][11658] Environment doom_dwango5 already registered, overwriting... +[2024-09-02 00:22:45,213][11658] Environment doom_my_way_home_flat_actions already registered, overwriting... +[2024-09-02 00:22:45,213][11658] Environment doom_defend_the_center_flat_actions already registered, overwriting... +[2024-09-02 00:22:45,214][11658] Environment doom_my_way_home already registered, overwriting... +[2024-09-02 00:22:45,216][11658] Environment doom_deadly_corridor already registered, overwriting... +[2024-09-02 00:22:45,217][11658] Environment doom_defend_the_center already registered, overwriting... +[2024-09-02 00:22:45,218][11658] Environment doom_defend_the_line already registered, overwriting... +[2024-09-02 00:22:45,219][11658] Environment doom_health_gathering already registered, overwriting... +[2024-09-02 00:22:45,222][11658] Environment doom_health_gathering_supreme already registered, overwriting... +[2024-09-02 00:22:45,223][11658] Environment doom_battle already registered, overwriting... +[2024-09-02 00:22:45,223][11658] Environment doom_battle2 already registered, overwriting... +[2024-09-02 00:22:45,226][11658] Environment doom_duel_bots already registered, overwriting... +[2024-09-02 00:22:45,227][11658] Environment doom_deathmatch_bots already registered, overwriting... +[2024-09-02 00:22:45,228][11658] Environment doom_duel already registered, overwriting... +[2024-09-02 00:22:45,229][11658] Environment doom_deathmatch_full already registered, overwriting... +[2024-09-02 00:22:45,230][11658] Environment doom_benchmark already registered, overwriting... +[2024-09-02 00:22:45,231][11658] register_encoder_factory: +[2024-09-02 00:22:45,247][11658] Loading existing experiment configuration from /home/montana/repos/deep-rl/unit8-ppo/train_dir/default_experiment/config.json +[2024-09-02 00:22:45,253][11658] Experiment dir /home/montana/repos/deep-rl/unit8-ppo/train_dir/default_experiment already exists! +[2024-09-02 00:22:45,254][11658] Resuming existing experiment from /home/montana/repos/deep-rl/unit8-ppo/train_dir/default_experiment... +[2024-09-02 00:22:45,255][11658] Weights and Biases integration disabled +[2024-09-02 00:22:45,257][11658] Environment var CUDA_VISIBLE_DEVICES is 1 +[2024-09-02 00:22:46,924][11658] Starting experiment with the following configuration: +help=False +algo=APPO +env=doom_health_gathering_supreme +experiment=default_experiment +train_dir=/home/montana/repos/deep-rl/unit8-ppo/train_dir +restart_behavior=resume +device=gpu +seed=None +num_policies=1 +async_rl=True +serial_mode=False +batched_sampling=False +num_batches_to_accumulate=2 +worker_num_splits=2 +policy_workers_per_policy=1 +max_policy_lag=1000 +num_workers=8 +num_envs_per_worker=4 +batch_size=1024 +num_batches_per_epoch=1 +num_epochs=1 +rollout=32 +recurrence=32 +shuffle_minibatches=False +gamma=0.99 +reward_scale=1.0 +reward_clip=1000.0 +value_bootstrap=False +normalize_returns=True +exploration_loss_coeff=0.001 +value_loss_coeff=0.5 +kl_loss_coeff=0.0 +exploration_loss=symmetric_kl +gae_lambda=0.95 +ppo_clip_ratio=0.1 +ppo_clip_value=0.2 +with_vtrace=False +vtrace_rho=1.0 +vtrace_c=1.0 +optimizer=adam +adam_eps=1e-06 +adam_beta1=0.9 +adam_beta2=0.999 +max_grad_norm=4.0 +learning_rate=0.0001 +lr_schedule=constant +lr_schedule_kl_threshold=0.008 +lr_adaptive_min=1e-06 +lr_adaptive_max=0.01 +obs_subtract_mean=0.0 +obs_scale=255.0 +normalize_input=True +normalize_input_keys=None +decorrelate_experience_max_seconds=0 +decorrelate_envs_on_one_worker=True +actor_worker_gpus=[] +set_workers_cpu_affinity=True +force_envs_single_thread=False +default_niceness=0 +log_to_file=True +experiment_summaries_interval=10 +flush_summaries_interval=30 +stats_avg=100 +summaries_use_frameskip=True +heartbeat_interval=20 +heartbeat_reporting_interval=600 +train_for_env_steps=4000000 +train_for_seconds=10000000000 +save_every_sec=120 +keep_checkpoints=2 +load_checkpoint_kind=latest +save_milestones_sec=-1 +save_best_every_sec=5 +save_best_metric=reward +save_best_after=100000 +benchmark=False +encoder_mlp_layers=[512, 512] +encoder_conv_architecture=convnet_simple +encoder_conv_mlp_layers=[512] +use_rnn=True +rnn_size=512 +rnn_type=gru +rnn_num_layers=1 +decoder_mlp_layers=[] +nonlinearity=elu +policy_initialization=orthogonal +policy_init_gain=1.0 +actor_critic_share_weights=True +adaptive_stddev=True +continuous_tanh_scale=0.0 +initial_stddev=1.0 +use_env_info_cache=False +env_gpu_actions=False +env_gpu_observations=True +env_frameskip=4 +env_framestack=1 +pixel_format=CHW +use_record_episode_statistics=False +with_wandb=False +wandb_user=None +wandb_project=sample_factory +wandb_group=None +wandb_job_type=SF +wandb_tags=[] +with_pbt=False +pbt_mix_policies_in_one_env=True +pbt_period_env_steps=5000000 +pbt_start_mutation=20000000 +pbt_replace_fraction=0.3 +pbt_mutation_rate=0.15 +pbt_replace_reward_gap=0.1 +pbt_replace_reward_gap_absolute=1e-06 +pbt_optimize_gamma=False +pbt_target_objective=true_objective +pbt_perturb_min=1.1 +pbt_perturb_max=1.5 +num_agents=-1 +num_humans=0 +num_bots=-1 +start_bot_difficulty=None +timelimit=None +res_w=128 +res_h=72 +wide_aspect_ratio=False +eval_env_frameskip=1 +fps=35 +command_line=--env=doom_health_gathering_supreme --num_workers=8 --num_envs_per_worker=4 --train_for_env_steps=4000000 +cli_args={'env': 'doom_health_gathering_supreme', 'num_workers': 8, 'num_envs_per_worker': 4, 'train_for_env_steps': 4000000} +git_hash=e923ca6811d177eb3a7a4b268a75d06335cade44 +git_repo_name=https://github.com/monti-python/deep-rl.git +[2024-09-02 00:22:46,925][11658] Saving configuration to /home/montana/repos/deep-rl/unit8-ppo/train_dir/default_experiment/config.json... +[2024-09-02 00:22:46,929][11658] Rollout worker 0 uses device cpu +[2024-09-02 00:22:46,930][11658] Rollout worker 1 uses device cpu +[2024-09-02 00:22:46,930][11658] Rollout worker 2 uses device cpu +[2024-09-02 00:22:46,931][11658] Rollout worker 3 uses device cpu +[2024-09-02 00:22:46,932][11658] Rollout worker 4 uses device cpu +[2024-09-02 00:22:46,932][11658] Rollout worker 5 uses device cpu +[2024-09-02 00:22:46,933][11658] Rollout worker 6 uses device cpu +[2024-09-02 00:22:46,934][11658] Rollout worker 7 uses device cpu +[2024-09-02 00:22:46,977][11658] Using GPUs [0] for process 0 (actually maps to GPUs [1]) +[2024-09-02 00:22:46,978][11658] InferenceWorker_p0-w0: min num requests: 2 +[2024-09-02 00:22:47,002][11658] Starting all processes... +[2024-09-02 00:22:47,003][11658] Starting process learner_proc0 +[2024-09-02 00:22:47,052][11658] Starting all processes... +[2024-09-02 00:22:47,058][11658] Starting process inference_proc0-0 +[2024-09-02 00:22:47,059][11658] Starting process rollout_proc0 +[2024-09-02 00:22:47,059][11658] Starting process rollout_proc1 +[2024-09-02 00:22:47,060][11658] Starting process rollout_proc2 +[2024-09-02 00:22:47,060][11658] Starting process rollout_proc3 +[2024-09-02 00:22:47,061][11658] Starting process rollout_proc4 +[2024-09-02 00:22:47,064][11658] Starting process rollout_proc5 +[2024-09-02 00:22:47,064][11658] Starting process rollout_proc6 +[2024-09-02 00:22:47,067][11658] Starting process rollout_proc7 +[2024-09-02 00:22:49,676][03756] Worker 3 uses CPU cores [3] +[2024-09-02 00:22:49,779][03754] Worker 1 uses CPU cores [1] +[2024-09-02 00:22:49,779][03755] Worker 2 uses CPU cores [2] +[2024-09-02 00:22:50,035][03752] Using GPUs [0] for process 0 (actually maps to GPUs [1]) +[2024-09-02 00:22:50,035][03752] Set environment var CUDA_VISIBLE_DEVICES to '1' (GPU indices [0]) for inference process 0 +[2024-09-02 00:22:50,058][03752] Num visible devices: 0 +[2024-09-02 00:22:50,062][03757] Worker 4 uses CPU cores [4] +[2024-09-02 00:22:50,102][03739] Using GPUs [0] for process 0 (actually maps to GPUs [1]) +[2024-09-02 00:22:50,103][03739] Set environment var CUDA_VISIBLE_DEVICES to '1' (GPU indices [0]) for learning process 0 +[2024-09-02 00:22:50,119][03766] Worker 5 uses CPU cores [5] +[2024-09-02 00:22:50,124][03739] Num visible devices: 0 +[2024-09-02 00:22:50,160][03739] Starting seed is not provided +[2024-09-02 00:22:50,161][03739] Using GPUs [0] for process 0 (actually maps to GPUs [1]) +[2024-09-02 00:22:50,161][03739] Initializing actor-critic model on device cuda:0 +[2024-09-02 00:22:50,161][03739] RunningMeanStd input shape: (3, 72, 128) +[2024-09-02 00:22:50,163][03739] RunningMeanStd input shape: (1,) +[2024-09-02 00:22:50,179][03739] ConvEncoder: input_channels=3 +[2024-09-02 00:22:50,233][03758] Worker 6 uses CPU cores [6] +[2024-09-02 00:22:50,247][03767] Worker 7 uses CPU cores [7] +[2024-09-02 00:22:50,247][03753] Worker 0 uses CPU cores [0] +[2024-09-02 00:22:50,335][03739] Conv encoder output size: 512 +[2024-09-02 00:22:50,336][03739] Policy head output size: 512 +[2024-09-02 00:22:50,348][03739] Created Actor Critic model with architecture: +[2024-09-02 00:22:50,349][03739] ActorCriticSharedWeights( + (obs_normalizer): ObservationNormalizer( + (running_mean_std): RunningMeanStdDictInPlace( + (running_mean_std): ModuleDict( + (obs): RunningMeanStdInPlace() + ) + ) + ) + (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) + (encoder): VizdoomEncoder( + (basic_encoder): ConvEncoder( + (enc): RecursiveScriptModule( + original_name=ConvEncoderImpl + (conv_head): RecursiveScriptModule( + original_name=Sequential + (0): RecursiveScriptModule(original_name=Conv2d) + (1): RecursiveScriptModule(original_name=ELU) + (2): RecursiveScriptModule(original_name=Conv2d) + (3): RecursiveScriptModule(original_name=ELU) + (4): RecursiveScriptModule(original_name=Conv2d) + (5): RecursiveScriptModule(original_name=ELU) + ) + (mlp_layers): RecursiveScriptModule( + original_name=Sequential + (0): RecursiveScriptModule(original_name=Linear) + (1): RecursiveScriptModule(original_name=ELU) + ) + ) + ) + ) + (core): ModelCoreRNN( + (core): GRU(512, 512) + ) + (decoder): MlpDecoder( + (mlp): Identity() + ) + (critic_linear): Linear(in_features=512, out_features=1, bias=True) + (action_parameterization): ActionParameterizationDefault( + (distribution_linear): Linear(in_features=512, out_features=5, bias=True) + ) +) +[2024-09-02 00:22:50,357][03739] EvtLoop [learner_proc0_evt_loop, process=learner_proc0] unhandled exception in slot='init' connected to emitter=Emitter(object_id='Runner_EvtLoop', signal_name='start'), args=() +Traceback (most recent call last): + File "/home/montana/.local/lib/python3.10/site-packages/signal_slot/signal_slot.py", line 355, in _process_signal + slot_callable(*args) + File "/home/montana/.local/lib/python3.10/site-packages/sample_factory/algo/learning/learner_worker.py", line 139, in init + init_model_data = self.learner.init() + File "/home/montana/.local/lib/python3.10/site-packages/sample_factory/algo/learning/learner.py", line 215, in init + self.actor_critic.model_to_device(self.device) + File "/home/montana/.local/lib/python3.10/site-packages/sample_factory/model/actor_critic.py", line 60, in model_to_device + module.to(device) + File "/home/montana/miniconda3/envs/deep-rl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1145, in to + return self._apply(convert) + File "/home/montana/miniconda3/envs/deep-rl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 797, in _apply + module._apply(fn) + File "/home/montana/miniconda3/envs/deep-rl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 797, in _apply + module._apply(fn) + File "/home/montana/miniconda3/envs/deep-rl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 797, in _apply + module._apply(fn) + File "/home/montana/miniconda3/envs/deep-rl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 844, in _apply + self._buffers[key] = fn(buf) + File "/home/montana/miniconda3/envs/deep-rl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1143, in convert + return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking) + File "/home/montana/miniconda3/envs/deep-rl/lib/python3.10/site-packages/torch/cuda/__init__.py", line 247, in _lazy_init + torch._C._cuda_init() +RuntimeError: No CUDA GPUs are available +[2024-09-02 00:22:50,360][03739] Unhandled exception No CUDA GPUs are available in evt loop learner_proc0_evt_loop +[2024-09-02 00:23:06,971][11658] Heartbeat connected on Batcher_0 +[2024-09-02 00:23:06,978][11658] Heartbeat connected on InferenceWorker_p0-w0 +[2024-09-02 00:23:06,983][11658] Heartbeat connected on RolloutWorker_w0 +[2024-09-02 00:23:06,985][11658] Heartbeat connected on RolloutWorker_w1 +[2024-09-02 00:23:06,987][11658] Heartbeat connected on RolloutWorker_w2 +[2024-09-02 00:23:06,989][11658] Heartbeat connected on RolloutWorker_w3 +[2024-09-02 00:23:06,992][11658] Heartbeat connected on RolloutWorker_w4 +[2024-09-02 00:23:06,995][11658] Heartbeat connected on RolloutWorker_w5 +[2024-09-02 00:23:07,000][11658] Heartbeat connected on RolloutWorker_w6 +[2024-09-02 00:23:07,030][11658] Heartbeat connected on RolloutWorker_w7 +[2024-09-02 00:23:09,252][11658] Keyboard interrupt detected in the event loop EvtLoop [Runner_EvtLoop, process=main process 11658], exiting... +[2024-09-02 00:23:09,254][03754] Stopping RolloutWorker_w1... +[2024-09-02 00:23:09,254][03757] Stopping RolloutWorker_w4... +[2024-09-02 00:23:09,254][03752] Stopping InferenceWorker_p0-w0... +[2024-09-02 00:23:09,254][03754] Loop rollout_proc1_evt_loop terminating... +[2024-09-02 00:23:09,254][03739] Stopping Batcher_0... +[2024-09-02 00:23:09,254][03752] Loop inference_proc0-0_evt_loop terminating... +[2024-09-02 00:23:09,254][03757] Loop rollout_proc4_evt_loop terminating... +[2024-09-02 00:23:09,254][03753] Stopping RolloutWorker_w0... +[2024-09-02 00:23:09,254][03739] Loop batcher_evt_loop terminating... +[2024-09-02 00:23:09,254][03756] Stopping RolloutWorker_w3... +[2024-09-02 00:23:09,253][11658] Runner profile tree view: +main_loop: 22.2519 +[2024-09-02 00:23:09,254][03767] Stopping RolloutWorker_w7... +[2024-09-02 00:23:09,255][03767] Loop rollout_proc7_evt_loop terminating... +[2024-09-02 00:23:09,255][03753] Loop rollout_proc0_evt_loop terminating... +[2024-09-02 00:23:09,255][03758] Stopping RolloutWorker_w6... +[2024-09-02 00:23:09,255][03758] Loop rollout_proc6_evt_loop terminating... +[2024-09-02 00:23:09,254][11658] Collected {}, FPS: 0.0 +[2024-09-02 00:23:09,255][03756] Loop rollout_proc3_evt_loop terminating... +[2024-09-02 00:23:09,265][03766] Stopping RolloutWorker_w5... +[2024-09-02 00:23:09,260][03755] Stopping RolloutWorker_w2... +[2024-09-02 00:23:09,265][03766] Loop rollout_proc5_evt_loop terminating... +[2024-09-02 00:23:09,266][03755] Loop rollout_proc2_evt_loop terminating... +[2024-09-02 00:44:05,383][13975] Saving configuration to /home/montana/repos/deep-rl/unit8-ppo/train_dir/default_experiment/config.json... +[2024-09-02 00:44:05,390][13975] Rollout worker 0 uses device cpu +[2024-09-02 00:44:05,391][13975] Rollout worker 1 uses device cpu +[2024-09-02 00:44:05,392][13975] Rollout worker 2 uses device cpu +[2024-09-02 00:44:05,393][13975] Rollout worker 3 uses device cpu +[2024-09-02 00:44:05,394][13975] Rollout worker 4 uses device cpu +[2024-09-02 00:44:05,395][13975] Rollout worker 5 uses device cpu +[2024-09-02 00:44:05,395][13975] Rollout worker 6 uses device cpu +[2024-09-02 00:44:05,396][13975] Rollout worker 7 uses device cpu +[2024-09-02 00:44:05,467][13975] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-09-02 00:44:05,468][13975] InferenceWorker_p0-w0: min num requests: 2 +[2024-09-02 00:44:05,491][13975] Starting all processes... +[2024-09-02 00:44:05,492][13975] Starting process learner_proc0 +[2024-09-02 00:44:05,601][13975] Starting all processes... +[2024-09-02 00:44:05,610][13975] Starting process inference_proc0-0 +[2024-09-02 00:44:05,610][13975] Starting process rollout_proc0 +[2024-09-02 00:44:05,611][13975] Starting process rollout_proc1 +[2024-09-02 00:44:05,612][13975] Starting process rollout_proc2 +[2024-09-02 00:44:05,613][13975] Starting process rollout_proc3 +[2024-09-02 00:44:05,614][13975] Starting process rollout_proc4 +[2024-09-02 00:44:05,614][13975] Starting process rollout_proc5 +[2024-09-02 00:44:05,615][13975] Starting process rollout_proc6 +[2024-09-02 00:44:05,616][13975] Starting process rollout_proc7 +[2024-09-02 00:44:16,282][14226] Worker 2 uses CPU cores [2] +[2024-09-02 00:44:16,391][14237] Worker 5 uses CPU cores [5] +[2024-09-02 00:44:16,400][14210] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-09-02 00:44:16,401][14210] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 +[2024-09-02 00:44:16,412][14225] Worker 1 uses CPU cores [1] +[2024-09-02 00:44:16,534][14227] Worker 3 uses CPU cores [3] +[2024-09-02 00:44:16,599][14228] Worker 4 uses CPU cores [4] +[2024-09-02 00:44:16,638][14238] Worker 7 uses CPU cores [7] +[2024-09-02 00:44:16,719][14210] Num visible devices: 1 +[2024-09-02 00:44:16,759][14224] Worker 0 uses CPU cores [0] +[2024-09-02 00:44:16,771][14210] Starting seed is not provided +[2024-09-02 00:44:16,771][14210] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-09-02 00:44:16,771][14210] Initializing actor-critic model on device cuda:0 +[2024-09-02 00:44:16,772][14210] RunningMeanStd input shape: (3, 72, 128) +[2024-09-02 00:44:16,774][14210] RunningMeanStd input shape: (1,) +[2024-09-02 00:44:16,789][14229] Worker 6 uses CPU cores [6] +[2024-09-02 00:44:16,801][14210] ConvEncoder: input_channels=3 +[2024-09-02 00:44:16,828][14223] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-09-02 00:44:16,828][14223] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 +[2024-09-02 00:44:16,907][14223] Num visible devices: 1 +[2024-09-02 00:44:17,064][14210] Conv encoder output size: 512 +[2024-09-02 00:44:17,065][14210] Policy head output size: 512 +[2024-09-02 00:44:17,097][14210] Created Actor Critic model with architecture: +[2024-09-02 00:44:17,097][14210] ActorCriticSharedWeights( + (obs_normalizer): ObservationNormalizer( + (running_mean_std): RunningMeanStdDictInPlace( + (running_mean_std): ModuleDict( + (obs): RunningMeanStdInPlace() + ) + ) + ) + (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) + (encoder): VizdoomEncoder( + (basic_encoder): ConvEncoder( + (enc): RecursiveScriptModule( + original_name=ConvEncoderImpl + (conv_head): RecursiveScriptModule( + original_name=Sequential + (0): RecursiveScriptModule(original_name=Conv2d) + (1): RecursiveScriptModule(original_name=ELU) + (2): RecursiveScriptModule(original_name=Conv2d) + (3): RecursiveScriptModule(original_name=ELU) + (4): RecursiveScriptModule(original_name=Conv2d) + (5): RecursiveScriptModule(original_name=ELU) + ) + (mlp_layers): RecursiveScriptModule( + original_name=Sequential + (0): RecursiveScriptModule(original_name=Linear) + (1): RecursiveScriptModule(original_name=ELU) + ) + ) + ) + ) + (core): ModelCoreRNN( + (core): GRU(512, 512) + ) + (decoder): MlpDecoder( + (mlp): Identity() + ) + (critic_linear): Linear(in_features=512, out_features=1, bias=True) + (action_parameterization): ActionParameterizationDefault( + (distribution_linear): Linear(in_features=512, out_features=5, bias=True) + ) +) +[2024-09-02 00:44:25,462][13975] Heartbeat connected on Batcher_0 +[2024-09-02 00:44:25,468][13975] Heartbeat connected on InferenceWorker_p0-w0 +[2024-09-02 00:44:25,473][13975] Heartbeat connected on RolloutWorker_w0 +[2024-09-02 00:44:25,476][13975] Heartbeat connected on RolloutWorker_w1 +[2024-09-02 00:44:25,478][13975] Heartbeat connected on RolloutWorker_w2 +[2024-09-02 00:44:25,485][13975] Heartbeat connected on RolloutWorker_w5 +[2024-09-02 00:44:25,488][13975] Heartbeat connected on RolloutWorker_w6 +[2024-09-02 00:44:25,491][13975] Heartbeat connected on RolloutWorker_w7 +[2024-09-02 00:44:25,500][13975] Heartbeat connected on RolloutWorker_w4 +[2024-09-02 00:44:25,519][13975] Heartbeat connected on RolloutWorker_w3 +[2024-09-02 00:44:40,951][14210] Using optimizer +[2024-09-02 00:44:40,953][14210] Loading state from checkpoint /home/montana/repos/deep-rl/unit8-ppo/train_dir/default_experiment/checkpoint_p0/checkpoint_000000000_0.pth... +[2024-09-02 00:44:41,217][14210] Loading model from checkpoint +[2024-09-02 00:44:41,218][14210] Loaded experiment state at self.train_step=0, self.env_steps=0 +[2024-09-02 00:44:41,224][14210] Initialized policy 0 weights for model version 0 +[2024-09-02 00:44:41,233][14210] LearnerWorker_p0 finished initialization! +[2024-09-02 00:44:41,234][14210] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-09-02 00:44:41,235][13975] Heartbeat connected on LearnerWorker_p0 +[2024-09-02 00:44:42,424][13975] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-02 00:44:47,424][13975] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-02 00:44:50,417][14223] Unhandled exception CUDA error: unknown error +CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. +For debugging consider passing CUDA_LAUNCH_BLOCKING=1. +Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions. + in evt loop inference_proc0-0_evt_loop +[2024-09-02 00:44:52,424][13975] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-02 00:44:55,291][13975] Keyboard interrupt detected in the event loop EvtLoop [Runner_EvtLoop, process=main process 13975], exiting... +[2024-09-02 00:44:55,293][14226] Stopping RolloutWorker_w2... +[2024-09-02 00:44:55,293][14228] Stopping RolloutWorker_w4... +[2024-09-02 00:44:55,293][14228] Loop rollout_proc4_evt_loop terminating... +[2024-09-02 00:44:55,293][14238] Stopping RolloutWorker_w7... +[2024-09-02 00:44:55,293][14226] Loop rollout_proc2_evt_loop terminating... +[2024-09-02 00:44:55,293][14210] Stopping Batcher_0... +[2024-09-02 00:44:55,293][14229] Stopping RolloutWorker_w6... +[2024-09-02 00:44:55,293][14237] Stopping RolloutWorker_w5... +[2024-09-02 00:44:55,293][13975] Runner profile tree view: +main_loop: 49.8020 +[2024-09-02 00:44:55,294][14229] Loop rollout_proc6_evt_loop terminating... +[2024-09-02 00:44:55,294][14210] Loop batcher_evt_loop terminating... +[2024-09-02 00:44:55,294][14238] Loop rollout_proc7_evt_loop terminating... +[2024-09-02 00:44:55,294][14224] Stopping RolloutWorker_w0... +[2024-09-02 00:44:55,294][14225] Stopping RolloutWorker_w1... +[2024-09-02 00:44:55,294][13975] Collected {0: 0}, FPS: 0.0 +[2024-09-02 00:44:55,295][14225] Loop rollout_proc1_evt_loop terminating... +[2024-09-02 00:44:55,294][14237] Loop rollout_proc5_evt_loop terminating... +[2024-09-02 00:44:55,295][14210] Saving /home/montana/repos/deep-rl/unit8-ppo/train_dir/default_experiment/checkpoint_p0/checkpoint_000000000_0.pth... +[2024-09-02 00:44:55,295][14224] Loop rollout_proc0_evt_loop terminating... +[2024-09-02 00:44:55,299][14227] Stopping RolloutWorker_w3... +[2024-09-02 00:44:55,300][14227] Loop rollout_proc3_evt_loop terminating... +[2024-09-02 00:44:55,410][14210] Stopping LearnerWorker_p0... +[2024-09-02 00:44:55,411][14210] Loop learner_proc0_evt_loop terminating... +[2024-09-02 00:48:10,686][15596] Saving configuration to /home/montana/repos/deep-rl/unit8-ppo/train_dir/default_experiment/config.json... +[2024-09-02 00:48:10,692][15596] Rollout worker 0 uses device cpu +[2024-09-02 00:48:10,693][15596] Rollout worker 1 uses device cpu +[2024-09-02 00:48:10,694][15596] Rollout worker 2 uses device cpu +[2024-09-02 00:48:10,695][15596] Rollout worker 3 uses device cpu +[2024-09-02 00:48:10,696][15596] Rollout worker 4 uses device cpu +[2024-09-02 00:48:10,697][15596] Rollout worker 5 uses device cpu +[2024-09-02 00:48:10,697][15596] Rollout worker 6 uses device cpu +[2024-09-02 00:48:10,698][15596] Rollout worker 7 uses device cpu +[2024-09-02 00:48:10,741][15596] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-09-02 00:48:10,742][15596] InferenceWorker_p0-w0: min num requests: 2 +[2024-09-02 00:48:10,764][15596] Starting all processes... +[2024-09-02 00:48:10,765][15596] Starting process learner_proc0 +[2024-09-02 00:48:10,837][15596] Starting all processes... +[2024-09-02 00:48:10,844][15596] Starting process inference_proc0-0 +[2024-09-02 00:48:10,846][15596] Starting process rollout_proc0 +[2024-09-02 00:48:10,846][15596] Starting process rollout_proc1 +[2024-09-02 00:48:10,847][15596] Starting process rollout_proc2 +[2024-09-02 00:48:10,848][15596] Starting process rollout_proc3 +[2024-09-02 00:48:10,849][15596] Starting process rollout_proc4 +[2024-09-02 00:48:10,849][15596] Starting process rollout_proc5 +[2024-09-02 00:48:10,849][15596] Starting process rollout_proc6 +[2024-09-02 00:48:10,849][15596] Starting process rollout_proc7 +[2024-09-02 00:48:13,462][15849] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-09-02 00:48:13,462][15849] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 +[2024-09-02 00:48:13,539][15849] Num visible devices: 1 +[2024-09-02 00:48:13,613][15849] Starting seed is not provided +[2024-09-02 00:48:13,614][15849] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-09-02 00:48:13,614][15849] Initializing actor-critic model on device cuda:0 +[2024-09-02 00:48:13,614][15849] RunningMeanStd input shape: (3, 72, 128) +[2024-09-02 00:48:13,615][15849] RunningMeanStd input shape: (1,) +[2024-09-02 00:48:13,619][15863] Worker 0 uses CPU cores [0] +[2024-09-02 00:48:13,652][15849] ConvEncoder: input_channels=3 +[2024-09-02 00:48:13,709][15866] Worker 3 uses CPU cores [3] +[2024-09-02 00:48:13,897][15849] Conv encoder output size: 512 +[2024-09-02 00:48:13,898][15849] Policy head output size: 512 +[2024-09-02 00:48:13,919][15849] Created Actor Critic model with architecture: +[2024-09-02 00:48:13,919][15849] ActorCriticSharedWeights( + (obs_normalizer): ObservationNormalizer( + (running_mean_std): RunningMeanStdDictInPlace( + (running_mean_std): ModuleDict( + (obs): RunningMeanStdInPlace() + ) + ) + ) + (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) + (encoder): VizdoomEncoder( + (basic_encoder): ConvEncoder( + (enc): RecursiveScriptModule( + original_name=ConvEncoderImpl + (conv_head): RecursiveScriptModule( + original_name=Sequential + (0): RecursiveScriptModule(original_name=Conv2d) + (1): RecursiveScriptModule(original_name=ELU) + (2): RecursiveScriptModule(original_name=Conv2d) + (3): RecursiveScriptModule(original_name=ELU) + (4): RecursiveScriptModule(original_name=Conv2d) + (5): RecursiveScriptModule(original_name=ELU) + ) + (mlp_layers): RecursiveScriptModule( + original_name=Sequential + (0): RecursiveScriptModule(original_name=Linear) + (1): RecursiveScriptModule(original_name=ELU) + ) + ) + ) + ) + (core): ModelCoreRNN( + (core): GRU(512, 512) + ) + (decoder): MlpDecoder( + (mlp): Identity() + ) + (critic_linear): Linear(in_features=512, out_features=1, bias=True) + (action_parameterization): ActionParameterizationDefault( + (distribution_linear): Linear(in_features=512, out_features=5, bias=True) + ) +) +[2024-09-02 00:48:14,019][15864] Worker 1 uses CPU cores [1] +[2024-09-02 00:48:14,071][15869] Worker 6 uses CPU cores [6] +[2024-09-02 00:48:14,139][15865] Worker 2 uses CPU cores [2] +[2024-09-02 00:48:14,149][15868] Worker 5 uses CPU cores [5] +[2024-09-02 00:48:14,164][15867] Worker 4 uses CPU cores [4] +[2024-09-02 00:48:14,229][15870] Worker 7 uses CPU cores [7] +[2024-09-02 00:48:14,291][15862] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-09-02 00:48:14,291][15862] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 +[2024-09-02 00:48:14,310][15862] Num visible devices: 1 +[2024-09-02 00:48:17,232][15849] Using optimizer +[2024-09-02 00:48:17,233][15849] Loading state from checkpoint /home/montana/repos/deep-rl/unit8-ppo/train_dir/default_experiment/checkpoint_p0/checkpoint_000000000_0.pth... +[2024-09-02 00:48:17,259][15849] Loading model from checkpoint +[2024-09-02 00:48:17,261][15849] Loaded experiment state at self.train_step=0, self.env_steps=0 +[2024-09-02 00:48:17,261][15849] Initialized policy 0 weights for model version 0 +[2024-09-02 00:48:17,267][15849] LearnerWorker_p0 finished initialization! +[2024-09-02 00:48:17,267][15849] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-09-02 00:48:17,296][15596] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-02 00:48:20,485][15862] Unhandled exception CUDA error: unknown error +CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. +For debugging consider passing CUDA_LAUNCH_BLOCKING=1. +Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions. + in evt loop inference_proc0-0_evt_loop +[2024-09-02 00:48:22,296][15596] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-02 00:48:27,295][15596] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-02 00:48:30,736][15596] Heartbeat connected on Batcher_0 +[2024-09-02 00:48:30,738][15596] Heartbeat connected on LearnerWorker_p0 +[2024-09-02 00:48:30,747][15596] Heartbeat connected on RolloutWorker_w0 +[2024-09-02 00:48:30,750][15596] Heartbeat connected on RolloutWorker_w1 +[2024-09-02 00:48:30,754][15596] Heartbeat connected on RolloutWorker_w3 +[2024-09-02 00:48:30,756][15596] Heartbeat connected on RolloutWorker_w4 +[2024-09-02 00:48:30,759][15596] Heartbeat connected on RolloutWorker_w5 +[2024-09-02 00:48:30,760][15596] Heartbeat connected on RolloutWorker_w2 +[2024-09-02 00:48:30,762][15596] Heartbeat connected on RolloutWorker_w6 +[2024-09-02 00:48:30,764][15596] Heartbeat connected on RolloutWorker_w7 +[2024-09-02 00:48:32,295][15596] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-02 00:48:34,014][15596] Keyboard interrupt detected in the event loop EvtLoop [Runner_EvtLoop, process=main process 15596], exiting... +[2024-09-02 00:48:34,015][15869] Stopping RolloutWorker_w6... +[2024-09-02 00:48:34,015][15865] Stopping RolloutWorker_w2... +[2024-09-02 00:48:34,015][15864] Stopping RolloutWorker_w1... +[2024-09-02 00:48:34,015][15866] Stopping RolloutWorker_w3... +[2024-09-02 00:48:34,016][15869] Loop rollout_proc6_evt_loop terminating... +[2024-09-02 00:48:34,016][15865] Loop rollout_proc2_evt_loop terminating... +[2024-09-02 00:48:34,016][15868] Stopping RolloutWorker_w5... +[2024-09-02 00:48:34,016][15864] Loop rollout_proc1_evt_loop terminating... +[2024-09-02 00:48:34,016][15866] Loop rollout_proc3_evt_loop terminating... +[2024-09-02 00:48:34,016][15849] Stopping Batcher_0... +[2024-09-02 00:48:34,016][15868] Loop rollout_proc5_evt_loop terminating... +[2024-09-02 00:48:34,016][15849] Loop batcher_evt_loop terminating... +[2024-09-02 00:48:34,016][15870] Stopping RolloutWorker_w7... +[2024-09-02 00:48:34,016][15870] Loop rollout_proc7_evt_loop terminating... +[2024-09-02 00:48:34,016][15867] Stopping RolloutWorker_w4... +[2024-09-02 00:48:34,017][15867] Loop rollout_proc4_evt_loop terminating... +[2024-09-02 00:48:34,018][15849] Saving /home/montana/repos/deep-rl/unit8-ppo/train_dir/default_experiment/checkpoint_p0/checkpoint_000000000_0.pth... +[2024-09-02 00:48:34,015][15596] Runner profile tree view: +main_loop: 23.2514 +[2024-09-02 00:48:34,020][15596] Collected {0: 0}, FPS: 0.0 +[2024-09-02 00:48:34,030][15863] Stopping RolloutWorker_w0... +[2024-09-02 00:48:34,030][15863] Loop rollout_proc0_evt_loop terminating... +[2024-09-02 00:48:34,100][15849] Stopping LearnerWorker_p0... +[2024-09-02 00:48:34,101][15849] Loop learner_proc0_evt_loop terminating... +[2024-09-02 01:04:25,924][15596] Loading existing experiment configuration from /home/montana/repos/deep-rl/unit8-ppo/train_dir/default_experiment/config.json +[2024-09-02 01:04:25,926][15596] Overriding arg 'num_workers' with value 1 passed from command line +[2024-09-02 01:04:25,926][15596] Adding new argument 'no_render'=True that is not in the saved config file! +[2024-09-02 01:04:25,927][15596] Adding new argument 'save_video'=True that is not in the saved config file! +[2024-09-02 01:04:25,928][15596] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! +[2024-09-02 01:04:25,930][15596] Adding new argument 'video_name'=None that is not in the saved config file! +[2024-09-02 01:04:25,930][15596] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! +[2024-09-02 01:04:25,931][15596] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! +[2024-09-02 01:04:25,932][15596] Adding new argument 'push_to_hub'=True that is not in the saved config file! +[2024-09-02 01:04:25,933][15596] Adding new argument 'hf_repository'='monti-python/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! +[2024-09-02 01:04:25,933][15596] Adding new argument 'policy_index'=0 that is not in the saved config file! +[2024-09-02 01:04:25,934][15596] Adding new argument 'eval_deterministic'=False that is not in the saved config file! +[2024-09-02 01:04:25,935][15596] Adding new argument 'train_script'=None that is not in the saved config file! +[2024-09-02 01:04:25,935][15596] Adding new argument 'enjoy_script'=None that is not in the saved config file! +[2024-09-02 01:04:25,936][15596] Using frameskip 1 and render_action_repeat=4 for evaluation +[2024-09-02 01:04:25,962][15596] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-09-02 01:04:25,966][15596] RunningMeanStd input shape: (3, 72, 128) +[2024-09-02 01:04:25,971][15596] RunningMeanStd input shape: (1,) +[2024-09-02 01:04:26,012][15596] ConvEncoder: input_channels=3 +[2024-09-02 01:04:26,159][15596] Conv encoder output size: 512 +[2024-09-02 01:04:26,160][15596] Policy head output size: 512 +[2024-09-02 01:04:34,377][15596] Loading state from checkpoint /home/montana/repos/deep-rl/unit8-ppo/train_dir/default_experiment/checkpoint_p0/checkpoint_000000000_0.pth... +[2024-09-02 01:04:41,601][15596] Num frames 100... +[2024-09-02 01:04:41,846][15596] Num frames 200... +[2024-09-02 01:04:42,022][15596] Num frames 300... +[2024-09-02 01:04:42,234][15596] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840 +[2024-09-02 01:04:42,236][15596] Avg episode reward: 3.840, avg true_objective: 3.840 +[2024-09-02 01:04:42,270][15596] Num frames 400... +[2024-09-02 01:04:42,440][15596] Num frames 500... +[2024-09-02 01:04:42,609][15596] Num frames 600... +[2024-09-02 01:04:42,762][15596] Num frames 700... +[2024-09-02 01:04:42,951][15596] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840 +[2024-09-02 01:04:42,953][15596] Avg episode reward: 3.840, avg true_objective: 3.840 +[2024-09-02 01:04:43,018][15596] Num frames 800... +[2024-09-02 01:04:43,189][15596] Num frames 900... +[2024-09-02 01:04:43,357][15596] Num frames 1000... +[2024-09-02 01:04:43,516][15596] Num frames 1100... +[2024-09-02 01:04:43,648][15596] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840 +[2024-09-02 01:04:43,650][15596] Avg episode reward: 3.840, avg true_objective: 3.840 +[2024-09-02 01:04:43,733][15596] Num frames 1200... +[2024-09-02 01:04:43,915][15596] Num frames 1300... +[2024-09-02 01:04:44,067][15596] Num frames 1400... +[2024-09-02 01:04:44,229][15596] Num frames 1500... +[2024-09-02 01:04:44,397][15596] Num frames 1600... +[2024-09-02 01:04:44,511][15596] Avg episode rewards: #0: 4.580, true rewards: #0: 4.080 +[2024-09-02 01:04:44,513][15596] Avg episode reward: 4.580, avg true_objective: 4.080 +[2024-09-02 01:04:44,633][15596] Num frames 1700... +[2024-09-02 01:04:44,778][15596] Num frames 1800... +[2024-09-02 01:04:44,947][15596] Num frames 1900... +[2024-09-02 01:04:45,117][15596] Num frames 2000... +[2024-09-02 01:04:45,307][15596] Avg episode rewards: #0: 4.760, true rewards: #0: 4.160 +[2024-09-02 01:04:45,308][15596] Avg episode reward: 4.760, avg true_objective: 4.160 +[2024-09-02 01:04:45,347][15596] Num frames 2100... +[2024-09-02 01:04:45,520][15596] Num frames 2200... +[2024-09-02 01:04:45,687][15596] Num frames 2300... +[2024-09-02 01:04:45,859][15596] Num frames 2400... +[2024-09-02 01:04:46,030][15596] Num frames 2500... +[2024-09-02 01:04:46,134][15596] Avg episode rewards: #0: 4.880, true rewards: #0: 4.213 +[2024-09-02 01:04:46,135][15596] Avg episode reward: 4.880, avg true_objective: 4.213 +[2024-09-02 01:04:46,268][15596] Num frames 2600... +[2024-09-02 01:04:46,459][15596] Num frames 2700... +[2024-09-02 01:04:46,644][15596] Num frames 2800... +[2024-09-02 01:04:46,826][15596] Num frames 2900... +[2024-09-02 01:04:47,022][15596] Avg episode rewards: #0: 4.966, true rewards: #0: 4.251 +[2024-09-02 01:04:47,024][15596] Avg episode reward: 4.966, avg true_objective: 4.251 +[2024-09-02 01:04:47,072][15596] Num frames 3000... +[2024-09-02 01:04:47,278][15596] Num frames 3100... +[2024-09-02 01:04:47,512][15596] Num frames 3200... +[2024-09-02 01:04:47,746][15596] Num frames 3300... +[2024-09-02 01:04:47,946][15596] Avg episode rewards: #0: 4.825, true rewards: #0: 4.200 +[2024-09-02 01:04:47,947][15596] Avg episode reward: 4.825, avg true_objective: 4.200 +[2024-09-02 01:04:48,042][15596] Num frames 3400... +[2024-09-02 01:04:48,273][15596] Num frames 3500... +[2024-09-02 01:04:48,524][15596] Num frames 3600... +[2024-09-02 01:04:48,750][15596] Num frames 3700... +[2024-09-02 01:04:48,932][15596] Avg episode rewards: #0: 4.716, true rewards: #0: 4.160 +[2024-09-02 01:04:48,933][15596] Avg episode reward: 4.716, avg true_objective: 4.160 +[2024-09-02 01:04:49,061][15596] Num frames 3800... +[2024-09-02 01:04:49,288][15596] Num frames 3900... +[2024-09-02 01:04:49,509][15596] Num frames 4000... +[2024-09-02 01:04:49,672][15596] Num frames 4100... +[2024-09-02 01:04:49,776][15596] Avg episode rewards: #0: 4.628, true rewards: #0: 4.128 +[2024-09-02 01:04:49,777][15596] Avg episode reward: 4.628, avg true_objective: 4.128 +[2024-09-02 01:04:57,844][15596] Replay video saved to /home/montana/repos/deep-rl/unit8-ppo/train_dir/default_experiment/replay.mp4!