diff --git "a/sf_log.txt" "b/sf_log.txt" new file mode 100644--- /dev/null +++ "b/sf_log.txt" @@ -0,0 +1,1130 @@ +[2024-09-21 12:48:30,381][00197] Saving configuration to /content/train_dir/default_experiment/config.json... +[2024-09-21 12:48:30,385][00197] Rollout worker 0 uses device cpu +[2024-09-21 12:48:30,388][00197] Rollout worker 1 uses device cpu +[2024-09-21 12:48:30,391][00197] Rollout worker 2 uses device cpu +[2024-09-21 12:48:30,392][00197] Rollout worker 3 uses device cpu +[2024-09-21 12:48:30,393][00197] Rollout worker 4 uses device cpu +[2024-09-21 12:48:30,395][00197] Rollout worker 5 uses device cpu +[2024-09-21 12:48:30,396][00197] Rollout worker 6 uses device cpu +[2024-09-21 12:48:30,397][00197] Rollout worker 7 uses device cpu +[2024-09-21 12:48:30,558][00197] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-09-21 12:48:30,560][00197] InferenceWorker_p0-w0: min num requests: 2 +[2024-09-21 12:48:30,593][00197] Starting all processes... +[2024-09-21 12:48:30,594][00197] Starting process learner_proc0 +[2024-09-21 12:48:30,637][00197] Starting all processes... +[2024-09-21 12:48:30,646][00197] Starting process inference_proc0-0 +[2024-09-21 12:48:30,646][00197] Starting process rollout_proc0 +[2024-09-21 12:48:30,648][00197] Starting process rollout_proc1 +[2024-09-21 12:48:30,648][00197] Starting process rollout_proc2 +[2024-09-21 12:48:30,648][00197] Starting process rollout_proc3 +[2024-09-21 12:48:30,648][00197] Starting process rollout_proc4 +[2024-09-21 12:48:30,648][00197] Starting process rollout_proc5 +[2024-09-21 12:48:30,648][00197] Starting process rollout_proc6 +[2024-09-21 12:48:30,648][00197] Starting process rollout_proc7 +[2024-09-21 12:48:43,166][02682] Worker 5 uses CPU cores [1] +[2024-09-21 12:48:43,188][02684] Worker 7 uses CPU cores [1] +[2024-09-21 12:48:43,255][02663] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-09-21 12:48:43,255][02663] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 +[2024-09-21 12:48:43,293][02681] Worker 3 uses CPU cores [1] +[2024-09-21 12:48:43,312][02663] Num visible devices: 1 +[2024-09-21 12:48:43,345][02663] Starting seed is not provided +[2024-09-21 12:48:43,346][02663] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-09-21 12:48:43,347][02663] Initializing actor-critic model on device cuda:0 +[2024-09-21 12:48:43,348][02663] RunningMeanStd input shape: (3, 72, 128) +[2024-09-21 12:48:43,349][02663] RunningMeanStd input shape: (1,) +[2024-09-21 12:48:43,379][02676] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-09-21 12:48:43,380][02676] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 +[2024-09-21 12:48:43,400][02676] Num visible devices: 1 +[2024-09-21 12:48:43,440][02663] ConvEncoder: input_channels=3 +[2024-09-21 12:48:43,475][02679] Worker 0 uses CPU cores [0] +[2024-09-21 12:48:43,494][02680] Worker 4 uses CPU cores [0] +[2024-09-21 12:48:43,560][02678] Worker 1 uses CPU cores [1] +[2024-09-21 12:48:43,570][02677] Worker 2 uses CPU cores [0] +[2024-09-21 12:48:43,617][02683] Worker 6 uses CPU cores [0] +[2024-09-21 12:48:43,718][02663] Conv encoder output size: 512 +[2024-09-21 12:48:43,718][02663] Policy head output size: 512 +[2024-09-21 12:48:43,734][02663] Created Actor Critic model with architecture: +[2024-09-21 12:48:43,734][02663] ActorCriticSharedWeights( + (obs_normalizer): ObservationNormalizer( + (running_mean_std): RunningMeanStdDictInPlace( + (running_mean_std): ModuleDict( + (obs): RunningMeanStdInPlace() + ) + ) + ) + (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) + (encoder): VizdoomEncoder( + (basic_encoder): ConvEncoder( + (enc): RecursiveScriptModule( + original_name=ConvEncoderImpl + (conv_head): RecursiveScriptModule( + original_name=Sequential + (0): RecursiveScriptModule(original_name=Conv2d) + (1): RecursiveScriptModule(original_name=ELU) + (2): RecursiveScriptModule(original_name=Conv2d) + (3): RecursiveScriptModule(original_name=ELU) + (4): RecursiveScriptModule(original_name=Conv2d) + (5): RecursiveScriptModule(original_name=ELU) + ) + (mlp_layers): RecursiveScriptModule( + original_name=Sequential + (0): RecursiveScriptModule(original_name=Linear) + (1): RecursiveScriptModule(original_name=ELU) + ) + ) + ) + ) + (core): ModelCoreRNN( + (core): GRU(512, 512) + ) + (decoder): MlpDecoder( + (mlp): Identity() + ) + (critic_linear): Linear(in_features=512, out_features=1, bias=True) + (action_parameterization): ActionParameterizationDefault( + (distribution_linear): Linear(in_features=512, out_features=5, bias=True) + ) +) +[2024-09-21 12:48:48,331][02663] Using optimizer +[2024-09-21 12:48:48,332][02663] No checkpoints found +[2024-09-21 12:48:48,332][02663] Did not load from checkpoint, starting from scratch! +[2024-09-21 12:48:48,332][02663] Initialized policy 0 weights for model version 0 +[2024-09-21 12:48:48,338][02663] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-09-21 12:48:48,347][02663] LearnerWorker_p0 finished initialization! +[2024-09-21 12:48:48,670][02676] RunningMeanStd input shape: (3, 72, 128) +[2024-09-21 12:48:48,672][02676] RunningMeanStd input shape: (1,) +[2024-09-21 12:48:48,693][02676] ConvEncoder: input_channels=3 +[2024-09-21 12:48:48,865][02676] Conv encoder output size: 512 +[2024-09-21 12:48:48,865][02676] Policy head output size: 512 +[2024-09-21 12:48:49,339][00197] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-21 12:48:50,519][00197] Inference worker 0-0 is ready! +[2024-09-21 12:48:50,520][00197] All inference workers are ready! Signal rollout workers to start! +[2024-09-21 12:48:50,548][00197] Heartbeat connected on Batcher_0 +[2024-09-21 12:48:50,552][00197] Heartbeat connected on LearnerWorker_p0 +[2024-09-21 12:48:50,601][00197] Heartbeat connected on InferenceWorker_p0-w0 +[2024-09-21 12:48:50,705][02679] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-09-21 12:48:50,710][02677] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-09-21 12:48:50,713][02681] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-09-21 12:48:50,710][02678] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-09-21 12:48:50,712][02683] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-09-21 12:48:50,715][02680] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-09-21 12:48:50,722][02682] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-09-21 12:48:50,729][02684] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-09-21 12:48:51,817][02683] Decorrelating experience for 0 frames... +[2024-09-21 12:48:51,819][02678] Decorrelating experience for 0 frames... +[2024-09-21 12:48:51,818][02682] Decorrelating experience for 0 frames... +[2024-09-21 12:48:52,224][02683] Decorrelating experience for 32 frames... +[2024-09-21 12:48:52,605][02684] Decorrelating experience for 0 frames... +[2024-09-21 12:48:52,617][02678] Decorrelating experience for 32 frames... +[2024-09-21 12:48:52,773][02683] Decorrelating experience for 64 frames... +[2024-09-21 12:48:53,711][02680] Decorrelating experience for 0 frames... +[2024-09-21 12:48:53,897][02683] Decorrelating experience for 96 frames... +[2024-09-21 12:48:54,099][00197] Heartbeat connected on RolloutWorker_w6 +[2024-09-21 12:48:54,284][02684] Decorrelating experience for 32 frames... +[2024-09-21 12:48:54,339][00197] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-21 12:48:54,345][02682] Decorrelating experience for 32 frames... +[2024-09-21 12:48:54,502][02678] Decorrelating experience for 64 frames... +[2024-09-21 12:48:54,533][02680] Decorrelating experience for 32 frames... +[2024-09-21 12:48:55,195][02680] Decorrelating experience for 64 frames... +[2024-09-21 12:48:55,400][02684] Decorrelating experience for 64 frames... +[2024-09-21 12:48:55,544][02678] Decorrelating experience for 96 frames... +[2024-09-21 12:48:55,721][00197] Heartbeat connected on RolloutWorker_w1 +[2024-09-21 12:48:55,786][02680] Decorrelating experience for 96 frames... +[2024-09-21 12:48:55,923][00197] Heartbeat connected on RolloutWorker_w4 +[2024-09-21 12:48:56,073][02682] Decorrelating experience for 64 frames... +[2024-09-21 12:48:56,711][02684] Decorrelating experience for 96 frames... +[2024-09-21 12:48:56,753][02682] Decorrelating experience for 96 frames... +[2024-09-21 12:48:56,844][00197] Heartbeat connected on RolloutWorker_w7 +[2024-09-21 12:48:56,874][00197] Heartbeat connected on RolloutWorker_w5 +[2024-09-21 12:48:59,339][00197] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 2.4. Samples: 24. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-21 12:48:59,345][00197] Avg episode reward: [(0, '1.873')] +[2024-09-21 12:49:01,793][02663] Signal inference workers to stop experience collection... +[2024-09-21 12:49:01,812][02676] InferenceWorker_p0-w0: stopping experience collection +[2024-09-21 12:49:03,091][02663] Signal inference workers to resume experience collection... +[2024-09-21 12:49:03,094][02676] InferenceWorker_p0-w0: resuming experience collection +[2024-09-21 12:49:04,339][00197] Fps is (10 sec: 409.6, 60 sec: 273.1, 300 sec: 273.1). Total num frames: 4096. Throughput: 0: 163.6. Samples: 2454. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) +[2024-09-21 12:49:04,341][00197] Avg episode reward: [(0, '3.073')] +[2024-09-21 12:49:09,339][00197] Fps is (10 sec: 1638.4, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 16384. Throughput: 0: 233.6. Samples: 4672. Policy #0 lag: (min: 0.0, avg: 0.3, max: 2.0) +[2024-09-21 12:49:09,341][00197] Avg episode reward: [(0, '3.540')] +[2024-09-21 12:49:14,339][00197] Fps is (10 sec: 3276.8, 60 sec: 1474.6, 300 sec: 1474.6). Total num frames: 36864. Throughput: 0: 290.7. Samples: 7268. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-21 12:49:14,343][00197] Avg episode reward: [(0, '3.981')] +[2024-09-21 12:49:15,288][02676] Updated weights for policy 0, policy_version 10 (0.0669) +[2024-09-21 12:49:19,339][00197] Fps is (10 sec: 3276.8, 60 sec: 1638.4, 300 sec: 1638.4). Total num frames: 49152. Throughput: 0: 402.5. Samples: 12074. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-09-21 12:49:19,346][00197] Avg episode reward: [(0, '4.221')] +[2024-09-21 12:49:24,341][00197] Fps is (10 sec: 2866.6, 60 sec: 1872.3, 300 sec: 1872.3). Total num frames: 65536. Throughput: 0: 473.4. Samples: 16570. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-21 12:49:24,354][00197] Avg episode reward: [(0, '4.367')] +[2024-09-21 12:49:28,180][02676] Updated weights for policy 0, policy_version 20 (0.0018) +[2024-09-21 12:49:29,347][00197] Fps is (10 sec: 3683.2, 60 sec: 2149.9, 300 sec: 2149.9). Total num frames: 86016. Throughput: 0: 480.5. Samples: 19226. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-21 12:49:29,354][00197] Avg episode reward: [(0, '4.519')] +[2024-09-21 12:49:34,339][00197] Fps is (10 sec: 3277.5, 60 sec: 2184.5, 300 sec: 2184.5). Total num frames: 98304. Throughput: 0: 548.0. Samples: 24660. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-21 12:49:34,346][00197] Avg episode reward: [(0, '4.422')] +[2024-09-21 12:49:39,339][00197] Fps is (10 sec: 2459.7, 60 sec: 2211.8, 300 sec: 2211.8). Total num frames: 110592. Throughput: 0: 623.0. Samples: 28034. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-09-21 12:49:39,346][00197] Avg episode reward: [(0, '4.319')] +[2024-09-21 12:49:39,348][02663] Saving new best policy, reward=4.319! +[2024-09-21 12:49:41,840][02676] Updated weights for policy 0, policy_version 30 (0.0028) +[2024-09-21 12:49:44,339][00197] Fps is (10 sec: 3276.8, 60 sec: 2383.1, 300 sec: 2383.1). Total num frames: 131072. Throughput: 0: 685.4. Samples: 30866. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-09-21 12:49:44,347][00197] Avg episode reward: [(0, '4.258')] +[2024-09-21 12:49:49,339][00197] Fps is (10 sec: 3686.5, 60 sec: 2457.6, 300 sec: 2457.6). Total num frames: 147456. Throughput: 0: 756.1. Samples: 36480. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-09-21 12:49:49,340][00197] Avg episode reward: [(0, '4.371')] +[2024-09-21 12:49:49,346][02663] Saving new best policy, reward=4.371! +[2024-09-21 12:49:54,340][00197] Fps is (10 sec: 2866.9, 60 sec: 2662.4, 300 sec: 2457.6). Total num frames: 159744. Throughput: 0: 787.1. Samples: 40094. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-21 12:49:54,343][00197] Avg episode reward: [(0, '4.304')] +[2024-09-21 12:49:54,983][02676] Updated weights for policy 0, policy_version 40 (0.0014) +[2024-09-21 12:49:59,339][00197] Fps is (10 sec: 3276.8, 60 sec: 3003.7, 300 sec: 2574.6). Total num frames: 180224. Throughput: 0: 790.0. Samples: 42820. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-09-21 12:49:59,346][00197] Avg episode reward: [(0, '4.211')] +[2024-09-21 12:50:04,339][00197] Fps is (10 sec: 3686.8, 60 sec: 3208.5, 300 sec: 2621.4). Total num frames: 196608. Throughput: 0: 811.6. Samples: 48598. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-09-21 12:50:04,342][00197] Avg episode reward: [(0, '4.299')] +[2024-09-21 12:50:06,991][02676] Updated weights for policy 0, policy_version 50 (0.0019) +[2024-09-21 12:50:09,339][00197] Fps is (10 sec: 2867.2, 60 sec: 3208.5, 300 sec: 2611.2). Total num frames: 208896. Throughput: 0: 798.8. Samples: 52512. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-09-21 12:50:09,343][00197] Avg episode reward: [(0, '4.440')] +[2024-09-21 12:50:09,348][02663] Saving new best policy, reward=4.440! +[2024-09-21 12:50:14,339][00197] Fps is (10 sec: 3276.7, 60 sec: 3208.5, 300 sec: 2698.5). Total num frames: 229376. Throughput: 0: 794.8. Samples: 54984. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-21 12:50:14,342][00197] Avg episode reward: [(0, '4.481')] +[2024-09-21 12:50:14,350][02663] Saving new best policy, reward=4.481! +[2024-09-21 12:50:18,516][02676] Updated weights for policy 0, policy_version 60 (0.0013) +[2024-09-21 12:50:19,339][00197] Fps is (10 sec: 3686.4, 60 sec: 3276.8, 300 sec: 2730.7). Total num frames: 245760. Throughput: 0: 801.1. Samples: 60708. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-21 12:50:19,344][00197] Avg episode reward: [(0, '4.668')] +[2024-09-21 12:50:19,348][02663] Saving new best policy, reward=4.668! +[2024-09-21 12:50:24,340][00197] Fps is (10 sec: 2867.0, 60 sec: 3208.6, 300 sec: 2716.3). Total num frames: 258048. Throughput: 0: 823.9. Samples: 65112. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-09-21 12:50:24,342][00197] Avg episode reward: [(0, '4.668')] +[2024-09-21 12:50:24,355][02663] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000063_258048.pth... +[2024-09-21 12:50:29,339][00197] Fps is (10 sec: 2867.2, 60 sec: 3140.7, 300 sec: 2744.3). Total num frames: 274432. Throughput: 0: 802.6. Samples: 66982. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-09-21 12:50:29,347][00197] Avg episode reward: [(0, '4.484')] +[2024-09-21 12:50:31,902][02676] Updated weights for policy 0, policy_version 70 (0.0016) +[2024-09-21 12:50:34,339][00197] Fps is (10 sec: 3686.8, 60 sec: 3276.8, 300 sec: 2808.7). Total num frames: 294912. Throughput: 0: 806.9. Samples: 72790. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-09-21 12:50:34,347][00197] Avg episode reward: [(0, '4.335')] +[2024-09-21 12:50:39,343][00197] Fps is (10 sec: 3275.4, 60 sec: 3276.6, 300 sec: 2792.6). Total num frames: 307200. Throughput: 0: 834.9. Samples: 77666. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-09-21 12:50:39,347][00197] Avg episode reward: [(0, '4.423')] +[2024-09-21 12:50:44,339][00197] Fps is (10 sec: 2867.2, 60 sec: 3208.5, 300 sec: 2813.8). Total num frames: 323584. Throughput: 0: 814.1. Samples: 79454. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-21 12:50:44,341][00197] Avg episode reward: [(0, '4.471')] +[2024-09-21 12:50:44,625][02676] Updated weights for policy 0, policy_version 80 (0.0015) +[2024-09-21 12:50:49,339][00197] Fps is (10 sec: 3688.0, 60 sec: 3276.8, 300 sec: 2867.2). Total num frames: 344064. Throughput: 0: 805.3. Samples: 84836. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-21 12:50:49,342][00197] Avg episode reward: [(0, '4.471')] +[2024-09-21 12:50:54,339][00197] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 2883.6). Total num frames: 360448. Throughput: 0: 836.6. Samples: 90160. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-21 12:50:54,342][00197] Avg episode reward: [(0, '4.462')] +[2024-09-21 12:50:57,769][02676] Updated weights for policy 0, policy_version 90 (0.0017) +[2024-09-21 12:50:59,339][00197] Fps is (10 sec: 2867.2, 60 sec: 3208.5, 300 sec: 2867.2). Total num frames: 372736. Throughput: 0: 819.2. Samples: 91846. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-21 12:50:59,342][00197] Avg episode reward: [(0, '4.406')] +[2024-09-21 12:51:04,339][00197] Fps is (10 sec: 2867.2, 60 sec: 3208.5, 300 sec: 2882.4). Total num frames: 389120. Throughput: 0: 801.3. Samples: 96768. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-09-21 12:51:04,342][00197] Avg episode reward: [(0, '4.585')] +[2024-09-21 12:51:08,922][02676] Updated weights for policy 0, policy_version 100 (0.0015) +[2024-09-21 12:51:09,339][00197] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 2925.7). Total num frames: 409600. Throughput: 0: 831.8. Samples: 102542. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-09-21 12:51:09,341][00197] Avg episode reward: [(0, '4.465')] +[2024-09-21 12:51:14,340][00197] Fps is (10 sec: 3276.3, 60 sec: 3208.5, 300 sec: 2909.5). Total num frames: 421888. Throughput: 0: 832.7. Samples: 104456. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-21 12:51:14,343][00197] Avg episode reward: [(0, '4.544')] +[2024-09-21 12:51:19,339][00197] Fps is (10 sec: 2867.2, 60 sec: 3208.5, 300 sec: 2921.8). Total num frames: 438272. Throughput: 0: 802.8. Samples: 108914. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-21 12:51:19,344][00197] Avg episode reward: [(0, '4.672')] +[2024-09-21 12:51:19,346][02663] Saving new best policy, reward=4.672! +[2024-09-21 12:51:21,840][02676] Updated weights for policy 0, policy_version 110 (0.0013) +[2024-09-21 12:51:24,339][00197] Fps is (10 sec: 3687.0, 60 sec: 3345.1, 300 sec: 2959.7). Total num frames: 458752. Throughput: 0: 819.4. Samples: 114534. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-09-21 12:51:24,342][00197] Avg episode reward: [(0, '4.662')] +[2024-09-21 12:51:29,339][00197] Fps is (10 sec: 3276.7, 60 sec: 3276.8, 300 sec: 2944.0). Total num frames: 471040. Throughput: 0: 832.7. Samples: 116928. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-09-21 12:51:29,342][00197] Avg episode reward: [(0, '4.495')] +[2024-09-21 12:51:34,339][00197] Fps is (10 sec: 2867.2, 60 sec: 3208.5, 300 sec: 2954.1). Total num frames: 487424. Throughput: 0: 800.0. Samples: 120836. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-09-21 12:51:34,344][00197] Avg episode reward: [(0, '4.518')] +[2024-09-21 12:51:35,074][02676] Updated weights for policy 0, policy_version 120 (0.0015) +[2024-09-21 12:51:39,339][00197] Fps is (10 sec: 3686.5, 60 sec: 3345.3, 300 sec: 2987.7). Total num frames: 507904. Throughput: 0: 809.3. Samples: 126580. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-09-21 12:51:39,341][00197] Avg episode reward: [(0, '4.659')] +[2024-09-21 12:51:44,339][00197] Fps is (10 sec: 3276.7, 60 sec: 3276.8, 300 sec: 2972.5). Total num frames: 520192. Throughput: 0: 834.9. Samples: 129416. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-21 12:51:44,344][00197] Avg episode reward: [(0, '4.648')] +[2024-09-21 12:51:47,979][02676] Updated weights for policy 0, policy_version 130 (0.0024) +[2024-09-21 12:51:49,339][00197] Fps is (10 sec: 2867.1, 60 sec: 3208.5, 300 sec: 2981.0). Total num frames: 536576. Throughput: 0: 809.1. Samples: 133176. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-21 12:51:49,343][00197] Avg episode reward: [(0, '4.776')] +[2024-09-21 12:51:49,350][02663] Saving new best policy, reward=4.776! +[2024-09-21 12:51:54,339][00197] Fps is (10 sec: 3276.9, 60 sec: 3208.5, 300 sec: 2989.0). Total num frames: 552960. Throughput: 0: 801.5. Samples: 138610. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-09-21 12:51:54,348][00197] Avg episode reward: [(0, '4.740')] +[2024-09-21 12:51:59,339][00197] Fps is (10 sec: 3276.9, 60 sec: 3276.8, 300 sec: 2996.5). Total num frames: 569344. Throughput: 0: 819.1. Samples: 141314. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-09-21 12:51:59,343][00197] Avg episode reward: [(0, '4.549')] +[2024-09-21 12:51:59,721][02676] Updated weights for policy 0, policy_version 140 (0.0013) +[2024-09-21 12:52:04,339][00197] Fps is (10 sec: 2867.2, 60 sec: 3208.5, 300 sec: 2982.7). Total num frames: 581632. Throughput: 0: 812.3. Samples: 145468. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-09-21 12:52:04,346][00197] Avg episode reward: [(0, '4.582')] +[2024-09-21 12:52:09,339][00197] Fps is (10 sec: 3276.8, 60 sec: 3208.5, 300 sec: 3010.6). Total num frames: 602112. Throughput: 0: 802.0. Samples: 150622. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-21 12:52:09,346][00197] Avg episode reward: [(0, '4.607')] +[2024-09-21 12:52:12,064][02676] Updated weights for policy 0, policy_version 150 (0.0015) +[2024-09-21 12:52:14,339][00197] Fps is (10 sec: 4096.0, 60 sec: 3345.2, 300 sec: 3037.0). Total num frames: 622592. Throughput: 0: 810.9. Samples: 153418. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-09-21 12:52:14,346][00197] Avg episode reward: [(0, '4.729')] +[2024-09-21 12:52:19,340][00197] Fps is (10 sec: 2866.8, 60 sec: 3208.5, 300 sec: 3003.7). Total num frames: 630784. Throughput: 0: 822.9. Samples: 157866. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-21 12:52:19,342][00197] Avg episode reward: [(0, '4.702')] +[2024-09-21 12:52:24,339][00197] Fps is (10 sec: 2047.9, 60 sec: 3072.0, 300 sec: 2991.0). Total num frames: 643072. Throughput: 0: 766.1. Samples: 161056. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-09-21 12:52:24,347][00197] Avg episode reward: [(0, '4.597')] +[2024-09-21 12:52:24,361][02663] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000157_643072.pth... +[2024-09-21 12:52:27,845][02676] Updated weights for policy 0, policy_version 160 (0.0028) +[2024-09-21 12:52:29,339][00197] Fps is (10 sec: 2867.6, 60 sec: 3140.3, 300 sec: 2997.5). Total num frames: 659456. Throughput: 0: 748.4. Samples: 163094. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-09-21 12:52:29,341][00197] Avg episode reward: [(0, '4.575')] +[2024-09-21 12:52:34,339][00197] Fps is (10 sec: 2867.3, 60 sec: 3072.0, 300 sec: 2985.5). Total num frames: 671744. Throughput: 0: 779.4. Samples: 168248. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-09-21 12:52:34,342][00197] Avg episode reward: [(0, '4.618')] +[2024-09-21 12:52:39,339][00197] Fps is (10 sec: 2867.1, 60 sec: 3003.7, 300 sec: 2991.9). Total num frames: 688128. Throughput: 0: 745.1. Samples: 172140. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-21 12:52:39,344][00197] Avg episode reward: [(0, '4.784')] +[2024-09-21 12:52:39,347][02663] Saving new best policy, reward=4.784! +[2024-09-21 12:52:40,976][02676] Updated weights for policy 0, policy_version 170 (0.0036) +[2024-09-21 12:52:44,339][00197] Fps is (10 sec: 3686.4, 60 sec: 3140.3, 300 sec: 3015.4). Total num frames: 708608. Throughput: 0: 746.8. Samples: 174920. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-21 12:52:44,341][00197] Avg episode reward: [(0, '4.826')] +[2024-09-21 12:52:44,350][02663] Saving new best policy, reward=4.826! +[2024-09-21 12:52:49,344][00197] Fps is (10 sec: 3684.6, 60 sec: 3140.0, 300 sec: 3020.7). Total num frames: 724992. Throughput: 0: 779.2. Samples: 180536. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-21 12:52:49,348][00197] Avg episode reward: [(0, '4.925')] +[2024-09-21 12:52:49,350][02663] Saving new best policy, reward=4.925! +[2024-09-21 12:52:54,339][00197] Fps is (10 sec: 2457.6, 60 sec: 3003.7, 300 sec: 2992.6). Total num frames: 733184. Throughput: 0: 742.6. Samples: 184040. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-09-21 12:52:54,341][00197] Avg episode reward: [(0, '4.762')] +[2024-09-21 12:52:54,415][02676] Updated weights for policy 0, policy_version 180 (0.0013) +[2024-09-21 12:52:59,339][00197] Fps is (10 sec: 2868.7, 60 sec: 3072.0, 300 sec: 3014.7). Total num frames: 753664. Throughput: 0: 742.3. Samples: 186820. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-21 12:52:59,341][00197] Avg episode reward: [(0, '4.948')] +[2024-09-21 12:52:59,348][02663] Saving new best policy, reward=4.948! +[2024-09-21 12:53:04,339][00197] Fps is (10 sec: 4096.0, 60 sec: 3208.5, 300 sec: 3035.9). Total num frames: 774144. Throughput: 0: 769.1. Samples: 192476. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-21 12:53:04,347][00197] Avg episode reward: [(0, '4.775')] +[2024-09-21 12:53:05,579][02676] Updated weights for policy 0, policy_version 190 (0.0021) +[2024-09-21 12:53:09,339][00197] Fps is (10 sec: 3276.8, 60 sec: 3072.0, 300 sec: 3024.7). Total num frames: 786432. Throughput: 0: 791.2. Samples: 196658. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-09-21 12:53:09,349][00197] Avg episode reward: [(0, '4.998')] +[2024-09-21 12:53:09,351][02663] Saving new best policy, reward=4.998! +[2024-09-21 12:53:14,339][00197] Fps is (10 sec: 2867.1, 60 sec: 3003.7, 300 sec: 3029.5). Total num frames: 802816. Throughput: 0: 795.3. Samples: 198882. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-09-21 12:53:14,342][00197] Avg episode reward: [(0, '4.838')] +[2024-09-21 12:53:18,060][02676] Updated weights for policy 0, policy_version 200 (0.0016) +[2024-09-21 12:53:19,339][00197] Fps is (10 sec: 3686.4, 60 sec: 3208.6, 300 sec: 3049.2). Total num frames: 823296. Throughput: 0: 809.5. Samples: 204676. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-09-21 12:53:19,342][00197] Avg episode reward: [(0, '5.020')] +[2024-09-21 12:53:19,346][02663] Saving new best policy, reward=5.020! +[2024-09-21 12:53:24,339][00197] Fps is (10 sec: 3276.9, 60 sec: 3208.5, 300 sec: 3038.5). Total num frames: 835584. Throughput: 0: 822.0. Samples: 209128. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-21 12:53:24,344][00197] Avg episode reward: [(0, '4.933')] +[2024-09-21 12:53:29,339][00197] Fps is (10 sec: 2867.2, 60 sec: 3208.5, 300 sec: 3042.7). Total num frames: 851968. Throughput: 0: 803.1. Samples: 211060. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-21 12:53:29,347][00197] Avg episode reward: [(0, '4.906')] +[2024-09-21 12:53:31,006][02676] Updated weights for policy 0, policy_version 210 (0.0026) +[2024-09-21 12:53:34,339][00197] Fps is (10 sec: 3276.8, 60 sec: 3276.8, 300 sec: 3046.8). Total num frames: 868352. Throughput: 0: 804.8. Samples: 216746. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-09-21 12:53:34,341][00197] Avg episode reward: [(0, '4.907')] +[2024-09-21 12:53:39,339][00197] Fps is (10 sec: 3276.6, 60 sec: 3276.8, 300 sec: 3050.8). Total num frames: 884736. Throughput: 0: 835.4. Samples: 221632. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-09-21 12:53:39,346][00197] Avg episode reward: [(0, '4.861')] +[2024-09-21 12:53:44,339][00197] Fps is (10 sec: 2867.2, 60 sec: 3140.3, 300 sec: 3040.8). Total num frames: 897024. Throughput: 0: 812.3. Samples: 223372. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-09-21 12:53:44,346][00197] Avg episode reward: [(0, '5.129')] +[2024-09-21 12:53:44,354][02663] Saving new best policy, reward=5.129! +[2024-09-21 12:53:44,629][02676] Updated weights for policy 0, policy_version 220 (0.0013) +[2024-09-21 12:53:49,339][00197] Fps is (10 sec: 3277.0, 60 sec: 3208.8, 300 sec: 3110.2). Total num frames: 917504. Throughput: 0: 800.3. Samples: 228490. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-09-21 12:53:49,349][00197] Avg episode reward: [(0, '5.154')] +[2024-09-21 12:53:49,352][02663] Saving new best policy, reward=5.154! +[2024-09-21 12:53:54,339][00197] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 3165.7). Total num frames: 933888. Throughput: 0: 822.4. Samples: 233664. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-09-21 12:53:54,344][00197] Avg episode reward: [(0, '4.935')] +[2024-09-21 12:53:57,040][02676] Updated weights for policy 0, policy_version 230 (0.0016) +[2024-09-21 12:53:59,339][00197] Fps is (10 sec: 2867.2, 60 sec: 3208.5, 300 sec: 3193.5). Total num frames: 946176. Throughput: 0: 811.6. Samples: 235404. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-09-21 12:53:59,341][00197] Avg episode reward: [(0, '4.885')] +[2024-09-21 12:54:04,339][00197] Fps is (10 sec: 2867.1, 60 sec: 3140.3, 300 sec: 3207.4). Total num frames: 962560. Throughput: 0: 785.8. Samples: 240036. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-21 12:54:04,342][00197] Avg episode reward: [(0, '4.929')] +[2024-09-21 12:54:08,855][02676] Updated weights for policy 0, policy_version 240 (0.0014) +[2024-09-21 12:54:09,339][00197] Fps is (10 sec: 3686.4, 60 sec: 3276.8, 300 sec: 3207.4). Total num frames: 983040. Throughput: 0: 816.6. Samples: 245874. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-09-21 12:54:09,344][00197] Avg episode reward: [(0, '5.045')] +[2024-09-21 12:54:14,339][00197] Fps is (10 sec: 3276.9, 60 sec: 3208.5, 300 sec: 3207.4). Total num frames: 995328. Throughput: 0: 820.7. Samples: 247990. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-09-21 12:54:14,343][00197] Avg episode reward: [(0, '4.970')] +[2024-09-21 12:54:19,339][00197] Fps is (10 sec: 2867.2, 60 sec: 3140.3, 300 sec: 3207.4). Total num frames: 1011712. Throughput: 0: 788.4. Samples: 252224. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-21 12:54:19,346][00197] Avg episode reward: [(0, '5.005')] +[2024-09-21 12:54:21,835][02676] Updated weights for policy 0, policy_version 250 (0.0018) +[2024-09-21 12:54:24,339][00197] Fps is (10 sec: 3686.4, 60 sec: 3276.8, 300 sec: 3207.5). Total num frames: 1032192. Throughput: 0: 806.5. Samples: 257922. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-21 12:54:24,346][00197] Avg episode reward: [(0, '5.164')] +[2024-09-21 12:54:24,358][02663] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000252_1032192.pth... +[2024-09-21 12:54:24,473][02663] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000063_258048.pth +[2024-09-21 12:54:24,496][02663] Saving new best policy, reward=5.164! +[2024-09-21 12:54:29,339][00197] Fps is (10 sec: 3276.8, 60 sec: 3208.5, 300 sec: 3207.4). Total num frames: 1044480. Throughput: 0: 820.7. Samples: 260302. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-21 12:54:29,342][00197] Avg episode reward: [(0, '5.441')] +[2024-09-21 12:54:29,351][02663] Saving new best policy, reward=5.441! +[2024-09-21 12:54:34,339][00197] Fps is (10 sec: 2867.2, 60 sec: 3208.5, 300 sec: 3221.3). Total num frames: 1060864. Throughput: 0: 789.8. Samples: 264030. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-09-21 12:54:34,345][00197] Avg episode reward: [(0, '5.451')] +[2024-09-21 12:54:34,354][02663] Saving new best policy, reward=5.451! +[2024-09-21 12:54:35,510][02676] Updated weights for policy 0, policy_version 260 (0.0013) +[2024-09-21 12:54:39,339][00197] Fps is (10 sec: 3276.8, 60 sec: 3208.6, 300 sec: 3207.4). Total num frames: 1077248. Throughput: 0: 796.6. Samples: 269512. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-21 12:54:39,341][00197] Avg episode reward: [(0, '5.562')] +[2024-09-21 12:54:39,344][02663] Saving new best policy, reward=5.562! +[2024-09-21 12:54:44,339][00197] Fps is (10 sec: 3276.6, 60 sec: 3276.8, 300 sec: 3207.4). Total num frames: 1093632. Throughput: 0: 820.3. Samples: 272316. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-21 12:54:44,345][00197] Avg episode reward: [(0, '5.382')] +[2024-09-21 12:54:48,565][02676] Updated weights for policy 0, policy_version 270 (0.0022) +[2024-09-21 12:54:49,339][00197] Fps is (10 sec: 2867.2, 60 sec: 3140.3, 300 sec: 3207.4). Total num frames: 1105920. Throughput: 0: 799.4. Samples: 276010. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-09-21 12:54:49,342][00197] Avg episode reward: [(0, '5.612')] +[2024-09-21 12:54:49,346][02663] Saving new best policy, reward=5.612! +[2024-09-21 12:54:54,339][00197] Fps is (10 sec: 3277.0, 60 sec: 3208.5, 300 sec: 3207.4). Total num frames: 1126400. Throughput: 0: 791.4. Samples: 281486. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-09-21 12:54:54,341][00197] Avg episode reward: [(0, '5.640')] +[2024-09-21 12:54:54,356][02663] Saving new best policy, reward=5.640! +[2024-09-21 12:54:59,339][00197] Fps is (10 sec: 3686.4, 60 sec: 3276.8, 300 sec: 3207.4). Total num frames: 1142784. Throughput: 0: 806.7. Samples: 284290. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-21 12:54:59,347][00197] Avg episode reward: [(0, '5.716')] +[2024-09-21 12:54:59,354][02663] Saving new best policy, reward=5.716! +[2024-09-21 12:55:00,287][02676] Updated weights for policy 0, policy_version 280 (0.0015) +[2024-09-21 12:55:04,339][00197] Fps is (10 sec: 2867.2, 60 sec: 3208.5, 300 sec: 3207.4). Total num frames: 1155072. Throughput: 0: 800.2. Samples: 288232. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-09-21 12:55:04,342][00197] Avg episode reward: [(0, '5.555')] +[2024-09-21 12:55:09,339][00197] Fps is (10 sec: 2867.2, 60 sec: 3140.3, 300 sec: 3193.5). Total num frames: 1171456. Throughput: 0: 785.3. Samples: 293262. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-09-21 12:55:09,341][00197] Avg episode reward: [(0, '5.777')] +[2024-09-21 12:55:09,347][02663] Saving new best policy, reward=5.777! +[2024-09-21 12:55:12,890][02676] Updated weights for policy 0, policy_version 290 (0.0020) +[2024-09-21 12:55:14,339][00197] Fps is (10 sec: 3686.4, 60 sec: 3276.8, 300 sec: 3207.4). Total num frames: 1191936. Throughput: 0: 794.2. Samples: 296042. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-09-21 12:55:14,347][00197] Avg episode reward: [(0, '5.940')] +[2024-09-21 12:55:14,377][02663] Saving new best policy, reward=5.940! +[2024-09-21 12:55:19,339][00197] Fps is (10 sec: 3276.7, 60 sec: 3208.5, 300 sec: 3207.4). Total num frames: 1204224. Throughput: 0: 811.8. Samples: 300560. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-09-21 12:55:19,342][00197] Avg episode reward: [(0, '6.001')] +[2024-09-21 12:55:19,344][02663] Saving new best policy, reward=6.001! +[2024-09-21 12:55:24,340][00197] Fps is (10 sec: 2867.0, 60 sec: 3140.2, 300 sec: 3207.4). Total num frames: 1220608. Throughput: 0: 794.5. Samples: 305266. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-09-21 12:55:24,342][00197] Avg episode reward: [(0, '6.040')] +[2024-09-21 12:55:24,355][02663] Saving new best policy, reward=6.040! +[2024-09-21 12:55:26,059][02676] Updated weights for policy 0, policy_version 300 (0.0016) +[2024-09-21 12:55:29,339][00197] Fps is (10 sec: 3686.5, 60 sec: 3276.8, 300 sec: 3207.4). Total num frames: 1241088. Throughput: 0: 795.2. Samples: 308100. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-21 12:55:29,345][00197] Avg episode reward: [(0, '6.406')] +[2024-09-21 12:55:29,348][02663] Saving new best policy, reward=6.406! +[2024-09-21 12:55:34,339][00197] Fps is (10 sec: 3277.1, 60 sec: 3208.5, 300 sec: 3207.4). Total num frames: 1253376. Throughput: 0: 824.1. Samples: 313096. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-09-21 12:55:34,344][00197] Avg episode reward: [(0, '6.396')] +[2024-09-21 12:55:39,339][00197] Fps is (10 sec: 2048.0, 60 sec: 3072.0, 300 sec: 3179.6). Total num frames: 1261568. Throughput: 0: 772.2. Samples: 316236. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-09-21 12:55:39,346][00197] Avg episode reward: [(0, '6.359')] +[2024-09-21 12:55:41,132][02676] Updated weights for policy 0, policy_version 310 (0.0035) +[2024-09-21 12:55:44,339][00197] Fps is (10 sec: 2457.6, 60 sec: 3072.0, 300 sec: 3165.7). Total num frames: 1277952. Throughput: 0: 748.5. Samples: 317972. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-21 12:55:44,346][00197] Avg episode reward: [(0, '6.559')] +[2024-09-21 12:55:44,355][02663] Saving new best policy, reward=6.559! +[2024-09-21 12:55:49,339][00197] Fps is (10 sec: 3276.8, 60 sec: 3140.3, 300 sec: 3165.7). Total num frames: 1294336. Throughput: 0: 778.6. Samples: 323270. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-21 12:55:49,344][00197] Avg episode reward: [(0, '6.945')] +[2024-09-21 12:55:49,348][02663] Saving new best policy, reward=6.945! +[2024-09-21 12:55:54,339][00197] Fps is (10 sec: 2867.2, 60 sec: 3003.7, 300 sec: 3165.7). Total num frames: 1306624. Throughput: 0: 746.7. Samples: 326864. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-21 12:55:54,341][00197] Avg episode reward: [(0, '7.119')] +[2024-09-21 12:55:54,349][02663] Saving new best policy, reward=7.119! +[2024-09-21 12:55:55,138][02676] Updated weights for policy 0, policy_version 320 (0.0014) +[2024-09-21 12:55:59,339][00197] Fps is (10 sec: 2867.2, 60 sec: 3003.7, 300 sec: 3165.7). Total num frames: 1323008. Throughput: 0: 738.7. Samples: 329284. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-21 12:55:59,341][00197] Avg episode reward: [(0, '7.178')] +[2024-09-21 12:55:59,344][02663] Saving new best policy, reward=7.178! +[2024-09-21 12:56:04,339][00197] Fps is (10 sec: 3686.4, 60 sec: 3140.3, 300 sec: 3165.7). Total num frames: 1343488. Throughput: 0: 763.2. Samples: 334904. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-21 12:56:04,341][00197] Avg episode reward: [(0, '7.374')] +[2024-09-21 12:56:04,353][02663] Saving new best policy, reward=7.374! +[2024-09-21 12:56:07,037][02676] Updated weights for policy 0, policy_version 330 (0.0013) +[2024-09-21 12:56:09,339][00197] Fps is (10 sec: 3276.8, 60 sec: 3072.0, 300 sec: 3165.7). Total num frames: 1355776. Throughput: 0: 752.9. Samples: 339144. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-21 12:56:09,345][00197] Avg episode reward: [(0, '7.841')] +[2024-09-21 12:56:09,347][02663] Saving new best policy, reward=7.841! +[2024-09-21 12:56:14,339][00197] Fps is (10 sec: 2867.2, 60 sec: 3003.7, 300 sec: 3165.7). Total num frames: 1372160. Throughput: 0: 737.5. Samples: 341288. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-21 12:56:14,347][00197] Avg episode reward: [(0, '8.040')] +[2024-09-21 12:56:14,358][02663] Saving new best policy, reward=8.040! +[2024-09-21 12:56:19,052][02676] Updated weights for policy 0, policy_version 340 (0.0016) +[2024-09-21 12:56:19,339][00197] Fps is (10 sec: 3686.4, 60 sec: 3140.3, 300 sec: 3165.7). Total num frames: 1392640. Throughput: 0: 754.3. Samples: 347038. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-09-21 12:56:19,342][00197] Avg episode reward: [(0, '8.763')] +[2024-09-21 12:56:19,348][02663] Saving new best policy, reward=8.763! +[2024-09-21 12:56:24,339][00197] Fps is (10 sec: 3276.8, 60 sec: 3072.0, 300 sec: 3165.7). Total num frames: 1404928. Throughput: 0: 785.9. Samples: 351602. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-09-21 12:56:24,344][00197] Avg episode reward: [(0, '8.392')] +[2024-09-21 12:56:24,355][02663] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000343_1404928.pth... +[2024-09-21 12:56:24,511][02663] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000157_643072.pth +[2024-09-21 12:56:29,339][00197] Fps is (10 sec: 2867.2, 60 sec: 3003.7, 300 sec: 3165.7). Total num frames: 1421312. Throughput: 0: 787.4. Samples: 353406. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-21 12:56:29,342][00197] Avg episode reward: [(0, '8.066')] +[2024-09-21 12:56:32,363][02676] Updated weights for policy 0, policy_version 350 (0.0015) +[2024-09-21 12:56:34,339][00197] Fps is (10 sec: 3276.8, 60 sec: 3072.0, 300 sec: 3151.8). Total num frames: 1437696. Throughput: 0: 792.1. Samples: 358916. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-21 12:56:34,349][00197] Avg episode reward: [(0, '7.630')] +[2024-09-21 12:56:39,340][00197] Fps is (10 sec: 3276.4, 60 sec: 3208.5, 300 sec: 3165.7). Total num frames: 1454080. Throughput: 0: 830.1. Samples: 364218. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-09-21 12:56:39,347][00197] Avg episode reward: [(0, '7.400')] +[2024-09-21 12:56:44,339][00197] Fps is (10 sec: 3276.8, 60 sec: 3208.5, 300 sec: 3165.7). Total num frames: 1470464. Throughput: 0: 815.8. Samples: 365996. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-21 12:56:44,341][00197] Avg episode reward: [(0, '7.762')] +[2024-09-21 12:56:45,331][02676] Updated weights for policy 0, policy_version 360 (0.0013) +[2024-09-21 12:56:49,339][00197] Fps is (10 sec: 3277.2, 60 sec: 3208.5, 300 sec: 3165.7). Total num frames: 1486848. Throughput: 0: 805.0. Samples: 371130. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-21 12:56:49,342][00197] Avg episode reward: [(0, '8.366')] +[2024-09-21 12:56:54,339][00197] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 3179.6). Total num frames: 1507328. Throughput: 0: 836.6. Samples: 376790. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-09-21 12:56:54,345][00197] Avg episode reward: [(0, '9.184')] +[2024-09-21 12:56:54,359][02663] Saving new best policy, reward=9.184! +[2024-09-21 12:56:57,699][02676] Updated weights for policy 0, policy_version 370 (0.0013) +[2024-09-21 12:56:59,339][00197] Fps is (10 sec: 3276.7, 60 sec: 3276.8, 300 sec: 3179.6). Total num frames: 1519616. Throughput: 0: 827.3. Samples: 378516. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-21 12:56:59,346][00197] Avg episode reward: [(0, '9.815')] +[2024-09-21 12:56:59,350][02663] Saving new best policy, reward=9.815! +[2024-09-21 12:57:04,339][00197] Fps is (10 sec: 2867.1, 60 sec: 3208.5, 300 sec: 3165.7). Total num frames: 1536000. Throughput: 0: 803.4. Samples: 383192. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-09-21 12:57:04,347][00197] Avg episode reward: [(0, '10.820')] +[2024-09-21 12:57:04,356][02663] Saving new best policy, reward=10.820! +[2024-09-21 12:57:08,897][02676] Updated weights for policy 0, policy_version 380 (0.0013) +[2024-09-21 12:57:09,339][00197] Fps is (10 sec: 3686.6, 60 sec: 3345.1, 300 sec: 3165.7). Total num frames: 1556480. Throughput: 0: 831.5. Samples: 389020. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-09-21 12:57:09,344][00197] Avg episode reward: [(0, '11.399')] +[2024-09-21 12:57:09,348][02663] Saving new best policy, reward=11.399! +[2024-09-21 12:57:14,340][00197] Fps is (10 sec: 3276.9, 60 sec: 3276.8, 300 sec: 3179.6). Total num frames: 1568768. Throughput: 0: 841.0. Samples: 391252. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-21 12:57:14,343][00197] Avg episode reward: [(0, '11.544')] +[2024-09-21 12:57:14,366][02663] Saving new best policy, reward=11.544! +[2024-09-21 12:57:19,339][00197] Fps is (10 sec: 2867.2, 60 sec: 3208.5, 300 sec: 3193.5). Total num frames: 1585152. Throughput: 0: 814.1. Samples: 395550. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-21 12:57:19,341][00197] Avg episode reward: [(0, '11.526')] +[2024-09-21 12:57:21,705][02676] Updated weights for policy 0, policy_version 390 (0.0024) +[2024-09-21 12:57:24,339][00197] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 3207.4). Total num frames: 1605632. Throughput: 0: 827.2. Samples: 401442. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-21 12:57:24,343][00197] Avg episode reward: [(0, '10.818')] +[2024-09-21 12:57:29,339][00197] Fps is (10 sec: 3276.8, 60 sec: 3276.8, 300 sec: 3207.4). Total num frames: 1617920. Throughput: 0: 847.0. Samples: 404112. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-21 12:57:29,341][00197] Avg episode reward: [(0, '10.785')] +[2024-09-21 12:57:34,339][00197] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 3207.4). Total num frames: 1634304. Throughput: 0: 817.9. Samples: 407936. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-21 12:57:34,344][00197] Avg episode reward: [(0, '10.517')] +[2024-09-21 12:57:34,586][02676] Updated weights for policy 0, policy_version 400 (0.0012) +[2024-09-21 12:57:39,339][00197] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 3207.4). Total num frames: 1654784. Throughput: 0: 823.6. Samples: 413850. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-09-21 12:57:39,345][00197] Avg episode reward: [(0, '10.234')] +[2024-09-21 12:57:44,341][00197] Fps is (10 sec: 3685.5, 60 sec: 3344.9, 300 sec: 3207.4). Total num frames: 1671168. Throughput: 0: 849.4. Samples: 416740. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-21 12:57:44,360][00197] Avg episode reward: [(0, '10.840')] +[2024-09-21 12:57:46,890][02676] Updated weights for policy 0, policy_version 410 (0.0013) +[2024-09-21 12:57:49,339][00197] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 3221.3). Total num frames: 1683456. Throughput: 0: 830.4. Samples: 420558. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-21 12:57:49,346][00197] Avg episode reward: [(0, '11.751')] +[2024-09-21 12:57:49,349][02663] Saving new best policy, reward=11.751! +[2024-09-21 12:57:54,341][00197] Fps is (10 sec: 3276.8, 60 sec: 3276.7, 300 sec: 3221.2). Total num frames: 1703936. Throughput: 0: 824.3. Samples: 426116. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-09-21 12:57:54,350][00197] Avg episode reward: [(0, '12.529')] +[2024-09-21 12:57:54,363][02663] Saving new best policy, reward=12.529! +[2024-09-21 12:57:58,265][02676] Updated weights for policy 0, policy_version 420 (0.0013) +[2024-09-21 12:57:59,339][00197] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 3207.4). Total num frames: 1720320. Throughput: 0: 835.7. Samples: 428858. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-09-21 12:57:59,352][00197] Avg episode reward: [(0, '13.322')] +[2024-09-21 12:57:59,357][02663] Saving new best policy, reward=13.322! +[2024-09-21 12:58:04,344][00197] Fps is (10 sec: 2866.3, 60 sec: 3276.5, 300 sec: 3207.3). Total num frames: 1732608. Throughput: 0: 834.1. Samples: 433088. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-21 12:58:04,352][00197] Avg episode reward: [(0, '13.278')] +[2024-09-21 12:58:09,339][00197] Fps is (10 sec: 3276.8, 60 sec: 3276.8, 300 sec: 3221.3). Total num frames: 1753088. Throughput: 0: 818.5. Samples: 438276. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-21 12:58:09,343][00197] Avg episode reward: [(0, '13.342')] +[2024-09-21 12:58:09,348][02663] Saving new best policy, reward=13.342! +[2024-09-21 12:58:10,985][02676] Updated weights for policy 0, policy_version 430 (0.0014) +[2024-09-21 12:58:14,339][00197] Fps is (10 sec: 4098.3, 60 sec: 3413.3, 300 sec: 3221.3). Total num frames: 1773568. Throughput: 0: 823.2. Samples: 441156. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-21 12:58:14,344][00197] Avg episode reward: [(0, '14.351')] +[2024-09-21 12:58:14,355][02663] Saving new best policy, reward=14.351! +[2024-09-21 12:58:19,339][00197] Fps is (10 sec: 3276.7, 60 sec: 3345.1, 300 sec: 3221.3). Total num frames: 1785856. Throughput: 0: 842.1. Samples: 445830. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-21 12:58:19,347][00197] Avg episode reward: [(0, '14.216')] +[2024-09-21 12:58:23,918][02676] Updated weights for policy 0, policy_version 440 (0.0013) +[2024-09-21 12:58:24,339][00197] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 3221.3). Total num frames: 1802240. Throughput: 0: 816.6. Samples: 450596. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-09-21 12:58:24,340][00197] Avg episode reward: [(0, '15.297')] +[2024-09-21 12:58:24,349][02663] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000440_1802240.pth... +[2024-09-21 12:58:24,352][00197] Components not started: RolloutWorker_w0, RolloutWorker_w2, RolloutWorker_w3, wait_time=600.0 seconds +[2024-09-21 12:58:24,466][02663] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000252_1032192.pth +[2024-09-21 12:58:24,479][02663] Saving new best policy, reward=15.297! +[2024-09-21 12:58:29,339][00197] Fps is (10 sec: 3276.9, 60 sec: 3345.1, 300 sec: 3221.3). Total num frames: 1818624. Throughput: 0: 813.7. Samples: 453354. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-09-21 12:58:29,341][00197] Avg episode reward: [(0, '15.638')] +[2024-09-21 12:58:29,345][02663] Saving new best policy, reward=15.638! +[2024-09-21 12:58:34,339][00197] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3221.3). Total num frames: 1835008. Throughput: 0: 839.8. Samples: 458350. Policy #0 lag: (min: 0.0, avg: 0.3, max: 2.0) +[2024-09-21 12:58:34,341][00197] Avg episode reward: [(0, '15.349')] +[2024-09-21 12:58:36,930][02676] Updated weights for policy 0, policy_version 450 (0.0015) +[2024-09-21 12:58:39,339][00197] Fps is (10 sec: 3276.8, 60 sec: 3276.8, 300 sec: 3235.1). Total num frames: 1851392. Throughput: 0: 814.1. Samples: 462748. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-21 12:58:39,347][00197] Avg episode reward: [(0, '14.796')] +[2024-09-21 12:58:44,339][00197] Fps is (10 sec: 3276.8, 60 sec: 3276.9, 300 sec: 3221.3). Total num frames: 1867776. Throughput: 0: 817.4. Samples: 465640. Policy #0 lag: (min: 0.0, avg: 0.2, max: 2.0) +[2024-09-21 12:58:44,343][00197] Avg episode reward: [(0, '13.785')] +[2024-09-21 12:58:47,802][02676] Updated weights for policy 0, policy_version 460 (0.0017) +[2024-09-21 12:58:49,339][00197] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3221.3). Total num frames: 1884160. Throughput: 0: 847.2. Samples: 471208. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2024-09-21 12:58:49,341][00197] Avg episode reward: [(0, '12.993')] +[2024-09-21 12:58:54,339][00197] Fps is (10 sec: 2867.2, 60 sec: 3208.7, 300 sec: 3221.3). Total num frames: 1896448. Throughput: 0: 806.7. Samples: 474578. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-09-21 12:58:54,344][00197] Avg episode reward: [(0, '12.421')] +[2024-09-21 12:58:59,339][00197] Fps is (10 sec: 2867.2, 60 sec: 3208.5, 300 sec: 3221.3). Total num frames: 1912832. Throughput: 0: 782.8. Samples: 476382. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-09-21 12:58:59,341][00197] Avg episode reward: [(0, '11.258')] +[2024-09-21 12:59:02,467][02676] Updated weights for policy 0, policy_version 470 (0.0014) +[2024-09-21 12:59:04,339][00197] Fps is (10 sec: 3276.7, 60 sec: 3277.1, 300 sec: 3207.4). Total num frames: 1929216. Throughput: 0: 792.2. Samples: 481478. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2024-09-21 12:59:04,344][00197] Avg episode reward: [(0, '12.164')] +[2024-09-21 12:59:09,339][00197] Fps is (10 sec: 2867.2, 60 sec: 3140.3, 300 sec: 3207.4). Total num frames: 1941504. Throughput: 0: 781.6. Samples: 485770. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-21 12:59:09,343][00197] Avg episode reward: [(0, '13.065')] +[2024-09-21 12:59:14,339][00197] Fps is (10 sec: 3276.9, 60 sec: 3140.3, 300 sec: 3221.3). Total num frames: 1961984. Throughput: 0: 772.2. Samples: 488104. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-09-21 12:59:14,341][00197] Avg episode reward: [(0, '13.984')] +[2024-09-21 12:59:15,470][02676] Updated weights for policy 0, policy_version 480 (0.0017) +[2024-09-21 12:59:19,339][00197] Fps is (10 sec: 3686.4, 60 sec: 3208.5, 300 sec: 3207.4). Total num frames: 1978368. Throughput: 0: 790.6. Samples: 493926. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-09-21 12:59:19,341][00197] Avg episode reward: [(0, '15.054')] +[2024-09-21 12:59:24,344][00197] Fps is (10 sec: 2865.6, 60 sec: 3140.0, 300 sec: 3207.3). Total num frames: 1990656. Throughput: 0: 792.8. Samples: 498426. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-09-21 12:59:24,347][00197] Avg episode reward: [(0, '14.439')] +[2024-09-21 12:59:28,340][02676] Updated weights for policy 0, policy_version 490 (0.0023) +[2024-09-21 12:59:29,339][00197] Fps is (10 sec: 2867.2, 60 sec: 3140.3, 300 sec: 3207.4). Total num frames: 2007040. Throughput: 0: 770.0. Samples: 500290. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-09-21 12:59:29,341][00197] Avg episode reward: [(0, '14.703')] +[2024-09-21 12:59:34,339][00197] Fps is (10 sec: 3688.4, 60 sec: 3208.5, 300 sec: 3221.3). Total num frames: 2027520. Throughput: 0: 774.8. Samples: 506074. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-09-21 12:59:34,344][00197] Avg episode reward: [(0, '14.537')] +[2024-09-21 12:59:39,340][00197] Fps is (10 sec: 3686.0, 60 sec: 3208.5, 300 sec: 3221.3). Total num frames: 2043904. Throughput: 0: 814.2. Samples: 511220. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-09-21 12:59:39,342][00197] Avg episode reward: [(0, '14.103')] +[2024-09-21 12:59:40,450][02676] Updated weights for policy 0, policy_version 500 (0.0013) +[2024-09-21 12:59:44,339][00197] Fps is (10 sec: 2867.2, 60 sec: 3140.3, 300 sec: 3221.3). Total num frames: 2056192. Throughput: 0: 814.8. Samples: 513046. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-09-21 12:59:44,343][00197] Avg episode reward: [(0, '15.235')] +[2024-09-21 12:59:49,339][00197] Fps is (10 sec: 3277.2, 60 sec: 3208.5, 300 sec: 3221.3). Total num frames: 2076672. Throughput: 0: 823.4. Samples: 518530. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-09-21 12:59:49,341][00197] Avg episode reward: [(0, '15.993')] +[2024-09-21 12:59:49,345][02663] Saving new best policy, reward=15.993! +[2024-09-21 12:59:51,708][02676] Updated weights for policy 0, policy_version 510 (0.0013) +[2024-09-21 12:59:54,339][00197] Fps is (10 sec: 3686.3, 60 sec: 3276.8, 300 sec: 3221.3). Total num frames: 2093056. Throughput: 0: 845.4. Samples: 523812. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-21 12:59:54,349][00197] Avg episode reward: [(0, '15.430')] +[2024-09-21 12:59:59,339][00197] Fps is (10 sec: 2867.2, 60 sec: 3208.5, 300 sec: 3221.3). Total num frames: 2105344. Throughput: 0: 833.3. Samples: 525602. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-09-21 12:59:59,343][00197] Avg episode reward: [(0, '15.367')] +[2024-09-21 13:00:04,339][00197] Fps is (10 sec: 3276.9, 60 sec: 3276.8, 300 sec: 3235.1). Total num frames: 2125824. Throughput: 0: 812.9. Samples: 530508. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-21 13:00:04,341][00197] Avg episode reward: [(0, '15.635')] +[2024-09-21 13:00:04,902][02676] Updated weights for policy 0, policy_version 520 (0.0019) +[2024-09-21 13:00:09,339][00197] Fps is (10 sec: 4096.0, 60 sec: 3413.3, 300 sec: 3235.1). Total num frames: 2146304. Throughput: 0: 843.3. Samples: 536370. Policy #0 lag: (min: 0.0, avg: 0.3, max: 2.0) +[2024-09-21 13:00:09,342][00197] Avg episode reward: [(0, '14.817')] +[2024-09-21 13:00:14,342][00197] Fps is (10 sec: 2866.2, 60 sec: 3208.3, 300 sec: 3221.2). Total num frames: 2154496. Throughput: 0: 843.5. Samples: 538252. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-21 13:00:14,348][00197] Avg episode reward: [(0, '15.768')] +[2024-09-21 13:00:17,642][02676] Updated weights for policy 0, policy_version 530 (0.0014) +[2024-09-21 13:00:19,339][00197] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 3235.2). Total num frames: 2174976. Throughput: 0: 819.1. Samples: 542932. Policy #0 lag: (min: 0.0, avg: 0.3, max: 2.0) +[2024-09-21 13:00:19,341][00197] Avg episode reward: [(0, '16.211')] +[2024-09-21 13:00:19,344][02663] Saving new best policy, reward=16.211! +[2024-09-21 13:00:24,339][00197] Fps is (10 sec: 4097.4, 60 sec: 3413.6, 300 sec: 3235.1). Total num frames: 2195456. Throughput: 0: 830.2. Samples: 548576. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-09-21 13:00:24,341][00197] Avg episode reward: [(0, '17.823')] +[2024-09-21 13:00:24,358][02663] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000536_2195456.pth... +[2024-09-21 13:00:24,483][02663] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000343_1404928.pth +[2024-09-21 13:00:24,499][02663] Saving new best policy, reward=17.823! +[2024-09-21 13:00:29,339][00197] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3235.1). Total num frames: 2207744. Throughput: 0: 837.3. Samples: 550726. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-21 13:00:29,346][00197] Avg episode reward: [(0, '18.162')] +[2024-09-21 13:00:29,349][02663] Saving new best policy, reward=18.162! +[2024-09-21 13:00:30,845][02676] Updated weights for policy 0, policy_version 540 (0.0021) +[2024-09-21 13:00:34,339][00197] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 3262.9). Total num frames: 2224128. Throughput: 0: 804.0. Samples: 554712. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-09-21 13:00:34,346][00197] Avg episode reward: [(0, '18.307')] +[2024-09-21 13:00:34,356][02663] Saving new best policy, reward=18.307! +[2024-09-21 13:00:39,339][00197] Fps is (10 sec: 3276.8, 60 sec: 3276.9, 300 sec: 3262.9). Total num frames: 2240512. Throughput: 0: 814.2. Samples: 560450. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-09-21 13:00:39,346][00197] Avg episode reward: [(0, '19.099')] +[2024-09-21 13:00:39,349][02663] Saving new best policy, reward=19.099! +[2024-09-21 13:00:42,260][02676] Updated weights for policy 0, policy_version 550 (0.0013) +[2024-09-21 13:00:44,339][00197] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3262.9). Total num frames: 2256896. Throughput: 0: 831.9. Samples: 563038. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-21 13:00:44,345][00197] Avg episode reward: [(0, '19.093')] +[2024-09-21 13:00:49,339][00197] Fps is (10 sec: 2867.2, 60 sec: 3208.5, 300 sec: 3262.9). Total num frames: 2269184. Throughput: 0: 806.0. Samples: 566780. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-09-21 13:00:49,341][00197] Avg episode reward: [(0, '18.423')] +[2024-09-21 13:00:54,339][00197] Fps is (10 sec: 3276.7, 60 sec: 3276.8, 300 sec: 3276.8). Total num frames: 2289664. Throughput: 0: 805.0. Samples: 572596. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-21 13:00:54,347][00197] Avg episode reward: [(0, '18.280')] +[2024-09-21 13:00:54,854][02676] Updated weights for policy 0, policy_version 560 (0.0018) +[2024-09-21 13:00:59,339][00197] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 3262.9). Total num frames: 2306048. Throughput: 0: 825.6. Samples: 575402. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-21 13:00:59,342][00197] Avg episode reward: [(0, '19.026')] +[2024-09-21 13:01:04,339][00197] Fps is (10 sec: 2867.3, 60 sec: 3208.5, 300 sec: 3262.9). Total num frames: 2318336. Throughput: 0: 805.2. Samples: 579164. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-21 13:01:04,347][00197] Avg episode reward: [(0, '18.312')] +[2024-09-21 13:01:07,764][02676] Updated weights for policy 0, policy_version 570 (0.0014) +[2024-09-21 13:01:09,339][00197] Fps is (10 sec: 3276.8, 60 sec: 3208.5, 300 sec: 3276.8). Total num frames: 2338816. Throughput: 0: 803.2. Samples: 584718. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-21 13:01:09,342][00197] Avg episode reward: [(0, '17.515')] +[2024-09-21 13:01:14,339][00197] Fps is (10 sec: 3686.4, 60 sec: 3345.3, 300 sec: 3262.9). Total num frames: 2355200. Throughput: 0: 820.0. Samples: 587624. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-09-21 13:01:14,342][00197] Avg episode reward: [(0, '17.064')] +[2024-09-21 13:01:19,339][00197] Fps is (10 sec: 2867.2, 60 sec: 3208.5, 300 sec: 3262.9). Total num frames: 2367488. Throughput: 0: 825.4. Samples: 591856. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-21 13:01:19,346][00197] Avg episode reward: [(0, '16.955')] +[2024-09-21 13:01:20,594][02676] Updated weights for policy 0, policy_version 580 (0.0017) +[2024-09-21 13:01:24,339][00197] Fps is (10 sec: 3276.8, 60 sec: 3208.5, 300 sec: 3276.8). Total num frames: 2387968. Throughput: 0: 815.5. Samples: 597148. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-21 13:01:24,342][00197] Avg episode reward: [(0, '16.705')] +[2024-09-21 13:01:29,339][00197] Fps is (10 sec: 4096.0, 60 sec: 3345.1, 300 sec: 3290.7). Total num frames: 2408448. Throughput: 0: 822.3. Samples: 600040. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-09-21 13:01:29,349][00197] Avg episode reward: [(0, '16.183')] +[2024-09-21 13:01:32,487][02676] Updated weights for policy 0, policy_version 590 (0.0016) +[2024-09-21 13:01:34,339][00197] Fps is (10 sec: 3276.8, 60 sec: 3276.8, 300 sec: 3276.8). Total num frames: 2420736. Throughput: 0: 840.1. Samples: 604584. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-09-21 13:01:34,341][00197] Avg episode reward: [(0, '16.198')] +[2024-09-21 13:01:39,339][00197] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 3276.8). Total num frames: 2437120. Throughput: 0: 816.6. Samples: 609344. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-21 13:01:39,344][00197] Avg episode reward: [(0, '16.723')] +[2024-09-21 13:01:44,206][02676] Updated weights for policy 0, policy_version 600 (0.0015) +[2024-09-21 13:01:44,339][00197] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 3290.7). Total num frames: 2457600. Throughput: 0: 818.2. Samples: 612222. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-21 13:01:44,344][00197] Avg episode reward: [(0, '16.832')] +[2024-09-21 13:01:49,339][00197] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3262.9). Total num frames: 2469888. Throughput: 0: 847.4. Samples: 617296. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-09-21 13:01:49,343][00197] Avg episode reward: [(0, '16.909')] +[2024-09-21 13:01:54,339][00197] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 3276.8). Total num frames: 2486272. Throughput: 0: 819.1. Samples: 621578. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-09-21 13:01:54,341][00197] Avg episode reward: [(0, '17.403')] +[2024-09-21 13:01:57,232][02676] Updated weights for policy 0, policy_version 610 (0.0014) +[2024-09-21 13:01:59,339][00197] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 3290.7). Total num frames: 2506752. Throughput: 0: 817.0. Samples: 624388. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-09-21 13:01:59,341][00197] Avg episode reward: [(0, '18.103')] +[2024-09-21 13:02:04,341][00197] Fps is (10 sec: 3276.2, 60 sec: 3345.0, 300 sec: 3262.9). Total num frames: 2519040. Throughput: 0: 844.9. Samples: 629878. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-21 13:02:04,343][00197] Avg episode reward: [(0, '18.355')] +[2024-09-21 13:02:09,339][00197] Fps is (10 sec: 2457.5, 60 sec: 3208.5, 300 sec: 3262.9). Total num frames: 2531328. Throughput: 0: 805.4. Samples: 633392. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-21 13:02:09,347][00197] Avg episode reward: [(0, '17.487')] +[2024-09-21 13:02:11,496][02676] Updated weights for policy 0, policy_version 620 (0.0017) +[2024-09-21 13:02:14,339][00197] Fps is (10 sec: 2458.1, 60 sec: 3140.3, 300 sec: 3249.0). Total num frames: 2543616. Throughput: 0: 780.5. Samples: 635164. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-09-21 13:02:14,341][00197] Avg episode reward: [(0, '17.676')] +[2024-09-21 13:02:19,339][00197] Fps is (10 sec: 3276.9, 60 sec: 3276.8, 300 sec: 3249.0). Total num frames: 2564096. Throughput: 0: 787.1. Samples: 640002. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-09-21 13:02:19,341][00197] Avg episode reward: [(0, '18.051')] +[2024-09-21 13:02:24,345][00197] Fps is (10 sec: 3274.7, 60 sec: 3139.9, 300 sec: 3249.0). Total num frames: 2576384. Throughput: 0: 769.7. Samples: 643986. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-21 13:02:24,360][00197] Avg episode reward: [(0, '18.511')] +[2024-09-21 13:02:24,373][02663] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000629_2576384.pth... +[2024-09-21 13:02:24,538][02663] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000440_1802240.pth +[2024-09-21 13:02:25,684][02676] Updated weights for policy 0, policy_version 630 (0.0018) +[2024-09-21 13:02:29,339][00197] Fps is (10 sec: 2867.2, 60 sec: 3072.0, 300 sec: 3249.0). Total num frames: 2592768. Throughput: 0: 759.1. Samples: 646380. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-21 13:02:29,341][00197] Avg episode reward: [(0, '18.504')] +[2024-09-21 13:02:34,339][00197] Fps is (10 sec: 3688.7, 60 sec: 3208.5, 300 sec: 3249.0). Total num frames: 2613248. Throughput: 0: 773.7. Samples: 652114. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-09-21 13:02:34,350][00197] Avg episode reward: [(0, '18.525')] +[2024-09-21 13:02:36,993][02676] Updated weights for policy 0, policy_version 640 (0.0013) +[2024-09-21 13:02:39,339][00197] Fps is (10 sec: 3276.8, 60 sec: 3140.3, 300 sec: 3235.2). Total num frames: 2625536. Throughput: 0: 778.2. Samples: 656598. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-21 13:02:39,344][00197] Avg episode reward: [(0, '19.517')] +[2024-09-21 13:02:39,348][02663] Saving new best policy, reward=19.517! +[2024-09-21 13:02:44,339][00197] Fps is (10 sec: 2867.2, 60 sec: 3072.0, 300 sec: 3249.0). Total num frames: 2641920. Throughput: 0: 758.8. Samples: 658534. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-21 13:02:44,341][00197] Avg episode reward: [(0, '19.526')] +[2024-09-21 13:02:44,350][02663] Saving new best policy, reward=19.526! +[2024-09-21 13:02:49,342][00197] Fps is (10 sec: 3275.7, 60 sec: 3140.1, 300 sec: 3235.1). Total num frames: 2658304. Throughput: 0: 762.9. Samples: 664208. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-09-21 13:02:49,345][00197] Avg episode reward: [(0, '19.726')] +[2024-09-21 13:02:49,346][02663] Saving new best policy, reward=19.726! +[2024-09-21 13:02:49,542][02676] Updated weights for policy 0, policy_version 650 (0.0019) +[2024-09-21 13:02:54,339][00197] Fps is (10 sec: 3276.8, 60 sec: 3140.3, 300 sec: 3235.1). Total num frames: 2674688. Throughput: 0: 793.2. Samples: 669088. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-09-21 13:02:54,341][00197] Avg episode reward: [(0, '19.984')] +[2024-09-21 13:02:54,350][02663] Saving new best policy, reward=19.984! +[2024-09-21 13:02:59,339][00197] Fps is (10 sec: 3277.9, 60 sec: 3072.0, 300 sec: 3249.1). Total num frames: 2691072. Throughput: 0: 794.0. Samples: 670892. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-09-21 13:02:59,344][00197] Avg episode reward: [(0, '19.492')] +[2024-09-21 13:03:02,599][02676] Updated weights for policy 0, policy_version 660 (0.0023) +[2024-09-21 13:03:04,339][00197] Fps is (10 sec: 3276.8, 60 sec: 3140.4, 300 sec: 3235.1). Total num frames: 2707456. Throughput: 0: 804.4. Samples: 676200. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-21 13:03:04,345][00197] Avg episode reward: [(0, '19.700')] +[2024-09-21 13:03:09,339][00197] Fps is (10 sec: 3276.8, 60 sec: 3208.5, 300 sec: 3221.3). Total num frames: 2723840. Throughput: 0: 836.5. Samples: 681622. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-09-21 13:03:09,342][00197] Avg episode reward: [(0, '20.278')] +[2024-09-21 13:03:09,352][02663] Saving new best policy, reward=20.278! +[2024-09-21 13:03:14,339][00197] Fps is (10 sec: 2867.2, 60 sec: 3208.5, 300 sec: 3221.3). Total num frames: 2736128. Throughput: 0: 821.6. Samples: 683354. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-21 13:03:14,346][00197] Avg episode reward: [(0, '21.450')] +[2024-09-21 13:03:14,357][02663] Saving new best policy, reward=21.450! +[2024-09-21 13:03:15,757][02676] Updated weights for policy 0, policy_version 670 (0.0014) +[2024-09-21 13:03:19,339][00197] Fps is (10 sec: 3276.8, 60 sec: 3208.5, 300 sec: 3235.1). Total num frames: 2756608. Throughput: 0: 803.1. Samples: 688254. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2024-09-21 13:03:19,344][00197] Avg episode reward: [(0, '20.901')] +[2024-09-21 13:03:24,339][00197] Fps is (10 sec: 3686.4, 60 sec: 3277.1, 300 sec: 3235.1). Total num frames: 2772992. Throughput: 0: 831.6. Samples: 694018. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-21 13:03:24,349][00197] Avg episode reward: [(0, '21.246')] +[2024-09-21 13:03:27,703][02676] Updated weights for policy 0, policy_version 680 (0.0017) +[2024-09-21 13:03:29,339][00197] Fps is (10 sec: 2867.2, 60 sec: 3208.5, 300 sec: 3221.3). Total num frames: 2785280. Throughput: 0: 830.6. Samples: 695910. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-09-21 13:03:29,341][00197] Avg episode reward: [(0, '22.096')] +[2024-09-21 13:03:29,344][02663] Saving new best policy, reward=22.096! +[2024-09-21 13:03:34,339][00197] Fps is (10 sec: 3276.8, 60 sec: 3208.5, 300 sec: 3235.1). Total num frames: 2805760. Throughput: 0: 805.3. Samples: 700444. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-21 13:03:34,341][00197] Avg episode reward: [(0, '22.424')] +[2024-09-21 13:03:34,357][02663] Saving new best policy, reward=22.424! +[2024-09-21 13:03:39,339][00197] Fps is (10 sec: 3686.4, 60 sec: 3276.8, 300 sec: 3235.1). Total num frames: 2822144. Throughput: 0: 822.5. Samples: 706102. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-09-21 13:03:39,346][00197] Avg episode reward: [(0, '21.657')] +[2024-09-21 13:03:39,371][02676] Updated weights for policy 0, policy_version 690 (0.0015) +[2024-09-21 13:03:44,339][00197] Fps is (10 sec: 3276.8, 60 sec: 3276.8, 300 sec: 3235.1). Total num frames: 2838528. Throughput: 0: 835.0. Samples: 708468. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-21 13:03:44,341][00197] Avg episode reward: [(0, '20.963')] +[2024-09-21 13:03:49,339][00197] Fps is (10 sec: 3276.8, 60 sec: 3277.0, 300 sec: 3249.0). Total num frames: 2854912. Throughput: 0: 808.0. Samples: 712562. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2024-09-21 13:03:49,346][00197] Avg episode reward: [(0, '20.822')] +[2024-09-21 13:03:52,268][02676] Updated weights for policy 0, policy_version 700 (0.0013) +[2024-09-21 13:03:54,339][00197] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 3262.9). Total num frames: 2875392. Throughput: 0: 817.5. Samples: 718408. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-09-21 13:03:54,346][00197] Avg episode reward: [(0, '20.828')] +[2024-09-21 13:03:59,339][00197] Fps is (10 sec: 3276.8, 60 sec: 3276.8, 300 sec: 3249.0). Total num frames: 2887680. Throughput: 0: 841.4. Samples: 721218. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-21 13:03:59,345][00197] Avg episode reward: [(0, '19.314')] +[2024-09-21 13:04:04,339][00197] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 3262.9). Total num frames: 2904064. Throughput: 0: 814.2. Samples: 724894. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-09-21 13:04:04,345][00197] Avg episode reward: [(0, '19.743')] +[2024-09-21 13:04:05,063][02676] Updated weights for policy 0, policy_version 710 (0.0016) +[2024-09-21 13:04:09,339][00197] Fps is (10 sec: 3276.8, 60 sec: 3276.8, 300 sec: 3249.0). Total num frames: 2920448. Throughput: 0: 813.3. Samples: 730616. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-09-21 13:04:09,342][00197] Avg episode reward: [(0, '20.805')] +[2024-09-21 13:04:14,339][00197] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3249.0). Total num frames: 2936832. Throughput: 0: 835.2. Samples: 733492. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-09-21 13:04:14,341][00197] Avg episode reward: [(0, '20.469')] +[2024-09-21 13:04:17,585][02676] Updated weights for policy 0, policy_version 720 (0.0016) +[2024-09-21 13:04:19,339][00197] Fps is (10 sec: 3276.8, 60 sec: 3276.8, 300 sec: 3263.0). Total num frames: 2953216. Throughput: 0: 824.8. Samples: 737558. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-21 13:04:19,343][00197] Avg episode reward: [(0, '20.896')] +[2024-09-21 13:04:24,339][00197] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 3276.8). Total num frames: 2973696. Throughput: 0: 818.7. Samples: 742944. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-21 13:04:24,341][00197] Avg episode reward: [(0, '21.289')] +[2024-09-21 13:04:24,355][02663] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000726_2973696.pth... +[2024-09-21 13:04:24,486][02663] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000536_2195456.pth +[2024-09-21 13:04:28,757][02676] Updated weights for policy 0, policy_version 730 (0.0013) +[2024-09-21 13:04:29,342][00197] Fps is (10 sec: 3685.2, 60 sec: 3413.1, 300 sec: 3262.9). Total num frames: 2990080. Throughput: 0: 829.0. Samples: 745778. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-09-21 13:04:29,355][00197] Avg episode reward: [(0, '22.329')] +[2024-09-21 13:04:34,340][00197] Fps is (10 sec: 2866.8, 60 sec: 3276.7, 300 sec: 3249.0). Total num frames: 3002368. Throughput: 0: 836.3. Samples: 750198. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-09-21 13:04:34,348][00197] Avg episode reward: [(0, '22.840')] +[2024-09-21 13:04:34,362][02663] Saving new best policy, reward=22.840! +[2024-09-21 13:04:39,339][00197] Fps is (10 sec: 2868.2, 60 sec: 3276.8, 300 sec: 3262.9). Total num frames: 3018752. Throughput: 0: 812.9. Samples: 754990. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-09-21 13:04:39,348][00197] Avg episode reward: [(0, '21.519')] +[2024-09-21 13:04:41,818][02676] Updated weights for policy 0, policy_version 740 (0.0018) +[2024-09-21 13:04:44,339][00197] Fps is (10 sec: 3687.0, 60 sec: 3345.1, 300 sec: 3262.9). Total num frames: 3039232. Throughput: 0: 815.2. Samples: 757904. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-21 13:04:44,343][00197] Avg episode reward: [(0, '21.301')] +[2024-09-21 13:04:49,339][00197] Fps is (10 sec: 3276.8, 60 sec: 3276.8, 300 sec: 3249.0). Total num frames: 3051520. Throughput: 0: 843.2. Samples: 762838. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-21 13:04:49,344][00197] Avg episode reward: [(0, '20.429')] +[2024-09-21 13:04:54,339][00197] Fps is (10 sec: 2867.2, 60 sec: 3208.5, 300 sec: 3262.9). Total num frames: 3067904. Throughput: 0: 814.5. Samples: 767270. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-09-21 13:04:54,341][00197] Avg episode reward: [(0, '21.624')] +[2024-09-21 13:04:54,674][02676] Updated weights for policy 0, policy_version 750 (0.0019) +[2024-09-21 13:04:59,339][00197] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 3262.9). Total num frames: 3088384. Throughput: 0: 814.2. Samples: 770130. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-21 13:04:59,341][00197] Avg episode reward: [(0, '21.478')] +[2024-09-21 13:05:04,342][00197] Fps is (10 sec: 3686.3, 60 sec: 3345.0, 300 sec: 3249.0). Total num frames: 3104768. Throughput: 0: 843.3. Samples: 775506. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-09-21 13:05:04,349][00197] Avg episode reward: [(0, '20.999')] +[2024-09-21 13:05:07,486][02676] Updated weights for policy 0, policy_version 760 (0.0013) +[2024-09-21 13:05:09,339][00197] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 3263.0). Total num frames: 3117056. Throughput: 0: 810.6. Samples: 779420. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-21 13:05:09,341][00197] Avg episode reward: [(0, '20.828')] +[2024-09-21 13:05:14,339][00197] Fps is (10 sec: 3276.9, 60 sec: 3345.1, 300 sec: 3262.9). Total num frames: 3137536. Throughput: 0: 811.4. Samples: 782290. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-21 13:05:14,342][00197] Avg episode reward: [(0, '19.957')] +[2024-09-21 13:05:18,322][02676] Updated weights for policy 0, policy_version 770 (0.0013) +[2024-09-21 13:05:19,344][00197] Fps is (10 sec: 3684.3, 60 sec: 3344.7, 300 sec: 3249.0). Total num frames: 3153920. Throughput: 0: 842.3. Samples: 788106. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-09-21 13:05:19,347][00197] Avg episode reward: [(0, '21.288')] +[2024-09-21 13:05:24,342][00197] Fps is (10 sec: 2866.2, 60 sec: 3208.3, 300 sec: 3249.0). Total num frames: 3166208. Throughput: 0: 819.2. Samples: 791856. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-21 13:05:24,345][00197] Avg episode reward: [(0, '20.063')] +[2024-09-21 13:05:29,339][00197] Fps is (10 sec: 2459.0, 60 sec: 3140.4, 300 sec: 3235.1). Total num frames: 3178496. Throughput: 0: 793.5. Samples: 793610. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-09-21 13:05:29,341][00197] Avg episode reward: [(0, '19.998')] +[2024-09-21 13:05:33,432][02676] Updated weights for policy 0, policy_version 780 (0.0015) +[2024-09-21 13:05:34,339][00197] Fps is (10 sec: 2868.2, 60 sec: 3208.6, 300 sec: 3235.1). Total num frames: 3194880. Throughput: 0: 785.3. Samples: 798176. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-21 13:05:34,342][00197] Avg episode reward: [(0, '20.043')] +[2024-09-21 13:05:39,339][00197] Fps is (10 sec: 2867.2, 60 sec: 3140.3, 300 sec: 3221.3). Total num frames: 3207168. Throughput: 0: 782.8. Samples: 802496. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-21 13:05:39,345][00197] Avg episode reward: [(0, '20.908')] +[2024-09-21 13:05:44,339][00197] Fps is (10 sec: 2867.2, 60 sec: 3072.0, 300 sec: 3235.1). Total num frames: 3223552. Throughput: 0: 763.7. Samples: 804496. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-21 13:05:44,341][00197] Avg episode reward: [(0, '20.351')] +[2024-09-21 13:05:46,564][02676] Updated weights for policy 0, policy_version 790 (0.0019) +[2024-09-21 13:05:49,339][00197] Fps is (10 sec: 3686.4, 60 sec: 3208.5, 300 sec: 3235.1). Total num frames: 3244032. Throughput: 0: 772.3. Samples: 810260. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-09-21 13:05:49,346][00197] Avg episode reward: [(0, '18.912')] +[2024-09-21 13:05:54,339][00197] Fps is (10 sec: 3686.5, 60 sec: 3208.5, 300 sec: 3235.1). Total num frames: 3260416. Throughput: 0: 794.9. Samples: 815192. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-21 13:05:54,346][00197] Avg episode reward: [(0, '19.815')] +[2024-09-21 13:05:59,339][00197] Fps is (10 sec: 2867.2, 60 sec: 3072.0, 300 sec: 3235.1). Total num frames: 3272704. Throughput: 0: 771.2. Samples: 816992. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-09-21 13:05:59,341][00197] Avg episode reward: [(0, '20.632')] +[2024-09-21 13:05:59,430][02676] Updated weights for policy 0, policy_version 800 (0.0012) +[2024-09-21 13:06:04,341][00197] Fps is (10 sec: 3276.2, 60 sec: 3140.2, 300 sec: 3235.1). Total num frames: 3293184. Throughput: 0: 765.5. Samples: 822550. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-09-21 13:06:04,354][00197] Avg episode reward: [(0, '19.979')] +[2024-09-21 13:06:09,342][00197] Fps is (10 sec: 3685.1, 60 sec: 3208.3, 300 sec: 3235.1). Total num frames: 3309568. Throughput: 0: 795.3. Samples: 827644. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-09-21 13:06:09,345][00197] Avg episode reward: [(0, '21.047')] +[2024-09-21 13:06:12,455][02676] Updated weights for policy 0, policy_version 810 (0.0026) +[2024-09-21 13:06:14,339][00197] Fps is (10 sec: 2867.7, 60 sec: 3072.0, 300 sec: 3235.1). Total num frames: 3321856. Throughput: 0: 794.7. Samples: 829370. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-09-21 13:06:14,343][00197] Avg episode reward: [(0, '21.386')] +[2024-09-21 13:06:19,339][00197] Fps is (10 sec: 3277.9, 60 sec: 3140.6, 300 sec: 3235.1). Total num frames: 3342336. Throughput: 0: 805.3. Samples: 834416. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-21 13:06:19,341][00197] Avg episode reward: [(0, '21.754')] +[2024-09-21 13:06:23,300][02676] Updated weights for policy 0, policy_version 820 (0.0014) +[2024-09-21 13:06:24,339][00197] Fps is (10 sec: 3686.4, 60 sec: 3208.7, 300 sec: 3221.3). Total num frames: 3358720. Throughput: 0: 835.7. Samples: 840102. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-21 13:06:24,344][00197] Avg episode reward: [(0, '21.167')] +[2024-09-21 13:06:24,359][02663] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000820_3358720.pth... +[2024-09-21 13:06:24,504][02663] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000629_2576384.pth +[2024-09-21 13:06:29,339][00197] Fps is (10 sec: 2867.3, 60 sec: 3208.5, 300 sec: 3221.3). Total num frames: 3371008. Throughput: 0: 831.1. Samples: 841896. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-09-21 13:06:29,345][00197] Avg episode reward: [(0, '21.691')] +[2024-09-21 13:06:34,339][00197] Fps is (10 sec: 3276.8, 60 sec: 3276.8, 300 sec: 3235.1). Total num frames: 3391488. Throughput: 0: 810.0. Samples: 846710. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-09-21 13:06:34,346][00197] Avg episode reward: [(0, '22.301')] +[2024-09-21 13:06:36,463][02676] Updated weights for policy 0, policy_version 830 (0.0016) +[2024-09-21 13:06:39,341][00197] Fps is (10 sec: 3685.5, 60 sec: 3344.9, 300 sec: 3221.2). Total num frames: 3407872. Throughput: 0: 822.8. Samples: 852222. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-21 13:06:39,350][00197] Avg episode reward: [(0, '21.705')] +[2024-09-21 13:06:44,339][00197] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 3221.3). Total num frames: 3420160. Throughput: 0: 828.9. Samples: 854292. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-09-21 13:06:44,342][00197] Avg episode reward: [(0, '22.008')] +[2024-09-21 13:06:49,339][00197] Fps is (10 sec: 2867.9, 60 sec: 3208.5, 300 sec: 3221.3). Total num frames: 3436544. Throughput: 0: 797.6. Samples: 858442. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-21 13:06:49,348][00197] Avg episode reward: [(0, '22.216')] +[2024-09-21 13:06:49,591][02676] Updated weights for policy 0, policy_version 840 (0.0013) +[2024-09-21 13:06:54,339][00197] Fps is (10 sec: 3686.4, 60 sec: 3276.8, 300 sec: 3221.3). Total num frames: 3457024. Throughput: 0: 814.5. Samples: 864292. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-09-21 13:06:54,346][00197] Avg episode reward: [(0, '22.438')] +[2024-09-21 13:06:59,339][00197] Fps is (10 sec: 3276.8, 60 sec: 3276.8, 300 sec: 3221.3). Total num frames: 3469312. Throughput: 0: 832.8. Samples: 866848. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-09-21 13:06:59,341][00197] Avg episode reward: [(0, '21.336')] +[2024-09-21 13:07:02,373][02676] Updated weights for policy 0, policy_version 850 (0.0018) +[2024-09-21 13:07:04,339][00197] Fps is (10 sec: 2867.2, 60 sec: 3208.6, 300 sec: 3235.1). Total num frames: 3485696. Throughput: 0: 809.4. Samples: 870840. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-21 13:07:04,341][00197] Avg episode reward: [(0, '21.819')] +[2024-09-21 13:07:09,339][00197] Fps is (10 sec: 3686.4, 60 sec: 3277.0, 300 sec: 3262.9). Total num frames: 3506176. Throughput: 0: 811.9. Samples: 876638. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-21 13:07:09,347][00197] Avg episode reward: [(0, '21.699')] +[2024-09-21 13:07:13,673][02676] Updated weights for policy 0, policy_version 860 (0.0019) +[2024-09-21 13:07:14,339][00197] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 3249.0). Total num frames: 3522560. Throughput: 0: 837.8. Samples: 879596. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-09-21 13:07:14,343][00197] Avg episode reward: [(0, '21.429')] +[2024-09-21 13:07:19,339][00197] Fps is (10 sec: 3276.8, 60 sec: 3276.8, 300 sec: 3263.0). Total num frames: 3538944. Throughput: 0: 812.9. Samples: 883290. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-09-21 13:07:19,341][00197] Avg episode reward: [(0, '23.461')] +[2024-09-21 13:07:19,350][02663] Saving new best policy, reward=23.461! +[2024-09-21 13:07:24,339][00197] Fps is (10 sec: 3276.8, 60 sec: 3276.8, 300 sec: 3262.9). Total num frames: 3555328. Throughput: 0: 819.8. Samples: 889110. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-21 13:07:24,349][00197] Avg episode reward: [(0, '21.392')] +[2024-09-21 13:07:25,831][02676] Updated weights for policy 0, policy_version 870 (0.0013) +[2024-09-21 13:07:29,339][00197] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3262.9). Total num frames: 3575808. Throughput: 0: 840.2. Samples: 892102. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-09-21 13:07:29,345][00197] Avg episode reward: [(0, '20.838')] +[2024-09-21 13:07:34,339][00197] Fps is (10 sec: 3276.8, 60 sec: 3276.8, 300 sec: 3262.9). Total num frames: 3588096. Throughput: 0: 841.6. Samples: 896314. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-21 13:07:34,341][00197] Avg episode reward: [(0, '19.682')] +[2024-09-21 13:07:38,389][02676] Updated weights for policy 0, policy_version 880 (0.0012) +[2024-09-21 13:07:39,339][00197] Fps is (10 sec: 3276.7, 60 sec: 3345.2, 300 sec: 3276.8). Total num frames: 3608576. Throughput: 0: 830.1. Samples: 901648. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-09-21 13:07:39,341][00197] Avg episode reward: [(0, '20.984')] +[2024-09-21 13:07:44,344][00197] Fps is (10 sec: 3684.3, 60 sec: 3413.0, 300 sec: 3276.8). Total num frames: 3624960. Throughput: 0: 839.5. Samples: 904628. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-21 13:07:44,352][00197] Avg episode reward: [(0, '19.768')] +[2024-09-21 13:07:49,339][00197] Fps is (10 sec: 2867.2, 60 sec: 3345.1, 300 sec: 3262.9). Total num frames: 3637248. Throughput: 0: 855.3. Samples: 909328. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-21 13:07:49,341][00197] Avg episode reward: [(0, '19.448')] +[2024-09-21 13:07:50,986][02676] Updated weights for policy 0, policy_version 890 (0.0018) +[2024-09-21 13:07:54,339][00197] Fps is (10 sec: 3278.6, 60 sec: 3345.1, 300 sec: 3276.8). Total num frames: 3657728. Throughput: 0: 836.6. Samples: 914284. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-21 13:07:54,341][00197] Avg episode reward: [(0, '21.025')] +[2024-09-21 13:07:59,339][00197] Fps is (10 sec: 4096.1, 60 sec: 3481.6, 300 sec: 3290.7). Total num frames: 3678208. Throughput: 0: 836.0. Samples: 917216. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-09-21 13:07:59,341][00197] Avg episode reward: [(0, '21.265')] +[2024-09-21 13:08:01,954][02676] Updated weights for policy 0, policy_version 900 (0.0020) +[2024-09-21 13:08:04,339][00197] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3276.8). Total num frames: 3690496. Throughput: 0: 867.2. Samples: 922316. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-09-21 13:08:04,341][00197] Avg episode reward: [(0, '22.501')] +[2024-09-21 13:08:09,339][00197] Fps is (10 sec: 2867.2, 60 sec: 3345.1, 300 sec: 3290.7). Total num frames: 3706880. Throughput: 0: 835.5. Samples: 926706. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-09-21 13:08:09,341][00197] Avg episode reward: [(0, '21.782')] +[2024-09-21 13:08:14,127][02676] Updated weights for policy 0, policy_version 910 (0.0016) +[2024-09-21 13:08:14,339][00197] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3290.7). Total num frames: 3727360. Throughput: 0: 834.7. Samples: 929662. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-09-21 13:08:14,346][00197] Avg episode reward: [(0, '22.119')] +[2024-09-21 13:08:19,339][00197] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3290.7). Total num frames: 3743744. Throughput: 0: 863.6. Samples: 935176. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-21 13:08:19,343][00197] Avg episode reward: [(0, '22.512')] +[2024-09-21 13:08:24,339][00197] Fps is (10 sec: 2867.2, 60 sec: 3345.1, 300 sec: 3290.7). Total num frames: 3756032. Throughput: 0: 835.0. Samples: 939224. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-21 13:08:24,345][00197] Avg episode reward: [(0, '23.418')] +[2024-09-21 13:08:24,359][02663] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000917_3756032.pth... +[2024-09-21 13:08:24,358][00197] Components not started: RolloutWorker_w0, RolloutWorker_w2, RolloutWorker_w3, wait_time=1200.0 seconds +[2024-09-21 13:08:24,468][02663] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000726_2973696.pth +[2024-09-21 13:08:26,906][02676] Updated weights for policy 0, policy_version 920 (0.0016) +[2024-09-21 13:08:29,339][00197] Fps is (10 sec: 3276.7, 60 sec: 3345.0, 300 sec: 3290.7). Total num frames: 3776512. Throughput: 0: 832.2. Samples: 942072. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-09-21 13:08:29,343][00197] Avg episode reward: [(0, '23.401')] +[2024-09-21 13:08:34,339][00197] Fps is (10 sec: 3686.3, 60 sec: 3413.3, 300 sec: 3290.7). Total num frames: 3792896. Throughput: 0: 856.2. Samples: 947856. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-21 13:08:34,344][00197] Avg episode reward: [(0, '23.724')] +[2024-09-21 13:08:34,359][02663] Saving new best policy, reward=23.724! +[2024-09-21 13:08:39,339][00197] Fps is (10 sec: 2867.3, 60 sec: 3276.8, 300 sec: 3276.8). Total num frames: 3805184. Throughput: 0: 829.0. Samples: 951588. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-21 13:08:39,346][00197] Avg episode reward: [(0, '23.869')] +[2024-09-21 13:08:39,349][02663] Saving new best policy, reward=23.869! +[2024-09-21 13:08:40,121][02676] Updated weights for policy 0, policy_version 930 (0.0023) +[2024-09-21 13:08:44,342][00197] Fps is (10 sec: 2866.2, 60 sec: 3276.9, 300 sec: 3276.8). Total num frames: 3821568. Throughput: 0: 816.2. Samples: 953950. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-21 13:08:44,345][00197] Avg episode reward: [(0, '24.772')] +[2024-09-21 13:08:44,363][02663] Saving new best policy, reward=24.772! +[2024-09-21 13:08:49,339][00197] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 3249.0). Total num frames: 3833856. Throughput: 0: 785.7. Samples: 957672. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-09-21 13:08:49,343][00197] Avg episode reward: [(0, '25.477')] +[2024-09-21 13:08:49,345][02663] Saving new best policy, reward=25.477! +[2024-09-21 13:08:54,339][00197] Fps is (10 sec: 2458.5, 60 sec: 3140.3, 300 sec: 3249.0). Total num frames: 3846144. Throughput: 0: 782.8. Samples: 961930. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-09-21 13:08:54,345][00197] Avg episode reward: [(0, '25.083')] +[2024-09-21 13:08:55,189][02676] Updated weights for policy 0, policy_version 940 (0.0013) +[2024-09-21 13:08:59,342][00197] Fps is (10 sec: 2866.2, 60 sec: 3071.8, 300 sec: 3249.0). Total num frames: 3862528. Throughput: 0: 766.2. Samples: 964142. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-09-21 13:08:59,349][00197] Avg episode reward: [(0, '23.910')] +[2024-09-21 13:09:04,339][00197] Fps is (10 sec: 3686.4, 60 sec: 3208.5, 300 sec: 3262.9). Total num frames: 3883008. Throughput: 0: 775.4. Samples: 970068. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-09-21 13:09:04,346][00197] Avg episode reward: [(0, '23.625')] +[2024-09-21 13:09:05,646][02676] Updated weights for policy 0, policy_version 950 (0.0014) +[2024-09-21 13:09:09,339][00197] Fps is (10 sec: 3687.6, 60 sec: 3208.5, 300 sec: 3262.9). Total num frames: 3899392. Throughput: 0: 792.2. Samples: 974872. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-21 13:09:09,342][00197] Avg episode reward: [(0, '23.307')] +[2024-09-21 13:09:14,339][00197] Fps is (10 sec: 3276.8, 60 sec: 3140.3, 300 sec: 3262.9). Total num frames: 3915776. Throughput: 0: 770.0. Samples: 976722. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-21 13:09:14,346][00197] Avg episode reward: [(0, '22.475')] +[2024-09-21 13:09:18,340][02676] Updated weights for policy 0, policy_version 960 (0.0022) +[2024-09-21 13:09:19,339][00197] Fps is (10 sec: 3276.8, 60 sec: 3140.3, 300 sec: 3249.0). Total num frames: 3932160. Throughput: 0: 771.0. Samples: 982550. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-21 13:09:19,344][00197] Avg episode reward: [(0, '21.378')] +[2024-09-21 13:09:24,339][00197] Fps is (10 sec: 3276.8, 60 sec: 3208.5, 300 sec: 3249.1). Total num frames: 3948544. Throughput: 0: 806.2. Samples: 987866. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-09-21 13:09:24,344][00197] Avg episode reward: [(0, '21.922')] +[2024-09-21 13:09:29,339][00197] Fps is (10 sec: 3276.8, 60 sec: 3140.3, 300 sec: 3262.9). Total num frames: 3964928. Throughput: 0: 795.4. Samples: 989740. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-21 13:09:29,341][00197] Avg episode reward: [(0, '22.044')] +[2024-09-21 13:09:30,853][02676] Updated weights for policy 0, policy_version 970 (0.0016) +[2024-09-21 13:09:34,339][00197] Fps is (10 sec: 3686.4, 60 sec: 3208.5, 300 sec: 3276.8). Total num frames: 3985408. Throughput: 0: 835.2. Samples: 995258. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-09-21 13:09:34,341][00197] Avg episode reward: [(0, '21.862')] +[2024-09-21 13:09:39,340][00197] Fps is (10 sec: 3685.9, 60 sec: 3276.7, 300 sec: 3262.9). Total num frames: 4001792. Throughput: 0: 864.1. Samples: 1000814. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-21 13:09:39,348][00197] Avg episode reward: [(0, '21.783')] +[2024-09-21 13:09:40,076][02663] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... +[2024-09-21 13:09:40,119][02663] Stopping Batcher_0... +[2024-09-21 13:09:40,120][02663] Loop batcher_evt_loop terminating... +[2024-09-21 13:09:40,120][00197] Component Batcher_0 stopped! +[2024-09-21 13:09:40,126][00197] Component RolloutWorker_w0 process died already! Don't wait for it. +[2024-09-21 13:09:40,130][00197] Component RolloutWorker_w2 process died already! Don't wait for it. +[2024-09-21 13:09:40,135][00197] Component RolloutWorker_w3 process died already! Don't wait for it. +[2024-09-21 13:09:40,204][02676] Weights refcount: 2 0 +[2024-09-21 13:09:40,219][02676] Stopping InferenceWorker_p0-w0... +[2024-09-21 13:09:40,219][02676] Loop inference_proc0-0_evt_loop terminating... +[2024-09-21 13:09:40,219][00197] Component InferenceWorker_p0-w0 stopped! +[2024-09-21 13:09:40,283][02663] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000820_3358720.pth +[2024-09-21 13:09:40,308][02663] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... +[2024-09-21 13:09:40,504][00197] Component LearnerWorker_p0 stopped! +[2024-09-21 13:09:40,521][02663] Stopping LearnerWorker_p0... +[2024-09-21 13:09:40,525][02663] Loop learner_proc0_evt_loop terminating... +[2024-09-21 13:09:40,762][00197] Component RolloutWorker_w6 stopped! +[2024-09-21 13:09:40,768][02683] Stopping RolloutWorker_w6... +[2024-09-21 13:09:40,769][02683] Loop rollout_proc6_evt_loop terminating... +[2024-09-21 13:09:40,790][02678] Stopping RolloutWorker_w1... +[2024-09-21 13:09:40,790][00197] Component RolloutWorker_w1 stopped! +[2024-09-21 13:09:40,793][02678] Loop rollout_proc1_evt_loop terminating... +[2024-09-21 13:09:40,807][00197] Component RolloutWorker_w4 stopped! +[2024-09-21 13:09:40,813][02680] Stopping RolloutWorker_w4... +[2024-09-21 13:09:40,821][02680] Loop rollout_proc4_evt_loop terminating... +[2024-09-21 13:09:40,868][00197] Component RolloutWorker_w7 stopped! +[2024-09-21 13:09:40,871][02684] Stopping RolloutWorker_w7... +[2024-09-21 13:09:40,881][02684] Loop rollout_proc7_evt_loop terminating... +[2024-09-21 13:09:40,895][02682] Stopping RolloutWorker_w5... +[2024-09-21 13:09:40,895][00197] Component RolloutWorker_w5 stopped! +[2024-09-21 13:09:40,899][00197] Waiting for process learner_proc0 to stop... +[2024-09-21 13:09:40,898][02682] Loop rollout_proc5_evt_loop terminating... +[2024-09-21 13:09:42,822][00197] Waiting for process inference_proc0-0 to join... +[2024-09-21 13:09:43,217][00197] Waiting for process rollout_proc0 to join... +[2024-09-21 13:09:43,219][00197] Waiting for process rollout_proc1 to join... +[2024-09-21 13:09:43,944][00197] Waiting for process rollout_proc2 to join... +[2024-09-21 13:09:43,950][00197] Waiting for process rollout_proc3 to join... +[2024-09-21 13:09:43,952][00197] Waiting for process rollout_proc4 to join... +[2024-09-21 13:09:43,956][00197] Waiting for process rollout_proc5 to join... +[2024-09-21 13:09:43,961][00197] Waiting for process rollout_proc6 to join... +[2024-09-21 13:09:43,965][00197] Waiting for process rollout_proc7 to join... +[2024-09-21 13:09:43,970][00197] Batcher 0 profile tree view: +batching: 24.9618, releasing_batches: 0.0299 +[2024-09-21 13:09:43,971][00197] InferenceWorker_p0-w0 profile tree view: +wait_policy: 0.0000 + wait_policy_total: 549.8861 +update_model: 9.6287 + weight_update: 0.0015 +one_step: 0.0065 + handle_policy_step: 638.9649 + deserialize: 16.9350, stack: 3.9038, obs_to_device_normalize: 139.3524, forward: 326.2632, send_messages: 26.0760 + prepare_outputs: 93.2817 + to_cpu: 57.0874 +[2024-09-21 13:09:43,973][00197] Learner 0 profile tree view: +misc: 0.0066, prepare_batch: 16.8214 +train: 72.6075 + epoch_init: 0.0061, minibatch_init: 0.0105, losses_postprocess: 0.5752, kl_divergence: 0.5265, after_optimizer: 32.9221 + calculate_losses: 23.5307 + losses_init: 0.0052, forward_head: 1.7769, bptt_initial: 15.3279, tail: 1.0012, advantages_returns: 0.2803, losses: 2.6819 + bptt: 2.1621 + bptt_forward_core: 2.0823 + update: 14.4247 + clip: 1.5584 +[2024-09-21 13:09:43,974][00197] RolloutWorker_w7 profile tree view: +wait_for_trajectories: 0.4161, enqueue_policy_requests: 150.2985, env_step: 948.2443, overhead: 21.2826, complete_rollouts: 9.3829 +save_policy_outputs: 37.1383 + split_output_tensors: 12.6333 +[2024-09-21 13:09:43,976][00197] Loop Runner_EvtLoop terminating... +[2024-09-21 13:09:43,978][00197] Runner profile tree view: +main_loop: 1273.3851 +[2024-09-21 13:09:43,979][00197] Collected {0: 4005888}, FPS: 3145.9 +[2024-09-21 13:09:44,222][00197] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json +[2024-09-21 13:09:44,226][00197] Overriding arg 'num_workers' with value 1 passed from command line +[2024-09-21 13:09:44,229][00197] Adding new argument 'no_render'=True that is not in the saved config file! +[2024-09-21 13:09:44,231][00197] Adding new argument 'save_video'=True that is not in the saved config file! +[2024-09-21 13:09:44,234][00197] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! +[2024-09-21 13:09:44,235][00197] Adding new argument 'video_name'=None that is not in the saved config file! +[2024-09-21 13:09:44,236][00197] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! +[2024-09-21 13:09:44,238][00197] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! +[2024-09-21 13:09:44,239][00197] Adding new argument 'push_to_hub'=False that is not in the saved config file! +[2024-09-21 13:09:44,240][00197] Adding new argument 'hf_repository'=None that is not in the saved config file! +[2024-09-21 13:09:44,242][00197] Adding new argument 'policy_index'=0 that is not in the saved config file! +[2024-09-21 13:09:44,243][00197] Adding new argument 'eval_deterministic'=False that is not in the saved config file! +[2024-09-21 13:09:44,244][00197] Adding new argument 'train_script'=None that is not in the saved config file! +[2024-09-21 13:09:44,245][00197] Adding new argument 'enjoy_script'=None that is not in the saved config file! +[2024-09-21 13:09:44,247][00197] Using frameskip 1 and render_action_repeat=4 for evaluation +[2024-09-21 13:09:44,268][00197] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-09-21 13:09:44,270][00197] RunningMeanStd input shape: (3, 72, 128) +[2024-09-21 13:09:44,274][00197] RunningMeanStd input shape: (1,) +[2024-09-21 13:09:44,289][00197] ConvEncoder: input_channels=3 +[2024-09-21 13:09:44,422][00197] Conv encoder output size: 512 +[2024-09-21 13:09:44,424][00197] Policy head output size: 512 +[2024-09-21 13:09:46,158][00197] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... +[2024-09-21 13:09:47,069][00197] Num frames 100... +[2024-09-21 13:09:47,206][00197] Num frames 200... +[2024-09-21 13:09:47,338][00197] Num frames 300... +[2024-09-21 13:09:47,467][00197] Num frames 400... +[2024-09-21 13:09:47,602][00197] Num frames 500... +[2024-09-21 13:09:47,724][00197] Num frames 600... +[2024-09-21 13:09:47,860][00197] Num frames 700... +[2024-09-21 13:09:47,963][00197] Avg episode rewards: #0: 15.360, true rewards: #0: 7.360 +[2024-09-21 13:09:47,965][00197] Avg episode reward: 15.360, avg true_objective: 7.360 +[2024-09-21 13:09:48,047][00197] Num frames 800... +[2024-09-21 13:09:48,182][00197] Num frames 900... +[2024-09-21 13:09:48,314][00197] Num frames 1000... +[2024-09-21 13:09:48,435][00197] Num frames 1100... +[2024-09-21 13:09:48,560][00197] Num frames 1200... +[2024-09-21 13:09:48,724][00197] Avg episode rewards: #0: 11.400, true rewards: #0: 6.400 +[2024-09-21 13:09:48,726][00197] Avg episode reward: 11.400, avg true_objective: 6.400 +[2024-09-21 13:09:48,756][00197] Num frames 1300... +[2024-09-21 13:09:48,893][00197] Num frames 1400... +[2024-09-21 13:09:49,021][00197] Num frames 1500... +[2024-09-21 13:09:49,144][00197] Num frames 1600... +[2024-09-21 13:09:49,286][00197] Num frames 1700... +[2024-09-21 13:09:49,414][00197] Num frames 1800... +[2024-09-21 13:09:49,491][00197] Avg episode rewards: #0: 10.720, true rewards: #0: 6.053 +[2024-09-21 13:09:49,493][00197] Avg episode reward: 10.720, avg true_objective: 6.053 +[2024-09-21 13:09:49,614][00197] Num frames 1900... +[2024-09-21 13:09:49,758][00197] Num frames 2000... +[2024-09-21 13:09:49,864][00197] Avg episode rewards: #0: 8.850, true rewards: #0: 5.100 +[2024-09-21 13:09:49,866][00197] Avg episode reward: 8.850, avg true_objective: 5.100 +[2024-09-21 13:09:49,955][00197] Num frames 2100... +[2024-09-21 13:09:50,087][00197] Num frames 2200... +[2024-09-21 13:09:50,219][00197] Num frames 2300... +[2024-09-21 13:09:50,366][00197] Num frames 2400... +[2024-09-21 13:09:50,497][00197] Num frames 2500... +[2024-09-21 13:09:50,610][00197] Avg episode rewards: #0: 8.686, true rewards: #0: 5.086 +[2024-09-21 13:09:50,611][00197] Avg episode reward: 8.686, avg true_objective: 5.086 +[2024-09-21 13:09:50,696][00197] Num frames 2600... +[2024-09-21 13:09:50,820][00197] Num frames 2700... +[2024-09-21 13:09:50,953][00197] Num frames 2800... +[2024-09-21 13:09:51,079][00197] Num frames 2900... +[2024-09-21 13:09:51,204][00197] Num frames 3000... +[2024-09-21 13:09:51,334][00197] Num frames 3100... +[2024-09-21 13:09:51,497][00197] Avg episode rewards: #0: 9.140, true rewards: #0: 5.307 +[2024-09-21 13:09:51,498][00197] Avg episode reward: 9.140, avg true_objective: 5.307 +[2024-09-21 13:09:51,526][00197] Num frames 3200... +[2024-09-21 13:09:51,652][00197] Num frames 3300... +[2024-09-21 13:09:51,795][00197] Num frames 3400... +[2024-09-21 13:09:51,935][00197] Num frames 3500... +[2024-09-21 13:09:52,074][00197] Num frames 3600... +[2024-09-21 13:09:52,199][00197] Num frames 3700... +[2024-09-21 13:09:52,340][00197] Num frames 3800... +[2024-09-21 13:09:52,467][00197] Num frames 3900... +[2024-09-21 13:09:52,600][00197] Num frames 4000... +[2024-09-21 13:09:52,733][00197] Num frames 4100... +[2024-09-21 13:09:52,873][00197] Num frames 4200... +[2024-09-21 13:09:52,982][00197] Avg episode rewards: #0: 11.772, true rewards: #0: 6.057 +[2024-09-21 13:09:52,984][00197] Avg episode reward: 11.772, avg true_objective: 6.057 +[2024-09-21 13:09:53,096][00197] Num frames 4300... +[2024-09-21 13:09:53,278][00197] Num frames 4400... +[2024-09-21 13:09:53,451][00197] Num frames 4500... +[2024-09-21 13:09:53,629][00197] Num frames 4600... +[2024-09-21 13:09:53,801][00197] Num frames 4700... +[2024-09-21 13:09:53,976][00197] Num frames 4800... +[2024-09-21 13:09:54,147][00197] Num frames 4900... +[2024-09-21 13:09:54,225][00197] Avg episode rewards: #0: 12.140, true rewards: #0: 6.140 +[2024-09-21 13:09:54,227][00197] Avg episode reward: 12.140, avg true_objective: 6.140 +[2024-09-21 13:09:54,392][00197] Num frames 5000... +[2024-09-21 13:09:54,582][00197] Num frames 5100... +[2024-09-21 13:09:54,763][00197] Num frames 5200... +[2024-09-21 13:09:54,957][00197] Num frames 5300... +[2024-09-21 13:09:55,178][00197] Avg episode rewards: #0: 11.547, true rewards: #0: 5.991 +[2024-09-21 13:09:55,180][00197] Avg episode reward: 11.547, avg true_objective: 5.991 +[2024-09-21 13:09:55,198][00197] Num frames 5400... +[2024-09-21 13:09:55,384][00197] Num frames 5500... +[2024-09-21 13:09:55,531][00197] Num frames 5600... +[2024-09-21 13:09:55,663][00197] Num frames 5700... +[2024-09-21 13:09:55,785][00197] Num frames 5800... +[2024-09-21 13:09:55,927][00197] Num frames 5900... +[2024-09-21 13:09:56,058][00197] Num frames 6000... +[2024-09-21 13:09:56,153][00197] Avg episode rewards: #0: 11.732, true rewards: #0: 6.032 +[2024-09-21 13:09:56,155][00197] Avg episode reward: 11.732, avg true_objective: 6.032 +[2024-09-21 13:10:35,170][00197] Replay video saved to /content/train_dir/default_experiment/replay.mp4! +[2024-09-21 13:24:42,347][00197] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json +[2024-09-21 13:24:42,349][00197] Overriding arg 'num_workers' with value 1 passed from command line +[2024-09-21 13:24:42,351][00197] Adding new argument 'no_render'=True that is not in the saved config file! +[2024-09-21 13:24:42,354][00197] Adding new argument 'save_video'=True that is not in the saved config file! +[2024-09-21 13:24:42,356][00197] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! +[2024-09-21 13:24:42,358][00197] Adding new argument 'video_name'=None that is not in the saved config file! +[2024-09-21 13:24:42,360][00197] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! +[2024-09-21 13:24:42,361][00197] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! +[2024-09-21 13:24:42,362][00197] Adding new argument 'push_to_hub'=True that is not in the saved config file! +[2024-09-21 13:24:42,363][00197] Adding new argument 'hf_repository'='yhyeo0202/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! +[2024-09-21 13:24:42,364][00197] Adding new argument 'policy_index'=0 that is not in the saved config file! +[2024-09-21 13:24:42,365][00197] Adding new argument 'eval_deterministic'=False that is not in the saved config file! +[2024-09-21 13:24:42,366][00197] Adding new argument 'train_script'=None that is not in the saved config file! +[2024-09-21 13:24:42,368][00197] Adding new argument 'enjoy_script'=None that is not in the saved config file! +[2024-09-21 13:24:42,369][00197] Using frameskip 1 and render_action_repeat=4 for evaluation +[2024-09-21 13:24:42,379][00197] RunningMeanStd input shape: (3, 72, 128) +[2024-09-21 13:24:42,388][00197] RunningMeanStd input shape: (1,) +[2024-09-21 13:24:42,403][00197] ConvEncoder: input_channels=3 +[2024-09-21 13:24:42,442][00197] Conv encoder output size: 512 +[2024-09-21 13:24:42,443][00197] Policy head output size: 512 +[2024-09-21 13:24:42,463][00197] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... +[2024-09-21 13:24:42,979][00197] Num frames 100... +[2024-09-21 13:24:43,104][00197] Num frames 200... +[2024-09-21 13:24:43,227][00197] Num frames 300... +[2024-09-21 13:24:43,355][00197] Num frames 400... +[2024-09-21 13:24:43,476][00197] Num frames 500... +[2024-09-21 13:24:43,607][00197] Num frames 600... +[2024-09-21 13:24:43,730][00197] Num frames 700... +[2024-09-21 13:24:43,857][00197] Num frames 800... +[2024-09-21 13:24:43,954][00197] Avg episode rewards: #0: 17.320, true rewards: #0: 8.320 +[2024-09-21 13:24:43,956][00197] Avg episode reward: 17.320, avg true_objective: 8.320 +[2024-09-21 13:24:44,043][00197] Num frames 900... +[2024-09-21 13:24:44,182][00197] Num frames 1000... +[2024-09-21 13:24:44,308][00197] Num frames 1100... +[2024-09-21 13:24:44,436][00197] Num frames 1200... +[2024-09-21 13:24:44,564][00197] Num frames 1300... +[2024-09-21 13:24:44,693][00197] Num frames 1400... +[2024-09-21 13:24:44,819][00197] Num frames 1500... +[2024-09-21 13:24:44,949][00197] Num frames 1600... +[2024-09-21 13:24:45,073][00197] Num frames 1700... +[2024-09-21 13:24:45,201][00197] Num frames 1800... +[2024-09-21 13:24:45,332][00197] Num frames 1900... +[2024-09-21 13:24:45,477][00197] Avg episode rewards: #0: 22.345, true rewards: #0: 9.845 +[2024-09-21 13:24:45,479][00197] Avg episode reward: 22.345, avg true_objective: 9.845 +[2024-09-21 13:24:45,523][00197] Num frames 2000... +[2024-09-21 13:24:45,653][00197] Num frames 2100... +[2024-09-21 13:24:45,789][00197] Num frames 2200... +[2024-09-21 13:24:45,918][00197] Num frames 2300... +[2024-09-21 13:24:46,068][00197] Avg episode rewards: #0: 17.913, true rewards: #0: 7.913 +[2024-09-21 13:24:46,071][00197] Avg episode reward: 17.913, avg true_objective: 7.913 +[2024-09-21 13:24:46,105][00197] Num frames 2400... +[2024-09-21 13:24:46,227][00197] Num frames 2500... +[2024-09-21 13:24:46,354][00197] Num frames 2600... +[2024-09-21 13:24:46,477][00197] Num frames 2700... +[2024-09-21 13:24:46,604][00197] Num frames 2800... +[2024-09-21 13:24:46,736][00197] Num frames 2900... +[2024-09-21 13:24:46,873][00197] Num frames 3000... +[2024-09-21 13:24:46,998][00197] Num frames 3100... +[2024-09-21 13:24:47,126][00197] Num frames 3200... +[2024-09-21 13:24:47,191][00197] Avg episode rewards: #0: 18.265, true rewards: #0: 8.015 +[2024-09-21 13:24:47,193][00197] Avg episode reward: 18.265, avg true_objective: 8.015 +[2024-09-21 13:24:47,314][00197] Num frames 3300... +[2024-09-21 13:24:47,442][00197] Num frames 3400... +[2024-09-21 13:24:47,609][00197] Num frames 3500... +[2024-09-21 13:24:47,796][00197] Num frames 3600... +[2024-09-21 13:24:48,008][00197] Num frames 3700... +[2024-09-21 13:24:48,382][00197] Num frames 3800... +[2024-09-21 13:24:48,694][00197] Num frames 3900... +[2024-09-21 13:24:49,122][00197] Num frames 4000... +[2024-09-21 13:24:49,460][00197] Num frames 4100... +[2024-09-21 13:24:49,979][00197] Num frames 4200... +[2024-09-21 13:24:50,413][00197] Num frames 4300... +[2024-09-21 13:24:50,808][00197] Num frames 4400... +[2024-09-21 13:24:51,031][00197] Num frames 4500... +[2024-09-21 13:24:51,286][00197] Num frames 4600... +[2024-09-21 13:24:51,541][00197] Num frames 4700... +[2024-09-21 13:24:51,820][00197] Num frames 4800... +[2024-09-21 13:24:52,065][00197] Num frames 4900... +[2024-09-21 13:24:52,323][00197] Num frames 5000... +[2024-09-21 13:24:52,596][00197] Num frames 5100... +[2024-09-21 13:24:52,856][00197] Num frames 5200... +[2024-09-21 13:24:53,130][00197] Num frames 5300... +[2024-09-21 13:24:53,225][00197] Avg episode rewards: #0: 25.412, true rewards: #0: 10.612 +[2024-09-21 13:24:53,230][00197] Avg episode reward: 25.412, avg true_objective: 10.612 +[2024-09-21 13:24:53,360][00197] Num frames 5400... +[2024-09-21 13:24:53,487][00197] Num frames 5500... +[2024-09-21 13:24:53,619][00197] Num frames 5600... +[2024-09-21 13:24:53,757][00197] Num frames 5700... +[2024-09-21 13:24:53,902][00197] Num frames 5800... +[2024-09-21 13:24:54,041][00197] Num frames 5900... +[2024-09-21 13:24:54,165][00197] Num frames 6000... +[2024-09-21 13:24:54,292][00197] Num frames 6100... +[2024-09-21 13:24:54,423][00197] Num frames 6200... +[2024-09-21 13:24:54,553][00197] Num frames 6300... +[2024-09-21 13:24:54,687][00197] Avg episode rewards: #0: 25.103, true rewards: #0: 10.603 +[2024-09-21 13:24:54,689][00197] Avg episode reward: 25.103, avg true_objective: 10.603 +[2024-09-21 13:24:54,738][00197] Num frames 6400... +[2024-09-21 13:24:54,870][00197] Num frames 6500... +[2024-09-21 13:24:54,991][00197] Num frames 6600... +[2024-09-21 13:24:55,132][00197] Num frames 6700... +[2024-09-21 13:24:55,270][00197] Num frames 6800... +[2024-09-21 13:24:55,394][00197] Num frames 6900... +[2024-09-21 13:24:55,522][00197] Num frames 7000... +[2024-09-21 13:24:55,653][00197] Num frames 7100... +[2024-09-21 13:24:55,750][00197] Avg episode rewards: #0: 23.757, true rewards: #0: 10.186 +[2024-09-21 13:24:55,753][00197] Avg episode reward: 23.757, avg true_objective: 10.186 +[2024-09-21 13:24:55,839][00197] Num frames 7200... +[2024-09-21 13:24:55,970][00197] Num frames 7300... +[2024-09-21 13:24:56,105][00197] Num frames 7400... +[2024-09-21 13:24:56,223][00197] Avg episode rewards: #0: 21.312, true rewards: #0: 9.312 +[2024-09-21 13:24:56,225][00197] Avg episode reward: 21.312, avg true_objective: 9.312 +[2024-09-21 13:24:56,290][00197] Num frames 7500... +[2024-09-21 13:24:56,412][00197] Num frames 7600... +[2024-09-21 13:24:56,542][00197] Num frames 7700... +[2024-09-21 13:24:56,669][00197] Num frames 7800... +[2024-09-21 13:24:56,768][00197] Avg episode rewards: #0: 19.927, true rewards: #0: 8.704 +[2024-09-21 13:24:56,769][00197] Avg episode reward: 19.927, avg true_objective: 8.704 +[2024-09-21 13:24:56,866][00197] Num frames 7900... +[2024-09-21 13:24:56,995][00197] Num frames 8000... +[2024-09-21 13:24:57,127][00197] Num frames 8100... +[2024-09-21 13:24:57,252][00197] Num frames 8200... +[2024-09-21 13:24:57,380][00197] Num frames 8300... +[2024-09-21 13:24:57,505][00197] Num frames 8400... +[2024-09-21 13:24:57,627][00197] Num frames 8500... +[2024-09-21 13:24:57,758][00197] Num frames 8600... +[2024-09-21 13:24:57,891][00197] Num frames 8700... +[2024-09-21 13:24:58,017][00197] Num frames 8800... +[2024-09-21 13:24:58,159][00197] Num frames 8900... +[2024-09-21 13:24:58,284][00197] Num frames 9000... +[2024-09-21 13:24:58,375][00197] Avg episode rewards: #0: 20.328, true rewards: #0: 9.028 +[2024-09-21 13:24:58,376][00197] Avg episode reward: 20.328, avg true_objective: 9.028 +[2024-09-21 13:25:58,200][00197] Replay video saved to /content/train_dir/default_experiment/replay.mp4!