[2024-09-21 12:48:30,381][00197] Saving configuration to /content/train_dir/default_experiment/config.json... [2024-09-21 12:48:30,385][00197] Rollout worker 0 uses device cpu [2024-09-21 12:48:30,388][00197] Rollout worker 1 uses device cpu [2024-09-21 12:48:30,391][00197] Rollout worker 2 uses device cpu [2024-09-21 12:48:30,392][00197] Rollout worker 3 uses device cpu [2024-09-21 12:48:30,393][00197] Rollout worker 4 uses device cpu [2024-09-21 12:48:30,395][00197] Rollout worker 5 uses device cpu [2024-09-21 12:48:30,396][00197] Rollout worker 6 uses device cpu [2024-09-21 12:48:30,397][00197] Rollout worker 7 uses device cpu [2024-09-21 12:48:30,558][00197] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-09-21 12:48:30,560][00197] InferenceWorker_p0-w0: min num requests: 2 [2024-09-21 12:48:30,593][00197] Starting all processes... [2024-09-21 12:48:30,594][00197] Starting process learner_proc0 [2024-09-21 12:48:30,637][00197] Starting all processes... [2024-09-21 12:48:30,646][00197] Starting process inference_proc0-0 [2024-09-21 12:48:30,646][00197] Starting process rollout_proc0 [2024-09-21 12:48:30,648][00197] Starting process rollout_proc1 [2024-09-21 12:48:30,648][00197] Starting process rollout_proc2 [2024-09-21 12:48:30,648][00197] Starting process rollout_proc3 [2024-09-21 12:48:30,648][00197] Starting process rollout_proc4 [2024-09-21 12:48:30,648][00197] Starting process rollout_proc5 [2024-09-21 12:48:30,648][00197] Starting process rollout_proc6 [2024-09-21 12:48:30,648][00197] Starting process rollout_proc7 [2024-09-21 12:48:43,166][02682] Worker 5 uses CPU cores [1] [2024-09-21 12:48:43,188][02684] Worker 7 uses CPU cores [1] [2024-09-21 12:48:43,255][02663] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-09-21 12:48:43,255][02663] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2024-09-21 12:48:43,293][02681] Worker 3 uses CPU cores [1] [2024-09-21 12:48:43,312][02663] Num visible devices: 1 [2024-09-21 12:48:43,345][02663] Starting seed is not provided [2024-09-21 12:48:43,346][02663] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-09-21 12:48:43,347][02663] Initializing actor-critic model on device cuda:0 [2024-09-21 12:48:43,348][02663] RunningMeanStd input shape: (3, 72, 128) [2024-09-21 12:48:43,349][02663] RunningMeanStd input shape: (1,) [2024-09-21 12:48:43,379][02676] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-09-21 12:48:43,380][02676] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2024-09-21 12:48:43,400][02676] Num visible devices: 1 [2024-09-21 12:48:43,440][02663] ConvEncoder: input_channels=3 [2024-09-21 12:48:43,475][02679] Worker 0 uses CPU cores [0] [2024-09-21 12:48:43,494][02680] Worker 4 uses CPU cores [0] [2024-09-21 12:48:43,560][02678] Worker 1 uses CPU cores [1] [2024-09-21 12:48:43,570][02677] Worker 2 uses CPU cores [0] [2024-09-21 12:48:43,617][02683] Worker 6 uses CPU cores [0] [2024-09-21 12:48:43,718][02663] Conv encoder output size: 512 [2024-09-21 12:48:43,718][02663] Policy head output size: 512 [2024-09-21 12:48:43,734][02663] Created Actor Critic model with architecture: [2024-09-21 12:48:43,734][02663] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2024-09-21 12:48:48,331][02663] Using optimizer [2024-09-21 12:48:48,332][02663] No checkpoints found [2024-09-21 12:48:48,332][02663] Did not load from checkpoint, starting from scratch! [2024-09-21 12:48:48,332][02663] Initialized policy 0 weights for model version 0 [2024-09-21 12:48:48,338][02663] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-09-21 12:48:48,347][02663] LearnerWorker_p0 finished initialization! [2024-09-21 12:48:48,670][02676] RunningMeanStd input shape: (3, 72, 128) [2024-09-21 12:48:48,672][02676] RunningMeanStd input shape: (1,) [2024-09-21 12:48:48,693][02676] ConvEncoder: input_channels=3 [2024-09-21 12:48:48,865][02676] Conv encoder output size: 512 [2024-09-21 12:48:48,865][02676] Policy head output size: 512 [2024-09-21 12:48:49,339][00197] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-09-21 12:48:50,519][00197] Inference worker 0-0 is ready! [2024-09-21 12:48:50,520][00197] All inference workers are ready! Signal rollout workers to start! [2024-09-21 12:48:50,548][00197] Heartbeat connected on Batcher_0 [2024-09-21 12:48:50,552][00197] Heartbeat connected on LearnerWorker_p0 [2024-09-21 12:48:50,601][00197] Heartbeat connected on InferenceWorker_p0-w0 [2024-09-21 12:48:50,705][02679] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-21 12:48:50,710][02677] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-21 12:48:50,713][02681] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-21 12:48:50,710][02678] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-21 12:48:50,712][02683] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-21 12:48:50,715][02680] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-21 12:48:50,722][02682] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-21 12:48:50,729][02684] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-21 12:48:51,817][02683] Decorrelating experience for 0 frames... [2024-09-21 12:48:51,819][02678] Decorrelating experience for 0 frames... [2024-09-21 12:48:51,818][02682] Decorrelating experience for 0 frames... [2024-09-21 12:48:52,224][02683] Decorrelating experience for 32 frames... [2024-09-21 12:48:52,605][02684] Decorrelating experience for 0 frames... [2024-09-21 12:48:52,617][02678] Decorrelating experience for 32 frames... [2024-09-21 12:48:52,773][02683] Decorrelating experience for 64 frames... [2024-09-21 12:48:53,711][02680] Decorrelating experience for 0 frames... [2024-09-21 12:48:53,897][02683] Decorrelating experience for 96 frames... [2024-09-21 12:48:54,099][00197] Heartbeat connected on RolloutWorker_w6 [2024-09-21 12:48:54,284][02684] Decorrelating experience for 32 frames... [2024-09-21 12:48:54,339][00197] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-09-21 12:48:54,345][02682] Decorrelating experience for 32 frames... [2024-09-21 12:48:54,502][02678] Decorrelating experience for 64 frames... [2024-09-21 12:48:54,533][02680] Decorrelating experience for 32 frames... [2024-09-21 12:48:55,195][02680] Decorrelating experience for 64 frames... [2024-09-21 12:48:55,400][02684] Decorrelating experience for 64 frames... [2024-09-21 12:48:55,544][02678] Decorrelating experience for 96 frames... [2024-09-21 12:48:55,721][00197] Heartbeat connected on RolloutWorker_w1 [2024-09-21 12:48:55,786][02680] Decorrelating experience for 96 frames... [2024-09-21 12:48:55,923][00197] Heartbeat connected on RolloutWorker_w4 [2024-09-21 12:48:56,073][02682] Decorrelating experience for 64 frames... [2024-09-21 12:48:56,711][02684] Decorrelating experience for 96 frames... [2024-09-21 12:48:56,753][02682] Decorrelating experience for 96 frames... [2024-09-21 12:48:56,844][00197] Heartbeat connected on RolloutWorker_w7 [2024-09-21 12:48:56,874][00197] Heartbeat connected on RolloutWorker_w5 [2024-09-21 12:48:59,339][00197] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 2.4. Samples: 24. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-09-21 12:48:59,345][00197] Avg episode reward: [(0, '1.873')] [2024-09-21 12:49:01,793][02663] Signal inference workers to stop experience collection... [2024-09-21 12:49:01,812][02676] InferenceWorker_p0-w0: stopping experience collection [2024-09-21 12:49:03,091][02663] Signal inference workers to resume experience collection... [2024-09-21 12:49:03,094][02676] InferenceWorker_p0-w0: resuming experience collection [2024-09-21 12:49:04,339][00197] Fps is (10 sec: 409.6, 60 sec: 273.1, 300 sec: 273.1). Total num frames: 4096. Throughput: 0: 163.6. Samples: 2454. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-09-21 12:49:04,341][00197] Avg episode reward: [(0, '3.073')] [2024-09-21 12:49:09,339][00197] Fps is (10 sec: 1638.4, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 16384. Throughput: 0: 233.6. Samples: 4672. Policy #0 lag: (min: 0.0, avg: 0.3, max: 2.0) [2024-09-21 12:49:09,341][00197] Avg episode reward: [(0, '3.540')] [2024-09-21 12:49:14,339][00197] Fps is (10 sec: 3276.8, 60 sec: 1474.6, 300 sec: 1474.6). Total num frames: 36864. Throughput: 0: 290.7. Samples: 7268. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-21 12:49:14,343][00197] Avg episode reward: [(0, '3.981')] [2024-09-21 12:49:15,288][02676] Updated weights for policy 0, policy_version 10 (0.0669) [2024-09-21 12:49:19,339][00197] Fps is (10 sec: 3276.8, 60 sec: 1638.4, 300 sec: 1638.4). Total num frames: 49152. Throughput: 0: 402.5. Samples: 12074. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-21 12:49:19,346][00197] Avg episode reward: [(0, '4.221')] [2024-09-21 12:49:24,341][00197] Fps is (10 sec: 2866.6, 60 sec: 1872.3, 300 sec: 1872.3). Total num frames: 65536. Throughput: 0: 473.4. Samples: 16570. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-21 12:49:24,354][00197] Avg episode reward: [(0, '4.367')] [2024-09-21 12:49:28,180][02676] Updated weights for policy 0, policy_version 20 (0.0018) [2024-09-21 12:49:29,347][00197] Fps is (10 sec: 3683.2, 60 sec: 2149.9, 300 sec: 2149.9). Total num frames: 86016. Throughput: 0: 480.5. Samples: 19226. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-21 12:49:29,354][00197] Avg episode reward: [(0, '4.519')] [2024-09-21 12:49:34,339][00197] Fps is (10 sec: 3277.5, 60 sec: 2184.5, 300 sec: 2184.5). Total num frames: 98304. Throughput: 0: 548.0. Samples: 24660. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-21 12:49:34,346][00197] Avg episode reward: [(0, '4.422')] [2024-09-21 12:49:39,339][00197] Fps is (10 sec: 2459.7, 60 sec: 2211.8, 300 sec: 2211.8). Total num frames: 110592. Throughput: 0: 623.0. Samples: 28034. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-09-21 12:49:39,346][00197] Avg episode reward: [(0, '4.319')] [2024-09-21 12:49:39,348][02663] Saving new best policy, reward=4.319! [2024-09-21 12:49:41,840][02676] Updated weights for policy 0, policy_version 30 (0.0028) [2024-09-21 12:49:44,339][00197] Fps is (10 sec: 3276.8, 60 sec: 2383.1, 300 sec: 2383.1). Total num frames: 131072. Throughput: 0: 685.4. Samples: 30866. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-21 12:49:44,347][00197] Avg episode reward: [(0, '4.258')] [2024-09-21 12:49:49,339][00197] Fps is (10 sec: 3686.5, 60 sec: 2457.6, 300 sec: 2457.6). Total num frames: 147456. Throughput: 0: 756.1. Samples: 36480. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-09-21 12:49:49,340][00197] Avg episode reward: [(0, '4.371')] [2024-09-21 12:49:49,346][02663] Saving new best policy, reward=4.371! [2024-09-21 12:49:54,340][00197] Fps is (10 sec: 2866.9, 60 sec: 2662.4, 300 sec: 2457.6). Total num frames: 159744. Throughput: 0: 787.1. Samples: 40094. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-21 12:49:54,343][00197] Avg episode reward: [(0, '4.304')] [2024-09-21 12:49:54,983][02676] Updated weights for policy 0, policy_version 40 (0.0014) [2024-09-21 12:49:59,339][00197] Fps is (10 sec: 3276.8, 60 sec: 3003.7, 300 sec: 2574.6). Total num frames: 180224. Throughput: 0: 790.0. Samples: 42820. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-09-21 12:49:59,346][00197] Avg episode reward: [(0, '4.211')] [2024-09-21 12:50:04,339][00197] Fps is (10 sec: 3686.8, 60 sec: 3208.5, 300 sec: 2621.4). Total num frames: 196608. Throughput: 0: 811.6. Samples: 48598. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-21 12:50:04,342][00197] Avg episode reward: [(0, '4.299')] [2024-09-21 12:50:06,991][02676] Updated weights for policy 0, policy_version 50 (0.0019) [2024-09-21 12:50:09,339][00197] Fps is (10 sec: 2867.2, 60 sec: 3208.5, 300 sec: 2611.2). Total num frames: 208896. Throughput: 0: 798.8. Samples: 52512. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-21 12:50:09,343][00197] Avg episode reward: [(0, '4.440')] [2024-09-21 12:50:09,348][02663] Saving new best policy, reward=4.440! [2024-09-21 12:50:14,339][00197] Fps is (10 sec: 3276.7, 60 sec: 3208.5, 300 sec: 2698.5). Total num frames: 229376. Throughput: 0: 794.8. Samples: 54984. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-21 12:50:14,342][00197] Avg episode reward: [(0, '4.481')] [2024-09-21 12:50:14,350][02663] Saving new best policy, reward=4.481! [2024-09-21 12:50:18,516][02676] Updated weights for policy 0, policy_version 60 (0.0013) [2024-09-21 12:50:19,339][00197] Fps is (10 sec: 3686.4, 60 sec: 3276.8, 300 sec: 2730.7). Total num frames: 245760. Throughput: 0: 801.1. Samples: 60708. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-21 12:50:19,344][00197] Avg episode reward: [(0, '4.668')] [2024-09-21 12:50:19,348][02663] Saving new best policy, reward=4.668! [2024-09-21 12:50:24,340][00197] Fps is (10 sec: 2867.0, 60 sec: 3208.6, 300 sec: 2716.3). Total num frames: 258048. Throughput: 0: 823.9. Samples: 65112. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-21 12:50:24,342][00197] Avg episode reward: [(0, '4.668')] [2024-09-21 12:50:24,355][02663] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000063_258048.pth... [2024-09-21 12:50:29,339][00197] Fps is (10 sec: 2867.2, 60 sec: 3140.7, 300 sec: 2744.3). Total num frames: 274432. Throughput: 0: 802.6. Samples: 66982. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-21 12:50:29,347][00197] Avg episode reward: [(0, '4.484')] [2024-09-21 12:50:31,902][02676] Updated weights for policy 0, policy_version 70 (0.0016) [2024-09-21 12:50:34,339][00197] Fps is (10 sec: 3686.8, 60 sec: 3276.8, 300 sec: 2808.7). Total num frames: 294912. Throughput: 0: 806.9. Samples: 72790. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-09-21 12:50:34,347][00197] Avg episode reward: [(0, '4.335')] [2024-09-21 12:50:39,343][00197] Fps is (10 sec: 3275.4, 60 sec: 3276.6, 300 sec: 2792.6). Total num frames: 307200. Throughput: 0: 834.9. Samples: 77666. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-21 12:50:39,347][00197] Avg episode reward: [(0, '4.423')] [2024-09-21 12:50:44,339][00197] Fps is (10 sec: 2867.2, 60 sec: 3208.5, 300 sec: 2813.8). Total num frames: 323584. Throughput: 0: 814.1. Samples: 79454. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-21 12:50:44,341][00197] Avg episode reward: [(0, '4.471')] [2024-09-21 12:50:44,625][02676] Updated weights for policy 0, policy_version 80 (0.0015) [2024-09-21 12:50:49,339][00197] Fps is (10 sec: 3688.0, 60 sec: 3276.8, 300 sec: 2867.2). Total num frames: 344064. Throughput: 0: 805.3. Samples: 84836. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-21 12:50:49,342][00197] Avg episode reward: [(0, '4.471')] [2024-09-21 12:50:54,339][00197] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 2883.6). Total num frames: 360448. Throughput: 0: 836.6. Samples: 90160. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-21 12:50:54,342][00197] Avg episode reward: [(0, '4.462')] [2024-09-21 12:50:57,769][02676] Updated weights for policy 0, policy_version 90 (0.0017) [2024-09-21 12:50:59,339][00197] Fps is (10 sec: 2867.2, 60 sec: 3208.5, 300 sec: 2867.2). Total num frames: 372736. Throughput: 0: 819.2. Samples: 91846. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-21 12:50:59,342][00197] Avg episode reward: [(0, '4.406')] [2024-09-21 12:51:04,339][00197] Fps is (10 sec: 2867.2, 60 sec: 3208.5, 300 sec: 2882.4). Total num frames: 389120. Throughput: 0: 801.3. Samples: 96768. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-21 12:51:04,342][00197] Avg episode reward: [(0, '4.585')] [2024-09-21 12:51:08,922][02676] Updated weights for policy 0, policy_version 100 (0.0015) [2024-09-21 12:51:09,339][00197] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 2925.7). Total num frames: 409600. Throughput: 0: 831.8. Samples: 102542. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-21 12:51:09,341][00197] Avg episode reward: [(0, '4.465')] [2024-09-21 12:51:14,340][00197] Fps is (10 sec: 3276.3, 60 sec: 3208.5, 300 sec: 2909.5). Total num frames: 421888. Throughput: 0: 832.7. Samples: 104456. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-21 12:51:14,343][00197] Avg episode reward: [(0, '4.544')] [2024-09-21 12:51:19,339][00197] Fps is (10 sec: 2867.2, 60 sec: 3208.5, 300 sec: 2921.8). Total num frames: 438272. Throughput: 0: 802.8. Samples: 108914. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-21 12:51:19,344][00197] Avg episode reward: [(0, '4.672')] [2024-09-21 12:51:19,346][02663] Saving new best policy, reward=4.672! [2024-09-21 12:51:21,840][02676] Updated weights for policy 0, policy_version 110 (0.0013) [2024-09-21 12:51:24,339][00197] Fps is (10 sec: 3687.0, 60 sec: 3345.1, 300 sec: 2959.7). Total num frames: 458752. Throughput: 0: 819.4. Samples: 114534. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-21 12:51:24,342][00197] Avg episode reward: [(0, '4.662')] [2024-09-21 12:51:29,339][00197] Fps is (10 sec: 3276.7, 60 sec: 3276.8, 300 sec: 2944.0). Total num frames: 471040. Throughput: 0: 832.7. Samples: 116928. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-21 12:51:29,342][00197] Avg episode reward: [(0, '4.495')] [2024-09-21 12:51:34,339][00197] Fps is (10 sec: 2867.2, 60 sec: 3208.5, 300 sec: 2954.1). Total num frames: 487424. Throughput: 0: 800.0. Samples: 120836. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-09-21 12:51:34,344][00197] Avg episode reward: [(0, '4.518')] [2024-09-21 12:51:35,074][02676] Updated weights for policy 0, policy_version 120 (0.0015) [2024-09-21 12:51:39,339][00197] Fps is (10 sec: 3686.5, 60 sec: 3345.3, 300 sec: 2987.7). Total num frames: 507904. Throughput: 0: 809.3. Samples: 126580. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-09-21 12:51:39,341][00197] Avg episode reward: [(0, '4.659')] [2024-09-21 12:51:44,339][00197] Fps is (10 sec: 3276.7, 60 sec: 3276.8, 300 sec: 2972.5). Total num frames: 520192. Throughput: 0: 834.9. Samples: 129416. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-21 12:51:44,344][00197] Avg episode reward: [(0, '4.648')] [2024-09-21 12:51:47,979][02676] Updated weights for policy 0, policy_version 130 (0.0024) [2024-09-21 12:51:49,339][00197] Fps is (10 sec: 2867.1, 60 sec: 3208.5, 300 sec: 2981.0). Total num frames: 536576. Throughput: 0: 809.1. Samples: 133176. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-21 12:51:49,343][00197] Avg episode reward: [(0, '4.776')] [2024-09-21 12:51:49,350][02663] Saving new best policy, reward=4.776! [2024-09-21 12:51:54,339][00197] Fps is (10 sec: 3276.9, 60 sec: 3208.5, 300 sec: 2989.0). Total num frames: 552960. Throughput: 0: 801.5. Samples: 138610. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-21 12:51:54,348][00197] Avg episode reward: [(0, '4.740')] [2024-09-21 12:51:59,339][00197] Fps is (10 sec: 3276.9, 60 sec: 3276.8, 300 sec: 2996.5). Total num frames: 569344. Throughput: 0: 819.1. Samples: 141314. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-09-21 12:51:59,343][00197] Avg episode reward: [(0, '4.549')] [2024-09-21 12:51:59,721][02676] Updated weights for policy 0, policy_version 140 (0.0013) [2024-09-21 12:52:04,339][00197] Fps is (10 sec: 2867.2, 60 sec: 3208.5, 300 sec: 2982.7). Total num frames: 581632. Throughput: 0: 812.3. Samples: 145468. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-21 12:52:04,346][00197] Avg episode reward: [(0, '4.582')] [2024-09-21 12:52:09,339][00197] Fps is (10 sec: 3276.8, 60 sec: 3208.5, 300 sec: 3010.6). Total num frames: 602112. Throughput: 0: 802.0. Samples: 150622. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-21 12:52:09,346][00197] Avg episode reward: [(0, '4.607')] [2024-09-21 12:52:12,064][02676] Updated weights for policy 0, policy_version 150 (0.0015) [2024-09-21 12:52:14,339][00197] Fps is (10 sec: 4096.0, 60 sec: 3345.2, 300 sec: 3037.0). Total num frames: 622592. Throughput: 0: 810.9. Samples: 153418. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-09-21 12:52:14,346][00197] Avg episode reward: [(0, '4.729')] [2024-09-21 12:52:19,340][00197] Fps is (10 sec: 2866.8, 60 sec: 3208.5, 300 sec: 3003.7). Total num frames: 630784. Throughput: 0: 822.9. Samples: 157866. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-21 12:52:19,342][00197] Avg episode reward: [(0, '4.702')] [2024-09-21 12:52:24,339][00197] Fps is (10 sec: 2047.9, 60 sec: 3072.0, 300 sec: 2991.0). Total num frames: 643072. Throughput: 0: 766.1. Samples: 161056. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-09-21 12:52:24,347][00197] Avg episode reward: [(0, '4.597')] [2024-09-21 12:52:24,361][02663] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000157_643072.pth... [2024-09-21 12:52:27,845][02676] Updated weights for policy 0, policy_version 160 (0.0028) [2024-09-21 12:52:29,339][00197] Fps is (10 sec: 2867.6, 60 sec: 3140.3, 300 sec: 2997.5). Total num frames: 659456. Throughput: 0: 748.4. Samples: 163094. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-21 12:52:29,341][00197] Avg episode reward: [(0, '4.575')] [2024-09-21 12:52:34,339][00197] Fps is (10 sec: 2867.3, 60 sec: 3072.0, 300 sec: 2985.5). Total num frames: 671744. Throughput: 0: 779.4. Samples: 168248. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-09-21 12:52:34,342][00197] Avg episode reward: [(0, '4.618')] [2024-09-21 12:52:39,339][00197] Fps is (10 sec: 2867.1, 60 sec: 3003.7, 300 sec: 2991.9). Total num frames: 688128. Throughput: 0: 745.1. Samples: 172140. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-21 12:52:39,344][00197] Avg episode reward: [(0, '4.784')] [2024-09-21 12:52:39,347][02663] Saving new best policy, reward=4.784! [2024-09-21 12:52:40,976][02676] Updated weights for policy 0, policy_version 170 (0.0036) [2024-09-21 12:52:44,339][00197] Fps is (10 sec: 3686.4, 60 sec: 3140.3, 300 sec: 3015.4). Total num frames: 708608. Throughput: 0: 746.8. Samples: 174920. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-21 12:52:44,341][00197] Avg episode reward: [(0, '4.826')] [2024-09-21 12:52:44,350][02663] Saving new best policy, reward=4.826! [2024-09-21 12:52:49,344][00197] Fps is (10 sec: 3684.6, 60 sec: 3140.0, 300 sec: 3020.7). Total num frames: 724992. Throughput: 0: 779.2. Samples: 180536. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-21 12:52:49,348][00197] Avg episode reward: [(0, '4.925')] [2024-09-21 12:52:49,350][02663] Saving new best policy, reward=4.925! [2024-09-21 12:52:54,339][00197] Fps is (10 sec: 2457.6, 60 sec: 3003.7, 300 sec: 2992.6). Total num frames: 733184. Throughput: 0: 742.6. Samples: 184040. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-09-21 12:52:54,341][00197] Avg episode reward: [(0, '4.762')] [2024-09-21 12:52:54,415][02676] Updated weights for policy 0, policy_version 180 (0.0013) [2024-09-21 12:52:59,339][00197] Fps is (10 sec: 2868.7, 60 sec: 3072.0, 300 sec: 3014.7). Total num frames: 753664. Throughput: 0: 742.3. Samples: 186820. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-21 12:52:59,341][00197] Avg episode reward: [(0, '4.948')] [2024-09-21 12:52:59,348][02663] Saving new best policy, reward=4.948! [2024-09-21 12:53:04,339][00197] Fps is (10 sec: 4096.0, 60 sec: 3208.5, 300 sec: 3035.9). Total num frames: 774144. Throughput: 0: 769.1. Samples: 192476. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-21 12:53:04,347][00197] Avg episode reward: [(0, '4.775')] [2024-09-21 12:53:05,579][02676] Updated weights for policy 0, policy_version 190 (0.0021) [2024-09-21 12:53:09,339][00197] Fps is (10 sec: 3276.8, 60 sec: 3072.0, 300 sec: 3024.7). Total num frames: 786432. Throughput: 0: 791.2. Samples: 196658. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-09-21 12:53:09,349][00197] Avg episode reward: [(0, '4.998')] [2024-09-21 12:53:09,351][02663] Saving new best policy, reward=4.998! [2024-09-21 12:53:14,339][00197] Fps is (10 sec: 2867.1, 60 sec: 3003.7, 300 sec: 3029.5). Total num frames: 802816. Throughput: 0: 795.3. Samples: 198882. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-09-21 12:53:14,342][00197] Avg episode reward: [(0, '4.838')] [2024-09-21 12:53:18,060][02676] Updated weights for policy 0, policy_version 200 (0.0016) [2024-09-21 12:53:19,339][00197] Fps is (10 sec: 3686.4, 60 sec: 3208.6, 300 sec: 3049.2). Total num frames: 823296. Throughput: 0: 809.5. Samples: 204676. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-21 12:53:19,342][00197] Avg episode reward: [(0, '5.020')] [2024-09-21 12:53:19,346][02663] Saving new best policy, reward=5.020! [2024-09-21 12:53:24,339][00197] Fps is (10 sec: 3276.9, 60 sec: 3208.5, 300 sec: 3038.5). Total num frames: 835584. Throughput: 0: 822.0. Samples: 209128. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-21 12:53:24,344][00197] Avg episode reward: [(0, '4.933')] [2024-09-21 12:53:29,339][00197] Fps is (10 sec: 2867.2, 60 sec: 3208.5, 300 sec: 3042.7). Total num frames: 851968. Throughput: 0: 803.1. Samples: 211060. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-21 12:53:29,347][00197] Avg episode reward: [(0, '4.906')] [2024-09-21 12:53:31,006][02676] Updated weights for policy 0, policy_version 210 (0.0026) [2024-09-21 12:53:34,339][00197] Fps is (10 sec: 3276.8, 60 sec: 3276.8, 300 sec: 3046.8). Total num frames: 868352. Throughput: 0: 804.8. Samples: 216746. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-09-21 12:53:34,341][00197] Avg episode reward: [(0, '4.907')] [2024-09-21 12:53:39,339][00197] Fps is (10 sec: 3276.6, 60 sec: 3276.8, 300 sec: 3050.8). Total num frames: 884736. Throughput: 0: 835.4. Samples: 221632. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-09-21 12:53:39,346][00197] Avg episode reward: [(0, '4.861')] [2024-09-21 12:53:44,339][00197] Fps is (10 sec: 2867.2, 60 sec: 3140.3, 300 sec: 3040.8). Total num frames: 897024. Throughput: 0: 812.3. Samples: 223372. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-21 12:53:44,346][00197] Avg episode reward: [(0, '5.129')] [2024-09-21 12:53:44,354][02663] Saving new best policy, reward=5.129! [2024-09-21 12:53:44,629][02676] Updated weights for policy 0, policy_version 220 (0.0013) [2024-09-21 12:53:49,339][00197] Fps is (10 sec: 3277.0, 60 sec: 3208.8, 300 sec: 3110.2). Total num frames: 917504. Throughput: 0: 800.3. Samples: 228490. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-09-21 12:53:49,349][00197] Avg episode reward: [(0, '5.154')] [2024-09-21 12:53:49,352][02663] Saving new best policy, reward=5.154! [2024-09-21 12:53:54,339][00197] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 3165.7). Total num frames: 933888. Throughput: 0: 822.4. Samples: 233664. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-21 12:53:54,344][00197] Avg episode reward: [(0, '4.935')] [2024-09-21 12:53:57,040][02676] Updated weights for policy 0, policy_version 230 (0.0016) [2024-09-21 12:53:59,339][00197] Fps is (10 sec: 2867.2, 60 sec: 3208.5, 300 sec: 3193.5). Total num frames: 946176. Throughput: 0: 811.6. Samples: 235404. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-09-21 12:53:59,341][00197] Avg episode reward: [(0, '4.885')] [2024-09-21 12:54:04,339][00197] Fps is (10 sec: 2867.1, 60 sec: 3140.3, 300 sec: 3207.4). Total num frames: 962560. Throughput: 0: 785.8. Samples: 240036. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-21 12:54:04,342][00197] Avg episode reward: [(0, '4.929')] [2024-09-21 12:54:08,855][02676] Updated weights for policy 0, policy_version 240 (0.0014) [2024-09-21 12:54:09,339][00197] Fps is (10 sec: 3686.4, 60 sec: 3276.8, 300 sec: 3207.4). Total num frames: 983040. Throughput: 0: 816.6. Samples: 245874. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-21 12:54:09,344][00197] Avg episode reward: [(0, '5.045')] [2024-09-21 12:54:14,339][00197] Fps is (10 sec: 3276.9, 60 sec: 3208.5, 300 sec: 3207.4). Total num frames: 995328. Throughput: 0: 820.7. Samples: 247990. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-09-21 12:54:14,343][00197] Avg episode reward: [(0, '4.970')] [2024-09-21 12:54:19,339][00197] Fps is (10 sec: 2867.2, 60 sec: 3140.3, 300 sec: 3207.4). Total num frames: 1011712. Throughput: 0: 788.4. Samples: 252224. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-21 12:54:19,346][00197] Avg episode reward: [(0, '5.005')] [2024-09-21 12:54:21,835][02676] Updated weights for policy 0, policy_version 250 (0.0018) [2024-09-21 12:54:24,339][00197] Fps is (10 sec: 3686.4, 60 sec: 3276.8, 300 sec: 3207.5). Total num frames: 1032192. Throughput: 0: 806.5. Samples: 257922. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-21 12:54:24,346][00197] Avg episode reward: [(0, '5.164')] [2024-09-21 12:54:24,358][02663] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000252_1032192.pth... [2024-09-21 12:54:24,473][02663] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000063_258048.pth [2024-09-21 12:54:24,496][02663] Saving new best policy, reward=5.164! [2024-09-21 12:54:29,339][00197] Fps is (10 sec: 3276.8, 60 sec: 3208.5, 300 sec: 3207.4). Total num frames: 1044480. Throughput: 0: 820.7. Samples: 260302. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-21 12:54:29,342][00197] Avg episode reward: [(0, '5.441')] [2024-09-21 12:54:29,351][02663] Saving new best policy, reward=5.441! [2024-09-21 12:54:34,339][00197] Fps is (10 sec: 2867.2, 60 sec: 3208.5, 300 sec: 3221.3). Total num frames: 1060864. Throughput: 0: 789.8. Samples: 264030. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-21 12:54:34,345][00197] Avg episode reward: [(0, '5.451')] [2024-09-21 12:54:34,354][02663] Saving new best policy, reward=5.451! [2024-09-21 12:54:35,510][02676] Updated weights for policy 0, policy_version 260 (0.0013) [2024-09-21 12:54:39,339][00197] Fps is (10 sec: 3276.8, 60 sec: 3208.6, 300 sec: 3207.4). Total num frames: 1077248. Throughput: 0: 796.6. Samples: 269512. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-21 12:54:39,341][00197] Avg episode reward: [(0, '5.562')] [2024-09-21 12:54:39,344][02663] Saving new best policy, reward=5.562! [2024-09-21 12:54:44,339][00197] Fps is (10 sec: 3276.6, 60 sec: 3276.8, 300 sec: 3207.4). Total num frames: 1093632. Throughput: 0: 820.3. Samples: 272316. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-21 12:54:44,345][00197] Avg episode reward: [(0, '5.382')] [2024-09-21 12:54:48,565][02676] Updated weights for policy 0, policy_version 270 (0.0022) [2024-09-21 12:54:49,339][00197] Fps is (10 sec: 2867.2, 60 sec: 3140.3, 300 sec: 3207.4). Total num frames: 1105920. Throughput: 0: 799.4. Samples: 276010. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-09-21 12:54:49,342][00197] Avg episode reward: [(0, '5.612')] [2024-09-21 12:54:49,346][02663] Saving new best policy, reward=5.612! [2024-09-21 12:54:54,339][00197] Fps is (10 sec: 3277.0, 60 sec: 3208.5, 300 sec: 3207.4). Total num frames: 1126400. Throughput: 0: 791.4. Samples: 281486. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-09-21 12:54:54,341][00197] Avg episode reward: [(0, '5.640')] [2024-09-21 12:54:54,356][02663] Saving new best policy, reward=5.640! [2024-09-21 12:54:59,339][00197] Fps is (10 sec: 3686.4, 60 sec: 3276.8, 300 sec: 3207.4). Total num frames: 1142784. Throughput: 0: 806.7. Samples: 284290. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-21 12:54:59,347][00197] Avg episode reward: [(0, '5.716')] [2024-09-21 12:54:59,354][02663] Saving new best policy, reward=5.716! [2024-09-21 12:55:00,287][02676] Updated weights for policy 0, policy_version 280 (0.0015) [2024-09-21 12:55:04,339][00197] Fps is (10 sec: 2867.2, 60 sec: 3208.5, 300 sec: 3207.4). Total num frames: 1155072. Throughput: 0: 800.2. Samples: 288232. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-09-21 12:55:04,342][00197] Avg episode reward: [(0, '5.555')] [2024-09-21 12:55:09,339][00197] Fps is (10 sec: 2867.2, 60 sec: 3140.3, 300 sec: 3193.5). Total num frames: 1171456. Throughput: 0: 785.3. Samples: 293262. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-21 12:55:09,341][00197] Avg episode reward: [(0, '5.777')] [2024-09-21 12:55:09,347][02663] Saving new best policy, reward=5.777! [2024-09-21 12:55:12,890][02676] Updated weights for policy 0, policy_version 290 (0.0020) [2024-09-21 12:55:14,339][00197] Fps is (10 sec: 3686.4, 60 sec: 3276.8, 300 sec: 3207.4). Total num frames: 1191936. Throughput: 0: 794.2. Samples: 296042. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-21 12:55:14,347][00197] Avg episode reward: [(0, '5.940')] [2024-09-21 12:55:14,377][02663] Saving new best policy, reward=5.940! [2024-09-21 12:55:19,339][00197] Fps is (10 sec: 3276.7, 60 sec: 3208.5, 300 sec: 3207.4). Total num frames: 1204224. Throughput: 0: 811.8. Samples: 300560. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-09-21 12:55:19,342][00197] Avg episode reward: [(0, '6.001')] [2024-09-21 12:55:19,344][02663] Saving new best policy, reward=6.001! [2024-09-21 12:55:24,340][00197] Fps is (10 sec: 2867.0, 60 sec: 3140.2, 300 sec: 3207.4). Total num frames: 1220608. Throughput: 0: 794.5. Samples: 305266. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-21 12:55:24,342][00197] Avg episode reward: [(0, '6.040')] [2024-09-21 12:55:24,355][02663] Saving new best policy, reward=6.040! [2024-09-21 12:55:26,059][02676] Updated weights for policy 0, policy_version 300 (0.0016) [2024-09-21 12:55:29,339][00197] Fps is (10 sec: 3686.5, 60 sec: 3276.8, 300 sec: 3207.4). Total num frames: 1241088. Throughput: 0: 795.2. Samples: 308100. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-21 12:55:29,345][00197] Avg episode reward: [(0, '6.406')] [2024-09-21 12:55:29,348][02663] Saving new best policy, reward=6.406! [2024-09-21 12:55:34,339][00197] Fps is (10 sec: 3277.1, 60 sec: 3208.5, 300 sec: 3207.4). Total num frames: 1253376. Throughput: 0: 824.1. Samples: 313096. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-21 12:55:34,344][00197] Avg episode reward: [(0, '6.396')] [2024-09-21 12:55:39,339][00197] Fps is (10 sec: 2048.0, 60 sec: 3072.0, 300 sec: 3179.6). Total num frames: 1261568. Throughput: 0: 772.2. Samples: 316236. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-09-21 12:55:39,346][00197] Avg episode reward: [(0, '6.359')] [2024-09-21 12:55:41,132][02676] Updated weights for policy 0, policy_version 310 (0.0035) [2024-09-21 12:55:44,339][00197] Fps is (10 sec: 2457.6, 60 sec: 3072.0, 300 sec: 3165.7). Total num frames: 1277952. Throughput: 0: 748.5. Samples: 317972. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-21 12:55:44,346][00197] Avg episode reward: [(0, '6.559')] [2024-09-21 12:55:44,355][02663] Saving new best policy, reward=6.559! [2024-09-21 12:55:49,339][00197] Fps is (10 sec: 3276.8, 60 sec: 3140.3, 300 sec: 3165.7). Total num frames: 1294336. Throughput: 0: 778.6. Samples: 323270. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-21 12:55:49,344][00197] Avg episode reward: [(0, '6.945')] [2024-09-21 12:55:49,348][02663] Saving new best policy, reward=6.945! [2024-09-21 12:55:54,339][00197] Fps is (10 sec: 2867.2, 60 sec: 3003.7, 300 sec: 3165.7). Total num frames: 1306624. Throughput: 0: 746.7. Samples: 326864. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-21 12:55:54,341][00197] Avg episode reward: [(0, '7.119')] [2024-09-21 12:55:54,349][02663] Saving new best policy, reward=7.119! [2024-09-21 12:55:55,138][02676] Updated weights for policy 0, policy_version 320 (0.0014) [2024-09-21 12:55:59,339][00197] Fps is (10 sec: 2867.2, 60 sec: 3003.7, 300 sec: 3165.7). Total num frames: 1323008. Throughput: 0: 738.7. Samples: 329284. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-21 12:55:59,341][00197] Avg episode reward: [(0, '7.178')] [2024-09-21 12:55:59,344][02663] Saving new best policy, reward=7.178! [2024-09-21 12:56:04,339][00197] Fps is (10 sec: 3686.4, 60 sec: 3140.3, 300 sec: 3165.7). Total num frames: 1343488. Throughput: 0: 763.2. Samples: 334904. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-21 12:56:04,341][00197] Avg episode reward: [(0, '7.374')] [2024-09-21 12:56:04,353][02663] Saving new best policy, reward=7.374! [2024-09-21 12:56:07,037][02676] Updated weights for policy 0, policy_version 330 (0.0013) [2024-09-21 12:56:09,339][00197] Fps is (10 sec: 3276.8, 60 sec: 3072.0, 300 sec: 3165.7). Total num frames: 1355776. Throughput: 0: 752.9. Samples: 339144. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-21 12:56:09,345][00197] Avg episode reward: [(0, '7.841')] [2024-09-21 12:56:09,347][02663] Saving new best policy, reward=7.841! [2024-09-21 12:56:14,339][00197] Fps is (10 sec: 2867.2, 60 sec: 3003.7, 300 sec: 3165.7). Total num frames: 1372160. Throughput: 0: 737.5. Samples: 341288. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-21 12:56:14,347][00197] Avg episode reward: [(0, '8.040')] [2024-09-21 12:56:14,358][02663] Saving new best policy, reward=8.040! [2024-09-21 12:56:19,052][02676] Updated weights for policy 0, policy_version 340 (0.0016) [2024-09-21 12:56:19,339][00197] Fps is (10 sec: 3686.4, 60 sec: 3140.3, 300 sec: 3165.7). Total num frames: 1392640. Throughput: 0: 754.3. Samples: 347038. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-09-21 12:56:19,342][00197] Avg episode reward: [(0, '8.763')] [2024-09-21 12:56:19,348][02663] Saving new best policy, reward=8.763! [2024-09-21 12:56:24,339][00197] Fps is (10 sec: 3276.8, 60 sec: 3072.0, 300 sec: 3165.7). Total num frames: 1404928. Throughput: 0: 785.9. Samples: 351602. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-09-21 12:56:24,344][00197] Avg episode reward: [(0, '8.392')] [2024-09-21 12:56:24,355][02663] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000343_1404928.pth... [2024-09-21 12:56:24,511][02663] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000157_643072.pth [2024-09-21 12:56:29,339][00197] Fps is (10 sec: 2867.2, 60 sec: 3003.7, 300 sec: 3165.7). Total num frames: 1421312. Throughput: 0: 787.4. Samples: 353406. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-21 12:56:29,342][00197] Avg episode reward: [(0, '8.066')] [2024-09-21 12:56:32,363][02676] Updated weights for policy 0, policy_version 350 (0.0015) [2024-09-21 12:56:34,339][00197] Fps is (10 sec: 3276.8, 60 sec: 3072.0, 300 sec: 3151.8). Total num frames: 1437696. Throughput: 0: 792.1. Samples: 358916. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-21 12:56:34,349][00197] Avg episode reward: [(0, '7.630')] [2024-09-21 12:56:39,340][00197] Fps is (10 sec: 3276.4, 60 sec: 3208.5, 300 sec: 3165.7). Total num frames: 1454080. Throughput: 0: 830.1. Samples: 364218. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-09-21 12:56:39,347][00197] Avg episode reward: [(0, '7.400')] [2024-09-21 12:56:44,339][00197] Fps is (10 sec: 3276.8, 60 sec: 3208.5, 300 sec: 3165.7). Total num frames: 1470464. Throughput: 0: 815.8. Samples: 365996. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-21 12:56:44,341][00197] Avg episode reward: [(0, '7.762')] [2024-09-21 12:56:45,331][02676] Updated weights for policy 0, policy_version 360 (0.0013) [2024-09-21 12:56:49,339][00197] Fps is (10 sec: 3277.2, 60 sec: 3208.5, 300 sec: 3165.7). Total num frames: 1486848. Throughput: 0: 805.0. Samples: 371130. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-21 12:56:49,342][00197] Avg episode reward: [(0, '8.366')] [2024-09-21 12:56:54,339][00197] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 3179.6). Total num frames: 1507328. Throughput: 0: 836.6. Samples: 376790. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-09-21 12:56:54,345][00197] Avg episode reward: [(0, '9.184')] [2024-09-21 12:56:54,359][02663] Saving new best policy, reward=9.184! [2024-09-21 12:56:57,699][02676] Updated weights for policy 0, policy_version 370 (0.0013) [2024-09-21 12:56:59,339][00197] Fps is (10 sec: 3276.7, 60 sec: 3276.8, 300 sec: 3179.6). Total num frames: 1519616. Throughput: 0: 827.3. Samples: 378516. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-21 12:56:59,346][00197] Avg episode reward: [(0, '9.815')] [2024-09-21 12:56:59,350][02663] Saving new best policy, reward=9.815! [2024-09-21 12:57:04,339][00197] Fps is (10 sec: 2867.1, 60 sec: 3208.5, 300 sec: 3165.7). Total num frames: 1536000. Throughput: 0: 803.4. Samples: 383192. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-21 12:57:04,347][00197] Avg episode reward: [(0, '10.820')] [2024-09-21 12:57:04,356][02663] Saving new best policy, reward=10.820! [2024-09-21 12:57:08,897][02676] Updated weights for policy 0, policy_version 380 (0.0013) [2024-09-21 12:57:09,339][00197] Fps is (10 sec: 3686.6, 60 sec: 3345.1, 300 sec: 3165.7). Total num frames: 1556480. Throughput: 0: 831.5. Samples: 389020. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-21 12:57:09,344][00197] Avg episode reward: [(0, '11.399')] [2024-09-21 12:57:09,348][02663] Saving new best policy, reward=11.399! [2024-09-21 12:57:14,340][00197] Fps is (10 sec: 3276.9, 60 sec: 3276.8, 300 sec: 3179.6). Total num frames: 1568768. Throughput: 0: 841.0. Samples: 391252. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-21 12:57:14,343][00197] Avg episode reward: [(0, '11.544')] [2024-09-21 12:57:14,366][02663] Saving new best policy, reward=11.544! [2024-09-21 12:57:19,339][00197] Fps is (10 sec: 2867.2, 60 sec: 3208.5, 300 sec: 3193.5). Total num frames: 1585152. Throughput: 0: 814.1. Samples: 395550. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-21 12:57:19,341][00197] Avg episode reward: [(0, '11.526')] [2024-09-21 12:57:21,705][02676] Updated weights for policy 0, policy_version 390 (0.0024) [2024-09-21 12:57:24,339][00197] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 3207.4). Total num frames: 1605632. Throughput: 0: 827.2. Samples: 401442. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-21 12:57:24,343][00197] Avg episode reward: [(0, '10.818')] [2024-09-21 12:57:29,339][00197] Fps is (10 sec: 3276.8, 60 sec: 3276.8, 300 sec: 3207.4). Total num frames: 1617920. Throughput: 0: 847.0. Samples: 404112. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-21 12:57:29,341][00197] Avg episode reward: [(0, '10.785')] [2024-09-21 12:57:34,339][00197] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 3207.4). Total num frames: 1634304. Throughput: 0: 817.9. Samples: 407936. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-21 12:57:34,344][00197] Avg episode reward: [(0, '10.517')] [2024-09-21 12:57:34,586][02676] Updated weights for policy 0, policy_version 400 (0.0012) [2024-09-21 12:57:39,339][00197] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 3207.4). Total num frames: 1654784. Throughput: 0: 823.6. Samples: 413850. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-21 12:57:39,345][00197] Avg episode reward: [(0, '10.234')] [2024-09-21 12:57:44,341][00197] Fps is (10 sec: 3685.5, 60 sec: 3344.9, 300 sec: 3207.4). Total num frames: 1671168. Throughput: 0: 849.4. Samples: 416740. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-21 12:57:44,360][00197] Avg episode reward: [(0, '10.840')] [2024-09-21 12:57:46,890][02676] Updated weights for policy 0, policy_version 410 (0.0013) [2024-09-21 12:57:49,339][00197] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 3221.3). Total num frames: 1683456. Throughput: 0: 830.4. Samples: 420558. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-21 12:57:49,346][00197] Avg episode reward: [(0, '11.751')] [2024-09-21 12:57:49,349][02663] Saving new best policy, reward=11.751! [2024-09-21 12:57:54,341][00197] Fps is (10 sec: 3276.8, 60 sec: 3276.7, 300 sec: 3221.2). Total num frames: 1703936. Throughput: 0: 824.3. Samples: 426116. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-09-21 12:57:54,350][00197] Avg episode reward: [(0, '12.529')] [2024-09-21 12:57:54,363][02663] Saving new best policy, reward=12.529! [2024-09-21 12:57:58,265][02676] Updated weights for policy 0, policy_version 420 (0.0013) [2024-09-21 12:57:59,339][00197] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 3207.4). Total num frames: 1720320. Throughput: 0: 835.7. Samples: 428858. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-21 12:57:59,352][00197] Avg episode reward: [(0, '13.322')] [2024-09-21 12:57:59,357][02663] Saving new best policy, reward=13.322! [2024-09-21 12:58:04,344][00197] Fps is (10 sec: 2866.3, 60 sec: 3276.5, 300 sec: 3207.3). Total num frames: 1732608. Throughput: 0: 834.1. Samples: 433088. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-21 12:58:04,352][00197] Avg episode reward: [(0, '13.278')] [2024-09-21 12:58:09,339][00197] Fps is (10 sec: 3276.8, 60 sec: 3276.8, 300 sec: 3221.3). Total num frames: 1753088. Throughput: 0: 818.5. Samples: 438276. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-21 12:58:09,343][00197] Avg episode reward: [(0, '13.342')] [2024-09-21 12:58:09,348][02663] Saving new best policy, reward=13.342! [2024-09-21 12:58:10,985][02676] Updated weights for policy 0, policy_version 430 (0.0014) [2024-09-21 12:58:14,339][00197] Fps is (10 sec: 4098.3, 60 sec: 3413.3, 300 sec: 3221.3). Total num frames: 1773568. Throughput: 0: 823.2. Samples: 441156. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-21 12:58:14,344][00197] Avg episode reward: [(0, '14.351')] [2024-09-21 12:58:14,355][02663] Saving new best policy, reward=14.351! [2024-09-21 12:58:19,339][00197] Fps is (10 sec: 3276.7, 60 sec: 3345.1, 300 sec: 3221.3). Total num frames: 1785856. Throughput: 0: 842.1. Samples: 445830. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-21 12:58:19,347][00197] Avg episode reward: [(0, '14.216')] [2024-09-21 12:58:23,918][02676] Updated weights for policy 0, policy_version 440 (0.0013) [2024-09-21 12:58:24,339][00197] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 3221.3). Total num frames: 1802240. Throughput: 0: 816.6. Samples: 450596. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-21 12:58:24,340][00197] Avg episode reward: [(0, '15.297')] [2024-09-21 12:58:24,349][02663] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000440_1802240.pth... [2024-09-21 12:58:24,352][00197] Components not started: RolloutWorker_w0, RolloutWorker_w2, RolloutWorker_w3, wait_time=600.0 seconds [2024-09-21 12:58:24,466][02663] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000252_1032192.pth [2024-09-21 12:58:24,479][02663] Saving new best policy, reward=15.297! [2024-09-21 12:58:29,339][00197] Fps is (10 sec: 3276.9, 60 sec: 3345.1, 300 sec: 3221.3). Total num frames: 1818624. Throughput: 0: 813.7. Samples: 453354. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-09-21 12:58:29,341][00197] Avg episode reward: [(0, '15.638')] [2024-09-21 12:58:29,345][02663] Saving new best policy, reward=15.638! [2024-09-21 12:58:34,339][00197] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3221.3). Total num frames: 1835008. Throughput: 0: 839.8. Samples: 458350. Policy #0 lag: (min: 0.0, avg: 0.3, max: 2.0) [2024-09-21 12:58:34,341][00197] Avg episode reward: [(0, '15.349')] [2024-09-21 12:58:36,930][02676] Updated weights for policy 0, policy_version 450 (0.0015) [2024-09-21 12:58:39,339][00197] Fps is (10 sec: 3276.8, 60 sec: 3276.8, 300 sec: 3235.1). Total num frames: 1851392. Throughput: 0: 814.1. Samples: 462748. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-21 12:58:39,347][00197] Avg episode reward: [(0, '14.796')] [2024-09-21 12:58:44,339][00197] Fps is (10 sec: 3276.8, 60 sec: 3276.9, 300 sec: 3221.3). Total num frames: 1867776. Throughput: 0: 817.4. Samples: 465640. Policy #0 lag: (min: 0.0, avg: 0.2, max: 2.0) [2024-09-21 12:58:44,343][00197] Avg episode reward: [(0, '13.785')] [2024-09-21 12:58:47,802][02676] Updated weights for policy 0, policy_version 460 (0.0017) [2024-09-21 12:58:49,339][00197] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3221.3). Total num frames: 1884160. Throughput: 0: 847.2. Samples: 471208. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2024-09-21 12:58:49,341][00197] Avg episode reward: [(0, '12.993')] [2024-09-21 12:58:54,339][00197] Fps is (10 sec: 2867.2, 60 sec: 3208.7, 300 sec: 3221.3). Total num frames: 1896448. Throughput: 0: 806.7. Samples: 474578. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-21 12:58:54,344][00197] Avg episode reward: [(0, '12.421')] [2024-09-21 12:58:59,339][00197] Fps is (10 sec: 2867.2, 60 sec: 3208.5, 300 sec: 3221.3). Total num frames: 1912832. Throughput: 0: 782.8. Samples: 476382. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-21 12:58:59,341][00197] Avg episode reward: [(0, '11.258')] [2024-09-21 12:59:02,467][02676] Updated weights for policy 0, policy_version 470 (0.0014) [2024-09-21 12:59:04,339][00197] Fps is (10 sec: 3276.7, 60 sec: 3277.1, 300 sec: 3207.4). Total num frames: 1929216. Throughput: 0: 792.2. Samples: 481478. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2024-09-21 12:59:04,344][00197] Avg episode reward: [(0, '12.164')] [2024-09-21 12:59:09,339][00197] Fps is (10 sec: 2867.2, 60 sec: 3140.3, 300 sec: 3207.4). Total num frames: 1941504. Throughput: 0: 781.6. Samples: 485770. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-21 12:59:09,343][00197] Avg episode reward: [(0, '13.065')] [2024-09-21 12:59:14,339][00197] Fps is (10 sec: 3276.9, 60 sec: 3140.3, 300 sec: 3221.3). Total num frames: 1961984. Throughput: 0: 772.2. Samples: 488104. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-09-21 12:59:14,341][00197] Avg episode reward: [(0, '13.984')] [2024-09-21 12:59:15,470][02676] Updated weights for policy 0, policy_version 480 (0.0017) [2024-09-21 12:59:19,339][00197] Fps is (10 sec: 3686.4, 60 sec: 3208.5, 300 sec: 3207.4). Total num frames: 1978368. Throughput: 0: 790.6. Samples: 493926. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-21 12:59:19,341][00197] Avg episode reward: [(0, '15.054')] [2024-09-21 12:59:24,344][00197] Fps is (10 sec: 2865.6, 60 sec: 3140.0, 300 sec: 3207.3). Total num frames: 1990656. Throughput: 0: 792.8. Samples: 498426. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-21 12:59:24,347][00197] Avg episode reward: [(0, '14.439')] [2024-09-21 12:59:28,340][02676] Updated weights for policy 0, policy_version 490 (0.0023) [2024-09-21 12:59:29,339][00197] Fps is (10 sec: 2867.2, 60 sec: 3140.3, 300 sec: 3207.4). Total num frames: 2007040. Throughput: 0: 770.0. Samples: 500290. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-21 12:59:29,341][00197] Avg episode reward: [(0, '14.703')] [2024-09-21 12:59:34,339][00197] Fps is (10 sec: 3688.4, 60 sec: 3208.5, 300 sec: 3221.3). Total num frames: 2027520. Throughput: 0: 774.8. Samples: 506074. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-21 12:59:34,344][00197] Avg episode reward: [(0, '14.537')] [2024-09-21 12:59:39,340][00197] Fps is (10 sec: 3686.0, 60 sec: 3208.5, 300 sec: 3221.3). Total num frames: 2043904. Throughput: 0: 814.2. Samples: 511220. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-21 12:59:39,342][00197] Avg episode reward: [(0, '14.103')] [2024-09-21 12:59:40,450][02676] Updated weights for policy 0, policy_version 500 (0.0013) [2024-09-21 12:59:44,339][00197] Fps is (10 sec: 2867.2, 60 sec: 3140.3, 300 sec: 3221.3). Total num frames: 2056192. Throughput: 0: 814.8. Samples: 513046. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-21 12:59:44,343][00197] Avg episode reward: [(0, '15.235')] [2024-09-21 12:59:49,339][00197] Fps is (10 sec: 3277.2, 60 sec: 3208.5, 300 sec: 3221.3). Total num frames: 2076672. Throughput: 0: 823.4. Samples: 518530. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-09-21 12:59:49,341][00197] Avg episode reward: [(0, '15.993')] [2024-09-21 12:59:49,345][02663] Saving new best policy, reward=15.993! [2024-09-21 12:59:51,708][02676] Updated weights for policy 0, policy_version 510 (0.0013) [2024-09-21 12:59:54,339][00197] Fps is (10 sec: 3686.3, 60 sec: 3276.8, 300 sec: 3221.3). Total num frames: 2093056. Throughput: 0: 845.4. Samples: 523812. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-21 12:59:54,349][00197] Avg episode reward: [(0, '15.430')] [2024-09-21 12:59:59,339][00197] Fps is (10 sec: 2867.2, 60 sec: 3208.5, 300 sec: 3221.3). Total num frames: 2105344. Throughput: 0: 833.3. Samples: 525602. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-09-21 12:59:59,343][00197] Avg episode reward: [(0, '15.367')] [2024-09-21 13:00:04,339][00197] Fps is (10 sec: 3276.9, 60 sec: 3276.8, 300 sec: 3235.1). Total num frames: 2125824. Throughput: 0: 812.9. Samples: 530508. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-21 13:00:04,341][00197] Avg episode reward: [(0, '15.635')] [2024-09-21 13:00:04,902][02676] Updated weights for policy 0, policy_version 520 (0.0019) [2024-09-21 13:00:09,339][00197] Fps is (10 sec: 4096.0, 60 sec: 3413.3, 300 sec: 3235.1). Total num frames: 2146304. Throughput: 0: 843.3. Samples: 536370. Policy #0 lag: (min: 0.0, avg: 0.3, max: 2.0) [2024-09-21 13:00:09,342][00197] Avg episode reward: [(0, '14.817')] [2024-09-21 13:00:14,342][00197] Fps is (10 sec: 2866.2, 60 sec: 3208.3, 300 sec: 3221.2). Total num frames: 2154496. Throughput: 0: 843.5. Samples: 538252. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-21 13:00:14,348][00197] Avg episode reward: [(0, '15.768')] [2024-09-21 13:00:17,642][02676] Updated weights for policy 0, policy_version 530 (0.0014) [2024-09-21 13:00:19,339][00197] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 3235.2). Total num frames: 2174976. Throughput: 0: 819.1. Samples: 542932. Policy #0 lag: (min: 0.0, avg: 0.3, max: 2.0) [2024-09-21 13:00:19,341][00197] Avg episode reward: [(0, '16.211')] [2024-09-21 13:00:19,344][02663] Saving new best policy, reward=16.211! [2024-09-21 13:00:24,339][00197] Fps is (10 sec: 4097.4, 60 sec: 3413.6, 300 sec: 3235.1). Total num frames: 2195456. Throughput: 0: 830.2. Samples: 548576. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-09-21 13:00:24,341][00197] Avg episode reward: [(0, '17.823')] [2024-09-21 13:00:24,358][02663] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000536_2195456.pth... [2024-09-21 13:00:24,483][02663] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000343_1404928.pth [2024-09-21 13:00:24,499][02663] Saving new best policy, reward=17.823! [2024-09-21 13:00:29,339][00197] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3235.1). Total num frames: 2207744. Throughput: 0: 837.3. Samples: 550726. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-21 13:00:29,346][00197] Avg episode reward: [(0, '18.162')] [2024-09-21 13:00:29,349][02663] Saving new best policy, reward=18.162! [2024-09-21 13:00:30,845][02676] Updated weights for policy 0, policy_version 540 (0.0021) [2024-09-21 13:00:34,339][00197] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 3262.9). Total num frames: 2224128. Throughput: 0: 804.0. Samples: 554712. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-09-21 13:00:34,346][00197] Avg episode reward: [(0, '18.307')] [2024-09-21 13:00:34,356][02663] Saving new best policy, reward=18.307! [2024-09-21 13:00:39,339][00197] Fps is (10 sec: 3276.8, 60 sec: 3276.9, 300 sec: 3262.9). Total num frames: 2240512. Throughput: 0: 814.2. Samples: 560450. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-09-21 13:00:39,346][00197] Avg episode reward: [(0, '19.099')] [2024-09-21 13:00:39,349][02663] Saving new best policy, reward=19.099! [2024-09-21 13:00:42,260][02676] Updated weights for policy 0, policy_version 550 (0.0013) [2024-09-21 13:00:44,339][00197] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3262.9). Total num frames: 2256896. Throughput: 0: 831.9. Samples: 563038. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-21 13:00:44,345][00197] Avg episode reward: [(0, '19.093')] [2024-09-21 13:00:49,339][00197] Fps is (10 sec: 2867.2, 60 sec: 3208.5, 300 sec: 3262.9). Total num frames: 2269184. Throughput: 0: 806.0. Samples: 566780. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-09-21 13:00:49,341][00197] Avg episode reward: [(0, '18.423')] [2024-09-21 13:00:54,339][00197] Fps is (10 sec: 3276.7, 60 sec: 3276.8, 300 sec: 3276.8). Total num frames: 2289664. Throughput: 0: 805.0. Samples: 572596. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-21 13:00:54,347][00197] Avg episode reward: [(0, '18.280')] [2024-09-21 13:00:54,854][02676] Updated weights for policy 0, policy_version 560 (0.0018) [2024-09-21 13:00:59,339][00197] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 3262.9). Total num frames: 2306048. Throughput: 0: 825.6. Samples: 575402. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-21 13:00:59,342][00197] Avg episode reward: [(0, '19.026')] [2024-09-21 13:01:04,339][00197] Fps is (10 sec: 2867.3, 60 sec: 3208.5, 300 sec: 3262.9). Total num frames: 2318336. Throughput: 0: 805.2. Samples: 579164. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-21 13:01:04,347][00197] Avg episode reward: [(0, '18.312')] [2024-09-21 13:01:07,764][02676] Updated weights for policy 0, policy_version 570 (0.0014) [2024-09-21 13:01:09,339][00197] Fps is (10 sec: 3276.8, 60 sec: 3208.5, 300 sec: 3276.8). Total num frames: 2338816. Throughput: 0: 803.2. Samples: 584718. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-21 13:01:09,342][00197] Avg episode reward: [(0, '17.515')] [2024-09-21 13:01:14,339][00197] Fps is (10 sec: 3686.4, 60 sec: 3345.3, 300 sec: 3262.9). Total num frames: 2355200. Throughput: 0: 820.0. Samples: 587624. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-21 13:01:14,342][00197] Avg episode reward: [(0, '17.064')] [2024-09-21 13:01:19,339][00197] Fps is (10 sec: 2867.2, 60 sec: 3208.5, 300 sec: 3262.9). Total num frames: 2367488. Throughput: 0: 825.4. Samples: 591856. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-21 13:01:19,346][00197] Avg episode reward: [(0, '16.955')] [2024-09-21 13:01:20,594][02676] Updated weights for policy 0, policy_version 580 (0.0017) [2024-09-21 13:01:24,339][00197] Fps is (10 sec: 3276.8, 60 sec: 3208.5, 300 sec: 3276.8). Total num frames: 2387968. Throughput: 0: 815.5. Samples: 597148. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-21 13:01:24,342][00197] Avg episode reward: [(0, '16.705')] [2024-09-21 13:01:29,339][00197] Fps is (10 sec: 4096.0, 60 sec: 3345.1, 300 sec: 3290.7). Total num frames: 2408448. Throughput: 0: 822.3. Samples: 600040. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-21 13:01:29,349][00197] Avg episode reward: [(0, '16.183')] [2024-09-21 13:01:32,487][02676] Updated weights for policy 0, policy_version 590 (0.0016) [2024-09-21 13:01:34,339][00197] Fps is (10 sec: 3276.8, 60 sec: 3276.8, 300 sec: 3276.8). Total num frames: 2420736. Throughput: 0: 840.1. Samples: 604584. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-21 13:01:34,341][00197] Avg episode reward: [(0, '16.198')] [2024-09-21 13:01:39,339][00197] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 3276.8). Total num frames: 2437120. Throughput: 0: 816.6. Samples: 609344. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-21 13:01:39,344][00197] Avg episode reward: [(0, '16.723')] [2024-09-21 13:01:44,206][02676] Updated weights for policy 0, policy_version 600 (0.0015) [2024-09-21 13:01:44,339][00197] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 3290.7). Total num frames: 2457600. Throughput: 0: 818.2. Samples: 612222. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-21 13:01:44,344][00197] Avg episode reward: [(0, '16.832')] [2024-09-21 13:01:49,339][00197] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3262.9). Total num frames: 2469888. Throughput: 0: 847.4. Samples: 617296. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-09-21 13:01:49,343][00197] Avg episode reward: [(0, '16.909')] [2024-09-21 13:01:54,339][00197] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 3276.8). Total num frames: 2486272. Throughput: 0: 819.1. Samples: 621578. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-21 13:01:54,341][00197] Avg episode reward: [(0, '17.403')] [2024-09-21 13:01:57,232][02676] Updated weights for policy 0, policy_version 610 (0.0014) [2024-09-21 13:01:59,339][00197] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 3290.7). Total num frames: 2506752. Throughput: 0: 817.0. Samples: 624388. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-21 13:01:59,341][00197] Avg episode reward: [(0, '18.103')] [2024-09-21 13:02:04,341][00197] Fps is (10 sec: 3276.2, 60 sec: 3345.0, 300 sec: 3262.9). Total num frames: 2519040. Throughput: 0: 844.9. Samples: 629878. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-21 13:02:04,343][00197] Avg episode reward: [(0, '18.355')] [2024-09-21 13:02:09,339][00197] Fps is (10 sec: 2457.5, 60 sec: 3208.5, 300 sec: 3262.9). Total num frames: 2531328. Throughput: 0: 805.4. Samples: 633392. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-21 13:02:09,347][00197] Avg episode reward: [(0, '17.487')] [2024-09-21 13:02:11,496][02676] Updated weights for policy 0, policy_version 620 (0.0017) [2024-09-21 13:02:14,339][00197] Fps is (10 sec: 2458.1, 60 sec: 3140.3, 300 sec: 3249.0). Total num frames: 2543616. Throughput: 0: 780.5. Samples: 635164. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-09-21 13:02:14,341][00197] Avg episode reward: [(0, '17.676')] [2024-09-21 13:02:19,339][00197] Fps is (10 sec: 3276.9, 60 sec: 3276.8, 300 sec: 3249.0). Total num frames: 2564096. Throughput: 0: 787.1. Samples: 640002. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-21 13:02:19,341][00197] Avg episode reward: [(0, '18.051')] [2024-09-21 13:02:24,345][00197] Fps is (10 sec: 3274.7, 60 sec: 3139.9, 300 sec: 3249.0). Total num frames: 2576384. Throughput: 0: 769.7. Samples: 643986. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-21 13:02:24,360][00197] Avg episode reward: [(0, '18.511')] [2024-09-21 13:02:24,373][02663] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000629_2576384.pth... [2024-09-21 13:02:24,538][02663] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000440_1802240.pth [2024-09-21 13:02:25,684][02676] Updated weights for policy 0, policy_version 630 (0.0018) [2024-09-21 13:02:29,339][00197] Fps is (10 sec: 2867.2, 60 sec: 3072.0, 300 sec: 3249.0). Total num frames: 2592768. Throughput: 0: 759.1. Samples: 646380. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-21 13:02:29,341][00197] Avg episode reward: [(0, '18.504')] [2024-09-21 13:02:34,339][00197] Fps is (10 sec: 3688.7, 60 sec: 3208.5, 300 sec: 3249.0). Total num frames: 2613248. Throughput: 0: 773.7. Samples: 652114. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-21 13:02:34,350][00197] Avg episode reward: [(0, '18.525')] [2024-09-21 13:02:36,993][02676] Updated weights for policy 0, policy_version 640 (0.0013) [2024-09-21 13:02:39,339][00197] Fps is (10 sec: 3276.8, 60 sec: 3140.3, 300 sec: 3235.2). Total num frames: 2625536. Throughput: 0: 778.2. Samples: 656598. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-21 13:02:39,344][00197] Avg episode reward: [(0, '19.517')] [2024-09-21 13:02:39,348][02663] Saving new best policy, reward=19.517! [2024-09-21 13:02:44,339][00197] Fps is (10 sec: 2867.2, 60 sec: 3072.0, 300 sec: 3249.0). Total num frames: 2641920. Throughput: 0: 758.8. Samples: 658534. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-21 13:02:44,341][00197] Avg episode reward: [(0, '19.526')] [2024-09-21 13:02:44,350][02663] Saving new best policy, reward=19.526! [2024-09-21 13:02:49,342][00197] Fps is (10 sec: 3275.7, 60 sec: 3140.1, 300 sec: 3235.1). Total num frames: 2658304. Throughput: 0: 762.9. Samples: 664208. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-09-21 13:02:49,345][00197] Avg episode reward: [(0, '19.726')] [2024-09-21 13:02:49,346][02663] Saving new best policy, reward=19.726! [2024-09-21 13:02:49,542][02676] Updated weights for policy 0, policy_version 650 (0.0019) [2024-09-21 13:02:54,339][00197] Fps is (10 sec: 3276.8, 60 sec: 3140.3, 300 sec: 3235.1). Total num frames: 2674688. Throughput: 0: 793.2. Samples: 669088. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-21 13:02:54,341][00197] Avg episode reward: [(0, '19.984')] [2024-09-21 13:02:54,350][02663] Saving new best policy, reward=19.984! [2024-09-21 13:02:59,339][00197] Fps is (10 sec: 3277.9, 60 sec: 3072.0, 300 sec: 3249.1). Total num frames: 2691072. Throughput: 0: 794.0. Samples: 670892. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-21 13:02:59,344][00197] Avg episode reward: [(0, '19.492')] [2024-09-21 13:03:02,599][02676] Updated weights for policy 0, policy_version 660 (0.0023) [2024-09-21 13:03:04,339][00197] Fps is (10 sec: 3276.8, 60 sec: 3140.4, 300 sec: 3235.1). Total num frames: 2707456. Throughput: 0: 804.4. Samples: 676200. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-21 13:03:04,345][00197] Avg episode reward: [(0, '19.700')] [2024-09-21 13:03:09,339][00197] Fps is (10 sec: 3276.8, 60 sec: 3208.5, 300 sec: 3221.3). Total num frames: 2723840. Throughput: 0: 836.5. Samples: 681622. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-21 13:03:09,342][00197] Avg episode reward: [(0, '20.278')] [2024-09-21 13:03:09,352][02663] Saving new best policy, reward=20.278! [2024-09-21 13:03:14,339][00197] Fps is (10 sec: 2867.2, 60 sec: 3208.5, 300 sec: 3221.3). Total num frames: 2736128. Throughput: 0: 821.6. Samples: 683354. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-21 13:03:14,346][00197] Avg episode reward: [(0, '21.450')] [2024-09-21 13:03:14,357][02663] Saving new best policy, reward=21.450! [2024-09-21 13:03:15,757][02676] Updated weights for policy 0, policy_version 670 (0.0014) [2024-09-21 13:03:19,339][00197] Fps is (10 sec: 3276.8, 60 sec: 3208.5, 300 sec: 3235.1). Total num frames: 2756608. Throughput: 0: 803.1. Samples: 688254. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2024-09-21 13:03:19,344][00197] Avg episode reward: [(0, '20.901')] [2024-09-21 13:03:24,339][00197] Fps is (10 sec: 3686.4, 60 sec: 3277.1, 300 sec: 3235.1). Total num frames: 2772992. Throughput: 0: 831.6. Samples: 694018. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-21 13:03:24,349][00197] Avg episode reward: [(0, '21.246')] [2024-09-21 13:03:27,703][02676] Updated weights for policy 0, policy_version 680 (0.0017) [2024-09-21 13:03:29,339][00197] Fps is (10 sec: 2867.2, 60 sec: 3208.5, 300 sec: 3221.3). Total num frames: 2785280. Throughput: 0: 830.6. Samples: 695910. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-09-21 13:03:29,341][00197] Avg episode reward: [(0, '22.096')] [2024-09-21 13:03:29,344][02663] Saving new best policy, reward=22.096! [2024-09-21 13:03:34,339][00197] Fps is (10 sec: 3276.8, 60 sec: 3208.5, 300 sec: 3235.1). Total num frames: 2805760. Throughput: 0: 805.3. Samples: 700444. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-21 13:03:34,341][00197] Avg episode reward: [(0, '22.424')] [2024-09-21 13:03:34,357][02663] Saving new best policy, reward=22.424! [2024-09-21 13:03:39,339][00197] Fps is (10 sec: 3686.4, 60 sec: 3276.8, 300 sec: 3235.1). Total num frames: 2822144. Throughput: 0: 822.5. Samples: 706102. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-09-21 13:03:39,346][00197] Avg episode reward: [(0, '21.657')] [2024-09-21 13:03:39,371][02676] Updated weights for policy 0, policy_version 690 (0.0015) [2024-09-21 13:03:44,339][00197] Fps is (10 sec: 3276.8, 60 sec: 3276.8, 300 sec: 3235.1). Total num frames: 2838528. Throughput: 0: 835.0. Samples: 708468. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-21 13:03:44,341][00197] Avg episode reward: [(0, '20.963')] [2024-09-21 13:03:49,339][00197] Fps is (10 sec: 3276.8, 60 sec: 3277.0, 300 sec: 3249.0). Total num frames: 2854912. Throughput: 0: 808.0. Samples: 712562. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2024-09-21 13:03:49,346][00197] Avg episode reward: [(0, '20.822')] [2024-09-21 13:03:52,268][02676] Updated weights for policy 0, policy_version 700 (0.0013) [2024-09-21 13:03:54,339][00197] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 3262.9). Total num frames: 2875392. Throughput: 0: 817.5. Samples: 718408. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-21 13:03:54,346][00197] Avg episode reward: [(0, '20.828')] [2024-09-21 13:03:59,339][00197] Fps is (10 sec: 3276.8, 60 sec: 3276.8, 300 sec: 3249.0). Total num frames: 2887680. Throughput: 0: 841.4. Samples: 721218. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-21 13:03:59,345][00197] Avg episode reward: [(0, '19.314')] [2024-09-21 13:04:04,339][00197] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 3262.9). Total num frames: 2904064. Throughput: 0: 814.2. Samples: 724894. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-09-21 13:04:04,345][00197] Avg episode reward: [(0, '19.743')] [2024-09-21 13:04:05,063][02676] Updated weights for policy 0, policy_version 710 (0.0016) [2024-09-21 13:04:09,339][00197] Fps is (10 sec: 3276.8, 60 sec: 3276.8, 300 sec: 3249.0). Total num frames: 2920448. Throughput: 0: 813.3. Samples: 730616. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-09-21 13:04:09,342][00197] Avg episode reward: [(0, '20.805')] [2024-09-21 13:04:14,339][00197] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3249.0). Total num frames: 2936832. Throughput: 0: 835.2. Samples: 733492. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-09-21 13:04:14,341][00197] Avg episode reward: [(0, '20.469')] [2024-09-21 13:04:17,585][02676] Updated weights for policy 0, policy_version 720 (0.0016) [2024-09-21 13:04:19,339][00197] Fps is (10 sec: 3276.8, 60 sec: 3276.8, 300 sec: 3263.0). Total num frames: 2953216. Throughput: 0: 824.8. Samples: 737558. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-21 13:04:19,343][00197] Avg episode reward: [(0, '20.896')] [2024-09-21 13:04:24,339][00197] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 3276.8). Total num frames: 2973696. Throughput: 0: 818.7. Samples: 742944. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-21 13:04:24,341][00197] Avg episode reward: [(0, '21.289')] [2024-09-21 13:04:24,355][02663] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000726_2973696.pth... [2024-09-21 13:04:24,486][02663] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000536_2195456.pth [2024-09-21 13:04:28,757][02676] Updated weights for policy 0, policy_version 730 (0.0013) [2024-09-21 13:04:29,342][00197] Fps is (10 sec: 3685.2, 60 sec: 3413.1, 300 sec: 3262.9). Total num frames: 2990080. Throughput: 0: 829.0. Samples: 745778. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-21 13:04:29,355][00197] Avg episode reward: [(0, '22.329')] [2024-09-21 13:04:34,340][00197] Fps is (10 sec: 2866.8, 60 sec: 3276.7, 300 sec: 3249.0). Total num frames: 3002368. Throughput: 0: 836.3. Samples: 750198. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-09-21 13:04:34,348][00197] Avg episode reward: [(0, '22.840')] [2024-09-21 13:04:34,362][02663] Saving new best policy, reward=22.840! [2024-09-21 13:04:39,339][00197] Fps is (10 sec: 2868.2, 60 sec: 3276.8, 300 sec: 3262.9). Total num frames: 3018752. Throughput: 0: 812.9. Samples: 754990. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-21 13:04:39,348][00197] Avg episode reward: [(0, '21.519')] [2024-09-21 13:04:41,818][02676] Updated weights for policy 0, policy_version 740 (0.0018) [2024-09-21 13:04:44,339][00197] Fps is (10 sec: 3687.0, 60 sec: 3345.1, 300 sec: 3262.9). Total num frames: 3039232. Throughput: 0: 815.2. Samples: 757904. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-21 13:04:44,343][00197] Avg episode reward: [(0, '21.301')] [2024-09-21 13:04:49,339][00197] Fps is (10 sec: 3276.8, 60 sec: 3276.8, 300 sec: 3249.0). Total num frames: 3051520. Throughput: 0: 843.2. Samples: 762838. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-21 13:04:49,344][00197] Avg episode reward: [(0, '20.429')] [2024-09-21 13:04:54,339][00197] Fps is (10 sec: 2867.2, 60 sec: 3208.5, 300 sec: 3262.9). Total num frames: 3067904. Throughput: 0: 814.5. Samples: 767270. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-21 13:04:54,341][00197] Avg episode reward: [(0, '21.624')] [2024-09-21 13:04:54,674][02676] Updated weights for policy 0, policy_version 750 (0.0019) [2024-09-21 13:04:59,339][00197] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 3262.9). Total num frames: 3088384. Throughput: 0: 814.2. Samples: 770130. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-21 13:04:59,341][00197] Avg episode reward: [(0, '21.478')] [2024-09-21 13:05:04,342][00197] Fps is (10 sec: 3686.3, 60 sec: 3345.0, 300 sec: 3249.0). Total num frames: 3104768. Throughput: 0: 843.3. Samples: 775506. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-09-21 13:05:04,349][00197] Avg episode reward: [(0, '20.999')] [2024-09-21 13:05:07,486][02676] Updated weights for policy 0, policy_version 760 (0.0013) [2024-09-21 13:05:09,339][00197] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 3263.0). Total num frames: 3117056. Throughput: 0: 810.6. Samples: 779420. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-21 13:05:09,341][00197] Avg episode reward: [(0, '20.828')] [2024-09-21 13:05:14,339][00197] Fps is (10 sec: 3276.9, 60 sec: 3345.1, 300 sec: 3262.9). Total num frames: 3137536. Throughput: 0: 811.4. Samples: 782290. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-21 13:05:14,342][00197] Avg episode reward: [(0, '19.957')] [2024-09-21 13:05:18,322][02676] Updated weights for policy 0, policy_version 770 (0.0013) [2024-09-21 13:05:19,344][00197] Fps is (10 sec: 3684.3, 60 sec: 3344.7, 300 sec: 3249.0). Total num frames: 3153920. Throughput: 0: 842.3. Samples: 788106. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-21 13:05:19,347][00197] Avg episode reward: [(0, '21.288')] [2024-09-21 13:05:24,342][00197] Fps is (10 sec: 2866.2, 60 sec: 3208.3, 300 sec: 3249.0). Total num frames: 3166208. Throughput: 0: 819.2. Samples: 791856. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-21 13:05:24,345][00197] Avg episode reward: [(0, '20.063')] [2024-09-21 13:05:29,339][00197] Fps is (10 sec: 2459.0, 60 sec: 3140.4, 300 sec: 3235.1). Total num frames: 3178496. Throughput: 0: 793.5. Samples: 793610. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-21 13:05:29,341][00197] Avg episode reward: [(0, '19.998')] [2024-09-21 13:05:33,432][02676] Updated weights for policy 0, policy_version 780 (0.0015) [2024-09-21 13:05:34,339][00197] Fps is (10 sec: 2868.2, 60 sec: 3208.6, 300 sec: 3235.1). Total num frames: 3194880. Throughput: 0: 785.3. Samples: 798176. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-21 13:05:34,342][00197] Avg episode reward: [(0, '20.043')] [2024-09-21 13:05:39,339][00197] Fps is (10 sec: 2867.2, 60 sec: 3140.3, 300 sec: 3221.3). Total num frames: 3207168. Throughput: 0: 782.8. Samples: 802496. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-21 13:05:39,345][00197] Avg episode reward: [(0, '20.908')] [2024-09-21 13:05:44,339][00197] Fps is (10 sec: 2867.2, 60 sec: 3072.0, 300 sec: 3235.1). Total num frames: 3223552. Throughput: 0: 763.7. Samples: 804496. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-21 13:05:44,341][00197] Avg episode reward: [(0, '20.351')] [2024-09-21 13:05:46,564][02676] Updated weights for policy 0, policy_version 790 (0.0019) [2024-09-21 13:05:49,339][00197] Fps is (10 sec: 3686.4, 60 sec: 3208.5, 300 sec: 3235.1). Total num frames: 3244032. Throughput: 0: 772.3. Samples: 810260. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-21 13:05:49,346][00197] Avg episode reward: [(0, '18.912')] [2024-09-21 13:05:54,339][00197] Fps is (10 sec: 3686.5, 60 sec: 3208.5, 300 sec: 3235.1). Total num frames: 3260416. Throughput: 0: 794.9. Samples: 815192. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-21 13:05:54,346][00197] Avg episode reward: [(0, '19.815')] [2024-09-21 13:05:59,339][00197] Fps is (10 sec: 2867.2, 60 sec: 3072.0, 300 sec: 3235.1). Total num frames: 3272704. Throughput: 0: 771.2. Samples: 816992. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-09-21 13:05:59,341][00197] Avg episode reward: [(0, '20.632')] [2024-09-21 13:05:59,430][02676] Updated weights for policy 0, policy_version 800 (0.0012) [2024-09-21 13:06:04,341][00197] Fps is (10 sec: 3276.2, 60 sec: 3140.2, 300 sec: 3235.1). Total num frames: 3293184. Throughput: 0: 765.5. Samples: 822550. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-21 13:06:04,354][00197] Avg episode reward: [(0, '19.979')] [2024-09-21 13:06:09,342][00197] Fps is (10 sec: 3685.1, 60 sec: 3208.3, 300 sec: 3235.1). Total num frames: 3309568. Throughput: 0: 795.3. Samples: 827644. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-21 13:06:09,345][00197] Avg episode reward: [(0, '21.047')] [2024-09-21 13:06:12,455][02676] Updated weights for policy 0, policy_version 810 (0.0026) [2024-09-21 13:06:14,339][00197] Fps is (10 sec: 2867.7, 60 sec: 3072.0, 300 sec: 3235.1). Total num frames: 3321856. Throughput: 0: 794.7. Samples: 829370. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-09-21 13:06:14,343][00197] Avg episode reward: [(0, '21.386')] [2024-09-21 13:06:19,339][00197] Fps is (10 sec: 3277.9, 60 sec: 3140.6, 300 sec: 3235.1). Total num frames: 3342336. Throughput: 0: 805.3. Samples: 834416. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-21 13:06:19,341][00197] Avg episode reward: [(0, '21.754')] [2024-09-21 13:06:23,300][02676] Updated weights for policy 0, policy_version 820 (0.0014) [2024-09-21 13:06:24,339][00197] Fps is (10 sec: 3686.4, 60 sec: 3208.7, 300 sec: 3221.3). Total num frames: 3358720. Throughput: 0: 835.7. Samples: 840102. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-21 13:06:24,344][00197] Avg episode reward: [(0, '21.167')] [2024-09-21 13:06:24,359][02663] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000820_3358720.pth... [2024-09-21 13:06:24,504][02663] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000629_2576384.pth [2024-09-21 13:06:29,339][00197] Fps is (10 sec: 2867.3, 60 sec: 3208.5, 300 sec: 3221.3). Total num frames: 3371008. Throughput: 0: 831.1. Samples: 841896. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-21 13:06:29,345][00197] Avg episode reward: [(0, '21.691')] [2024-09-21 13:06:34,339][00197] Fps is (10 sec: 3276.8, 60 sec: 3276.8, 300 sec: 3235.1). Total num frames: 3391488. Throughput: 0: 810.0. Samples: 846710. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-21 13:06:34,346][00197] Avg episode reward: [(0, '22.301')] [2024-09-21 13:06:36,463][02676] Updated weights for policy 0, policy_version 830 (0.0016) [2024-09-21 13:06:39,341][00197] Fps is (10 sec: 3685.5, 60 sec: 3344.9, 300 sec: 3221.2). Total num frames: 3407872. Throughput: 0: 822.8. Samples: 852222. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-21 13:06:39,350][00197] Avg episode reward: [(0, '21.705')] [2024-09-21 13:06:44,339][00197] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 3221.3). Total num frames: 3420160. Throughput: 0: 828.9. Samples: 854292. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-09-21 13:06:44,342][00197] Avg episode reward: [(0, '22.008')] [2024-09-21 13:06:49,339][00197] Fps is (10 sec: 2867.9, 60 sec: 3208.5, 300 sec: 3221.3). Total num frames: 3436544. Throughput: 0: 797.6. Samples: 858442. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-21 13:06:49,348][00197] Avg episode reward: [(0, '22.216')] [2024-09-21 13:06:49,591][02676] Updated weights for policy 0, policy_version 840 (0.0013) [2024-09-21 13:06:54,339][00197] Fps is (10 sec: 3686.4, 60 sec: 3276.8, 300 sec: 3221.3). Total num frames: 3457024. Throughput: 0: 814.5. Samples: 864292. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-09-21 13:06:54,346][00197] Avg episode reward: [(0, '22.438')] [2024-09-21 13:06:59,339][00197] Fps is (10 sec: 3276.8, 60 sec: 3276.8, 300 sec: 3221.3). Total num frames: 3469312. Throughput: 0: 832.8. Samples: 866848. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-21 13:06:59,341][00197] Avg episode reward: [(0, '21.336')] [2024-09-21 13:07:02,373][02676] Updated weights for policy 0, policy_version 850 (0.0018) [2024-09-21 13:07:04,339][00197] Fps is (10 sec: 2867.2, 60 sec: 3208.6, 300 sec: 3235.1). Total num frames: 3485696. Throughput: 0: 809.4. Samples: 870840. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-21 13:07:04,341][00197] Avg episode reward: [(0, '21.819')] [2024-09-21 13:07:09,339][00197] Fps is (10 sec: 3686.4, 60 sec: 3277.0, 300 sec: 3262.9). Total num frames: 3506176. Throughput: 0: 811.9. Samples: 876638. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-21 13:07:09,347][00197] Avg episode reward: [(0, '21.699')] [2024-09-21 13:07:13,673][02676] Updated weights for policy 0, policy_version 860 (0.0019) [2024-09-21 13:07:14,339][00197] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 3249.0). Total num frames: 3522560. Throughput: 0: 837.8. Samples: 879596. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-09-21 13:07:14,343][00197] Avg episode reward: [(0, '21.429')] [2024-09-21 13:07:19,339][00197] Fps is (10 sec: 3276.8, 60 sec: 3276.8, 300 sec: 3263.0). Total num frames: 3538944. Throughput: 0: 812.9. Samples: 883290. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-21 13:07:19,341][00197] Avg episode reward: [(0, '23.461')] [2024-09-21 13:07:19,350][02663] Saving new best policy, reward=23.461! [2024-09-21 13:07:24,339][00197] Fps is (10 sec: 3276.8, 60 sec: 3276.8, 300 sec: 3262.9). Total num frames: 3555328. Throughput: 0: 819.8. Samples: 889110. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-21 13:07:24,349][00197] Avg episode reward: [(0, '21.392')] [2024-09-21 13:07:25,831][02676] Updated weights for policy 0, policy_version 870 (0.0013) [2024-09-21 13:07:29,339][00197] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3262.9). Total num frames: 3575808. Throughput: 0: 840.2. Samples: 892102. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-21 13:07:29,345][00197] Avg episode reward: [(0, '20.838')] [2024-09-21 13:07:34,339][00197] Fps is (10 sec: 3276.8, 60 sec: 3276.8, 300 sec: 3262.9). Total num frames: 3588096. Throughput: 0: 841.6. Samples: 896314. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-21 13:07:34,341][00197] Avg episode reward: [(0, '19.682')] [2024-09-21 13:07:38,389][02676] Updated weights for policy 0, policy_version 880 (0.0012) [2024-09-21 13:07:39,339][00197] Fps is (10 sec: 3276.7, 60 sec: 3345.2, 300 sec: 3276.8). Total num frames: 3608576. Throughput: 0: 830.1. Samples: 901648. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-09-21 13:07:39,341][00197] Avg episode reward: [(0, '20.984')] [2024-09-21 13:07:44,344][00197] Fps is (10 sec: 3684.3, 60 sec: 3413.0, 300 sec: 3276.8). Total num frames: 3624960. Throughput: 0: 839.5. Samples: 904628. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-21 13:07:44,352][00197] Avg episode reward: [(0, '19.768')] [2024-09-21 13:07:49,339][00197] Fps is (10 sec: 2867.2, 60 sec: 3345.1, 300 sec: 3262.9). Total num frames: 3637248. Throughput: 0: 855.3. Samples: 909328. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-21 13:07:49,341][00197] Avg episode reward: [(0, '19.448')] [2024-09-21 13:07:50,986][02676] Updated weights for policy 0, policy_version 890 (0.0018) [2024-09-21 13:07:54,339][00197] Fps is (10 sec: 3278.6, 60 sec: 3345.1, 300 sec: 3276.8). Total num frames: 3657728. Throughput: 0: 836.6. Samples: 914284. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-21 13:07:54,341][00197] Avg episode reward: [(0, '21.025')] [2024-09-21 13:07:59,339][00197] Fps is (10 sec: 4096.1, 60 sec: 3481.6, 300 sec: 3290.7). Total num frames: 3678208. Throughput: 0: 836.0. Samples: 917216. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-21 13:07:59,341][00197] Avg episode reward: [(0, '21.265')] [2024-09-21 13:08:01,954][02676] Updated weights for policy 0, policy_version 900 (0.0020) [2024-09-21 13:08:04,339][00197] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3276.8). Total num frames: 3690496. Throughput: 0: 867.2. Samples: 922316. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-21 13:08:04,341][00197] Avg episode reward: [(0, '22.501')] [2024-09-21 13:08:09,339][00197] Fps is (10 sec: 2867.2, 60 sec: 3345.1, 300 sec: 3290.7). Total num frames: 3706880. Throughput: 0: 835.5. Samples: 926706. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-21 13:08:09,341][00197] Avg episode reward: [(0, '21.782')] [2024-09-21 13:08:14,127][02676] Updated weights for policy 0, policy_version 910 (0.0016) [2024-09-21 13:08:14,339][00197] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3290.7). Total num frames: 3727360. Throughput: 0: 834.7. Samples: 929662. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-21 13:08:14,346][00197] Avg episode reward: [(0, '22.119')] [2024-09-21 13:08:19,339][00197] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3290.7). Total num frames: 3743744. Throughput: 0: 863.6. Samples: 935176. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-21 13:08:19,343][00197] Avg episode reward: [(0, '22.512')] [2024-09-21 13:08:24,339][00197] Fps is (10 sec: 2867.2, 60 sec: 3345.1, 300 sec: 3290.7). Total num frames: 3756032. Throughput: 0: 835.0. Samples: 939224. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-21 13:08:24,345][00197] Avg episode reward: [(0, '23.418')] [2024-09-21 13:08:24,359][02663] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000917_3756032.pth... [2024-09-21 13:08:24,358][00197] Components not started: RolloutWorker_w0, RolloutWorker_w2, RolloutWorker_w3, wait_time=1200.0 seconds [2024-09-21 13:08:24,468][02663] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000726_2973696.pth [2024-09-21 13:08:26,906][02676] Updated weights for policy 0, policy_version 920 (0.0016) [2024-09-21 13:08:29,339][00197] Fps is (10 sec: 3276.7, 60 sec: 3345.0, 300 sec: 3290.7). Total num frames: 3776512. Throughput: 0: 832.2. Samples: 942072. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-21 13:08:29,343][00197] Avg episode reward: [(0, '23.401')] [2024-09-21 13:08:34,339][00197] Fps is (10 sec: 3686.3, 60 sec: 3413.3, 300 sec: 3290.7). Total num frames: 3792896. Throughput: 0: 856.2. Samples: 947856. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-21 13:08:34,344][00197] Avg episode reward: [(0, '23.724')] [2024-09-21 13:08:34,359][02663] Saving new best policy, reward=23.724! [2024-09-21 13:08:39,339][00197] Fps is (10 sec: 2867.3, 60 sec: 3276.8, 300 sec: 3276.8). Total num frames: 3805184. Throughput: 0: 829.0. Samples: 951588. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-21 13:08:39,346][00197] Avg episode reward: [(0, '23.869')] [2024-09-21 13:08:39,349][02663] Saving new best policy, reward=23.869! [2024-09-21 13:08:40,121][02676] Updated weights for policy 0, policy_version 930 (0.0023) [2024-09-21 13:08:44,342][00197] Fps is (10 sec: 2866.2, 60 sec: 3276.9, 300 sec: 3276.8). Total num frames: 3821568. Throughput: 0: 816.2. Samples: 953950. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-21 13:08:44,345][00197] Avg episode reward: [(0, '24.772')] [2024-09-21 13:08:44,363][02663] Saving new best policy, reward=24.772! [2024-09-21 13:08:49,339][00197] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 3249.0). Total num frames: 3833856. Throughput: 0: 785.7. Samples: 957672. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-09-21 13:08:49,343][00197] Avg episode reward: [(0, '25.477')] [2024-09-21 13:08:49,345][02663] Saving new best policy, reward=25.477! [2024-09-21 13:08:54,339][00197] Fps is (10 sec: 2458.5, 60 sec: 3140.3, 300 sec: 3249.0). Total num frames: 3846144. Throughput: 0: 782.8. Samples: 961930. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-09-21 13:08:54,345][00197] Avg episode reward: [(0, '25.083')] [2024-09-21 13:08:55,189][02676] Updated weights for policy 0, policy_version 940 (0.0013) [2024-09-21 13:08:59,342][00197] Fps is (10 sec: 2866.2, 60 sec: 3071.8, 300 sec: 3249.0). Total num frames: 3862528. Throughput: 0: 766.2. Samples: 964142. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-21 13:08:59,349][00197] Avg episode reward: [(0, '23.910')] [2024-09-21 13:09:04,339][00197] Fps is (10 sec: 3686.4, 60 sec: 3208.5, 300 sec: 3262.9). Total num frames: 3883008. Throughput: 0: 775.4. Samples: 970068. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-09-21 13:09:04,346][00197] Avg episode reward: [(0, '23.625')] [2024-09-21 13:09:05,646][02676] Updated weights for policy 0, policy_version 950 (0.0014) [2024-09-21 13:09:09,339][00197] Fps is (10 sec: 3687.6, 60 sec: 3208.5, 300 sec: 3262.9). Total num frames: 3899392. Throughput: 0: 792.2. Samples: 974872. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-21 13:09:09,342][00197] Avg episode reward: [(0, '23.307')] [2024-09-21 13:09:14,339][00197] Fps is (10 sec: 3276.8, 60 sec: 3140.3, 300 sec: 3262.9). Total num frames: 3915776. Throughput: 0: 770.0. Samples: 976722. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-21 13:09:14,346][00197] Avg episode reward: [(0, '22.475')] [2024-09-21 13:09:18,340][02676] Updated weights for policy 0, policy_version 960 (0.0022) [2024-09-21 13:09:19,339][00197] Fps is (10 sec: 3276.8, 60 sec: 3140.3, 300 sec: 3249.0). Total num frames: 3932160. Throughput: 0: 771.0. Samples: 982550. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-21 13:09:19,344][00197] Avg episode reward: [(0, '21.378')] [2024-09-21 13:09:24,339][00197] Fps is (10 sec: 3276.8, 60 sec: 3208.5, 300 sec: 3249.1). Total num frames: 3948544. Throughput: 0: 806.2. Samples: 987866. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-21 13:09:24,344][00197] Avg episode reward: [(0, '21.922')] [2024-09-21 13:09:29,339][00197] Fps is (10 sec: 3276.8, 60 sec: 3140.3, 300 sec: 3262.9). Total num frames: 3964928. Throughput: 0: 795.4. Samples: 989740. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-21 13:09:29,341][00197] Avg episode reward: [(0, '22.044')] [2024-09-21 13:09:30,853][02676] Updated weights for policy 0, policy_version 970 (0.0016) [2024-09-21 13:09:34,339][00197] Fps is (10 sec: 3686.4, 60 sec: 3208.5, 300 sec: 3276.8). Total num frames: 3985408. Throughput: 0: 835.2. Samples: 995258. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-21 13:09:34,341][00197] Avg episode reward: [(0, '21.862')] [2024-09-21 13:09:39,340][00197] Fps is (10 sec: 3685.9, 60 sec: 3276.7, 300 sec: 3262.9). Total num frames: 4001792. Throughput: 0: 864.1. Samples: 1000814. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-21 13:09:39,348][00197] Avg episode reward: [(0, '21.783')] [2024-09-21 13:09:40,076][02663] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-09-21 13:09:40,119][02663] Stopping Batcher_0... [2024-09-21 13:09:40,120][02663] Loop batcher_evt_loop terminating... [2024-09-21 13:09:40,120][00197] Component Batcher_0 stopped! [2024-09-21 13:09:40,126][00197] Component RolloutWorker_w0 process died already! Don't wait for it. [2024-09-21 13:09:40,130][00197] Component RolloutWorker_w2 process died already! Don't wait for it. [2024-09-21 13:09:40,135][00197] Component RolloutWorker_w3 process died already! Don't wait for it. [2024-09-21 13:09:40,204][02676] Weights refcount: 2 0 [2024-09-21 13:09:40,219][02676] Stopping InferenceWorker_p0-w0... [2024-09-21 13:09:40,219][02676] Loop inference_proc0-0_evt_loop terminating... [2024-09-21 13:09:40,219][00197] Component InferenceWorker_p0-w0 stopped! [2024-09-21 13:09:40,283][02663] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000820_3358720.pth [2024-09-21 13:09:40,308][02663] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-09-21 13:09:40,504][00197] Component LearnerWorker_p0 stopped! [2024-09-21 13:09:40,521][02663] Stopping LearnerWorker_p0... [2024-09-21 13:09:40,525][02663] Loop learner_proc0_evt_loop terminating... [2024-09-21 13:09:40,762][00197] Component RolloutWorker_w6 stopped! [2024-09-21 13:09:40,768][02683] Stopping RolloutWorker_w6... [2024-09-21 13:09:40,769][02683] Loop rollout_proc6_evt_loop terminating... [2024-09-21 13:09:40,790][02678] Stopping RolloutWorker_w1... [2024-09-21 13:09:40,790][00197] Component RolloutWorker_w1 stopped! [2024-09-21 13:09:40,793][02678] Loop rollout_proc1_evt_loop terminating... [2024-09-21 13:09:40,807][00197] Component RolloutWorker_w4 stopped! [2024-09-21 13:09:40,813][02680] Stopping RolloutWorker_w4... [2024-09-21 13:09:40,821][02680] Loop rollout_proc4_evt_loop terminating... [2024-09-21 13:09:40,868][00197] Component RolloutWorker_w7 stopped! [2024-09-21 13:09:40,871][02684] Stopping RolloutWorker_w7... [2024-09-21 13:09:40,881][02684] Loop rollout_proc7_evt_loop terminating... [2024-09-21 13:09:40,895][02682] Stopping RolloutWorker_w5... [2024-09-21 13:09:40,895][00197] Component RolloutWorker_w5 stopped! [2024-09-21 13:09:40,899][00197] Waiting for process learner_proc0 to stop... [2024-09-21 13:09:40,898][02682] Loop rollout_proc5_evt_loop terminating... [2024-09-21 13:09:42,822][00197] Waiting for process inference_proc0-0 to join... [2024-09-21 13:09:43,217][00197] Waiting for process rollout_proc0 to join... [2024-09-21 13:09:43,219][00197] Waiting for process rollout_proc1 to join... [2024-09-21 13:09:43,944][00197] Waiting for process rollout_proc2 to join... [2024-09-21 13:09:43,950][00197] Waiting for process rollout_proc3 to join... [2024-09-21 13:09:43,952][00197] Waiting for process rollout_proc4 to join... [2024-09-21 13:09:43,956][00197] Waiting for process rollout_proc5 to join... [2024-09-21 13:09:43,961][00197] Waiting for process rollout_proc6 to join... [2024-09-21 13:09:43,965][00197] Waiting for process rollout_proc7 to join... [2024-09-21 13:09:43,970][00197] Batcher 0 profile tree view: batching: 24.9618, releasing_batches: 0.0299 [2024-09-21 13:09:43,971][00197] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0000 wait_policy_total: 549.8861 update_model: 9.6287 weight_update: 0.0015 one_step: 0.0065 handle_policy_step: 638.9649 deserialize: 16.9350, stack: 3.9038, obs_to_device_normalize: 139.3524, forward: 326.2632, send_messages: 26.0760 prepare_outputs: 93.2817 to_cpu: 57.0874 [2024-09-21 13:09:43,973][00197] Learner 0 profile tree view: misc: 0.0066, prepare_batch: 16.8214 train: 72.6075 epoch_init: 0.0061, minibatch_init: 0.0105, losses_postprocess: 0.5752, kl_divergence: 0.5265, after_optimizer: 32.9221 calculate_losses: 23.5307 losses_init: 0.0052, forward_head: 1.7769, bptt_initial: 15.3279, tail: 1.0012, advantages_returns: 0.2803, losses: 2.6819 bptt: 2.1621 bptt_forward_core: 2.0823 update: 14.4247 clip: 1.5584 [2024-09-21 13:09:43,974][00197] RolloutWorker_w7 profile tree view: wait_for_trajectories: 0.4161, enqueue_policy_requests: 150.2985, env_step: 948.2443, overhead: 21.2826, complete_rollouts: 9.3829 save_policy_outputs: 37.1383 split_output_tensors: 12.6333 [2024-09-21 13:09:43,976][00197] Loop Runner_EvtLoop terminating... [2024-09-21 13:09:43,978][00197] Runner profile tree view: main_loop: 1273.3851 [2024-09-21 13:09:43,979][00197] Collected {0: 4005888}, FPS: 3145.9 [2024-09-21 13:09:44,222][00197] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-09-21 13:09:44,226][00197] Overriding arg 'num_workers' with value 1 passed from command line [2024-09-21 13:09:44,229][00197] Adding new argument 'no_render'=True that is not in the saved config file! [2024-09-21 13:09:44,231][00197] Adding new argument 'save_video'=True that is not in the saved config file! [2024-09-21 13:09:44,234][00197] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-09-21 13:09:44,235][00197] Adding new argument 'video_name'=None that is not in the saved config file! [2024-09-21 13:09:44,236][00197] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! [2024-09-21 13:09:44,238][00197] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-09-21 13:09:44,239][00197] Adding new argument 'push_to_hub'=False that is not in the saved config file! [2024-09-21 13:09:44,240][00197] Adding new argument 'hf_repository'=None that is not in the saved config file! [2024-09-21 13:09:44,242][00197] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-09-21 13:09:44,243][00197] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-09-21 13:09:44,244][00197] Adding new argument 'train_script'=None that is not in the saved config file! [2024-09-21 13:09:44,245][00197] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-09-21 13:09:44,247][00197] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-09-21 13:09:44,268][00197] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-21 13:09:44,270][00197] RunningMeanStd input shape: (3, 72, 128) [2024-09-21 13:09:44,274][00197] RunningMeanStd input shape: (1,) [2024-09-21 13:09:44,289][00197] ConvEncoder: input_channels=3 [2024-09-21 13:09:44,422][00197] Conv encoder output size: 512 [2024-09-21 13:09:44,424][00197] Policy head output size: 512 [2024-09-21 13:09:46,158][00197] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-09-21 13:09:47,069][00197] Num frames 100... [2024-09-21 13:09:47,206][00197] Num frames 200... [2024-09-21 13:09:47,338][00197] Num frames 300... [2024-09-21 13:09:47,467][00197] Num frames 400... [2024-09-21 13:09:47,602][00197] Num frames 500... [2024-09-21 13:09:47,724][00197] Num frames 600... [2024-09-21 13:09:47,860][00197] Num frames 700... [2024-09-21 13:09:47,963][00197] Avg episode rewards: #0: 15.360, true rewards: #0: 7.360 [2024-09-21 13:09:47,965][00197] Avg episode reward: 15.360, avg true_objective: 7.360 [2024-09-21 13:09:48,047][00197] Num frames 800... [2024-09-21 13:09:48,182][00197] Num frames 900... [2024-09-21 13:09:48,314][00197] Num frames 1000... [2024-09-21 13:09:48,435][00197] Num frames 1100... [2024-09-21 13:09:48,560][00197] Num frames 1200... [2024-09-21 13:09:48,724][00197] Avg episode rewards: #0: 11.400, true rewards: #0: 6.400 [2024-09-21 13:09:48,726][00197] Avg episode reward: 11.400, avg true_objective: 6.400 [2024-09-21 13:09:48,756][00197] Num frames 1300... [2024-09-21 13:09:48,893][00197] Num frames 1400... [2024-09-21 13:09:49,021][00197] Num frames 1500... [2024-09-21 13:09:49,144][00197] Num frames 1600... [2024-09-21 13:09:49,286][00197] Num frames 1700... [2024-09-21 13:09:49,414][00197] Num frames 1800... [2024-09-21 13:09:49,491][00197] Avg episode rewards: #0: 10.720, true rewards: #0: 6.053 [2024-09-21 13:09:49,493][00197] Avg episode reward: 10.720, avg true_objective: 6.053 [2024-09-21 13:09:49,614][00197] Num frames 1900... [2024-09-21 13:09:49,758][00197] Num frames 2000... [2024-09-21 13:09:49,864][00197] Avg episode rewards: #0: 8.850, true rewards: #0: 5.100 [2024-09-21 13:09:49,866][00197] Avg episode reward: 8.850, avg true_objective: 5.100 [2024-09-21 13:09:49,955][00197] Num frames 2100... [2024-09-21 13:09:50,087][00197] Num frames 2200... [2024-09-21 13:09:50,219][00197] Num frames 2300... [2024-09-21 13:09:50,366][00197] Num frames 2400... [2024-09-21 13:09:50,497][00197] Num frames 2500... [2024-09-21 13:09:50,610][00197] Avg episode rewards: #0: 8.686, true rewards: #0: 5.086 [2024-09-21 13:09:50,611][00197] Avg episode reward: 8.686, avg true_objective: 5.086 [2024-09-21 13:09:50,696][00197] Num frames 2600... [2024-09-21 13:09:50,820][00197] Num frames 2700... [2024-09-21 13:09:50,953][00197] Num frames 2800... [2024-09-21 13:09:51,079][00197] Num frames 2900... [2024-09-21 13:09:51,204][00197] Num frames 3000... [2024-09-21 13:09:51,334][00197] Num frames 3100... [2024-09-21 13:09:51,497][00197] Avg episode rewards: #0: 9.140, true rewards: #0: 5.307 [2024-09-21 13:09:51,498][00197] Avg episode reward: 9.140, avg true_objective: 5.307 [2024-09-21 13:09:51,526][00197] Num frames 3200... [2024-09-21 13:09:51,652][00197] Num frames 3300... [2024-09-21 13:09:51,795][00197] Num frames 3400... [2024-09-21 13:09:51,935][00197] Num frames 3500... [2024-09-21 13:09:52,074][00197] Num frames 3600... [2024-09-21 13:09:52,199][00197] Num frames 3700... [2024-09-21 13:09:52,340][00197] Num frames 3800... [2024-09-21 13:09:52,467][00197] Num frames 3900... [2024-09-21 13:09:52,600][00197] Num frames 4000... [2024-09-21 13:09:52,733][00197] Num frames 4100... [2024-09-21 13:09:52,873][00197] Num frames 4200... [2024-09-21 13:09:52,982][00197] Avg episode rewards: #0: 11.772, true rewards: #0: 6.057 [2024-09-21 13:09:52,984][00197] Avg episode reward: 11.772, avg true_objective: 6.057 [2024-09-21 13:09:53,096][00197] Num frames 4300... [2024-09-21 13:09:53,278][00197] Num frames 4400... [2024-09-21 13:09:53,451][00197] Num frames 4500... [2024-09-21 13:09:53,629][00197] Num frames 4600... [2024-09-21 13:09:53,801][00197] Num frames 4700... [2024-09-21 13:09:53,976][00197] Num frames 4800... [2024-09-21 13:09:54,147][00197] Num frames 4900... [2024-09-21 13:09:54,225][00197] Avg episode rewards: #0: 12.140, true rewards: #0: 6.140 [2024-09-21 13:09:54,227][00197] Avg episode reward: 12.140, avg true_objective: 6.140 [2024-09-21 13:09:54,392][00197] Num frames 5000... [2024-09-21 13:09:54,582][00197] Num frames 5100... [2024-09-21 13:09:54,763][00197] Num frames 5200... [2024-09-21 13:09:54,957][00197] Num frames 5300... [2024-09-21 13:09:55,178][00197] Avg episode rewards: #0: 11.547, true rewards: #0: 5.991 [2024-09-21 13:09:55,180][00197] Avg episode reward: 11.547, avg true_objective: 5.991 [2024-09-21 13:09:55,198][00197] Num frames 5400... [2024-09-21 13:09:55,384][00197] Num frames 5500... [2024-09-21 13:09:55,531][00197] Num frames 5600... [2024-09-21 13:09:55,663][00197] Num frames 5700... [2024-09-21 13:09:55,785][00197] Num frames 5800... [2024-09-21 13:09:55,927][00197] Num frames 5900... [2024-09-21 13:09:56,058][00197] Num frames 6000... [2024-09-21 13:09:56,153][00197] Avg episode rewards: #0: 11.732, true rewards: #0: 6.032 [2024-09-21 13:09:56,155][00197] Avg episode reward: 11.732, avg true_objective: 6.032 [2024-09-21 13:10:35,170][00197] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2024-09-21 13:24:42,347][00197] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-09-21 13:24:42,349][00197] Overriding arg 'num_workers' with value 1 passed from command line [2024-09-21 13:24:42,351][00197] Adding new argument 'no_render'=True that is not in the saved config file! [2024-09-21 13:24:42,354][00197] Adding new argument 'save_video'=True that is not in the saved config file! [2024-09-21 13:24:42,356][00197] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-09-21 13:24:42,358][00197] Adding new argument 'video_name'=None that is not in the saved config file! [2024-09-21 13:24:42,360][00197] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2024-09-21 13:24:42,361][00197] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-09-21 13:24:42,362][00197] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2024-09-21 13:24:42,363][00197] Adding new argument 'hf_repository'='yhyeo0202/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2024-09-21 13:24:42,364][00197] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-09-21 13:24:42,365][00197] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-09-21 13:24:42,366][00197] Adding new argument 'train_script'=None that is not in the saved config file! [2024-09-21 13:24:42,368][00197] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-09-21 13:24:42,369][00197] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-09-21 13:24:42,379][00197] RunningMeanStd input shape: (3, 72, 128) [2024-09-21 13:24:42,388][00197] RunningMeanStd input shape: (1,) [2024-09-21 13:24:42,403][00197] ConvEncoder: input_channels=3 [2024-09-21 13:24:42,442][00197] Conv encoder output size: 512 [2024-09-21 13:24:42,443][00197] Policy head output size: 512 [2024-09-21 13:24:42,463][00197] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-09-21 13:24:42,979][00197] Num frames 100... [2024-09-21 13:24:43,104][00197] Num frames 200... [2024-09-21 13:24:43,227][00197] Num frames 300... [2024-09-21 13:24:43,355][00197] Num frames 400... [2024-09-21 13:24:43,476][00197] Num frames 500... [2024-09-21 13:24:43,607][00197] Num frames 600... [2024-09-21 13:24:43,730][00197] Num frames 700... [2024-09-21 13:24:43,857][00197] Num frames 800... [2024-09-21 13:24:43,954][00197] Avg episode rewards: #0: 17.320, true rewards: #0: 8.320 [2024-09-21 13:24:43,956][00197] Avg episode reward: 17.320, avg true_objective: 8.320 [2024-09-21 13:24:44,043][00197] Num frames 900... [2024-09-21 13:24:44,182][00197] Num frames 1000... [2024-09-21 13:24:44,308][00197] Num frames 1100... [2024-09-21 13:24:44,436][00197] Num frames 1200... [2024-09-21 13:24:44,564][00197] Num frames 1300... [2024-09-21 13:24:44,693][00197] Num frames 1400... [2024-09-21 13:24:44,819][00197] Num frames 1500... [2024-09-21 13:24:44,949][00197] Num frames 1600... [2024-09-21 13:24:45,073][00197] Num frames 1700... [2024-09-21 13:24:45,201][00197] Num frames 1800... [2024-09-21 13:24:45,332][00197] Num frames 1900... [2024-09-21 13:24:45,477][00197] Avg episode rewards: #0: 22.345, true rewards: #0: 9.845 [2024-09-21 13:24:45,479][00197] Avg episode reward: 22.345, avg true_objective: 9.845 [2024-09-21 13:24:45,523][00197] Num frames 2000... [2024-09-21 13:24:45,653][00197] Num frames 2100... [2024-09-21 13:24:45,789][00197] Num frames 2200... [2024-09-21 13:24:45,918][00197] Num frames 2300... [2024-09-21 13:24:46,068][00197] Avg episode rewards: #0: 17.913, true rewards: #0: 7.913 [2024-09-21 13:24:46,071][00197] Avg episode reward: 17.913, avg true_objective: 7.913 [2024-09-21 13:24:46,105][00197] Num frames 2400... [2024-09-21 13:24:46,227][00197] Num frames 2500... [2024-09-21 13:24:46,354][00197] Num frames 2600... [2024-09-21 13:24:46,477][00197] Num frames 2700... [2024-09-21 13:24:46,604][00197] Num frames 2800... [2024-09-21 13:24:46,736][00197] Num frames 2900... [2024-09-21 13:24:46,873][00197] Num frames 3000... [2024-09-21 13:24:46,998][00197] Num frames 3100... [2024-09-21 13:24:47,126][00197] Num frames 3200... [2024-09-21 13:24:47,191][00197] Avg episode rewards: #0: 18.265, true rewards: #0: 8.015 [2024-09-21 13:24:47,193][00197] Avg episode reward: 18.265, avg true_objective: 8.015 [2024-09-21 13:24:47,314][00197] Num frames 3300... [2024-09-21 13:24:47,442][00197] Num frames 3400... [2024-09-21 13:24:47,609][00197] Num frames 3500... [2024-09-21 13:24:47,796][00197] Num frames 3600... [2024-09-21 13:24:48,008][00197] Num frames 3700... [2024-09-21 13:24:48,382][00197] Num frames 3800... [2024-09-21 13:24:48,694][00197] Num frames 3900... [2024-09-21 13:24:49,122][00197] Num frames 4000... [2024-09-21 13:24:49,460][00197] Num frames 4100... [2024-09-21 13:24:49,979][00197] Num frames 4200... [2024-09-21 13:24:50,413][00197] Num frames 4300... [2024-09-21 13:24:50,808][00197] Num frames 4400... [2024-09-21 13:24:51,031][00197] Num frames 4500... [2024-09-21 13:24:51,286][00197] Num frames 4600... [2024-09-21 13:24:51,541][00197] Num frames 4700... [2024-09-21 13:24:51,820][00197] Num frames 4800... [2024-09-21 13:24:52,065][00197] Num frames 4900... [2024-09-21 13:24:52,323][00197] Num frames 5000... [2024-09-21 13:24:52,596][00197] Num frames 5100... [2024-09-21 13:24:52,856][00197] Num frames 5200... [2024-09-21 13:24:53,130][00197] Num frames 5300... [2024-09-21 13:24:53,225][00197] Avg episode rewards: #0: 25.412, true rewards: #0: 10.612 [2024-09-21 13:24:53,230][00197] Avg episode reward: 25.412, avg true_objective: 10.612 [2024-09-21 13:24:53,360][00197] Num frames 5400... [2024-09-21 13:24:53,487][00197] Num frames 5500... [2024-09-21 13:24:53,619][00197] Num frames 5600... [2024-09-21 13:24:53,757][00197] Num frames 5700... [2024-09-21 13:24:53,902][00197] Num frames 5800... [2024-09-21 13:24:54,041][00197] Num frames 5900... [2024-09-21 13:24:54,165][00197] Num frames 6000... [2024-09-21 13:24:54,292][00197] Num frames 6100... [2024-09-21 13:24:54,423][00197] Num frames 6200... [2024-09-21 13:24:54,553][00197] Num frames 6300... [2024-09-21 13:24:54,687][00197] Avg episode rewards: #0: 25.103, true rewards: #0: 10.603 [2024-09-21 13:24:54,689][00197] Avg episode reward: 25.103, avg true_objective: 10.603 [2024-09-21 13:24:54,738][00197] Num frames 6400... [2024-09-21 13:24:54,870][00197] Num frames 6500... [2024-09-21 13:24:54,991][00197] Num frames 6600... [2024-09-21 13:24:55,132][00197] Num frames 6700... [2024-09-21 13:24:55,270][00197] Num frames 6800... [2024-09-21 13:24:55,394][00197] Num frames 6900... [2024-09-21 13:24:55,522][00197] Num frames 7000... [2024-09-21 13:24:55,653][00197] Num frames 7100... [2024-09-21 13:24:55,750][00197] Avg episode rewards: #0: 23.757, true rewards: #0: 10.186 [2024-09-21 13:24:55,753][00197] Avg episode reward: 23.757, avg true_objective: 10.186 [2024-09-21 13:24:55,839][00197] Num frames 7200... [2024-09-21 13:24:55,970][00197] Num frames 7300... [2024-09-21 13:24:56,105][00197] Num frames 7400... [2024-09-21 13:24:56,223][00197] Avg episode rewards: #0: 21.312, true rewards: #0: 9.312 [2024-09-21 13:24:56,225][00197] Avg episode reward: 21.312, avg true_objective: 9.312 [2024-09-21 13:24:56,290][00197] Num frames 7500... [2024-09-21 13:24:56,412][00197] Num frames 7600... [2024-09-21 13:24:56,542][00197] Num frames 7700... [2024-09-21 13:24:56,669][00197] Num frames 7800... [2024-09-21 13:24:56,768][00197] Avg episode rewards: #0: 19.927, true rewards: #0: 8.704 [2024-09-21 13:24:56,769][00197] Avg episode reward: 19.927, avg true_objective: 8.704 [2024-09-21 13:24:56,866][00197] Num frames 7900... [2024-09-21 13:24:56,995][00197] Num frames 8000... [2024-09-21 13:24:57,127][00197] Num frames 8100... [2024-09-21 13:24:57,252][00197] Num frames 8200... [2024-09-21 13:24:57,380][00197] Num frames 8300... [2024-09-21 13:24:57,505][00197] Num frames 8400... [2024-09-21 13:24:57,627][00197] Num frames 8500... [2024-09-21 13:24:57,758][00197] Num frames 8600... [2024-09-21 13:24:57,891][00197] Num frames 8700... [2024-09-21 13:24:58,017][00197] Num frames 8800... [2024-09-21 13:24:58,159][00197] Num frames 8900... [2024-09-21 13:24:58,284][00197] Num frames 9000... [2024-09-21 13:24:58,375][00197] Avg episode rewards: #0: 20.328, true rewards: #0: 9.028 [2024-09-21 13:24:58,376][00197] Avg episode reward: 20.328, avg true_objective: 9.028 [2024-09-21 13:25:58,200][00197] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2024-09-21 13:26:03,118][00197] The model has been pushed to https://huggingface.co/yhyeo0202/rl_course_vizdoom_health_gathering_supreme [2024-09-21 13:29:37,213][00197] Loading legacy config file train_dir/doom_health_gathering_supreme_2222/cfg.json instead of train_dir/doom_health_gathering_supreme_2222/config.json [2024-09-21 13:29:37,216][00197] Loading existing experiment configuration from train_dir/doom_health_gathering_supreme_2222/config.json [2024-09-21 13:29:37,218][00197] Overriding arg 'experiment' with value 'doom_health_gathering_supreme_2222' passed from command line [2024-09-21 13:29:37,222][00197] Overriding arg 'train_dir' with value 'train_dir' passed from command line [2024-09-21 13:29:37,225][00197] Overriding arg 'num_workers' with value 1 passed from command line [2024-09-21 13:29:37,227][00197] Adding new argument 'lr_adaptive_min'=1e-06 that is not in the saved config file! [2024-09-21 13:29:37,229][00197] Adding new argument 'lr_adaptive_max'=0.01 that is not in the saved config file! [2024-09-21 13:29:37,231][00197] Adding new argument 'env_gpu_observations'=True that is not in the saved config file! [2024-09-21 13:29:37,232][00197] Adding new argument 'no_render'=True that is not in the saved config file! [2024-09-21 13:29:37,233][00197] Adding new argument 'save_video'=True that is not in the saved config file! [2024-09-21 13:29:37,234][00197] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-09-21 13:29:37,235][00197] Adding new argument 'video_name'=None that is not in the saved config file! [2024-09-21 13:29:37,236][00197] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! [2024-09-21 13:29:37,241][00197] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-09-21 13:29:37,242][00197] Adding new argument 'push_to_hub'=False that is not in the saved config file! [2024-09-21 13:29:37,243][00197] Adding new argument 'hf_repository'=None that is not in the saved config file! [2024-09-21 13:29:37,244][00197] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-09-21 13:29:37,245][00197] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-09-21 13:29:37,246][00197] Adding new argument 'train_script'=None that is not in the saved config file! [2024-09-21 13:29:37,247][00197] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-09-21 13:29:37,248][00197] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-09-21 13:29:37,267][00197] RunningMeanStd input shape: (3, 72, 128) [2024-09-21 13:29:37,269][00197] RunningMeanStd input shape: (1,) [2024-09-21 13:29:37,309][00197] ConvEncoder: input_channels=3 [2024-09-21 13:29:37,386][00197] Conv encoder output size: 512 [2024-09-21 13:29:37,388][00197] Policy head output size: 512 [2024-09-21 13:29:37,426][00197] Loading state from checkpoint train_dir/doom_health_gathering_supreme_2222/checkpoint_p0/checkpoint_000539850_4422451200.pth... [2024-09-21 13:29:38,229][00197] Num frames 100... [2024-09-21 13:29:38,429][00197] Num frames 200... [2024-09-21 13:29:38,588][00197] Num frames 300... [2024-09-21 13:29:38,715][00197] Num frames 400... [2024-09-21 13:29:38,849][00197] Num frames 500... [2024-09-21 13:29:38,975][00197] Num frames 600... [2024-09-21 13:29:39,106][00197] Num frames 700... [2024-09-21 13:29:39,232][00197] Num frames 800... [2024-09-21 13:29:39,369][00197] Num frames 900... [2024-09-21 13:29:39,498][00197] Num frames 1000... [2024-09-21 13:29:39,623][00197] Num frames 1100... [2024-09-21 13:29:39,756][00197] Num frames 1200... [2024-09-21 13:29:39,894][00197] Num frames 1300... [2024-09-21 13:29:40,020][00197] Num frames 1400... [2024-09-21 13:29:40,152][00197] Num frames 1500... [2024-09-21 13:29:40,291][00197] Num frames 1600... [2024-09-21 13:29:40,433][00197] Num frames 1700... [2024-09-21 13:29:40,571][00197] Num frames 1800... [2024-09-21 13:29:40,731][00197] Num frames 1900... [2024-09-21 13:29:40,866][00197] Num frames 2000... [2024-09-21 13:29:41,006][00197] Num frames 2100... [2024-09-21 13:29:41,058][00197] Avg episode rewards: #0: 65.999, true rewards: #0: 21.000 [2024-09-21 13:29:41,061][00197] Avg episode reward: 65.999, avg true_objective: 21.000 [2024-09-21 13:29:41,201][00197] Num frames 2200... [2024-09-21 13:29:41,331][00197] Num frames 2300... [2024-09-21 13:29:41,470][00197] Num frames 2400... [2024-09-21 13:29:41,601][00197] Num frames 2500... [2024-09-21 13:29:41,726][00197] Num frames 2600... [2024-09-21 13:29:41,870][00197] Num frames 2700... [2024-09-21 13:29:41,998][00197] Num frames 2800... [2024-09-21 13:29:42,126][00197] Num frames 2900... [2024-09-21 13:29:42,264][00197] Num frames 3000... [2024-09-21 13:29:42,402][00197] Num frames 3100... [2024-09-21 13:29:42,542][00197] Num frames 3200... [2024-09-21 13:29:42,672][00197] Num frames 3300... [2024-09-21 13:29:42,802][00197] Num frames 3400... [2024-09-21 13:29:42,938][00197] Num frames 3500... [2024-09-21 13:29:43,078][00197] Num frames 3600... [2024-09-21 13:29:43,206][00197] Num frames 3700... [2024-09-21 13:29:43,338][00197] Num frames 3800... [2024-09-21 13:29:43,481][00197] Num frames 3900... [2024-09-21 13:29:43,613][00197] Num frames 4000... [2024-09-21 13:29:43,743][00197] Num frames 4100... [2024-09-21 13:29:43,889][00197] Num frames 4200... [2024-09-21 13:29:43,942][00197] Avg episode rewards: #0: 65.999, true rewards: #0: 21.000 [2024-09-21 13:29:43,944][00197] Avg episode reward: 65.999, avg true_objective: 21.000 [2024-09-21 13:29:44,073][00197] Num frames 4300... [2024-09-21 13:29:44,202][00197] Num frames 4400... [2024-09-21 13:29:44,339][00197] Num frames 4500... [2024-09-21 13:29:44,469][00197] Num frames 4600... [2024-09-21 13:29:44,613][00197] Num frames 4700... [2024-09-21 13:29:44,741][00197] Num frames 4800... [2024-09-21 13:29:44,880][00197] Num frames 4900... [2024-09-21 13:29:45,007][00197] Num frames 5000... [2024-09-21 13:29:45,145][00197] Num frames 5100... [2024-09-21 13:29:45,286][00197] Num frames 5200... [2024-09-21 13:29:45,418][00197] Num frames 5300... [2024-09-21 13:29:45,560][00197] Num frames 5400... [2024-09-21 13:29:45,690][00197] Num frames 5500... [2024-09-21 13:29:45,817][00197] Num frames 5600... [2024-09-21 13:29:45,956][00197] Num frames 5700... [2024-09-21 13:29:46,092][00197] Num frames 5800... [2024-09-21 13:29:46,225][00197] Num frames 5900... [2024-09-21 13:29:46,367][00197] Num frames 6000... [2024-09-21 13:29:46,498][00197] Num frames 6100... [2024-09-21 13:29:46,643][00197] Num frames 6200... [2024-09-21 13:29:46,771][00197] Num frames 6300... [2024-09-21 13:29:46,824][00197] Avg episode rewards: #0: 66.999, true rewards: #0: 21.000 [2024-09-21 13:29:46,826][00197] Avg episode reward: 66.999, avg true_objective: 21.000 [2024-09-21 13:29:46,970][00197] Num frames 6400... [2024-09-21 13:29:47,107][00197] Num frames 6500... [2024-09-21 13:29:47,237][00197] Num frames 6600... [2024-09-21 13:29:47,377][00197] Num frames 6700... [2024-09-21 13:29:47,515][00197] Num frames 6800... [2024-09-21 13:29:47,653][00197] Num frames 6900... [2024-09-21 13:29:47,776][00197] Num frames 7000... [2024-09-21 13:29:47,922][00197] Num frames 7100... [2024-09-21 13:29:48,059][00197] Num frames 7200... [2024-09-21 13:29:48,195][00197] Num frames 7300... [2024-09-21 13:29:48,330][00197] Num frames 7400... [2024-09-21 13:29:48,468][00197] Num frames 7500... [2024-09-21 13:29:48,637][00197] Num frames 7600... [2024-09-21 13:29:48,834][00197] Num frames 7700... [2024-09-21 13:29:49,024][00197] Num frames 7800... [2024-09-21 13:29:49,205][00197] Num frames 7900... [2024-09-21 13:29:49,394][00197] Num frames 8000... [2024-09-21 13:29:49,576][00197] Num frames 8100... [2024-09-21 13:29:49,771][00197] Num frames 8200... [2024-09-21 13:29:49,965][00197] Num frames 8300... [2024-09-21 13:29:50,156][00197] Num frames 8400... [2024-09-21 13:29:50,213][00197] Avg episode rewards: #0: 65.499, true rewards: #0: 21.000 [2024-09-21 13:29:50,215][00197] Avg episode reward: 65.499, avg true_objective: 21.000 [2024-09-21 13:29:50,403][00197] Num frames 8500... [2024-09-21 13:29:50,597][00197] Num frames 8600... [2024-09-21 13:29:50,786][00197] Num frames 8700... [2024-09-21 13:29:50,973][00197] Num frames 8800... [2024-09-21 13:29:51,156][00197] Num frames 8900... [2024-09-21 13:29:51,339][00197] Num frames 9000... [2024-09-21 13:29:51,484][00197] Num frames 9100... [2024-09-21 13:29:51,616][00197] Num frames 9200... [2024-09-21 13:29:51,751][00197] Num frames 9300... [2024-09-21 13:29:51,898][00197] Num frames 9400... [2024-09-21 13:29:52,031][00197] Num frames 9500... [2024-09-21 13:29:52,174][00197] Num frames 9600... [2024-09-21 13:29:52,309][00197] Num frames 9700... [2024-09-21 13:29:52,443][00197] Num frames 9800... [2024-09-21 13:29:52,575][00197] Num frames 9900... [2024-09-21 13:29:52,704][00197] Num frames 10000... [2024-09-21 13:29:52,838][00197] Num frames 10100... [2024-09-21 13:29:52,980][00197] Num frames 10200... [2024-09-21 13:29:53,108][00197] Num frames 10300... [2024-09-21 13:29:53,240][00197] Num frames 10400... [2024-09-21 13:29:53,380][00197] Num frames 10500... [2024-09-21 13:29:53,433][00197] Avg episode rewards: #0: 65.999, true rewards: #0: 21.000 [2024-09-21 13:29:53,435][00197] Avg episode reward: 65.999, avg true_objective: 21.000 [2024-09-21 13:29:53,579][00197] Num frames 10600... [2024-09-21 13:29:53,727][00197] Num frames 10700... [2024-09-21 13:29:53,876][00197] Num frames 10800... [2024-09-21 13:29:54,011][00197] Num frames 10900... [2024-09-21 13:29:54,142][00197] Num frames 11000... [2024-09-21 13:29:54,286][00197] Num frames 11100... [2024-09-21 13:29:54,412][00197] Num frames 11200... [2024-09-21 13:29:54,552][00197] Num frames 11300... [2024-09-21 13:29:54,693][00197] Num frames 11400... [2024-09-21 13:29:54,840][00197] Num frames 11500... [2024-09-21 13:29:54,985][00197] Num frames 11600... [2024-09-21 13:29:55,123][00197] Num frames 11700... [2024-09-21 13:29:55,259][00197] Num frames 11800... [2024-09-21 13:29:55,391][00197] Num frames 11900... [2024-09-21 13:29:55,520][00197] Num frames 12000... [2024-09-21 13:29:55,649][00197] Num frames 12100... [2024-09-21 13:29:55,777][00197] Num frames 12200... [2024-09-21 13:29:55,921][00197] Num frames 12300... [2024-09-21 13:29:56,060][00197] Num frames 12400... [2024-09-21 13:29:56,187][00197] Num frames 12500... [2024-09-21 13:29:56,329][00197] Num frames 12600... [2024-09-21 13:29:56,381][00197] Avg episode rewards: #0: 65.665, true rewards: #0: 21.000 [2024-09-21 13:29:56,384][00197] Avg episode reward: 65.665, avg true_objective: 21.000 [2024-09-21 13:29:56,515][00197] Num frames 12700... [2024-09-21 13:29:56,659][00197] Num frames 12800... [2024-09-21 13:29:56,791][00197] Num frames 12900... [2024-09-21 13:29:56,928][00197] Num frames 13000... [2024-09-21 13:29:57,063][00197] Num frames 13100... [2024-09-21 13:29:57,195][00197] Num frames 13200... [2024-09-21 13:29:57,333][00197] Num frames 13300... [2024-09-21 13:29:57,465][00197] Num frames 13400... [2024-09-21 13:29:57,605][00197] Num frames 13500... [2024-09-21 13:29:57,733][00197] Num frames 13600... [2024-09-21 13:29:57,872][00197] Num frames 13700... [2024-09-21 13:29:58,018][00197] Num frames 13800... [2024-09-21 13:29:58,151][00197] Num frames 13900... [2024-09-21 13:29:58,283][00197] Num frames 14000... [2024-09-21 13:29:58,422][00197] Num frames 14100... [2024-09-21 13:29:58,554][00197] Num frames 14200... [2024-09-21 13:29:58,696][00197] Num frames 14300... [2024-09-21 13:29:58,832][00197] Num frames 14400... [2024-09-21 13:29:58,969][00197] Num frames 14500... [2024-09-21 13:29:59,125][00197] Num frames 14600... [2024-09-21 13:29:59,267][00197] Num frames 14700... [2024-09-21 13:29:59,320][00197] Avg episode rewards: #0: 65.427, true rewards: #0: 21.000 [2024-09-21 13:29:59,322][00197] Avg episode reward: 65.427, avg true_objective: 21.000 [2024-09-21 13:29:59,448][00197] Num frames 14800... [2024-09-21 13:29:59,584][00197] Num frames 14900... [2024-09-21 13:29:59,710][00197] Num frames 15000... [2024-09-21 13:29:59,835][00197] Num frames 15100... [2024-09-21 13:29:59,977][00197] Num frames 15200... [2024-09-21 13:30:00,116][00197] Num frames 15300... [2024-09-21 13:30:00,260][00197] Num frames 15400... [2024-09-21 13:30:00,391][00197] Num frames 15500... [2024-09-21 13:30:00,521][00197] Num frames 15600... [2024-09-21 13:30:00,647][00197] Num frames 15700... [2024-09-21 13:30:00,777][00197] Num frames 15800... [2024-09-21 13:30:00,911][00197] Num frames 15900... [2024-09-21 13:30:01,040][00197] Num frames 16000... [2024-09-21 13:30:01,177][00197] Num frames 16100... [2024-09-21 13:30:01,313][00197] Num frames 16200... [2024-09-21 13:30:01,510][00197] Num frames 16300... [2024-09-21 13:30:01,690][00197] Num frames 16400... [2024-09-21 13:30:01,888][00197] Num frames 16500... [2024-09-21 13:30:02,070][00197] Num frames 16600... [2024-09-21 13:30:02,252][00197] Num frames 16700... [2024-09-21 13:30:02,436][00197] Num frames 16800... [2024-09-21 13:30:02,493][00197] Avg episode rewards: #0: 64.749, true rewards: #0: 21.000 [2024-09-21 13:30:02,495][00197] Avg episode reward: 64.749, avg true_objective: 21.000 [2024-09-21 13:30:02,685][00197] Num frames 16900... [2024-09-21 13:30:02,870][00197] Num frames 17000... [2024-09-21 13:30:03,063][00197] Num frames 17100... [2024-09-21 13:30:03,266][00197] Num frames 17200... [2024-09-21 13:30:03,450][00197] Num frames 17300... [2024-09-21 13:30:03,646][00197] Num frames 17400... [2024-09-21 13:30:03,845][00197] Num frames 17500... [2024-09-21 13:30:04,051][00197] Num frames 17600... [2024-09-21 13:30:04,205][00197] Num frames 17700... [2024-09-21 13:30:04,347][00197] Num frames 17800... [2024-09-21 13:30:04,474][00197] Num frames 17900... [2024-09-21 13:30:04,601][00197] Num frames 18000... [2024-09-21 13:30:04,735][00197] Num frames 18100... [2024-09-21 13:30:04,875][00197] Num frames 18200... [2024-09-21 13:30:05,007][00197] Num frames 18300... [2024-09-21 13:30:05,136][00197] Num frames 18400... [2024-09-21 13:30:05,282][00197] Num frames 18500... [2024-09-21 13:30:05,417][00197] Num frames 18600... [2024-09-21 13:30:05,546][00197] Num frames 18700... [2024-09-21 13:30:05,680][00197] Num frames 18800... [2024-09-21 13:30:05,808][00197] Num frames 18900... [2024-09-21 13:30:05,860][00197] Avg episode rewards: #0: 64.665, true rewards: #0: 21.000 [2024-09-21 13:30:05,862][00197] Avg episode reward: 64.665, avg true_objective: 21.000 [2024-09-21 13:30:05,992][00197] Num frames 19000... [2024-09-21 13:30:06,121][00197] Num frames 19100... [2024-09-21 13:30:06,250][00197] Num frames 19200... [2024-09-21 13:30:06,388][00197] Num frames 19300... [2024-09-21 13:30:06,522][00197] Num frames 19400... [2024-09-21 13:30:06,653][00197] Num frames 19500... [2024-09-21 13:30:06,783][00197] Num frames 19600... [2024-09-21 13:30:06,930][00197] Num frames 19700... [2024-09-21 13:30:07,062][00197] Num frames 19800... [2024-09-21 13:30:07,188][00197] Num frames 19900... [2024-09-21 13:30:07,330][00197] Num frames 20000... [2024-09-21 13:30:07,460][00197] Num frames 20100... [2024-09-21 13:30:07,589][00197] Num frames 20200... [2024-09-21 13:30:07,720][00197] Num frames 20300... [2024-09-21 13:30:07,856][00197] Num frames 20400... [2024-09-21 13:30:07,988][00197] Num frames 20500... [2024-09-21 13:30:08,126][00197] Num frames 20600... [2024-09-21 13:30:08,263][00197] Num frames 20700... [2024-09-21 13:30:08,405][00197] Num frames 20800... [2024-09-21 13:30:08,542][00197] Num frames 20900... [2024-09-21 13:30:08,682][00197] Num frames 21000... [2024-09-21 13:30:08,735][00197] Avg episode rewards: #0: 64.899, true rewards: #0: 21.000 [2024-09-21 13:30:08,737][00197] Avg episode reward: 64.899, avg true_objective: 21.000 [2024-09-21 13:32:23,807][00197] Replay video saved to train_dir/doom_health_gathering_supreme_2222/replay.mp4! [2024-09-21 13:46:20,436][00197] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-09-21 13:46:20,438][00197] Overriding arg 'num_workers' with value 1 passed from command line [2024-09-21 13:46:20,440][00197] Adding new argument 'no_render'=True that is not in the saved config file! [2024-09-21 13:46:20,442][00197] Adding new argument 'save_video'=True that is not in the saved config file! [2024-09-21 13:46:20,443][00197] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-09-21 13:46:20,445][00197] Adding new argument 'video_name'=None that is not in the saved config file! [2024-09-21 13:46:20,446][00197] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2024-09-21 13:46:20,447][00197] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-09-21 13:46:20,449][00197] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2024-09-21 13:46:20,450][00197] Adding new argument 'hf_repository'='yhyeo0202/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2024-09-21 13:46:20,452][00197] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-09-21 13:46:20,453][00197] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-09-21 13:46:20,455][00197] Adding new argument 'train_script'=None that is not in the saved config file! [2024-09-21 13:46:20,456][00197] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-09-21 13:46:20,458][00197] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-09-21 13:46:20,476][00197] RunningMeanStd input shape: (3, 72, 128) [2024-09-21 13:46:20,479][00197] RunningMeanStd input shape: (1,) [2024-09-21 13:46:20,492][00197] ConvEncoder: input_channels=3 [2024-09-21 13:46:20,529][00197] Conv encoder output size: 512 [2024-09-21 13:46:20,532][00197] Policy head output size: 512 [2024-09-21 13:46:20,554][00197] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-09-21 13:46:21,077][00197] Num frames 100... [2024-09-21 13:46:21,203][00197] Num frames 200... [2024-09-21 13:46:21,327][00197] Num frames 300... [2024-09-21 13:46:21,451][00197] Num frames 400... [2024-09-21 13:46:21,575][00197] Num frames 500... [2024-09-21 13:46:21,697][00197] Num frames 600... [2024-09-21 13:46:21,834][00197] Num frames 700... [2024-09-21 13:46:21,963][00197] Num frames 800... [2024-09-21 13:46:22,099][00197] Avg episode rewards: #0: 15.640, true rewards: #0: 8.640 [2024-09-21 13:46:22,101][00197] Avg episode reward: 15.640, avg true_objective: 8.640 [2024-09-21 13:46:22,149][00197] Num frames 900... [2024-09-21 13:46:22,281][00197] Num frames 1000... [2024-09-21 13:46:22,462][00197] Num frames 1100... [2024-09-21 13:46:22,626][00197] Num frames 1200... [2024-09-21 13:46:22,804][00197] Num frames 1300... [2024-09-21 13:46:22,973][00197] Num frames 1400... [2024-09-21 13:46:23,149][00197] Num frames 1500... [2024-09-21 13:46:23,314][00197] Num frames 1600... [2024-09-21 13:46:23,485][00197] Num frames 1700... [2024-09-21 13:46:23,664][00197] Num frames 1800... [2024-09-21 13:46:23,854][00197] Num frames 1900... [2024-09-21 13:46:24,028][00197] Num frames 2000... [2024-09-21 13:46:24,212][00197] Num frames 2100... [2024-09-21 13:46:24,390][00197] Num frames 2200... [2024-09-21 13:46:24,566][00197] Num frames 2300... [2024-09-21 13:46:24,632][00197] Avg episode rewards: #0: 23.020, true rewards: #0: 11.520 [2024-09-21 13:46:24,635][00197] Avg episode reward: 23.020, avg true_objective: 11.520 [2024-09-21 13:46:24,823][00197] Num frames 2400... [2024-09-21 13:46:24,981][00197] Num frames 2500... [2024-09-21 13:46:25,106][00197] Num frames 2600... [2024-09-21 13:46:25,239][00197] Num frames 2700... [2024-09-21 13:46:25,373][00197] Num frames 2800... [2024-09-21 13:46:25,498][00197] Num frames 2900... [2024-09-21 13:46:25,622][00197] Num frames 3000... [2024-09-21 13:46:25,755][00197] Num frames 3100... [2024-09-21 13:46:25,883][00197] Num frames 3200... [2024-09-21 13:46:26,016][00197] Num frames 3300... [2024-09-21 13:46:26,144][00197] Num frames 3400... [2024-09-21 13:46:26,277][00197] Num frames 3500... [2024-09-21 13:46:26,361][00197] Avg episode rewards: #0: 23.400, true rewards: #0: 11.733 [2024-09-21 13:46:26,363][00197] Avg episode reward: 23.400, avg true_objective: 11.733 [2024-09-21 13:46:26,470][00197] Num frames 3600... [2024-09-21 13:46:26,596][00197] Num frames 3700... [2024-09-21 13:46:26,727][00197] Num frames 3800... [2024-09-21 13:46:26,852][00197] Num frames 3900... [2024-09-21 13:46:27,015][00197] Avg episode rewards: #0: 20.183, true rewards: #0: 9.932 [2024-09-21 13:46:27,016][00197] Avg episode reward: 20.183, avg true_objective: 9.932 [2024-09-21 13:46:27,058][00197] Num frames 4000... [2024-09-21 13:46:27,180][00197] Num frames 4100... [2024-09-21 13:46:27,309][00197] Num frames 4200... [2024-09-21 13:46:27,435][00197] Num frames 4300... [2024-09-21 13:46:27,563][00197] Num frames 4400... [2024-09-21 13:46:27,697][00197] Num frames 4500... [2024-09-21 13:46:27,825][00197] Num frames 4600... [2024-09-21 13:46:27,966][00197] Num frames 4700... [2024-09-21 13:46:28,095][00197] Num frames 4800... [2024-09-21 13:46:28,246][00197] Num frames 4900... [2024-09-21 13:46:28,356][00197] Avg episode rewards: #0: 21.274, true rewards: #0: 9.874 [2024-09-21 13:46:28,357][00197] Avg episode reward: 21.274, avg true_objective: 9.874 [2024-09-21 13:46:28,444][00197] Num frames 5000... [2024-09-21 13:46:28,571][00197] Num frames 5100... [2024-09-21 13:46:28,712][00197] Num frames 5200... [2024-09-21 13:46:28,844][00197] Num frames 5300... [2024-09-21 13:46:28,976][00197] Num frames 5400... [2024-09-21 13:46:29,116][00197] Num frames 5500... [2024-09-21 13:46:29,241][00197] Num frames 5600... [2024-09-21 13:46:29,371][00197] Num frames 5700... [2024-09-21 13:46:29,502][00197] Num frames 5800... [2024-09-21 13:46:29,628][00197] Num frames 5900... [2024-09-21 13:46:29,760][00197] Num frames 6000... [2024-09-21 13:46:29,891][00197] Num frames 6100... [2024-09-21 13:46:30,069][00197] Avg episode rewards: #0: 23.154, true rewards: #0: 10.320 [2024-09-21 13:46:30,071][00197] Avg episode reward: 23.154, avg true_objective: 10.320 [2024-09-21 13:46:30,086][00197] Num frames 6200... [2024-09-21 13:46:30,217][00197] Num frames 6300... [2024-09-21 13:46:30,349][00197] Num frames 6400... [2024-09-21 13:46:30,485][00197] Num frames 6500... [2024-09-21 13:46:30,608][00197] Num frames 6600... [2024-09-21 13:46:30,757][00197] Avg episode rewards: #0: 20.954, true rewards: #0: 9.526 [2024-09-21 13:46:30,758][00197] Avg episode reward: 20.954, avg true_objective: 9.526 [2024-09-21 13:46:30,803][00197] Num frames 6700... [2024-09-21 13:46:30,929][00197] Num frames 6800... [2024-09-21 13:46:31,072][00197] Num frames 6900... [2024-09-21 13:46:31,196][00197] Num frames 7000... [2024-09-21 13:46:31,328][00197] Num frames 7100... [2024-09-21 13:46:31,458][00197] Num frames 7200... [2024-09-21 13:46:31,608][00197] Avg episode rewards: #0: 20.220, true rewards: #0: 9.095 [2024-09-21 13:46:31,610][00197] Avg episode reward: 20.220, avg true_objective: 9.095 [2024-09-21 13:46:31,643][00197] Num frames 7300... [2024-09-21 13:46:31,777][00197] Num frames 7400... [2024-09-21 13:46:31,910][00197] Num frames 7500... [2024-09-21 13:46:32,041][00197] Num frames 7600... [2024-09-21 13:46:32,182][00197] Num frames 7700... [2024-09-21 13:46:32,270][00197] Avg episode rewards: #0: 18.805, true rewards: #0: 8.582 [2024-09-21 13:46:32,271][00197] Avg episode reward: 18.805, avg true_objective: 8.582 [2024-09-21 13:46:32,372][00197] Num frames 7800... [2024-09-21 13:46:32,511][00197] Num frames 7900... [2024-09-21 13:46:32,646][00197] Num frames 8000... [2024-09-21 13:46:32,793][00197] Num frames 8100... [2024-09-21 13:46:32,889][00197] Avg episode rewards: #0: 17.530, true rewards: #0: 8.130 [2024-09-21 13:46:32,891][00197] Avg episode reward: 17.530, avg true_objective: 8.130 [2024-09-21 13:47:25,447][00197] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2024-09-21 13:50:04,195][00197] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-09-21 13:50:04,196][00197] Overriding arg 'num_workers' with value 1 passed from command line [2024-09-21 13:50:04,198][00197] Adding new argument 'no_render'=True that is not in the saved config file! [2024-09-21 13:50:04,200][00197] Adding new argument 'save_video'=True that is not in the saved config file! [2024-09-21 13:50:04,201][00197] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-09-21 13:50:04,203][00197] Adding new argument 'video_name'=None that is not in the saved config file! [2024-09-21 13:50:04,206][00197] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2024-09-21 13:50:04,207][00197] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-09-21 13:50:04,209][00197] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2024-09-21 13:50:04,211][00197] Adding new argument 'hf_repository'='yhyeo0202/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2024-09-21 13:50:04,213][00197] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-09-21 13:50:04,215][00197] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-09-21 13:50:04,217][00197] Adding new argument 'train_script'=None that is not in the saved config file! [2024-09-21 13:50:04,218][00197] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-09-21 13:50:04,219][00197] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-09-21 13:50:04,236][00197] RunningMeanStd input shape: (3, 72, 128) [2024-09-21 13:50:04,238][00197] RunningMeanStd input shape: (1,) [2024-09-21 13:50:04,251][00197] ConvEncoder: input_channels=3 [2024-09-21 13:50:04,291][00197] Conv encoder output size: 512 [2024-09-21 13:50:04,293][00197] Policy head output size: 512 [2024-09-21 13:50:04,315][00197] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-09-21 13:50:04,862][00197] Num frames 100... [2024-09-21 13:50:04,984][00197] Num frames 200... [2024-09-21 13:50:05,111][00197] Num frames 300... [2024-09-21 13:50:05,241][00197] Num frames 400... [2024-09-21 13:50:05,401][00197] Avg episode rewards: #0: 8.750, true rewards: #0: 4.750 [2024-09-21 13:50:05,403][00197] Avg episode reward: 8.750, avg true_objective: 4.750 [2024-09-21 13:50:05,437][00197] Num frames 500... [2024-09-21 13:50:05,566][00197] Num frames 600... [2024-09-21 13:50:05,701][00197] Num frames 700... [2024-09-21 13:50:05,826][00197] Num frames 800... [2024-09-21 13:50:05,949][00197] Num frames 900... [2024-09-21 13:50:06,075][00197] Num frames 1000... [2024-09-21 13:50:06,204][00197] Num frames 1100... [2024-09-21 13:50:06,345][00197] Num frames 1200... [2024-09-21 13:50:06,485][00197] Num frames 1300... [2024-09-21 13:50:06,613][00197] Num frames 1400... [2024-09-21 13:50:06,741][00197] Avg episode rewards: #0: 16.760, true rewards: #0: 7.260 [2024-09-21 13:50:06,744][00197] Avg episode reward: 16.760, avg true_objective: 7.260 [2024-09-21 13:50:06,806][00197] Num frames 1500... [2024-09-21 13:50:06,935][00197] Num frames 1600... [2024-09-21 13:50:07,066][00197] Num frames 1700... [2024-09-21 13:50:07,198][00197] Num frames 1800... [2024-09-21 13:50:07,330][00197] Num frames 1900... [2024-09-21 13:50:07,464][00197] Num frames 2000... [2024-09-21 13:50:07,600][00197] Num frames 2100... [2024-09-21 13:50:07,732][00197] Num frames 2200... [2024-09-21 13:50:07,858][00197] Avg episode rewards: #0: 16.507, true rewards: #0: 7.507 [2024-09-21 13:50:07,861][00197] Avg episode reward: 16.507, avg true_objective: 7.507 [2024-09-21 13:50:07,925][00197] Num frames 2300... [2024-09-21 13:50:08,049][00197] Num frames 2400... [2024-09-21 13:50:08,181][00197] Num frames 2500... [2024-09-21 13:50:08,315][00197] Num frames 2600... [2024-09-21 13:50:08,467][00197] Num frames 2700... [2024-09-21 13:50:08,603][00197] Num frames 2800... [2024-09-21 13:50:08,753][00197] Num frames 2900... [2024-09-21 13:50:08,888][00197] Num frames 3000... [2024-09-21 13:50:09,028][00197] Num frames 3100... [2024-09-21 13:50:09,181][00197] Num frames 3200... [2024-09-21 13:50:09,374][00197] Num frames 3300... [2024-09-21 13:50:09,551][00197] Num frames 3400... [2024-09-21 13:50:09,736][00197] Num frames 3500... [2024-09-21 13:50:09,851][00197] Avg episode rewards: #0: 20.080, true rewards: #0: 8.830 [2024-09-21 13:50:09,853][00197] Avg episode reward: 20.080, avg true_objective: 8.830 [2024-09-21 13:50:09,974][00197] Num frames 3600... [2024-09-21 13:50:10,156][00197] Num frames 3700... [2024-09-21 13:50:10,333][00197] Num frames 3800... [2024-09-21 13:50:10,530][00197] Num frames 3900... [2024-09-21 13:50:10,711][00197] Num frames 4000... [2024-09-21 13:50:10,891][00197] Num frames 4100... [2024-09-21 13:50:11,111][00197] Num frames 4200... [2024-09-21 13:50:11,181][00197] Avg episode rewards: #0: 19.008, true rewards: #0: 8.408 [2024-09-21 13:50:11,184][00197] Avg episode reward: 19.008, avg true_objective: 8.408 [2024-09-21 13:50:11,381][00197] Num frames 4300... [2024-09-21 13:50:11,596][00197] Num frames 4400... [2024-09-21 13:50:11,795][00197] Num frames 4500... [2024-09-21 13:50:12,009][00197] Num frames 4600... [2024-09-21 13:50:12,257][00197] Num frames 4700... [2024-09-21 13:50:12,481][00197] Num frames 4800... [2024-09-21 13:50:12,658][00197] Num frames 4900... [2024-09-21 13:50:12,837][00197] Num frames 5000... [2024-09-21 13:50:13,010][00197] Num frames 5100... [2024-09-21 13:50:13,199][00197] Num frames 5200... [2024-09-21 13:50:13,387][00197] Num frames 5300... [2024-09-21 13:50:13,584][00197] Num frames 5400... [2024-09-21 13:50:13,776][00197] Num frames 5500... [2024-09-21 13:50:13,960][00197] Num frames 5600... [2024-09-21 13:50:14,149][00197] Num frames 5700... [2024-09-21 13:50:14,335][00197] Num frames 5800... [2024-09-21 13:50:14,527][00197] Num frames 5900... [2024-09-21 13:50:14,634][00197] Avg episode rewards: #0: 23.887, true rewards: #0: 9.887 [2024-09-21 13:50:14,636][00197] Avg episode reward: 23.887, avg true_objective: 9.887 [2024-09-21 13:50:14,734][00197] Num frames 6000... [2024-09-21 13:50:14,875][00197] Num frames 6100... [2024-09-21 13:50:14,998][00197] Num frames 6200... [2024-09-21 13:50:15,123][00197] Num frames 6300... [2024-09-21 13:50:15,289][00197] Num frames 6400... [2024-09-21 13:50:15,423][00197] Num frames 6500... [2024-09-21 13:50:15,547][00197] Num frames 6600... [2024-09-21 13:50:15,686][00197] Num frames 6700... [2024-09-21 13:50:15,822][00197] Num frames 6800... [2024-09-21 13:50:15,949][00197] Num frames 6900... [2024-09-21 13:50:16,075][00197] Num frames 7000... [2024-09-21 13:50:16,201][00197] Num frames 7100... [2024-09-21 13:50:16,333][00197] Num frames 7200... [2024-09-21 13:50:16,456][00197] Num frames 7300... [2024-09-21 13:50:16,587][00197] Num frames 7400... [2024-09-21 13:50:16,718][00197] Num frames 7500... [2024-09-21 13:50:16,854][00197] Num frames 7600... [2024-09-21 13:50:16,947][00197] Avg episode rewards: #0: 25.897, true rewards: #0: 10.897 [2024-09-21 13:50:16,952][00197] Avg episode reward: 25.897, avg true_objective: 10.897 [2024-09-21 13:50:17,043][00197] Num frames 7700... [2024-09-21 13:50:17,171][00197] Num frames 7800... [2024-09-21 13:50:17,300][00197] Num frames 7900... [2024-09-21 13:50:17,427][00197] Num frames 8000... [2024-09-21 13:50:17,559][00197] Num frames 8100... [2024-09-21 13:50:17,709][00197] Num frames 8200... [2024-09-21 13:50:17,793][00197] Avg episode rewards: #0: 24.149, true rewards: #0: 10.274 [2024-09-21 13:50:17,796][00197] Avg episode reward: 24.149, avg true_objective: 10.274 [2024-09-21 13:50:17,920][00197] Num frames 8300... [2024-09-21 13:50:18,059][00197] Num frames 8400... [2024-09-21 13:50:18,204][00197] Num frames 8500... [2024-09-21 13:50:18,335][00197] Num frames 8600... [2024-09-21 13:50:18,456][00197] Num frames 8700... [2024-09-21 13:50:18,586][00197] Num frames 8800... [2024-09-21 13:50:18,728][00197] Num frames 8900... [2024-09-21 13:50:18,890][00197] Avg episode rewards: #0: 23.208, true rewards: #0: 9.986 [2024-09-21 13:50:18,892][00197] Avg episode reward: 23.208, avg true_objective: 9.986 [2024-09-21 13:50:18,915][00197] Num frames 9000... [2024-09-21 13:50:19,040][00197] Num frames 9100... [2024-09-21 13:50:19,161][00197] Num frames 9200... [2024-09-21 13:50:19,293][00197] Num frames 9300... [2024-09-21 13:50:19,422][00197] Num frames 9400... [2024-09-21 13:50:19,552][00197] Num frames 9500... [2024-09-21 13:50:19,686][00197] Num frames 9600... [2024-09-21 13:50:19,853][00197] Avg episode rewards: #0: 22.091, true rewards: #0: 9.691 [2024-09-21 13:50:19,855][00197] Avg episode reward: 22.091, avg true_objective: 9.691 [2024-09-21 13:51:21,987][00197] Replay video saved to /content/train_dir/default_experiment/replay.mp4!