[2024-11-23 14:45:22,117][09965] Saving configuration to /content/train_dir/default_experiment/config.json... [2024-11-23 14:45:22,124][09965] Rollout worker 0 uses device cpu [2024-11-23 14:45:22,127][09965] Rollout worker 1 uses device cpu [2024-11-23 14:45:22,131][09965] Rollout worker 2 uses device cpu [2024-11-23 14:45:22,134][09965] Rollout worker 3 uses device cpu [2024-11-23 14:45:22,155][09965] Rollout worker 4 uses device cpu [2024-11-23 14:45:22,163][09965] Rollout worker 5 uses device cpu [2024-11-23 14:45:22,169][09965] Rollout worker 6 uses device cpu [2024-11-23 14:45:22,175][09965] Rollout worker 7 uses device cpu [2024-11-23 14:45:22,407][09965] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-11-23 14:45:22,412][09965] InferenceWorker_p0-w0: min num requests: 2 [2024-11-23 14:45:22,456][09965] Starting all processes... [2024-11-23 14:45:22,460][09965] Starting process learner_proc0 [2024-11-23 14:45:22,513][09965] Starting all processes... [2024-11-23 14:45:22,525][09965] Starting process inference_proc0-0 [2024-11-23 14:45:22,532][09965] Starting process rollout_proc0 [2024-11-23 14:45:22,532][09965] Starting process rollout_proc1 [2024-11-23 14:45:22,532][09965] Starting process rollout_proc2 [2024-11-23 14:45:22,532][09965] Starting process rollout_proc3 [2024-11-23 14:45:22,532][09965] Starting process rollout_proc4 [2024-11-23 14:45:22,532][09965] Starting process rollout_proc5 [2024-11-23 14:45:22,532][09965] Starting process rollout_proc6 [2024-11-23 14:45:22,532][09965] Starting process rollout_proc7 [2024-11-23 14:45:33,425][11197] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-11-23 14:45:33,427][11197] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2024-11-23 14:45:33,513][11197] Num visible devices: 1 [2024-11-23 14:45:33,615][11184] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-11-23 14:45:33,616][11184] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2024-11-23 14:45:33,676][11184] Num visible devices: 1 [2024-11-23 14:45:33,708][11184] Starting seed is not provided [2024-11-23 14:45:33,709][11199] Worker 0 uses CPU cores [0] [2024-11-23 14:45:33,709][11184] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-11-23 14:45:33,710][11184] Initializing actor-critic model on device cuda:0 [2024-11-23 14:45:33,711][11184] RunningMeanStd input shape: (3, 72, 128) [2024-11-23 14:45:33,712][11184] RunningMeanStd input shape: (1,) [2024-11-23 14:45:33,742][11198] Worker 1 uses CPU cores [1] [2024-11-23 14:45:33,797][11184] ConvEncoder: input_channels=3 [2024-11-23 14:45:33,813][11200] Worker 2 uses CPU cores [0] [2024-11-23 14:45:33,899][11205] Worker 7 uses CPU cores [1] [2024-11-23 14:45:33,908][11201] Worker 3 uses CPU cores [1] [2024-11-23 14:45:33,925][11203] Worker 4 uses CPU cores [0] [2024-11-23 14:45:33,960][11202] Worker 5 uses CPU cores [1] [2024-11-23 14:45:33,964][11204] Worker 6 uses CPU cores [0] [2024-11-23 14:45:34,071][11184] Conv encoder output size: 512 [2024-11-23 14:45:34,071][11184] Policy head output size: 512 [2024-11-23 14:45:34,085][11184] Created Actor Critic model with architecture: [2024-11-23 14:45:34,086][11184] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2024-11-23 14:45:38,471][11184] Using optimizer [2024-11-23 14:45:38,472][11184] No checkpoints found [2024-11-23 14:45:38,472][11184] Did not load from checkpoint, starting from scratch! [2024-11-23 14:45:38,473][11184] Initialized policy 0 weights for model version 0 [2024-11-23 14:45:38,475][11184] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-11-23 14:45:38,482][11184] LearnerWorker_p0 finished initialization! [2024-11-23 14:45:38,668][11197] RunningMeanStd input shape: (3, 72, 128) [2024-11-23 14:45:38,671][11197] RunningMeanStd input shape: (1,) [2024-11-23 14:45:38,685][11197] ConvEncoder: input_channels=3 [2024-11-23 14:45:38,786][11197] Conv encoder output size: 512 [2024-11-23 14:45:38,787][11197] Policy head output size: 512 [2024-11-23 14:45:40,310][09965] Inference worker 0-0 is ready! [2024-11-23 14:45:40,313][09965] All inference workers are ready! Signal rollout workers to start! [2024-11-23 14:45:40,447][11202] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-23 14:45:40,469][11199] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-23 14:45:40,472][11203] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-23 14:45:40,478][11201] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-23 14:45:40,480][11205] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-23 14:45:40,482][11198] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-23 14:45:40,487][11204] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-23 14:45:40,502][11200] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-23 14:45:42,028][11200] Decorrelating experience for 0 frames... [2024-11-23 14:45:42,026][11202] Decorrelating experience for 0 frames... [2024-11-23 14:45:42,027][11198] Decorrelating experience for 0 frames... [2024-11-23 14:45:42,028][11199] Decorrelating experience for 0 frames... [2024-11-23 14:45:42,028][11201] Decorrelating experience for 0 frames... [2024-11-23 14:45:42,027][11203] Decorrelating experience for 0 frames... [2024-11-23 14:45:42,031][11204] Decorrelating experience for 0 frames... [2024-11-23 14:45:42,365][09965] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-11-23 14:45:42,397][09965] Heartbeat connected on Batcher_0 [2024-11-23 14:45:42,401][09965] Heartbeat connected on LearnerWorker_p0 [2024-11-23 14:45:42,444][09965] Heartbeat connected on InferenceWorker_p0-w0 [2024-11-23 14:45:42,798][11198] Decorrelating experience for 32 frames... [2024-11-23 14:45:42,818][11203] Decorrelating experience for 32 frames... [2024-11-23 14:45:42,820][11199] Decorrelating experience for 32 frames... [2024-11-23 14:45:42,821][11205] Decorrelating experience for 0 frames... [2024-11-23 14:45:43,343][11204] Decorrelating experience for 32 frames... [2024-11-23 14:45:44,142][11203] Decorrelating experience for 64 frames... [2024-11-23 14:45:44,218][11204] Decorrelating experience for 64 frames... [2024-11-23 14:45:44,308][11201] Decorrelating experience for 32 frames... [2024-11-23 14:45:44,323][11205] Decorrelating experience for 32 frames... [2024-11-23 14:45:44,381][11202] Decorrelating experience for 32 frames... [2024-11-23 14:45:44,541][11198] Decorrelating experience for 64 frames... [2024-11-23 14:45:45,622][11203] Decorrelating experience for 96 frames... [2024-11-23 14:45:45,728][11201] Decorrelating experience for 64 frames... [2024-11-23 14:45:45,726][11204] Decorrelating experience for 96 frames... [2024-11-23 14:45:45,781][11202] Decorrelating experience for 64 frames... [2024-11-23 14:45:45,837][11198] Decorrelating experience for 96 frames... [2024-11-23 14:45:45,861][11199] Decorrelating experience for 64 frames... [2024-11-23 14:45:45,877][09965] Heartbeat connected on RolloutWorker_w4 [2024-11-23 14:45:45,986][11200] Decorrelating experience for 32 frames... [2024-11-23 14:45:45,996][09965] Heartbeat connected on RolloutWorker_w6 [2024-11-23 14:45:46,105][09965] Heartbeat connected on RolloutWorker_w1 [2024-11-23 14:45:46,999][11199] Decorrelating experience for 96 frames... [2024-11-23 14:45:47,190][11200] Decorrelating experience for 64 frames... [2024-11-23 14:45:47,214][09965] Heartbeat connected on RolloutWorker_w0 [2024-11-23 14:45:47,250][11201] Decorrelating experience for 96 frames... [2024-11-23 14:45:47,316][11202] Decorrelating experience for 96 frames... [2024-11-23 14:45:47,365][09965] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-11-23 14:45:47,376][11205] Decorrelating experience for 64 frames... [2024-11-23 14:45:47,511][09965] Heartbeat connected on RolloutWorker_w3 [2024-11-23 14:45:47,570][09965] Heartbeat connected on RolloutWorker_w5 [2024-11-23 14:45:48,049][11200] Decorrelating experience for 96 frames... [2024-11-23 14:45:48,177][09965] Heartbeat connected on RolloutWorker_w2 [2024-11-23 14:45:48,216][11205] Decorrelating experience for 96 frames... [2024-11-23 14:45:48,304][09965] Heartbeat connected on RolloutWorker_w7 [2024-11-23 14:45:52,370][09965] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 153.5. Samples: 1536. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-11-23 14:45:52,373][09965] Avg episode reward: [(0, '1.280')] [2024-11-23 14:45:53,848][11184] Signal inference workers to stop experience collection... [2024-11-23 14:45:53,873][11197] InferenceWorker_p0-w0: stopping experience collection [2024-11-23 14:45:55,885][11184] Signal inference workers to resume experience collection... [2024-11-23 14:45:55,886][11197] InferenceWorker_p0-w0: resuming experience collection [2024-11-23 14:45:57,365][09965] Fps is (10 sec: 409.6, 60 sec: 273.1, 300 sec: 273.1). Total num frames: 4096. Throughput: 0: 148.5. Samples: 2228. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-11-23 14:45:57,367][09965] Avg episode reward: [(0, '2.435')] [2024-11-23 14:46:02,365][09965] Fps is (10 sec: 2868.7, 60 sec: 1433.6, 300 sec: 1433.6). Total num frames: 28672. Throughput: 0: 369.4. Samples: 7388. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-11-23 14:46:02,367][09965] Avg episode reward: [(0, '3.711')] [2024-11-23 14:46:04,204][11197] Updated weights for policy 0, policy_version 10 (0.0538) [2024-11-23 14:46:07,365][09965] Fps is (10 sec: 4505.6, 60 sec: 1966.1, 300 sec: 1966.1). Total num frames: 49152. Throughput: 0: 518.0. Samples: 12950. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-11-23 14:46:07,367][09965] Avg episode reward: [(0, '4.342')] [2024-11-23 14:46:12,365][09965] Fps is (10 sec: 3686.4, 60 sec: 2184.5, 300 sec: 2184.5). Total num frames: 65536. Throughput: 0: 504.4. Samples: 15132. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-11-23 14:46:12,370][09965] Avg episode reward: [(0, '4.397')] [2024-11-23 14:46:15,901][11197] Updated weights for policy 0, policy_version 20 (0.0022) [2024-11-23 14:46:17,365][09965] Fps is (10 sec: 3686.4, 60 sec: 2457.6, 300 sec: 2457.6). Total num frames: 86016. Throughput: 0: 607.3. Samples: 21256. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-11-23 14:46:17,367][09965] Avg episode reward: [(0, '4.263')] [2024-11-23 14:46:22,365][09965] Fps is (10 sec: 4505.5, 60 sec: 2764.8, 300 sec: 2764.8). Total num frames: 110592. Throughput: 0: 703.3. Samples: 28132. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-11-23 14:46:22,372][09965] Avg episode reward: [(0, '4.079')] [2024-11-23 14:46:22,374][11184] Saving new best policy, reward=4.079! [2024-11-23 14:46:26,629][11197] Updated weights for policy 0, policy_version 30 (0.0023) [2024-11-23 14:46:27,365][09965] Fps is (10 sec: 3686.4, 60 sec: 2730.7, 300 sec: 2730.7). Total num frames: 122880. Throughput: 0: 672.7. Samples: 30270. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-23 14:46:27,367][09965] Avg episode reward: [(0, '4.189')] [2024-11-23 14:46:27,379][11184] Saving new best policy, reward=4.189! [2024-11-23 14:46:32,365][09965] Fps is (10 sec: 3276.9, 60 sec: 2867.2, 300 sec: 2867.2). Total num frames: 143360. Throughput: 0: 789.9. Samples: 35544. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-23 14:46:32,370][09965] Avg episode reward: [(0, '4.342')] [2024-11-23 14:46:32,374][11184] Saving new best policy, reward=4.342! [2024-11-23 14:46:36,189][11197] Updated weights for policy 0, policy_version 40 (0.0022) [2024-11-23 14:46:37,365][09965] Fps is (10 sec: 4505.5, 60 sec: 3053.4, 300 sec: 3053.4). Total num frames: 167936. Throughput: 0: 912.3. Samples: 42586. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-11-23 14:46:37,370][09965] Avg episode reward: [(0, '4.400')] [2024-11-23 14:46:37,376][11184] Saving new best policy, reward=4.400! [2024-11-23 14:46:42,365][09965] Fps is (10 sec: 4096.0, 60 sec: 3072.0, 300 sec: 3072.0). Total num frames: 184320. Throughput: 0: 962.8. Samples: 45552. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-11-23 14:46:42,369][09965] Avg episode reward: [(0, '4.477')] [2024-11-23 14:46:42,376][11184] Saving new best policy, reward=4.477! [2024-11-23 14:46:47,365][09965] Fps is (10 sec: 3276.9, 60 sec: 3345.1, 300 sec: 3087.7). Total num frames: 200704. Throughput: 0: 945.4. Samples: 49930. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-23 14:46:47,369][09965] Avg episode reward: [(0, '4.343')] [2024-11-23 14:46:47,836][11197] Updated weights for policy 0, policy_version 50 (0.0028) [2024-11-23 14:46:52,365][09965] Fps is (10 sec: 4096.0, 60 sec: 3755.0, 300 sec: 3218.3). Total num frames: 225280. Throughput: 0: 978.1. Samples: 56964. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-23 14:46:52,367][09965] Avg episode reward: [(0, '4.175')] [2024-11-23 14:46:56,533][11197] Updated weights for policy 0, policy_version 60 (0.0034) [2024-11-23 14:46:57,365][09965] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3276.8). Total num frames: 245760. Throughput: 0: 1009.0. Samples: 60538. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-11-23 14:46:57,370][09965] Avg episode reward: [(0, '4.380')] [2024-11-23 14:47:02,365][09965] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3276.8). Total num frames: 262144. Throughput: 0: 980.2. Samples: 65364. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-11-23 14:47:02,372][09965] Avg episode reward: [(0, '4.658')] [2024-11-23 14:47:02,375][11184] Saving new best policy, reward=4.658! [2024-11-23 14:47:07,365][09965] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3276.8). Total num frames: 278528. Throughput: 0: 957.8. Samples: 71234. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-11-23 14:47:07,368][09965] Avg episode reward: [(0, '4.477')] [2024-11-23 14:47:08,289][11197] Updated weights for policy 0, policy_version 70 (0.0033) [2024-11-23 14:47:12,365][09965] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3367.8). Total num frames: 303104. Throughput: 0: 988.0. Samples: 74728. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-23 14:47:12,367][09965] Avg episode reward: [(0, '4.533')] [2024-11-23 14:47:17,365][09965] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3363.0). Total num frames: 319488. Throughput: 0: 1003.1. Samples: 80682. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-23 14:47:17,368][09965] Avg episode reward: [(0, '4.468')] [2024-11-23 14:47:17,404][11184] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000079_323584.pth... [2024-11-23 14:47:18,926][11197] Updated weights for policy 0, policy_version 80 (0.0014) [2024-11-23 14:47:22,365][09965] Fps is (10 sec: 3686.3, 60 sec: 3822.9, 300 sec: 3399.7). Total num frames: 339968. Throughput: 0: 954.8. Samples: 85552. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-23 14:47:22,368][09965] Avg episode reward: [(0, '4.234')] [2024-11-23 14:47:27,365][09965] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3432.8). Total num frames: 360448. Throughput: 0: 967.7. Samples: 89100. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-11-23 14:47:27,368][09965] Avg episode reward: [(0, '4.363')] [2024-11-23 14:47:28,402][11197] Updated weights for policy 0, policy_version 90 (0.0026) [2024-11-23 14:47:32,365][09965] Fps is (10 sec: 4505.7, 60 sec: 4027.7, 300 sec: 3500.2). Total num frames: 385024. Throughput: 0: 1023.1. Samples: 95968. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-11-23 14:47:32,368][09965] Avg episode reward: [(0, '4.560')] [2024-11-23 14:47:37,365][09965] Fps is (10 sec: 3686.3, 60 sec: 3822.9, 300 sec: 3454.9). Total num frames: 397312. Throughput: 0: 960.7. Samples: 100196. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-11-23 14:47:37,369][09965] Avg episode reward: [(0, '4.723')] [2024-11-23 14:47:37,381][11184] Saving new best policy, reward=4.723! [2024-11-23 14:47:40,118][11197] Updated weights for policy 0, policy_version 100 (0.0012) [2024-11-23 14:47:42,365][09965] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3481.6). Total num frames: 417792. Throughput: 0: 946.1. Samples: 103112. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-11-23 14:47:42,370][09965] Avg episode reward: [(0, '4.529')] [2024-11-23 14:47:47,365][09965] Fps is (10 sec: 3276.9, 60 sec: 3822.9, 300 sec: 3440.6). Total num frames: 430080. Throughput: 0: 933.3. Samples: 107362. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-23 14:47:47,369][09965] Avg episode reward: [(0, '4.508')] [2024-11-23 14:47:52,366][09965] Fps is (10 sec: 2867.0, 60 sec: 3686.3, 300 sec: 3434.3). Total num frames: 446464. Throughput: 0: 915.7. Samples: 112440. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-23 14:47:52,368][09965] Avg episode reward: [(0, '4.575')] [2024-11-23 14:47:53,412][11197] Updated weights for policy 0, policy_version 110 (0.0018) [2024-11-23 14:47:57,365][09965] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3428.5). Total num frames: 462848. Throughput: 0: 888.9. Samples: 114730. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-23 14:47:57,367][09965] Avg episode reward: [(0, '4.750')] [2024-11-23 14:47:57,377][11184] Saving new best policy, reward=4.750! [2024-11-23 14:48:02,365][09965] Fps is (10 sec: 4096.3, 60 sec: 3754.7, 300 sec: 3481.6). Total num frames: 487424. Throughput: 0: 906.0. Samples: 121450. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-23 14:48:02,371][09965] Avg episode reward: [(0, '4.598')] [2024-11-23 14:48:03,035][11197] Updated weights for policy 0, policy_version 120 (0.0019) [2024-11-23 14:48:07,368][09965] Fps is (10 sec: 4504.2, 60 sec: 3822.7, 300 sec: 3502.7). Total num frames: 507904. Throughput: 0: 938.2. Samples: 127776. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-23 14:48:07,371][09965] Avg episode reward: [(0, '4.517')] [2024-11-23 14:48:12,365][09965] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3467.9). Total num frames: 520192. Throughput: 0: 908.7. Samples: 129990. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-11-23 14:48:12,371][09965] Avg episode reward: [(0, '4.539')] [2024-11-23 14:48:14,572][11197] Updated weights for policy 0, policy_version 130 (0.0024) [2024-11-23 14:48:17,365][09965] Fps is (10 sec: 3687.6, 60 sec: 3754.7, 300 sec: 3514.6). Total num frames: 544768. Throughput: 0: 886.9. Samples: 135878. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-11-23 14:48:17,370][09965] Avg episode reward: [(0, '4.545')] [2024-11-23 14:48:22,370][09965] Fps is (10 sec: 4914.3, 60 sec: 3822.8, 300 sec: 3558.4). Total num frames: 569344. Throughput: 0: 950.9. Samples: 142986. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-23 14:48:22,373][09965] Avg episode reward: [(0, '4.472')] [2024-11-23 14:48:23,062][11197] Updated weights for policy 0, policy_version 140 (0.0025) [2024-11-23 14:48:27,365][09965] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3549.9). Total num frames: 585728. Throughput: 0: 942.5. Samples: 145524. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-11-23 14:48:27,370][09965] Avg episode reward: [(0, '4.346')] [2024-11-23 14:48:32,365][09965] Fps is (10 sec: 3277.4, 60 sec: 3618.1, 300 sec: 3541.8). Total num frames: 602112. Throughput: 0: 952.8. Samples: 150236. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-11-23 14:48:32,367][09965] Avg episode reward: [(0, '4.300')] [2024-11-23 14:48:34,510][11197] Updated weights for policy 0, policy_version 150 (0.0016) [2024-11-23 14:48:37,365][09965] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3581.1). Total num frames: 626688. Throughput: 0: 999.0. Samples: 157394. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-11-23 14:48:37,369][09965] Avg episode reward: [(0, '4.649')] [2024-11-23 14:48:42,365][09965] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3595.4). Total num frames: 647168. Throughput: 0: 1026.6. Samples: 160928. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-23 14:48:42,367][09965] Avg episode reward: [(0, '4.754')] [2024-11-23 14:48:42,375][11184] Saving new best policy, reward=4.754! [2024-11-23 14:48:45,609][11197] Updated weights for policy 0, policy_version 160 (0.0017) [2024-11-23 14:48:47,365][09965] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3564.6). Total num frames: 659456. Throughput: 0: 969.0. Samples: 165056. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-23 14:48:47,370][09965] Avg episode reward: [(0, '4.674')] [2024-11-23 14:48:52,365][09965] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3600.2). Total num frames: 684032. Throughput: 0: 973.0. Samples: 171556. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-23 14:48:52,369][09965] Avg episode reward: [(0, '4.574')] [2024-11-23 14:48:55,102][11197] Updated weights for policy 0, policy_version 170 (0.0013) [2024-11-23 14:48:57,365][09965] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3612.9). Total num frames: 704512. Throughput: 0: 996.9. Samples: 174852. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-11-23 14:48:57,369][09965] Avg episode reward: [(0, '4.605')] [2024-11-23 14:49:02,365][09965] Fps is (10 sec: 3686.3, 60 sec: 3891.2, 300 sec: 3604.5). Total num frames: 720896. Throughput: 0: 987.6. Samples: 180318. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-23 14:49:02,368][09965] Avg episode reward: [(0, '4.695')] [2024-11-23 14:49:06,663][11197] Updated weights for policy 0, policy_version 180 (0.0021) [2024-11-23 14:49:07,365][09965] Fps is (10 sec: 3276.8, 60 sec: 3823.1, 300 sec: 3596.5). Total num frames: 737280. Throughput: 0: 947.8. Samples: 185636. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-23 14:49:07,367][09965] Avg episode reward: [(0, '4.949')] [2024-11-23 14:49:07,380][11184] Saving new best policy, reward=4.949! [2024-11-23 14:49:12,365][09965] Fps is (10 sec: 4096.1, 60 sec: 4027.7, 300 sec: 3627.9). Total num frames: 761856. Throughput: 0: 968.7. Samples: 189116. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-11-23 14:49:12,372][09965] Avg episode reward: [(0, '5.096')] [2024-11-23 14:49:12,374][11184] Saving new best policy, reward=5.096! [2024-11-23 14:49:15,596][11197] Updated weights for policy 0, policy_version 190 (0.0019) [2024-11-23 14:49:17,365][09965] Fps is (10 sec: 4505.5, 60 sec: 3959.5, 300 sec: 3638.8). Total num frames: 782336. Throughput: 0: 1008.4. Samples: 195612. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-23 14:49:17,372][09965] Avg episode reward: [(0, '5.120')] [2024-11-23 14:49:17,392][11184] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000191_782336.pth... [2024-11-23 14:49:17,559][11184] Saving new best policy, reward=5.120! [2024-11-23 14:49:22,367][09965] Fps is (10 sec: 3276.1, 60 sec: 3754.6, 300 sec: 3611.9). Total num frames: 794624. Throughput: 0: 944.0. Samples: 199876. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-23 14:49:22,371][09965] Avg episode reward: [(0, '5.234')] [2024-11-23 14:49:22,374][11184] Saving new best policy, reward=5.234! [2024-11-23 14:49:26,994][11197] Updated weights for policy 0, policy_version 200 (0.0021) [2024-11-23 14:49:27,365][09965] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3640.9). Total num frames: 819200. Throughput: 0: 939.9. Samples: 203224. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-11-23 14:49:27,369][09965] Avg episode reward: [(0, '5.201')] [2024-11-23 14:49:32,365][09965] Fps is (10 sec: 4506.6, 60 sec: 3959.5, 300 sec: 3650.8). Total num frames: 839680. Throughput: 0: 1001.6. Samples: 210130. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-11-23 14:49:32,370][09965] Avg episode reward: [(0, '5.320')] [2024-11-23 14:49:32,428][11184] Saving new best policy, reward=5.320! [2024-11-23 14:49:37,367][09965] Fps is (10 sec: 3685.8, 60 sec: 3822.8, 300 sec: 3642.8). Total num frames: 856064. Throughput: 0: 965.2. Samples: 214994. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-23 14:49:37,369][09965] Avg episode reward: [(0, '5.514')] [2024-11-23 14:49:37,384][11184] Saving new best policy, reward=5.514! [2024-11-23 14:49:38,270][11197] Updated weights for policy 0, policy_version 210 (0.0014) [2024-11-23 14:49:42,365][09965] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3652.3). Total num frames: 876544. Throughput: 0: 947.5. Samples: 217488. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-23 14:49:42,369][09965] Avg episode reward: [(0, '5.464')] [2024-11-23 14:49:47,365][09965] Fps is (10 sec: 4096.7, 60 sec: 3959.5, 300 sec: 3661.3). Total num frames: 897024. Throughput: 0: 984.0. Samples: 224598. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-23 14:49:47,371][09965] Avg episode reward: [(0, '5.474')] [2024-11-23 14:49:47,471][11197] Updated weights for policy 0, policy_version 220 (0.0023) [2024-11-23 14:49:52,365][09965] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3670.0). Total num frames: 917504. Throughput: 0: 998.7. Samples: 230578. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-23 14:49:52,372][09965] Avg episode reward: [(0, '5.803')] [2024-11-23 14:49:52,379][11184] Saving new best policy, reward=5.803! [2024-11-23 14:49:57,367][09965] Fps is (10 sec: 3685.6, 60 sec: 3822.8, 300 sec: 3662.3). Total num frames: 933888. Throughput: 0: 968.6. Samples: 232704. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-23 14:49:57,370][09965] Avg episode reward: [(0, '5.632')] [2024-11-23 14:49:58,799][11197] Updated weights for policy 0, policy_version 230 (0.0020) [2024-11-23 14:50:02,365][09965] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3670.6). Total num frames: 954368. Throughput: 0: 964.0. Samples: 238992. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-23 14:50:02,369][09965] Avg episode reward: [(0, '5.522')] [2024-11-23 14:50:07,365][09965] Fps is (10 sec: 4506.4, 60 sec: 4027.7, 300 sec: 3694.1). Total num frames: 978944. Throughput: 0: 1022.2. Samples: 245874. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-23 14:50:07,367][09965] Avg episode reward: [(0, '5.764')] [2024-11-23 14:50:08,085][11197] Updated weights for policy 0, policy_version 240 (0.0014) [2024-11-23 14:50:12,365][09965] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3686.4). Total num frames: 995328. Throughput: 0: 996.4. Samples: 248060. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-23 14:50:12,370][09965] Avg episode reward: [(0, '5.949')] [2024-11-23 14:50:12,375][11184] Saving new best policy, reward=5.949! [2024-11-23 14:50:17,365][09965] Fps is (10 sec: 3686.5, 60 sec: 3891.2, 300 sec: 3693.8). Total num frames: 1015808. Throughput: 0: 962.9. Samples: 253460. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-23 14:50:17,368][09965] Avg episode reward: [(0, '6.257')] [2024-11-23 14:50:17,374][11184] Saving new best policy, reward=6.257! [2024-11-23 14:50:19,093][11197] Updated weights for policy 0, policy_version 250 (0.0017) [2024-11-23 14:50:22,365][09965] Fps is (10 sec: 4096.0, 60 sec: 4027.9, 300 sec: 3701.0). Total num frames: 1036288. Throughput: 0: 1011.7. Samples: 260518. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-23 14:50:22,370][09965] Avg episode reward: [(0, '6.156')] [2024-11-23 14:50:27,368][09965] Fps is (10 sec: 4094.9, 60 sec: 3959.3, 300 sec: 3707.9). Total num frames: 1056768. Throughput: 0: 1024.1. Samples: 263576. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-11-23 14:50:27,371][09965] Avg episode reward: [(0, '6.466')] [2024-11-23 14:50:27,382][11184] Saving new best policy, reward=6.466! [2024-11-23 14:50:30,361][11197] Updated weights for policy 0, policy_version 260 (0.0015) [2024-11-23 14:50:32,365][09965] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3686.4). Total num frames: 1069056. Throughput: 0: 959.5. Samples: 267776. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-23 14:50:32,370][09965] Avg episode reward: [(0, '6.602')] [2024-11-23 14:50:32,375][11184] Saving new best policy, reward=6.602! [2024-11-23 14:50:37,365][09965] Fps is (10 sec: 3277.6, 60 sec: 3891.3, 300 sec: 3693.3). Total num frames: 1089536. Throughput: 0: 960.2. Samples: 273788. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-23 14:50:37,372][09965] Avg episode reward: [(0, '6.828')] [2024-11-23 14:50:37,383][11184] Saving new best policy, reward=6.828! [2024-11-23 14:50:40,594][11197] Updated weights for policy 0, policy_version 270 (0.0020) [2024-11-23 14:50:42,365][09965] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3762.8). Total num frames: 1110016. Throughput: 0: 978.2. Samples: 276720. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-23 14:50:42,368][09965] Avg episode reward: [(0, '6.718')] [2024-11-23 14:50:47,365][09965] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3818.4). Total num frames: 1126400. Throughput: 0: 944.8. Samples: 281510. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-23 14:50:47,368][09965] Avg episode reward: [(0, '6.795')] [2024-11-23 14:50:52,365][09965] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3860.0). Total num frames: 1142784. Throughput: 0: 908.7. Samples: 286766. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-23 14:50:52,372][09965] Avg episode reward: [(0, '6.213')] [2024-11-23 14:50:52,737][11197] Updated weights for policy 0, policy_version 280 (0.0022) [2024-11-23 14:50:57,365][09965] Fps is (10 sec: 4096.1, 60 sec: 3891.3, 300 sec: 3860.0). Total num frames: 1167360. Throughput: 0: 934.6. Samples: 290116. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-23 14:50:57,373][09965] Avg episode reward: [(0, '6.754')] [2024-11-23 14:51:02,366][09965] Fps is (10 sec: 4095.7, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 1183744. Throughput: 0: 948.4. Samples: 296138. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-23 14:51:02,372][09965] Avg episode reward: [(0, '6.914')] [2024-11-23 14:51:02,376][11184] Saving new best policy, reward=6.914! [2024-11-23 14:51:03,642][11197] Updated weights for policy 0, policy_version 290 (0.0014) [2024-11-23 14:51:07,365][09965] Fps is (10 sec: 2867.2, 60 sec: 3618.1, 300 sec: 3832.2). Total num frames: 1196032. Throughput: 0: 878.9. Samples: 300070. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-23 14:51:07,370][09965] Avg episode reward: [(0, '7.090')] [2024-11-23 14:51:07,382][11184] Saving new best policy, reward=7.090! [2024-11-23 14:51:12,368][09965] Fps is (10 sec: 3276.2, 60 sec: 3686.2, 300 sec: 3832.2). Total num frames: 1216512. Throughput: 0: 882.2. Samples: 303274. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-23 14:51:12,372][09965] Avg episode reward: [(0, '7.170')] [2024-11-23 14:51:12,374][11184] Saving new best policy, reward=7.170! [2024-11-23 14:51:14,649][11197] Updated weights for policy 0, policy_version 300 (0.0021) [2024-11-23 14:51:17,366][09965] Fps is (10 sec: 4095.6, 60 sec: 3686.3, 300 sec: 3818.3). Total num frames: 1236992. Throughput: 0: 927.8. Samples: 309530. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-23 14:51:17,369][09965] Avg episode reward: [(0, '7.232')] [2024-11-23 14:51:17,390][11184] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000302_1236992.pth... [2024-11-23 14:51:17,605][11184] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000079_323584.pth [2024-11-23 14:51:17,620][11184] Saving new best policy, reward=7.232! [2024-11-23 14:51:22,365][09965] Fps is (10 sec: 3277.7, 60 sec: 3549.9, 300 sec: 3818.3). Total num frames: 1249280. Throughput: 0: 887.6. Samples: 313732. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-23 14:51:22,370][09965] Avg episode reward: [(0, '7.177')] [2024-11-23 14:51:27,349][11197] Updated weights for policy 0, policy_version 310 (0.0044) [2024-11-23 14:51:27,365][09965] Fps is (10 sec: 3277.1, 60 sec: 3550.0, 300 sec: 3818.3). Total num frames: 1269760. Throughput: 0: 869.2. Samples: 315836. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-23 14:51:27,368][09965] Avg episode reward: [(0, '7.227')] [2024-11-23 14:51:32,367][09965] Fps is (10 sec: 4095.3, 60 sec: 3686.3, 300 sec: 3804.4). Total num frames: 1290240. Throughput: 0: 901.7. Samples: 322090. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-23 14:51:32,369][09965] Avg episode reward: [(0, '7.630')] [2024-11-23 14:51:32,372][11184] Saving new best policy, reward=7.630! [2024-11-23 14:51:37,365][09965] Fps is (10 sec: 3686.3, 60 sec: 3618.1, 300 sec: 3804.4). Total num frames: 1306624. Throughput: 0: 905.5. Samples: 327512. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-11-23 14:51:37,368][09965] Avg episode reward: [(0, '8.049')] [2024-11-23 14:51:37,376][11184] Saving new best policy, reward=8.049! [2024-11-23 14:51:38,375][11197] Updated weights for policy 0, policy_version 320 (0.0027) [2024-11-23 14:51:42,367][09965] Fps is (10 sec: 2867.1, 60 sec: 3481.5, 300 sec: 3790.5). Total num frames: 1318912. Throughput: 0: 876.8. Samples: 329574. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-23 14:51:42,370][09965] Avg episode reward: [(0, '8.133')] [2024-11-23 14:51:42,372][11184] Saving new best policy, reward=8.133! [2024-11-23 14:51:47,365][09965] Fps is (10 sec: 3686.5, 60 sec: 3618.1, 300 sec: 3790.5). Total num frames: 1343488. Throughput: 0: 875.0. Samples: 335514. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-23 14:51:47,370][09965] Avg episode reward: [(0, '8.120')] [2024-11-23 14:51:48,864][11197] Updated weights for policy 0, policy_version 330 (0.0017) [2024-11-23 14:51:52,367][09965] Fps is (10 sec: 4915.2, 60 sec: 3754.5, 300 sec: 3804.4). Total num frames: 1368064. Throughput: 0: 944.6. Samples: 342580. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-23 14:51:52,372][09965] Avg episode reward: [(0, '8.819')] [2024-11-23 14:51:52,375][11184] Saving new best policy, reward=8.819! [2024-11-23 14:51:57,365][09965] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3790.5). Total num frames: 1380352. Throughput: 0: 921.3. Samples: 344728. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-23 14:51:57,370][09965] Avg episode reward: [(0, '9.331')] [2024-11-23 14:51:57,382][11184] Saving new best policy, reward=9.331! [2024-11-23 14:52:00,562][11197] Updated weights for policy 0, policy_version 340 (0.0014) [2024-11-23 14:52:02,365][09965] Fps is (10 sec: 2867.7, 60 sec: 3549.9, 300 sec: 3790.5). Total num frames: 1396736. Throughput: 0: 890.3. Samples: 349592. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-23 14:52:02,367][09965] Avg episode reward: [(0, '9.663')] [2024-11-23 14:52:02,393][11184] Saving new best policy, reward=9.663! [2024-11-23 14:52:07,365][09965] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3790.5). Total num frames: 1421312. Throughput: 0: 949.0. Samples: 356438. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-23 14:52:07,370][09965] Avg episode reward: [(0, '9.039')] [2024-11-23 14:52:09,586][11197] Updated weights for policy 0, policy_version 350 (0.0027) [2024-11-23 14:52:12,365][09965] Fps is (10 sec: 4505.8, 60 sec: 3754.8, 300 sec: 3804.4). Total num frames: 1441792. Throughput: 0: 971.4. Samples: 359550. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-23 14:52:12,369][09965] Avg episode reward: [(0, '9.300')] [2024-11-23 14:52:17,365][09965] Fps is (10 sec: 3276.8, 60 sec: 3618.2, 300 sec: 3776.7). Total num frames: 1454080. Throughput: 0: 928.4. Samples: 363868. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-23 14:52:17,372][09965] Avg episode reward: [(0, '9.451')] [2024-11-23 14:52:21,166][11197] Updated weights for policy 0, policy_version 360 (0.0016) [2024-11-23 14:52:22,365][09965] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 1478656. Throughput: 0: 956.0. Samples: 370534. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-23 14:52:22,367][09965] Avg episode reward: [(0, '10.423')] [2024-11-23 14:52:22,376][11184] Saving new best policy, reward=10.423! [2024-11-23 14:52:27,367][09965] Fps is (10 sec: 4504.8, 60 sec: 3822.8, 300 sec: 3776.6). Total num frames: 1499136. Throughput: 0: 984.8. Samples: 373888. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-23 14:52:27,373][09965] Avg episode reward: [(0, '11.375')] [2024-11-23 14:52:27,389][11184] Saving new best policy, reward=11.375! [2024-11-23 14:52:32,365][09965] Fps is (10 sec: 3276.8, 60 sec: 3686.5, 300 sec: 3776.7). Total num frames: 1511424. Throughput: 0: 955.2. Samples: 378500. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-23 14:52:32,367][09965] Avg episode reward: [(0, '12.022')] [2024-11-23 14:52:32,374][11184] Saving new best policy, reward=12.022! [2024-11-23 14:52:32,782][11197] Updated weights for policy 0, policy_version 370 (0.0046) [2024-11-23 14:52:37,365][09965] Fps is (10 sec: 3277.4, 60 sec: 3754.7, 300 sec: 3776.7). Total num frames: 1531904. Throughput: 0: 918.8. Samples: 383926. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-23 14:52:37,372][09965] Avg episode reward: [(0, '12.286')] [2024-11-23 14:52:37,383][11184] Saving new best policy, reward=12.286! [2024-11-23 14:52:42,365][09965] Fps is (10 sec: 4096.0, 60 sec: 3891.3, 300 sec: 3804.4). Total num frames: 1552384. Throughput: 0: 944.4. Samples: 387226. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-23 14:52:42,372][09965] Avg episode reward: [(0, '12.281')] [2024-11-23 14:52:42,514][11197] Updated weights for policy 0, policy_version 380 (0.0013) [2024-11-23 14:52:47,365][09965] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3818.3). Total num frames: 1572864. Throughput: 0: 970.8. Samples: 393276. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-11-23 14:52:47,372][09965] Avg episode reward: [(0, '11.894')] [2024-11-23 14:52:52,365][09965] Fps is (10 sec: 3686.4, 60 sec: 3686.5, 300 sec: 3818.3). Total num frames: 1589248. Throughput: 0: 919.1. Samples: 397796. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-11-23 14:52:52,368][09965] Avg episode reward: [(0, '12.624')] [2024-11-23 14:52:52,369][11184] Saving new best policy, reward=12.624! [2024-11-23 14:52:54,217][11197] Updated weights for policy 0, policy_version 390 (0.0038) [2024-11-23 14:52:57,365][09965] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3804.4). Total num frames: 1609728. Throughput: 0: 924.0. Samples: 401128. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-23 14:52:57,368][09965] Avg episode reward: [(0, '12.644')] [2024-11-23 14:52:57,376][11184] Saving new best policy, reward=12.644! [2024-11-23 14:53:02,365][09965] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3804.5). Total num frames: 1630208. Throughput: 0: 974.7. Samples: 407730. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-23 14:53:02,373][09965] Avg episode reward: [(0, '13.676')] [2024-11-23 14:53:02,380][11184] Saving new best policy, reward=13.676! [2024-11-23 14:53:04,591][11197] Updated weights for policy 0, policy_version 400 (0.0014) [2024-11-23 14:53:07,365][09965] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3804.4). Total num frames: 1642496. Throughput: 0: 921.5. Samples: 412000. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-11-23 14:53:07,372][09965] Avg episode reward: [(0, '14.373')] [2024-11-23 14:53:07,395][11184] Saving new best policy, reward=14.373! [2024-11-23 14:53:12,365][09965] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3790.5). Total num frames: 1662976. Throughput: 0: 901.0. Samples: 414432. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-11-23 14:53:12,368][09965] Avg episode reward: [(0, '13.643')] [2024-11-23 14:53:15,622][11197] Updated weights for policy 0, policy_version 410 (0.0023) [2024-11-23 14:53:17,365][09965] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3776.7). Total num frames: 1683456. Throughput: 0: 946.6. Samples: 421096. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-23 14:53:17,372][09965] Avg episode reward: [(0, '14.017')] [2024-11-23 14:53:17,380][11184] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000411_1683456.pth... [2024-11-23 14:53:17,523][11184] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000191_782336.pth [2024-11-23 14:53:22,365][09965] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3776.7). Total num frames: 1699840. Throughput: 0: 941.0. Samples: 426272. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-23 14:53:22,371][09965] Avg episode reward: [(0, '14.782')] [2024-11-23 14:53:22,373][11184] Saving new best policy, reward=14.782! [2024-11-23 14:53:27,365][09965] Fps is (10 sec: 3276.8, 60 sec: 3618.2, 300 sec: 3776.6). Total num frames: 1716224. Throughput: 0: 913.3. Samples: 428326. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-23 14:53:27,368][09965] Avg episode reward: [(0, '14.652')] [2024-11-23 14:53:27,902][11197] Updated weights for policy 0, policy_version 420 (0.0029) [2024-11-23 14:53:32,365][09965] Fps is (10 sec: 3686.3, 60 sec: 3754.6, 300 sec: 3762.8). Total num frames: 1736704. Throughput: 0: 914.1. Samples: 434412. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-23 14:53:32,372][09965] Avg episode reward: [(0, '14.828')] [2024-11-23 14:53:32,376][11184] Saving new best policy, reward=14.828! [2024-11-23 14:53:37,365][09965] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3762.8). Total num frames: 1757184. Throughput: 0: 951.5. Samples: 440612. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-23 14:53:37,369][09965] Avg episode reward: [(0, '14.451')] [2024-11-23 14:53:37,699][11197] Updated weights for policy 0, policy_version 430 (0.0017) [2024-11-23 14:53:42,365][09965] Fps is (10 sec: 3686.5, 60 sec: 3686.4, 300 sec: 3776.7). Total num frames: 1773568. Throughput: 0: 921.1. Samples: 442578. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-11-23 14:53:42,370][09965] Avg episode reward: [(0, '13.798')] [2024-11-23 14:53:47,365][09965] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3748.9). Total num frames: 1789952. Throughput: 0: 889.4. Samples: 447754. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-23 14:53:47,370][09965] Avg episode reward: [(0, '12.565')] [2024-11-23 14:53:49,325][11197] Updated weights for policy 0, policy_version 440 (0.0019) [2024-11-23 14:53:52,365][09965] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3762.8). Total num frames: 1814528. Throughput: 0: 943.4. Samples: 454454. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-23 14:53:52,368][09965] Avg episode reward: [(0, '13.681')] [2024-11-23 14:53:57,368][09965] Fps is (10 sec: 4094.8, 60 sec: 3686.2, 300 sec: 3762.7). Total num frames: 1830912. Throughput: 0: 952.1. Samples: 457280. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-23 14:53:57,373][09965] Avg episode reward: [(0, '14.616')] [2024-11-23 14:54:01,183][11197] Updated weights for policy 0, policy_version 450 (0.0013) [2024-11-23 14:54:02,365][09965] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3762.8). Total num frames: 1847296. Throughput: 0: 897.2. Samples: 461470. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-11-23 14:54:02,367][09965] Avg episode reward: [(0, '16.056')] [2024-11-23 14:54:02,370][11184] Saving new best policy, reward=16.056! [2024-11-23 14:54:07,365][09965] Fps is (10 sec: 3687.4, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 1867776. Throughput: 0: 923.5. Samples: 467832. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-23 14:54:07,370][09965] Avg episode reward: [(0, '16.597')] [2024-11-23 14:54:07,380][11184] Saving new best policy, reward=16.597! [2024-11-23 14:54:10,739][11197] Updated weights for policy 0, policy_version 460 (0.0018) [2024-11-23 14:54:12,365][09965] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 1888256. Throughput: 0: 953.0. Samples: 471212. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-23 14:54:12,370][09965] Avg episode reward: [(0, '17.113')] [2024-11-23 14:54:12,374][11184] Saving new best policy, reward=17.113! [2024-11-23 14:54:17,365][09965] Fps is (10 sec: 3276.9, 60 sec: 3618.1, 300 sec: 3748.9). Total num frames: 1900544. Throughput: 0: 915.1. Samples: 475590. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-23 14:54:17,367][09965] Avg episode reward: [(0, '17.818')] [2024-11-23 14:54:17,379][11184] Saving new best policy, reward=17.818! [2024-11-23 14:54:22,365][09965] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3735.0). Total num frames: 1921024. Throughput: 0: 904.6. Samples: 481318. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-23 14:54:22,370][09965] Avg episode reward: [(0, '18.437')] [2024-11-23 14:54:22,374][11184] Saving new best policy, reward=18.437! [2024-11-23 14:54:22,838][11197] Updated weights for policy 0, policy_version 470 (0.0029) [2024-11-23 14:54:27,365][09965] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3735.0). Total num frames: 1941504. Throughput: 0: 930.6. Samples: 484456. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-23 14:54:27,370][09965] Avg episode reward: [(0, '18.393')] [2024-11-23 14:54:32,367][09965] Fps is (10 sec: 3685.6, 60 sec: 3686.3, 300 sec: 3735.0). Total num frames: 1957888. Throughput: 0: 933.8. Samples: 489776. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-23 14:54:32,370][09965] Avg episode reward: [(0, '17.869')] [2024-11-23 14:54:34,893][11197] Updated weights for policy 0, policy_version 480 (0.0014) [2024-11-23 14:54:37,365][09965] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3721.1). Total num frames: 1974272. Throughput: 0: 887.3. Samples: 494382. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-23 14:54:37,367][09965] Avg episode reward: [(0, '17.642')] [2024-11-23 14:54:42,365][09965] Fps is (10 sec: 3687.2, 60 sec: 3686.4, 300 sec: 3721.1). Total num frames: 1994752. Throughput: 0: 895.6. Samples: 497578. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-23 14:54:42,367][09965] Avg episode reward: [(0, '15.977')] [2024-11-23 14:54:44,608][11197] Updated weights for policy 0, policy_version 490 (0.0016) [2024-11-23 14:54:47,365][09965] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3721.1). Total num frames: 2015232. Throughput: 0: 942.1. Samples: 503866. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-23 14:54:47,370][09965] Avg episode reward: [(0, '14.736')] [2024-11-23 14:54:52,365][09965] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3707.3). Total num frames: 2027520. Throughput: 0: 894.4. Samples: 508078. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-23 14:54:52,373][09965] Avg episode reward: [(0, '15.022')] [2024-11-23 14:54:56,754][11197] Updated weights for policy 0, policy_version 500 (0.0026) [2024-11-23 14:54:57,365][09965] Fps is (10 sec: 3276.8, 60 sec: 3618.3, 300 sec: 3707.2). Total num frames: 2048000. Throughput: 0: 887.6. Samples: 511154. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-11-23 14:54:57,372][09965] Avg episode reward: [(0, '14.481')] [2024-11-23 14:55:02,365][09965] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3693.3). Total num frames: 2068480. Throughput: 0: 937.3. Samples: 517768. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-23 14:55:02,368][09965] Avg episode reward: [(0, '14.392')] [2024-11-23 14:55:07,365][09965] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3693.3). Total num frames: 2084864. Throughput: 0: 910.4. Samples: 522284. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-11-23 14:55:07,370][09965] Avg episode reward: [(0, '14.965')] [2024-11-23 14:55:08,162][11197] Updated weights for policy 0, policy_version 510 (0.0028) [2024-11-23 14:55:12,365][09965] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3679.5). Total num frames: 2101248. Throughput: 0: 889.2. Samples: 524472. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-23 14:55:12,367][09965] Avg episode reward: [(0, '14.991')] [2024-11-23 14:55:17,365][09965] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3693.3). Total num frames: 2125824. Throughput: 0: 922.9. Samples: 531304. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-23 14:55:17,372][09965] Avg episode reward: [(0, '16.112')] [2024-11-23 14:55:17,384][11184] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000519_2125824.pth... [2024-11-23 14:55:17,524][11184] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000302_1236992.pth [2024-11-23 14:55:17,914][11197] Updated weights for policy 0, policy_version 520 (0.0019) [2024-11-23 14:55:22,365][09965] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3679.5). Total num frames: 2142208. Throughput: 0: 947.8. Samples: 537032. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-23 14:55:22,370][09965] Avg episode reward: [(0, '16.547')] [2024-11-23 14:55:27,365][09965] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3693.3). Total num frames: 2158592. Throughput: 0: 921.6. Samples: 539050. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-23 14:55:27,372][09965] Avg episode reward: [(0, '17.696')] [2024-11-23 14:55:29,912][11197] Updated weights for policy 0, policy_version 530 (0.0013) [2024-11-23 14:55:32,371][09965] Fps is (10 sec: 3684.1, 60 sec: 3686.1, 300 sec: 3693.3). Total num frames: 2179072. Throughput: 0: 915.5. Samples: 545070. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-23 14:55:32,380][09965] Avg episode reward: [(0, '19.033')] [2024-11-23 14:55:32,385][11184] Saving new best policy, reward=19.033! [2024-11-23 14:55:37,365][09965] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3707.2). Total num frames: 2203648. Throughput: 0: 961.7. Samples: 551354. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-23 14:55:37,368][09965] Avg episode reward: [(0, '19.301')] [2024-11-23 14:55:37,375][11184] Saving new best policy, reward=19.301! [2024-11-23 14:55:40,576][11197] Updated weights for policy 0, policy_version 540 (0.0019) [2024-11-23 14:55:42,365][09965] Fps is (10 sec: 3688.7, 60 sec: 3686.4, 300 sec: 3693.3). Total num frames: 2215936. Throughput: 0: 936.3. Samples: 553288. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-11-23 14:55:42,368][09965] Avg episode reward: [(0, '19.242')] [2024-11-23 14:55:47,365][09965] Fps is (10 sec: 2867.2, 60 sec: 3618.1, 300 sec: 3693.3). Total num frames: 2232320. Throughput: 0: 893.2. Samples: 557962. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-23 14:55:47,369][09965] Avg episode reward: [(0, '20.285')] [2024-11-23 14:55:47,377][11184] Saving new best policy, reward=20.285! [2024-11-23 14:55:51,688][11197] Updated weights for policy 0, policy_version 550 (0.0026) [2024-11-23 14:55:52,365][09965] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3679.5). Total num frames: 2252800. Throughput: 0: 937.8. Samples: 564484. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-11-23 14:55:52,372][09965] Avg episode reward: [(0, '18.973')] [2024-11-23 14:55:57,366][09965] Fps is (10 sec: 4095.6, 60 sec: 3754.6, 300 sec: 3693.3). Total num frames: 2273280. Throughput: 0: 958.7. Samples: 567616. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-11-23 14:55:57,372][09965] Avg episode reward: [(0, '17.345')] [2024-11-23 14:56:02,365][09965] Fps is (10 sec: 3276.7, 60 sec: 3618.1, 300 sec: 3693.3). Total num frames: 2285568. Throughput: 0: 898.1. Samples: 571720. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-23 14:56:02,367][09965] Avg episode reward: [(0, '18.613')] [2024-11-23 14:56:03,986][11197] Updated weights for policy 0, policy_version 560 (0.0016) [2024-11-23 14:56:07,365][09965] Fps is (10 sec: 3277.1, 60 sec: 3686.4, 300 sec: 3693.4). Total num frames: 2306048. Throughput: 0: 904.6. Samples: 577740. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-23 14:56:07,372][09965] Avg episode reward: [(0, '18.542')] [2024-11-23 14:56:12,365][09965] Fps is (10 sec: 4096.1, 60 sec: 3754.7, 300 sec: 3693.4). Total num frames: 2326528. Throughput: 0: 934.0. Samples: 581078. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-23 14:56:12,368][09965] Avg episode reward: [(0, '18.716')] [2024-11-23 14:56:13,773][11197] Updated weights for policy 0, policy_version 570 (0.0016) [2024-11-23 14:56:17,365][09965] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3707.2). Total num frames: 2342912. Throughput: 0: 910.5. Samples: 586036. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-23 14:56:17,369][09965] Avg episode reward: [(0, '18.709')] [2024-11-23 14:56:22,365][09965] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3707.2). Total num frames: 2363392. Throughput: 0: 889.6. Samples: 591386. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-11-23 14:56:22,370][09965] Avg episode reward: [(0, '20.364')] [2024-11-23 14:56:22,376][11184] Saving new best policy, reward=20.364! [2024-11-23 14:56:25,196][11197] Updated weights for policy 0, policy_version 580 (0.0014) [2024-11-23 14:56:27,365][09965] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3707.2). Total num frames: 2383872. Throughput: 0: 919.9. Samples: 594682. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-11-23 14:56:27,368][09965] Avg episode reward: [(0, '19.445')] [2024-11-23 14:56:32,365][09965] Fps is (10 sec: 3686.4, 60 sec: 3686.8, 300 sec: 3707.2). Total num frames: 2400256. Throughput: 0: 951.0. Samples: 600756. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-11-23 14:56:32,367][09965] Avg episode reward: [(0, '18.648')] [2024-11-23 14:56:36,970][11197] Updated weights for policy 0, policy_version 590 (0.0015) [2024-11-23 14:56:37,366][09965] Fps is (10 sec: 3276.6, 60 sec: 3549.8, 300 sec: 3721.1). Total num frames: 2416640. Throughput: 0: 903.7. Samples: 605150. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-23 14:56:37,372][09965] Avg episode reward: [(0, '18.402')] [2024-11-23 14:56:42,365][09965] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3721.1). Total num frames: 2441216. Throughput: 0: 911.6. Samples: 608636. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-11-23 14:56:42,372][09965] Avg episode reward: [(0, '20.324')] [2024-11-23 14:56:45,866][11197] Updated weights for policy 0, policy_version 600 (0.0030) [2024-11-23 14:56:47,365][09965] Fps is (10 sec: 4505.9, 60 sec: 3822.9, 300 sec: 3707.3). Total num frames: 2461696. Throughput: 0: 973.8. Samples: 615542. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-23 14:56:47,373][09965] Avg episode reward: [(0, '18.828')] [2024-11-23 14:56:52,365][09965] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3707.2). Total num frames: 2473984. Throughput: 0: 939.7. Samples: 620028. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-11-23 14:56:52,370][09965] Avg episode reward: [(0, '19.027')] [2024-11-23 14:56:57,365][09965] Fps is (10 sec: 3276.8, 60 sec: 3686.5, 300 sec: 3721.1). Total num frames: 2494464. Throughput: 0: 924.8. Samples: 622696. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-23 14:56:57,368][09965] Avg episode reward: [(0, '20.086')] [2024-11-23 14:56:57,606][11197] Updated weights for policy 0, policy_version 610 (0.0014) [2024-11-23 14:57:02,365][09965] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3721.1). Total num frames: 2519040. Throughput: 0: 969.1. Samples: 629646. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-23 14:57:02,368][09965] Avg episode reward: [(0, '18.375')] [2024-11-23 14:57:07,365][09965] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3707.2). Total num frames: 2535424. Throughput: 0: 964.4. Samples: 634784. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-23 14:57:07,368][09965] Avg episode reward: [(0, '18.368')] [2024-11-23 14:57:08,175][11197] Updated weights for policy 0, policy_version 620 (0.0020) [2024-11-23 14:57:12,365][09965] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3721.1). Total num frames: 2551808. Throughput: 0: 939.3. Samples: 636950. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-23 14:57:12,367][09965] Avg episode reward: [(0, '20.525')] [2024-11-23 14:57:12,369][11184] Saving new best policy, reward=20.525! [2024-11-23 14:57:17,365][09965] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3707.2). Total num frames: 2572288. Throughput: 0: 946.0. Samples: 643324. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-23 14:57:17,367][09965] Avg episode reward: [(0, '19.339')] [2024-11-23 14:57:17,379][11184] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000628_2572288.pth... [2024-11-23 14:57:17,516][11184] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000411_1683456.pth [2024-11-23 14:57:18,439][11197] Updated weights for policy 0, policy_version 630 (0.0018) [2024-11-23 14:57:22,365][09965] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3721.1). Total num frames: 2596864. Throughput: 0: 996.2. Samples: 649980. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-23 14:57:22,371][09965] Avg episode reward: [(0, '19.624')] [2024-11-23 14:57:27,365][09965] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3721.1). Total num frames: 2609152. Throughput: 0: 965.6. Samples: 652088. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-23 14:57:27,371][09965] Avg episode reward: [(0, '19.664')] [2024-11-23 14:57:29,970][11197] Updated weights for policy 0, policy_version 640 (0.0036) [2024-11-23 14:57:32,365][09965] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3721.1). Total num frames: 2629632. Throughput: 0: 933.9. Samples: 657566. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-23 14:57:32,370][09965] Avg episode reward: [(0, '18.487')] [2024-11-23 14:57:37,365][09965] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3735.0). Total num frames: 2654208. Throughput: 0: 987.6. Samples: 664468. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-23 14:57:37,372][09965] Avg episode reward: [(0, '18.328')] [2024-11-23 14:57:38,906][11197] Updated weights for policy 0, policy_version 650 (0.0021) [2024-11-23 14:57:42,366][09965] Fps is (10 sec: 4095.5, 60 sec: 3822.9, 300 sec: 3721.1). Total num frames: 2670592. Throughput: 0: 990.6. Samples: 667276. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-23 14:57:42,369][09965] Avg episode reward: [(0, '18.829')] [2024-11-23 14:57:47,365][09965] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3721.1). Total num frames: 2686976. Throughput: 0: 936.6. Samples: 671794. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-23 14:57:47,368][09965] Avg episode reward: [(0, '18.713')] [2024-11-23 14:57:50,254][11197] Updated weights for policy 0, policy_version 660 (0.0023) [2024-11-23 14:57:52,365][09965] Fps is (10 sec: 4096.5, 60 sec: 3959.5, 300 sec: 3735.0). Total num frames: 2711552. Throughput: 0: 980.0. Samples: 678886. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-23 14:57:52,372][09965] Avg episode reward: [(0, '18.656')] [2024-11-23 14:57:57,365][09965] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3735.0). Total num frames: 2732032. Throughput: 0: 1010.9. Samples: 682442. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-23 14:57:57,375][09965] Avg episode reward: [(0, '17.393')] [2024-11-23 14:58:00,759][11197] Updated weights for policy 0, policy_version 670 (0.0022) [2024-11-23 14:58:02,365][09965] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3748.9). Total num frames: 2748416. Throughput: 0: 972.1. Samples: 687070. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-23 14:58:02,367][09965] Avg episode reward: [(0, '17.506')] [2024-11-23 14:58:07,365][09965] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3748.9). Total num frames: 2768896. Throughput: 0: 955.6. Samples: 692984. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-23 14:58:07,367][09965] Avg episode reward: [(0, '16.718')] [2024-11-23 14:58:10,605][11197] Updated weights for policy 0, policy_version 680 (0.0028) [2024-11-23 14:58:12,365][09965] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3748.9). Total num frames: 2789376. Throughput: 0: 988.1. Samples: 696552. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-23 14:58:12,372][09965] Avg episode reward: [(0, '18.543')] [2024-11-23 14:58:17,365][09965] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3748.9). Total num frames: 2805760. Throughput: 0: 988.3. Samples: 702038. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-23 14:58:17,367][09965] Avg episode reward: [(0, '17.987')] [2024-11-23 14:58:22,058][11197] Updated weights for policy 0, policy_version 690 (0.0017) [2024-11-23 14:58:22,365][09965] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3762.8). Total num frames: 2826240. Throughput: 0: 952.8. Samples: 707342. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-23 14:58:22,372][09965] Avg episode reward: [(0, '17.301')] [2024-11-23 14:58:27,365][09965] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3762.8). Total num frames: 2846720. Throughput: 0: 967.4. Samples: 710810. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-23 14:58:27,369][09965] Avg episode reward: [(0, '19.811')] [2024-11-23 14:58:31,168][11197] Updated weights for policy 0, policy_version 700 (0.0038) [2024-11-23 14:58:32,365][09965] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3762.8). Total num frames: 2867200. Throughput: 0: 1015.2. Samples: 717480. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-23 14:58:32,371][09965] Avg episode reward: [(0, '19.180')] [2024-11-23 14:58:37,365][09965] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3762.8). Total num frames: 2883584. Throughput: 0: 954.9. Samples: 721856. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-23 14:58:37,370][09965] Avg episode reward: [(0, '18.595')] [2024-11-23 14:58:42,365][09965] Fps is (10 sec: 3686.4, 60 sec: 3891.3, 300 sec: 3776.7). Total num frames: 2904064. Throughput: 0: 947.5. Samples: 725080. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-23 14:58:42,369][09965] Avg episode reward: [(0, '19.623')] [2024-11-23 14:58:42,522][11197] Updated weights for policy 0, policy_version 710 (0.0032) [2024-11-23 14:58:47,365][09965] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3776.7). Total num frames: 2928640. Throughput: 0: 1001.2. Samples: 732126. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-11-23 14:58:47,371][09965] Avg episode reward: [(0, '19.735')] [2024-11-23 14:58:52,369][09965] Fps is (10 sec: 4094.4, 60 sec: 3891.0, 300 sec: 3776.6). Total num frames: 2945024. Throughput: 0: 980.9. Samples: 737130. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-11-23 14:58:52,376][09965] Avg episode reward: [(0, '18.586')] [2024-11-23 14:58:53,351][11197] Updated weights for policy 0, policy_version 720 (0.0036) [2024-11-23 14:58:57,365][09965] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3776.7). Total num frames: 2961408. Throughput: 0: 952.9. Samples: 739432. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-23 14:58:57,371][09965] Avg episode reward: [(0, '20.376')] [2024-11-23 14:59:02,365][09965] Fps is (10 sec: 4097.5, 60 sec: 3959.5, 300 sec: 3790.5). Total num frames: 2985984. Throughput: 0: 988.3. Samples: 746510. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-23 14:59:02,373][09965] Avg episode reward: [(0, '21.586')] [2024-11-23 14:59:02,377][11184] Saving new best policy, reward=21.586! [2024-11-23 14:59:02,796][11197] Updated weights for policy 0, policy_version 730 (0.0018) [2024-11-23 14:59:07,376][09965] Fps is (10 sec: 4500.5, 60 sec: 3958.7, 300 sec: 3790.4). Total num frames: 3006464. Throughput: 0: 998.2. Samples: 752272. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-23 14:59:07,382][09965] Avg episode reward: [(0, '21.697')] [2024-11-23 14:59:07,392][11184] Saving new best policy, reward=21.697! [2024-11-23 14:59:12,365][09965] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 3018752. Throughput: 0: 966.8. Samples: 754318. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-11-23 14:59:12,368][09965] Avg episode reward: [(0, '21.502')] [2024-11-23 14:59:14,577][11197] Updated weights for policy 0, policy_version 740 (0.0022) [2024-11-23 14:59:17,365][09965] Fps is (10 sec: 3690.5, 60 sec: 3959.5, 300 sec: 3804.4). Total num frames: 3043328. Throughput: 0: 952.6. Samples: 760348. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-23 14:59:17,371][09965] Avg episode reward: [(0, '22.920')] [2024-11-23 14:59:17,381][11184] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000743_3043328.pth... [2024-11-23 14:59:17,493][11184] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000519_2125824.pth [2024-11-23 14:59:17,514][11184] Saving new best policy, reward=22.920! [2024-11-23 14:59:22,365][09965] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3804.4). Total num frames: 3063808. Throughput: 0: 1008.1. Samples: 767220. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-11-23 14:59:22,371][09965] Avg episode reward: [(0, '22.253')] [2024-11-23 14:59:23,905][11197] Updated weights for policy 0, policy_version 750 (0.0014) [2024-11-23 14:59:27,365][09965] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3804.4). Total num frames: 3080192. Throughput: 0: 988.5. Samples: 769564. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-23 14:59:27,368][09965] Avg episode reward: [(0, '23.608')] [2024-11-23 14:59:27,380][11184] Saving new best policy, reward=23.608! [2024-11-23 14:59:32,365][09965] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3804.4). Total num frames: 3096576. Throughput: 0: 939.4. Samples: 774398. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-11-23 14:59:32,371][09965] Avg episode reward: [(0, '24.007')] [2024-11-23 14:59:32,395][11184] Saving new best policy, reward=24.007! [2024-11-23 14:59:35,180][11197] Updated weights for policy 0, policy_version 760 (0.0025) [2024-11-23 14:59:37,365][09965] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3818.3). Total num frames: 3121152. Throughput: 0: 979.8. Samples: 781216. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-23 14:59:37,370][09965] Avg episode reward: [(0, '22.759')] [2024-11-23 14:59:42,365][09965] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3818.3). Total num frames: 3141632. Throughput: 0: 1006.1. Samples: 784706. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-23 14:59:42,369][09965] Avg episode reward: [(0, '22.532')] [2024-11-23 14:59:46,244][11197] Updated weights for policy 0, policy_version 770 (0.0024) [2024-11-23 14:59:47,365][09965] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3818.3). Total num frames: 3153920. Throughput: 0: 944.3. Samples: 789002. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-11-23 14:59:47,368][09965] Avg episode reward: [(0, '23.035')] [2024-11-23 14:59:52,365][09965] Fps is (10 sec: 3686.4, 60 sec: 3891.4, 300 sec: 3832.2). Total num frames: 3178496. Throughput: 0: 969.7. Samples: 795896. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-23 14:59:52,368][09965] Avg episode reward: [(0, '22.075')] [2024-11-23 14:59:55,200][11197] Updated weights for policy 0, policy_version 780 (0.0017) [2024-11-23 14:59:57,365][09965] Fps is (10 sec: 4915.1, 60 sec: 4027.7, 300 sec: 3846.1). Total num frames: 3203072. Throughput: 0: 1001.7. Samples: 799394. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-23 14:59:57,367][09965] Avg episode reward: [(0, '23.779')] [2024-11-23 15:00:02,366][09965] Fps is (10 sec: 3685.9, 60 sec: 3822.9, 300 sec: 3832.2). Total num frames: 3215360. Throughput: 0: 982.3. Samples: 804552. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-23 15:00:02,370][09965] Avg episode reward: [(0, '23.244')] [2024-11-23 15:00:06,621][11197] Updated weights for policy 0, policy_version 790 (0.0022) [2024-11-23 15:00:07,365][09965] Fps is (10 sec: 3276.9, 60 sec: 3823.6, 300 sec: 3846.1). Total num frames: 3235840. Throughput: 0: 958.4. Samples: 810350. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-23 15:00:07,367][09965] Avg episode reward: [(0, '24.348')] [2024-11-23 15:00:07,383][11184] Saving new best policy, reward=24.348! [2024-11-23 15:00:12,365][09965] Fps is (10 sec: 4506.2, 60 sec: 4027.7, 300 sec: 3846.1). Total num frames: 3260416. Throughput: 0: 980.4. Samples: 813682. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-23 15:00:12,367][09965] Avg episode reward: [(0, '24.413')] [2024-11-23 15:00:12,374][11184] Saving new best policy, reward=24.413! [2024-11-23 15:00:16,509][11197] Updated weights for policy 0, policy_version 800 (0.0021) [2024-11-23 15:00:17,365][09965] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 3276800. Throughput: 0: 1006.0. Samples: 819668. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-11-23 15:00:17,367][09965] Avg episode reward: [(0, '24.064')] [2024-11-23 15:00:22,365][09965] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 3293184. Throughput: 0: 970.5. Samples: 824888. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-11-23 15:00:22,367][09965] Avg episode reward: [(0, '21.831')] [2024-11-23 15:00:26,760][11197] Updated weights for policy 0, policy_version 810 (0.0014) [2024-11-23 15:00:27,365][09965] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3860.0). Total num frames: 3317760. Throughput: 0: 971.2. Samples: 828408. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-23 15:00:27,367][09965] Avg episode reward: [(0, '21.682')] [2024-11-23 15:00:32,365][09965] Fps is (10 sec: 4505.5, 60 sec: 4027.7, 300 sec: 3846.1). Total num frames: 3338240. Throughput: 0: 1025.5. Samples: 835148. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-23 15:00:32,371][09965] Avg episode reward: [(0, '22.289')] [2024-11-23 15:00:37,367][09965] Fps is (10 sec: 3685.6, 60 sec: 3891.1, 300 sec: 3859.9). Total num frames: 3354624. Throughput: 0: 968.8. Samples: 839496. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-23 15:00:37,370][09965] Avg episode reward: [(0, '20.795')] [2024-11-23 15:00:38,017][11197] Updated weights for policy 0, policy_version 820 (0.0013) [2024-11-23 15:00:42,365][09965] Fps is (10 sec: 3686.5, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 3375104. Throughput: 0: 967.8. Samples: 842944. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-11-23 15:00:42,367][09965] Avg episode reward: [(0, '20.942')] [2024-11-23 15:00:46,748][11197] Updated weights for policy 0, policy_version 830 (0.0013) [2024-11-23 15:00:47,365][09965] Fps is (10 sec: 4506.6, 60 sec: 4096.0, 300 sec: 3887.7). Total num frames: 3399680. Throughput: 0: 1009.6. Samples: 849984. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-23 15:00:47,367][09965] Avg episode reward: [(0, '21.827')] [2024-11-23 15:00:52,365][09965] Fps is (10 sec: 4095.9, 60 sec: 3959.5, 300 sec: 3873.9). Total num frames: 3416064. Throughput: 0: 988.9. Samples: 854850. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-23 15:00:52,373][09965] Avg episode reward: [(0, '21.031')] [2024-11-23 15:00:57,365][09965] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3901.6). Total num frames: 3436544. Throughput: 0: 972.3. Samples: 857434. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-11-23 15:00:57,371][09965] Avg episode reward: [(0, '21.556')] [2024-11-23 15:00:58,102][11197] Updated weights for policy 0, policy_version 840 (0.0028) [2024-11-23 15:01:02,365][09965] Fps is (10 sec: 4505.7, 60 sec: 4096.1, 300 sec: 3915.5). Total num frames: 3461120. Throughput: 0: 998.7. Samples: 864608. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-23 15:01:02,371][09965] Avg episode reward: [(0, '22.837')] [2024-11-23 15:01:07,366][09965] Fps is (10 sec: 4095.8, 60 sec: 4027.7, 300 sec: 3901.6). Total num frames: 3477504. Throughput: 0: 1003.8. Samples: 870058. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-23 15:01:07,370][09965] Avg episode reward: [(0, '22.744')] [2024-11-23 15:01:08,682][11197] Updated weights for policy 0, policy_version 850 (0.0012) [2024-11-23 15:01:12,365][09965] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3901.6). Total num frames: 3493888. Throughput: 0: 975.2. Samples: 872290. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-23 15:01:12,367][09965] Avg episode reward: [(0, '21.912')] [2024-11-23 15:01:17,365][09965] Fps is (10 sec: 4096.2, 60 sec: 4027.7, 300 sec: 3915.5). Total num frames: 3518464. Throughput: 0: 975.1. Samples: 879026. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-23 15:01:17,369][09965] Avg episode reward: [(0, '23.353')] [2024-11-23 15:01:17,376][11184] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000859_3518464.pth... [2024-11-23 15:01:17,503][11184] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000628_2572288.pth [2024-11-23 15:01:18,219][11197] Updated weights for policy 0, policy_version 860 (0.0018) [2024-11-23 15:01:22,365][09965] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 3915.5). Total num frames: 3538944. Throughput: 0: 1022.6. Samples: 885510. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-23 15:01:22,369][09965] Avg episode reward: [(0, '22.987')] [2024-11-23 15:01:27,365][09965] Fps is (10 sec: 3276.7, 60 sec: 3891.2, 300 sec: 3901.6). Total num frames: 3551232. Throughput: 0: 995.1. Samples: 887724. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-23 15:01:27,368][09965] Avg episode reward: [(0, '21.164')] [2024-11-23 15:01:29,572][11197] Updated weights for policy 0, policy_version 870 (0.0016) [2024-11-23 15:01:32,365][09965] Fps is (10 sec: 3686.5, 60 sec: 3959.5, 300 sec: 3929.4). Total num frames: 3575808. Throughput: 0: 968.3. Samples: 893558. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-23 15:01:32,373][09965] Avg episode reward: [(0, '20.252')] [2024-11-23 15:01:37,365][09965] Fps is (10 sec: 4915.4, 60 sec: 4096.2, 300 sec: 3929.4). Total num frames: 3600384. Throughput: 0: 1015.7. Samples: 900556. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-23 15:01:37,367][09965] Avg episode reward: [(0, '20.781')] [2024-11-23 15:01:38,195][11197] Updated weights for policy 0, policy_version 880 (0.0022) [2024-11-23 15:01:42,365][09965] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3901.6). Total num frames: 3612672. Throughput: 0: 1013.7. Samples: 903052. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-11-23 15:01:42,371][09965] Avg episode reward: [(0, '19.582')] [2024-11-23 15:01:47,365][09965] Fps is (10 sec: 3276.7, 60 sec: 3891.2, 300 sec: 3929.4). Total num frames: 3633152. Throughput: 0: 967.5. Samples: 908148. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-23 15:01:47,368][09965] Avg episode reward: [(0, '20.068')] [2024-11-23 15:01:49,559][11197] Updated weights for policy 0, policy_version 890 (0.0019) [2024-11-23 15:01:52,365][09965] Fps is (10 sec: 4505.6, 60 sec: 4027.8, 300 sec: 3943.3). Total num frames: 3657728. Throughput: 0: 1003.7. Samples: 915224. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-23 15:01:52,367][09965] Avg episode reward: [(0, '20.389')] [2024-11-23 15:01:57,365][09965] Fps is (10 sec: 4096.1, 60 sec: 3959.5, 300 sec: 3915.5). Total num frames: 3674112. Throughput: 0: 1027.9. Samples: 918546. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-23 15:01:57,368][09965] Avg episode reward: [(0, '20.954')] [2024-11-23 15:02:00,203][11197] Updated weights for policy 0, policy_version 900 (0.0018) [2024-11-23 15:02:02,365][09965] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3915.5). Total num frames: 3690496. Throughput: 0: 979.6. Samples: 923110. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-11-23 15:02:02,373][09965] Avg episode reward: [(0, '22.035')] [2024-11-23 15:02:07,365][09965] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3943.3). Total num frames: 3715072. Throughput: 0: 979.2. Samples: 929576. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-23 15:02:07,372][09965] Avg episode reward: [(0, '22.098')] [2024-11-23 15:02:09,670][11197] Updated weights for policy 0, policy_version 910 (0.0012) [2024-11-23 15:02:12,365][09965] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3943.3). Total num frames: 3735552. Throughput: 0: 1009.9. Samples: 933170. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-23 15:02:12,372][09965] Avg episode reward: [(0, '22.956')] [2024-11-23 15:02:17,365][09965] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3915.5). Total num frames: 3751936. Throughput: 0: 989.2. Samples: 938070. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-23 15:02:17,368][09965] Avg episode reward: [(0, '23.130')] [2024-11-23 15:02:20,888][11197] Updated weights for policy 0, policy_version 920 (0.0023) [2024-11-23 15:02:22,365][09965] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3943.3). Total num frames: 3772416. Throughput: 0: 972.4. Samples: 944314. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-23 15:02:22,368][09965] Avg episode reward: [(0, '24.398')] [2024-11-23 15:02:27,365][09965] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 3957.2). Total num frames: 3796992. Throughput: 0: 994.0. Samples: 947780. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-11-23 15:02:27,367][09965] Avg episode reward: [(0, '24.713')] [2024-11-23 15:02:27,379][11184] Saving new best policy, reward=24.713! [2024-11-23 15:02:30,629][11197] Updated weights for policy 0, policy_version 930 (0.0029) [2024-11-23 15:02:32,368][09965] Fps is (10 sec: 4094.6, 60 sec: 3959.2, 300 sec: 3929.3). Total num frames: 3813376. Throughput: 0: 1005.8. Samples: 953414. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-23 15:02:32,373][09965] Avg episode reward: [(0, '24.657')] [2024-11-23 15:02:37,365][09965] Fps is (10 sec: 3276.7, 60 sec: 3822.9, 300 sec: 3929.4). Total num frames: 3829760. Throughput: 0: 962.4. Samples: 958532. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-23 15:02:37,367][09965] Avg episode reward: [(0, '25.066')] [2024-11-23 15:02:37,381][11184] Saving new best policy, reward=25.066! [2024-11-23 15:02:41,307][11197] Updated weights for policy 0, policy_version 940 (0.0028) [2024-11-23 15:02:42,365][09965] Fps is (10 sec: 4097.4, 60 sec: 4027.7, 300 sec: 3957.2). Total num frames: 3854336. Throughput: 0: 965.0. Samples: 961970. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-23 15:02:42,367][09965] Avg episode reward: [(0, '25.717')] [2024-11-23 15:02:42,371][11184] Saving new best policy, reward=25.717! [2024-11-23 15:02:47,367][09965] Fps is (10 sec: 4504.7, 60 sec: 4027.6, 300 sec: 3943.2). Total num frames: 3874816. Throughput: 0: 1009.4. Samples: 968536. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-23 15:02:47,375][09965] Avg episode reward: [(0, '26.308')] [2024-11-23 15:02:47,392][11184] Saving new best policy, reward=26.308! [2024-11-23 15:02:52,365][09965] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3915.5). Total num frames: 3887104. Throughput: 0: 964.3. Samples: 972970. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-23 15:02:52,368][09965] Avg episode reward: [(0, '24.403')] [2024-11-23 15:02:52,846][11197] Updated weights for policy 0, policy_version 950 (0.0030) [2024-11-23 15:02:57,365][09965] Fps is (10 sec: 3687.2, 60 sec: 3959.5, 300 sec: 3943.3). Total num frames: 3911680. Throughput: 0: 963.9. Samples: 976546. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-23 15:02:57,370][09965] Avg episode reward: [(0, '23.199')] [2024-11-23 15:03:01,593][11197] Updated weights for policy 0, policy_version 960 (0.0017) [2024-11-23 15:03:02,365][09965] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3943.3). Total num frames: 3932160. Throughput: 0: 1009.7. Samples: 983508. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-23 15:03:02,372][09965] Avg episode reward: [(0, '23.173')] [2024-11-23 15:03:07,368][09965] Fps is (10 sec: 3685.2, 60 sec: 3891.0, 300 sec: 3929.3). Total num frames: 3948544. Throughput: 0: 967.7. Samples: 987864. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-11-23 15:03:07,371][09965] Avg episode reward: [(0, '23.224')] [2024-11-23 15:03:12,365][09965] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3943.3). Total num frames: 3969024. Throughput: 0: 951.6. Samples: 990600. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-23 15:03:12,372][09965] Avg episode reward: [(0, '22.302')] [2024-11-23 15:03:13,171][11197] Updated weights for policy 0, policy_version 970 (0.0024) [2024-11-23 15:03:17,365][09965] Fps is (10 sec: 4507.0, 60 sec: 4027.7, 300 sec: 3957.2). Total num frames: 3993600. Throughput: 0: 984.0. Samples: 997692. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-11-23 15:03:17,368][09965] Avg episode reward: [(0, '23.504')] [2024-11-23 15:03:17,383][11184] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000975_3993600.pth... [2024-11-23 15:03:17,506][11184] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000743_3043328.pth [2024-11-23 15:03:20,365][11184] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-11-23 15:03:20,374][11184] Stopping Batcher_0... [2024-11-23 15:03:20,374][11184] Loop batcher_evt_loop terminating... [2024-11-23 15:03:20,377][09965] Component Batcher_0 stopped! [2024-11-23 15:03:20,443][11197] Weights refcount: 2 0 [2024-11-23 15:03:20,446][09965] Component InferenceWorker_p0-w0 stopped! [2024-11-23 15:03:20,446][11197] Stopping InferenceWorker_p0-w0... [2024-11-23 15:03:20,451][11197] Loop inference_proc0-0_evt_loop terminating... [2024-11-23 15:03:20,578][11184] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000859_3518464.pth [2024-11-23 15:03:20,606][11184] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-11-23 15:03:20,853][11184] Stopping LearnerWorker_p0... [2024-11-23 15:03:20,857][11184] Loop learner_proc0_evt_loop terminating... [2024-11-23 15:03:20,859][09965] Component LearnerWorker_p0 stopped! [2024-11-23 15:03:21,084][11203] Stopping RolloutWorker_w4... [2024-11-23 15:03:21,084][09965] Component RolloutWorker_w4 stopped! [2024-11-23 15:03:21,093][11199] Stopping RolloutWorker_w0... [2024-11-23 15:03:21,094][11199] Loop rollout_proc0_evt_loop terminating... [2024-11-23 15:03:21,096][09965] Component RolloutWorker_w0 stopped! [2024-11-23 15:03:21,099][11200] Stopping RolloutWorker_w2... [2024-11-23 15:03:21,102][11200] Loop rollout_proc2_evt_loop terminating... [2024-11-23 15:03:21,106][09965] Component RolloutWorker_w2 stopped! [2024-11-23 15:03:21,086][11203] Loop rollout_proc4_evt_loop terminating... [2024-11-23 15:03:21,182][09965] Component RolloutWorker_w1 stopped! [2024-11-23 15:03:21,189][11198] Stopping RolloutWorker_w1... [2024-11-23 15:03:21,189][11198] Loop rollout_proc1_evt_loop terminating... [2024-11-23 15:03:21,195][09965] Component RolloutWorker_w7 stopped! [2024-11-23 15:03:21,199][11205] Stopping RolloutWorker_w7... [2024-11-23 15:03:21,200][11204] Stopping RolloutWorker_w6... [2024-11-23 15:03:21,202][11204] Loop rollout_proc6_evt_loop terminating... [2024-11-23 15:03:21,202][09965] Component RolloutWorker_w6 stopped! [2024-11-23 15:03:21,211][11205] Loop rollout_proc7_evt_loop terminating... [2024-11-23 15:03:21,267][09965] Component RolloutWorker_w5 stopped! [2024-11-23 15:03:21,278][11202] Stopping RolloutWorker_w5... [2024-11-23 15:03:21,278][11202] Loop rollout_proc5_evt_loop terminating... [2024-11-23 15:03:21,295][09965] Component RolloutWorker_w3 stopped! [2024-11-23 15:03:21,303][09965] Waiting for process learner_proc0 to stop... [2024-11-23 15:03:21,307][11201] Stopping RolloutWorker_w3... [2024-11-23 15:03:21,308][11201] Loop rollout_proc3_evt_loop terminating... [2024-11-23 15:03:22,843][09965] Waiting for process inference_proc0-0 to join... [2024-11-23 15:03:23,398][09965] Waiting for process rollout_proc0 to join... [2024-11-23 15:03:25,246][09965] Waiting for process rollout_proc1 to join... [2024-11-23 15:03:25,249][09965] Waiting for process rollout_proc2 to join... [2024-11-23 15:03:25,254][09965] Waiting for process rollout_proc3 to join... [2024-11-23 15:03:25,259][09965] Waiting for process rollout_proc4 to join... [2024-11-23 15:03:25,262][09965] Waiting for process rollout_proc5 to join... [2024-11-23 15:03:25,265][09965] Waiting for process rollout_proc6 to join... [2024-11-23 15:03:25,270][09965] Waiting for process rollout_proc7 to join... [2024-11-23 15:03:25,273][09965] Batcher 0 profile tree view: batching: 24.7007, releasing_batches: 0.0257 [2024-11-23 15:03:25,276][09965] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0000 wait_policy_total: 470.2746 update_model: 8.2548 weight_update: 0.0036 one_step: 0.0089 handle_policy_step: 537.3748 deserialize: 14.8619, stack: 3.0660, obs_to_device_normalize: 114.5881, forward: 266.2189, send_messages: 27.1771 prepare_outputs: 83.4507 to_cpu: 51.9592 [2024-11-23 15:03:25,279][09965] Learner 0 profile tree view: misc: 0.0067, prepare_batch: 16.2860 train: 73.6337 epoch_init: 0.0057, minibatch_init: 0.0125, losses_postprocess: 0.5680, kl_divergence: 0.5599, after_optimizer: 33.1125 calculate_losses: 24.6548 losses_init: 0.0042, forward_head: 1.8012, bptt_initial: 15.7465, tail: 1.0897, advantages_returns: 0.3084, losses: 3.3664 bptt: 2.0411 bptt_forward_core: 1.9533 update: 14.0761 clip: 1.4772 [2024-11-23 15:03:25,281][09965] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.3217, enqueue_policy_requests: 115.4241, env_step: 822.9271, overhead: 13.0845, complete_rollouts: 7.0673 save_policy_outputs: 24.3955 split_output_tensors: 8.5519 [2024-11-23 15:03:25,283][09965] RolloutWorker_w7 profile tree view: wait_for_trajectories: 0.3091, enqueue_policy_requests: 123.3599, env_step: 816.0287, overhead: 13.6204, complete_rollouts: 6.3904 save_policy_outputs: 24.7830 split_output_tensors: 8.8895 [2024-11-23 15:03:25,284][09965] Loop Runner_EvtLoop terminating... [2024-11-23 15:03:25,286][09965] Runner profile tree view: main_loop: 1082.8306 [2024-11-23 15:03:25,287][09965] Collected {0: 4005888}, FPS: 3699.5 [2024-11-23 15:14:09,078][09965] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-11-23 15:14:09,080][09965] Overriding arg 'num_workers' with value 1 passed from command line [2024-11-23 15:14:09,082][09965] Adding new argument 'no_render'=True that is not in the saved config file! [2024-11-23 15:14:09,084][09965] Adding new argument 'save_video'=True that is not in the saved config file! [2024-11-23 15:14:09,086][09965] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-11-23 15:14:09,089][09965] Adding new argument 'video_name'=None that is not in the saved config file! [2024-11-23 15:14:09,090][09965] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! [2024-11-23 15:14:09,094][09965] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-11-23 15:14:09,095][09965] Adding new argument 'push_to_hub'=False that is not in the saved config file! [2024-11-23 15:14:09,097][09965] Adding new argument 'hf_repository'=None that is not in the saved config file! [2024-11-23 15:14:09,098][09965] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-11-23 15:14:09,099][09965] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-11-23 15:14:09,100][09965] Adding new argument 'train_script'=None that is not in the saved config file! [2024-11-23 15:14:09,102][09965] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-11-23 15:14:09,103][09965] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-11-23 15:14:09,120][09965] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-23 15:14:09,122][09965] RunningMeanStd input shape: (3, 72, 128) [2024-11-23 15:14:09,124][09965] RunningMeanStd input shape: (1,) [2024-11-23 15:14:09,139][09965] ConvEncoder: input_channels=3 [2024-11-23 15:14:09,268][09965] Conv encoder output size: 512 [2024-11-23 15:14:09,270][09965] Policy head output size: 512 [2024-11-23 15:14:11,241][09965] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-11-23 15:14:12,109][09965] Num frames 100... [2024-11-23 15:14:12,228][09965] Num frames 200... [2024-11-23 15:14:12,352][09965] Num frames 300... [2024-11-23 15:14:12,466][09965] Num frames 400... [2024-11-23 15:14:12,588][09965] Num frames 500... [2024-11-23 15:14:12,704][09965] Num frames 600... [2024-11-23 15:14:12,805][09965] Avg episode rewards: #0: 11.400, true rewards: #0: 6.400 [2024-11-23 15:14:12,808][09965] Avg episode reward: 11.400, avg true_objective: 6.400 [2024-11-23 15:14:12,881][09965] Num frames 700... [2024-11-23 15:14:13,001][09965] Num frames 800... [2024-11-23 15:14:13,121][09965] Num frames 900... [2024-11-23 15:14:13,241][09965] Num frames 1000... [2024-11-23 15:14:13,366][09965] Num frames 1100... [2024-11-23 15:14:13,483][09965] Num frames 1200... [2024-11-23 15:14:13,614][09965] Num frames 1300... [2024-11-23 15:14:13,677][09965] Avg episode rewards: #0: 11.525, true rewards: #0: 6.525 [2024-11-23 15:14:13,679][09965] Avg episode reward: 11.525, avg true_objective: 6.525 [2024-11-23 15:14:13,792][09965] Num frames 1400... [2024-11-23 15:14:13,913][09965] Num frames 1500... [2024-11-23 15:14:14,032][09965] Num frames 1600... [2024-11-23 15:14:14,150][09965] Num frames 1700... [2024-11-23 15:14:14,306][09965] Avg episode rewards: #0: 10.950, true rewards: #0: 5.950 [2024-11-23 15:14:14,308][09965] Avg episode reward: 10.950, avg true_objective: 5.950 [2024-11-23 15:14:14,327][09965] Num frames 1800... [2024-11-23 15:14:14,445][09965] Num frames 1900... [2024-11-23 15:14:14,565][09965] Num frames 2000... [2024-11-23 15:14:14,690][09965] Num frames 2100... [2024-11-23 15:14:14,813][09965] Num frames 2200... [2024-11-23 15:14:14,932][09965] Num frames 2300... [2024-11-23 15:14:15,052][09965] Num frames 2400... [2024-11-23 15:14:15,175][09965] Num frames 2500... [2024-11-23 15:14:15,259][09965] Avg episode rewards: #0: 12.303, true rewards: #0: 6.302 [2024-11-23 15:14:15,262][09965] Avg episode reward: 12.303, avg true_objective: 6.302 [2024-11-23 15:14:15,362][09965] Num frames 2600... [2024-11-23 15:14:15,525][09965] Num frames 2700... [2024-11-23 15:14:15,705][09965] Num frames 2800... [2024-11-23 15:14:15,868][09965] Num frames 2900... [2024-11-23 15:14:16,030][09965] Num frames 3000... [2024-11-23 15:14:16,195][09965] Num frames 3100... [2024-11-23 15:14:16,358][09965] Num frames 3200... [2024-11-23 15:14:16,517][09965] Num frames 3300... [2024-11-23 15:14:16,681][09965] Num frames 3400... [2024-11-23 15:14:16,860][09965] Num frames 3500... [2024-11-23 15:14:17,025][09965] Num frames 3600... [2024-11-23 15:14:17,195][09965] Num frames 3700... [2024-11-23 15:14:17,362][09965] Num frames 3800... [2024-11-23 15:14:17,529][09965] Num frames 3900... [2024-11-23 15:14:17,697][09965] Num frames 4000... [2024-11-23 15:14:17,885][09965] Num frames 4100... [2024-11-23 15:14:18,017][09965] Num frames 4200... [2024-11-23 15:14:18,137][09965] Num frames 4300... [2024-11-23 15:14:18,258][09965] Num frames 4400... [2024-11-23 15:14:18,381][09965] Num frames 4500... [2024-11-23 15:14:18,502][09965] Num frames 4600... [2024-11-23 15:14:18,584][09965] Avg episode rewards: #0: 21.442, true rewards: #0: 9.242 [2024-11-23 15:14:18,586][09965] Avg episode reward: 21.442, avg true_objective: 9.242 [2024-11-23 15:14:18,683][09965] Num frames 4700... [2024-11-23 15:14:18,808][09965] Num frames 4800... [2024-11-23 15:14:18,935][09965] Num frames 4900... [2024-11-23 15:14:19,055][09965] Num frames 5000... [2024-11-23 15:14:19,174][09965] Num frames 5100... [2024-11-23 15:14:19,299][09965] Num frames 5200... [2024-11-23 15:14:19,423][09965] Num frames 5300... [2024-11-23 15:14:19,542][09965] Num frames 5400... [2024-11-23 15:14:19,660][09965] Num frames 5500... [2024-11-23 15:14:19,782][09965] Num frames 5600... [2024-11-23 15:14:19,919][09965] Avg episode rewards: #0: 21.765, true rewards: #0: 9.432 [2024-11-23 15:14:19,921][09965] Avg episode reward: 21.765, avg true_objective: 9.432 [2024-11-23 15:14:19,971][09965] Num frames 5700... [2024-11-23 15:14:20,089][09965] Num frames 5800... [2024-11-23 15:14:20,211][09965] Num frames 5900... [2024-11-23 15:14:20,337][09965] Num frames 6000... [2024-11-23 15:14:20,460][09965] Num frames 6100... [2024-11-23 15:14:20,579][09965] Num frames 6200... [2024-11-23 15:14:20,699][09965] Num frames 6300... [2024-11-23 15:14:20,821][09965] Num frames 6400... [2024-11-23 15:14:20,948][09965] Num frames 6500... [2024-11-23 15:14:21,104][09965] Avg episode rewards: #0: 21.553, true rewards: #0: 9.410 [2024-11-23 15:14:21,105][09965] Avg episode reward: 21.553, avg true_objective: 9.410 [2024-11-23 15:14:21,125][09965] Num frames 6600... [2024-11-23 15:14:21,246][09965] Num frames 6700... [2024-11-23 15:14:21,374][09965] Num frames 6800... [2024-11-23 15:14:21,500][09965] Num frames 6900... [2024-11-23 15:14:21,558][09965] Avg episode rewards: #0: 19.501, true rewards: #0: 8.626 [2024-11-23 15:14:21,561][09965] Avg episode reward: 19.501, avg true_objective: 8.626 [2024-11-23 15:14:21,681][09965] Num frames 7000... [2024-11-23 15:14:21,805][09965] Num frames 7100... [2024-11-23 15:14:21,936][09965] Num frames 7200... [2024-11-23 15:14:22,056][09965] Num frames 7300... [2024-11-23 15:14:22,175][09965] Num frames 7400... [2024-11-23 15:14:22,301][09965] Num frames 7500... [2024-11-23 15:14:22,425][09965] Num frames 7600... [2024-11-23 15:14:22,543][09965] Num frames 7700... [2024-11-23 15:14:22,661][09965] Num frames 7800... [2024-11-23 15:14:22,781][09965] Num frames 7900... [2024-11-23 15:14:22,911][09965] Num frames 8000... [2024-11-23 15:14:23,032][09965] Num frames 8100... [2024-11-23 15:14:23,152][09965] Num frames 8200... [2024-11-23 15:14:23,290][09965] Avg episode rewards: #0: 20.856, true rewards: #0: 9.189 [2024-11-23 15:14:23,291][09965] Avg episode reward: 20.856, avg true_objective: 9.189 [2024-11-23 15:14:23,329][09965] Num frames 8300... [2024-11-23 15:14:23,446][09965] Num frames 8400... [2024-11-23 15:14:23,570][09965] Num frames 8500... [2024-11-23 15:14:23,689][09965] Num frames 8600... [2024-11-23 15:14:23,812][09965] Num frames 8700... [2024-11-23 15:14:23,934][09965] Num frames 8800... [2024-11-23 15:14:24,059][09965] Num frames 8900... [2024-11-23 15:14:24,179][09965] Num frames 9000... [2024-11-23 15:14:24,303][09965] Num frames 9100... [2024-11-23 15:14:24,422][09965] Num frames 9200... [2024-11-23 15:14:24,518][09965] Avg episode rewards: #0: 21.034, true rewards: #0: 9.234 [2024-11-23 15:14:24,520][09965] Avg episode reward: 21.034, avg true_objective: 9.234 [2024-11-23 15:15:18,591][09965] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2024-11-23 15:15:55,304][09965] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-11-23 15:15:55,306][09965] Overriding arg 'num_workers' with value 1 passed from command line [2024-11-23 15:15:55,307][09965] Adding new argument 'no_render'=True that is not in the saved config file! [2024-11-23 15:15:55,309][09965] Adding new argument 'save_video'=True that is not in the saved config file! [2024-11-23 15:15:55,311][09965] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-11-23 15:15:55,312][09965] Adding new argument 'video_name'=None that is not in the saved config file! [2024-11-23 15:15:55,314][09965] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2024-11-23 15:15:55,316][09965] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-11-23 15:15:55,317][09965] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2024-11-23 15:15:55,319][09965] Adding new argument 'hf_repository'='power-is-me/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2024-11-23 15:15:55,320][09965] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-11-23 15:15:55,322][09965] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-11-23 15:15:55,323][09965] Adding new argument 'train_script'=None that is not in the saved config file! [2024-11-23 15:15:55,324][09965] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-11-23 15:15:55,325][09965] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-11-23 15:15:55,338][09965] RunningMeanStd input shape: (3, 72, 128) [2024-11-23 15:15:55,340][09965] RunningMeanStd input shape: (1,) [2024-11-23 15:15:55,354][09965] ConvEncoder: input_channels=3 [2024-11-23 15:15:55,388][09965] Conv encoder output size: 512 [2024-11-23 15:15:55,390][09965] Policy head output size: 512 [2024-11-23 15:15:55,408][09965] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-11-23 15:15:55,884][09965] Num frames 100... [2024-11-23 15:15:56,003][09965] Num frames 200... [2024-11-23 15:15:56,122][09965] Num frames 300... [2024-11-23 15:15:56,239][09965] Num frames 400... [2024-11-23 15:15:56,408][09965] Num frames 500... [2024-11-23 15:15:56,574][09965] Num frames 600... [2024-11-23 15:15:56,735][09965] Num frames 700... [2024-11-23 15:15:56,907][09965] Num frames 800... [2024-11-23 15:15:57,065][09965] Num frames 900... [2024-11-23 15:15:57,223][09965] Num frames 1000... [2024-11-23 15:15:57,403][09965] Num frames 1100... [2024-11-23 15:15:57,568][09965] Num frames 1200... [2024-11-23 15:15:57,736][09965] Num frames 1300... [2024-11-23 15:15:57,923][09965] Num frames 1400... [2024-11-23 15:15:58,096][09965] Num frames 1500... [2024-11-23 15:15:58,286][09965] Num frames 1600... [2024-11-23 15:15:58,466][09965] Num frames 1700... [2024-11-23 15:15:58,571][09965] Avg episode rewards: #0: 44.280, true rewards: #0: 17.280 [2024-11-23 15:15:58,573][09965] Avg episode reward: 44.280, avg true_objective: 17.280 [2024-11-23 15:15:58,697][09965] Num frames 1800... [2024-11-23 15:15:58,866][09965] Num frames 1900... [2024-11-23 15:15:58,984][09965] Num frames 2000... [2024-11-23 15:15:59,098][09965] Num frames 2100... [2024-11-23 15:15:59,214][09965] Num frames 2200... [2024-11-23 15:15:59,336][09965] Num frames 2300... [2024-11-23 15:15:59,450][09965] Num frames 2400... [2024-11-23 15:15:59,585][09965] Num frames 2500... [2024-11-23 15:15:59,703][09965] Num frames 2600... [2024-11-23 15:15:59,820][09965] Num frames 2700... [2024-11-23 15:15:59,956][09965] Num frames 2800... [2024-11-23 15:16:00,076][09965] Num frames 2900... [2024-11-23 15:16:00,195][09965] Num frames 3000... [2024-11-23 15:16:00,323][09965] Num frames 3100... [2024-11-23 15:16:00,441][09965] Num frames 3200... [2024-11-23 15:16:00,558][09965] Num frames 3300... [2024-11-23 15:16:00,674][09965] Num frames 3400... [2024-11-23 15:16:00,797][09965] Num frames 3500... [2024-11-23 15:16:00,920][09965] Num frames 3600... [2024-11-23 15:16:01,044][09965] Num frames 3700... [2024-11-23 15:16:01,171][09965] Num frames 3800... [2024-11-23 15:16:01,260][09965] Avg episode rewards: #0: 50.624, true rewards: #0: 19.125 [2024-11-23 15:16:01,261][09965] Avg episode reward: 50.624, avg true_objective: 19.125 [2024-11-23 15:16:01,357][09965] Num frames 3900... [2024-11-23 15:16:01,473][09965] Num frames 4000... [2024-11-23 15:16:01,593][09965] Num frames 4100... [2024-11-23 15:16:01,718][09965] Num frames 4200... [2024-11-23 15:16:01,834][09965] Num frames 4300... [2024-11-23 15:16:01,966][09965] Num frames 4400... [2024-11-23 15:16:02,112][09965] Avg episode rewards: #0: 37.920, true rewards: #0: 14.920 [2024-11-23 15:16:02,113][09965] Avg episode reward: 37.920, avg true_objective: 14.920 [2024-11-23 15:16:02,145][09965] Num frames 4500... [2024-11-23 15:16:02,264][09965] Num frames 4600... [2024-11-23 15:16:02,395][09965] Num frames 4700... [2024-11-23 15:16:02,513][09965] Num frames 4800... [2024-11-23 15:16:02,633][09965] Num frames 4900... [2024-11-23 15:16:02,753][09965] Num frames 5000... [2024-11-23 15:16:02,878][09965] Num frames 5100... [2024-11-23 15:16:03,008][09965] Num frames 5200... [2024-11-23 15:16:03,129][09965] Num frames 5300... [2024-11-23 15:16:03,251][09965] Num frames 5400... [2024-11-23 15:16:03,379][09965] Num frames 5500... [2024-11-23 15:16:03,498][09965] Num frames 5600... [2024-11-23 15:16:03,620][09965] Num frames 5700... [2024-11-23 15:16:03,744][09965] Num frames 5800... [2024-11-23 15:16:03,883][09965] Num frames 5900... [2024-11-23 15:16:04,068][09965] Avg episode rewards: #0: 38.225, true rewards: #0: 14.975 [2024-11-23 15:16:04,071][09965] Avg episode reward: 38.225, avg true_objective: 14.975 [2024-11-23 15:16:04,085][09965] Num frames 6000... [2024-11-23 15:16:04,205][09965] Num frames 6100... [2024-11-23 15:16:04,333][09965] Num frames 6200... [2024-11-23 15:16:04,454][09965] Num frames 6300... [2024-11-23 15:16:04,572][09965] Num frames 6400... [2024-11-23 15:16:04,698][09965] Num frames 6500... [2024-11-23 15:16:04,825][09965] Num frames 6600... [2024-11-23 15:16:04,943][09965] Num frames 6700... [2024-11-23 15:16:05,068][09965] Num frames 6800... [2024-11-23 15:16:05,189][09965] Num frames 6900... [2024-11-23 15:16:05,313][09965] Num frames 7000... [2024-11-23 15:16:05,387][09965] Avg episode rewards: #0: 35.228, true rewards: #0: 14.028 [2024-11-23 15:16:05,389][09965] Avg episode reward: 35.228, avg true_objective: 14.028 [2024-11-23 15:16:05,492][09965] Num frames 7100... [2024-11-23 15:16:05,615][09965] Num frames 7200... [2024-11-23 15:16:05,731][09965] Num frames 7300... [2024-11-23 15:16:05,847][09965] Num frames 7400... [2024-11-23 15:16:05,973][09965] Num frames 7500... [2024-11-23 15:16:06,101][09965] Num frames 7600... [2024-11-23 15:16:06,220][09965] Num frames 7700... [2024-11-23 15:16:06,356][09965] Num frames 7800... [2024-11-23 15:16:06,476][09965] Num frames 7900... [2024-11-23 15:16:06,593][09965] Num frames 8000... [2024-11-23 15:16:06,726][09965] Num frames 8100... [2024-11-23 15:16:06,834][09965] Avg episode rewards: #0: 35.071, true rewards: #0: 13.572 [2024-11-23 15:16:06,836][09965] Avg episode reward: 35.071, avg true_objective: 13.572 [2024-11-23 15:16:06,907][09965] Num frames 8200... [2024-11-23 15:16:07,027][09965] Num frames 8300... [2024-11-23 15:16:07,161][09965] Num frames 8400... [2024-11-23 15:16:07,283][09965] Num frames 8500... [2024-11-23 15:16:07,407][09965] Num frames 8600... [2024-11-23 15:16:07,525][09965] Num frames 8700... [2024-11-23 15:16:07,646][09965] Num frames 8800... [2024-11-23 15:16:07,780][09965] Num frames 8900... [2024-11-23 15:16:07,904][09965] Num frames 9000... [2024-11-23 15:16:08,024][09965] Num frames 9100... [2024-11-23 15:16:08,158][09965] Num frames 9200... [2024-11-23 15:16:08,282][09965] Num frames 9300... [2024-11-23 15:16:08,414][09965] Num frames 9400... [2024-11-23 15:16:08,533][09965] Num frames 9500... [2024-11-23 15:16:08,654][09965] Num frames 9600... [2024-11-23 15:16:08,777][09965] Num frames 9700... [2024-11-23 15:16:08,927][09965] Num frames 9800... [2024-11-23 15:16:09,094][09965] Num frames 9900... [2024-11-23 15:16:09,298][09965] Avg episode rewards: #0: 36.971, true rewards: #0: 14.257 [2024-11-23 15:16:09,300][09965] Avg episode reward: 36.971, avg true_objective: 14.257 [2024-11-23 15:16:09,336][09965] Num frames 10000... [2024-11-23 15:16:09,501][09965] Num frames 10100... [2024-11-23 15:16:09,662][09965] Num frames 10200... [2024-11-23 15:16:09,828][09965] Num frames 10300... [2024-11-23 15:16:09,986][09965] Num frames 10400... [2024-11-23 15:16:10,150][09965] Num frames 10500... [2024-11-23 15:16:10,323][09965] Num frames 10600... [2024-11-23 15:16:10,478][09965] Avg episode rewards: #0: 33.690, true rewards: #0: 13.315 [2024-11-23 15:16:10,480][09965] Avg episode reward: 33.690, avg true_objective: 13.315 [2024-11-23 15:16:10,575][09965] Num frames 10700... [2024-11-23 15:16:10,746][09965] Num frames 10800... [2024-11-23 15:16:10,912][09965] Num frames 10900... [2024-11-23 15:16:11,083][09965] Avg episode rewards: #0: 30.519, true rewards: #0: 12.186 [2024-11-23 15:16:11,086][09965] Avg episode reward: 30.519, avg true_objective: 12.186 [2024-11-23 15:16:11,150][09965] Num frames 11000... [2024-11-23 15:16:11,339][09965] Num frames 11100... [2024-11-23 15:16:11,479][09965] Num frames 11200... [2024-11-23 15:16:11,596][09965] Num frames 11300... [2024-11-23 15:16:11,717][09965] Num frames 11400... [2024-11-23 15:16:11,833][09965] Num frames 11500... [2024-11-23 15:16:11,959][09965] Num frames 11600... [2024-11-23 15:16:12,080][09965] Num frames 11700... [2024-11-23 15:16:12,200][09965] Num frames 11800... [2024-11-23 15:16:12,334][09965] Num frames 11900... [2024-11-23 15:16:12,456][09965] Num frames 12000... [2024-11-23 15:16:12,613][09965] Avg episode rewards: #0: 29.987, true rewards: #0: 12.087 [2024-11-23 15:16:12,614][09965] Avg episode reward: 29.987, avg true_objective: 12.087 [2024-11-23 15:17:21,733][09965] Replay video saved to /content/train_dir/default_experiment/replay.mp4!