[2024-12-21 13:04:47,087][02089] Saving configuration to /content/train_dir/default_experiment/config.json... [2024-12-21 13:04:47,090][02089] Rollout worker 0 uses device cpu [2024-12-21 13:04:47,094][02089] Rollout worker 1 uses device cpu [2024-12-21 13:04:47,095][02089] Rollout worker 2 uses device cpu [2024-12-21 13:04:47,097][02089] Rollout worker 3 uses device cpu [2024-12-21 13:04:47,098][02089] Rollout worker 4 uses device cpu [2024-12-21 13:04:47,099][02089] Rollout worker 5 uses device cpu [2024-12-21 13:04:47,102][02089] Rollout worker 6 uses device cpu [2024-12-21 13:04:47,103][02089] Rollout worker 7 uses device cpu [2024-12-21 13:04:47,304][02089] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-12-21 13:04:47,308][02089] InferenceWorker_p0-w0: min num requests: 2 [2024-12-21 13:04:47,352][02089] Starting all processes... [2024-12-21 13:04:47,355][02089] Starting process learner_proc0 [2024-12-21 13:04:47,418][02089] Starting all processes... [2024-12-21 13:04:47,435][02089] Starting process inference_proc0-0 [2024-12-21 13:04:47,436][02089] Starting process rollout_proc0 [2024-12-21 13:04:47,436][02089] Starting process rollout_proc1 [2024-12-21 13:04:47,436][02089] Starting process rollout_proc2 [2024-12-21 13:04:47,436][02089] Starting process rollout_proc3 [2024-12-21 13:04:47,436][02089] Starting process rollout_proc4 [2024-12-21 13:04:47,436][02089] Starting process rollout_proc5 [2024-12-21 13:04:47,436][02089] Starting process rollout_proc6 [2024-12-21 13:04:47,436][02089] Starting process rollout_proc7 [2024-12-21 13:05:04,274][04429] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-12-21 13:05:04,284][04429] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2024-12-21 13:05:04,360][04429] Num visible devices: 1 [2024-12-21 13:05:04,414][04429] Starting seed is not provided [2024-12-21 13:05:04,415][04429] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-12-21 13:05:04,416][04429] Initializing actor-critic model on device cuda:0 [2024-12-21 13:05:04,417][04429] RunningMeanStd input shape: (3, 72, 128) [2024-12-21 13:05:04,420][04429] RunningMeanStd input shape: (1,) [2024-12-21 13:05:04,528][04429] ConvEncoder: input_channels=3 [2024-12-21 13:05:05,174][04446] Worker 3 uses CPU cores [1] [2024-12-21 13:05:05,180][04447] Worker 4 uses CPU cores [0] [2024-12-21 13:05:05,226][04444] Worker 0 uses CPU cores [0] [2024-12-21 13:05:05,350][04442] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-12-21 13:05:05,352][04442] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2024-12-21 13:05:05,362][04449] Worker 6 uses CPU cores [0] [2024-12-21 13:05:05,383][04450] Worker 7 uses CPU cores [1] [2024-12-21 13:05:05,409][04429] Conv encoder output size: 512 [2024-12-21 13:05:05,410][04429] Policy head output size: 512 [2024-12-21 13:05:05,418][04448] Worker 5 uses CPU cores [1] [2024-12-21 13:05:05,417][04442] Num visible devices: 1 [2024-12-21 13:05:05,488][04443] Worker 1 uses CPU cores [1] [2024-12-21 13:05:05,501][04445] Worker 2 uses CPU cores [0] [2024-12-21 13:05:05,505][04429] Created Actor Critic model with architecture: [2024-12-21 13:05:05,506][04429] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2024-12-21 13:05:05,917][04429] Using optimizer [2024-12-21 13:05:07,293][02089] Heartbeat connected on Batcher_0 [2024-12-21 13:05:07,305][02089] Heartbeat connected on InferenceWorker_p0-w0 [2024-12-21 13:05:07,317][02089] Heartbeat connected on RolloutWorker_w0 [2024-12-21 13:05:07,323][02089] Heartbeat connected on RolloutWorker_w1 [2024-12-21 13:05:07,329][02089] Heartbeat connected on RolloutWorker_w2 [2024-12-21 13:05:07,332][02089] Heartbeat connected on RolloutWorker_w3 [2024-12-21 13:05:07,337][02089] Heartbeat connected on RolloutWorker_w4 [2024-12-21 13:05:07,343][02089] Heartbeat connected on RolloutWorker_w5 [2024-12-21 13:05:07,347][02089] Heartbeat connected on RolloutWorker_w6 [2024-12-21 13:05:07,355][02089] Heartbeat connected on RolloutWorker_w7 [2024-12-21 13:05:09,432][04429] No checkpoints found [2024-12-21 13:05:09,432][04429] Did not load from checkpoint, starting from scratch! [2024-12-21 13:05:09,432][04429] Initialized policy 0 weights for model version 0 [2024-12-21 13:05:09,435][04429] LearnerWorker_p0 finished initialization! [2024-12-21 13:05:09,438][04429] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-12-21 13:05:09,436][02089] Heartbeat connected on LearnerWorker_p0 [2024-12-21 13:05:09,646][04442] RunningMeanStd input shape: (3, 72, 128) [2024-12-21 13:05:09,648][04442] RunningMeanStd input shape: (1,) [2024-12-21 13:05:09,661][04442] ConvEncoder: input_channels=3 [2024-12-21 13:05:09,766][04442] Conv encoder output size: 512 [2024-12-21 13:05:09,766][04442] Policy head output size: 512 [2024-12-21 13:05:09,818][02089] Inference worker 0-0 is ready! [2024-12-21 13:05:09,822][02089] All inference workers are ready! Signal rollout workers to start! [2024-12-21 13:05:10,021][04450] Doom resolution: 160x120, resize resolution: (128, 72) [2024-12-21 13:05:10,023][04448] Doom resolution: 160x120, resize resolution: (128, 72) [2024-12-21 13:05:10,018][04446] Doom resolution: 160x120, resize resolution: (128, 72) [2024-12-21 13:05:10,020][04443] Doom resolution: 160x120, resize resolution: (128, 72) [2024-12-21 13:05:10,018][04445] Doom resolution: 160x120, resize resolution: (128, 72) [2024-12-21 13:05:10,032][04447] Doom resolution: 160x120, resize resolution: (128, 72) [2024-12-21 13:05:10,035][04449] Doom resolution: 160x120, resize resolution: (128, 72) [2024-12-21 13:05:10,021][04444] Doom resolution: 160x120, resize resolution: (128, 72) [2024-12-21 13:05:11,576][04449] Decorrelating experience for 0 frames... [2024-12-21 13:05:11,576][04446] Decorrelating experience for 0 frames... [2024-12-21 13:05:11,578][04450] Decorrelating experience for 0 frames... [2024-12-21 13:05:11,579][04444] Decorrelating experience for 0 frames... [2024-12-21 13:05:11,581][04447] Decorrelating experience for 0 frames... [2024-12-21 13:05:11,582][04443] Decorrelating experience for 0 frames... [2024-12-21 13:05:11,651][02089] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-12-21 13:05:12,676][04443] Decorrelating experience for 32 frames... [2024-12-21 13:05:12,678][04450] Decorrelating experience for 32 frames... [2024-12-21 13:05:13,287][04449] Decorrelating experience for 32 frames... [2024-12-21 13:05:13,291][04447] Decorrelating experience for 32 frames... [2024-12-21 13:05:13,298][04445] Decorrelating experience for 0 frames... [2024-12-21 13:05:13,324][04444] Decorrelating experience for 32 frames... [2024-12-21 13:05:13,433][04450] Decorrelating experience for 64 frames... [2024-12-21 13:05:15,035][04448] Decorrelating experience for 0 frames... [2024-12-21 13:05:15,109][04446] Decorrelating experience for 32 frames... [2024-12-21 13:05:15,169][04445] Decorrelating experience for 32 frames... [2024-12-21 13:05:15,481][04450] Decorrelating experience for 96 frames... [2024-12-21 13:05:15,658][04444] Decorrelating experience for 64 frames... [2024-12-21 13:05:15,674][04449] Decorrelating experience for 64 frames... [2024-12-21 13:05:16,651][02089] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-12-21 13:05:16,733][04447] Decorrelating experience for 64 frames... [2024-12-21 13:05:17,164][04443] Decorrelating experience for 64 frames... [2024-12-21 13:05:17,395][04448] Decorrelating experience for 32 frames... [2024-12-21 13:05:17,460][04445] Decorrelating experience for 64 frames... [2024-12-21 13:05:18,902][04446] Decorrelating experience for 64 frames... [2024-12-21 13:05:19,467][04447] Decorrelating experience for 96 frames... [2024-12-21 13:05:20,012][04443] Decorrelating experience for 96 frames... [2024-12-21 13:05:20,079][04449] Decorrelating experience for 96 frames... [2024-12-21 13:05:20,371][04445] Decorrelating experience for 96 frames... [2024-12-21 13:05:20,930][04448] Decorrelating experience for 64 frames... [2024-12-21 13:05:21,135][04444] Decorrelating experience for 96 frames... [2024-12-21 13:05:21,651][02089] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 7.2. Samples: 72. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-12-21 13:05:21,653][02089] Avg episode reward: [(0, '2.205')] [2024-12-21 13:05:22,326][04446] Decorrelating experience for 96 frames... [2024-12-21 13:05:24,377][04429] Signal inference workers to stop experience collection... [2024-12-21 13:05:24,406][04442] InferenceWorker_p0-w0: stopping experience collection [2024-12-21 13:05:24,440][04448] Decorrelating experience for 96 frames... [2024-12-21 13:05:26,651][02089] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 169.5. Samples: 2542. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-12-21 13:05:26,653][02089] Avg episode reward: [(0, '2.565')] [2024-12-21 13:05:27,175][04429] Signal inference workers to resume experience collection... [2024-12-21 13:05:27,178][04442] InferenceWorker_p0-w0: resuming experience collection [2024-12-21 13:05:31,651][02089] Fps is (10 sec: 2048.0, 60 sec: 1024.0, 300 sec: 1024.0). Total num frames: 20480. Throughput: 0: 311.8. Samples: 6236. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-21 13:05:31,655][02089] Avg episode reward: [(0, '3.579')] [2024-12-21 13:05:36,651][02089] Fps is (10 sec: 3686.0, 60 sec: 1474.5, 300 sec: 1474.5). Total num frames: 36864. Throughput: 0: 331.4. Samples: 8286. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-21 13:05:36,654][02089] Avg episode reward: [(0, '3.765')] [2024-12-21 13:05:37,420][04442] Updated weights for policy 0, policy_version 10 (0.0154) [2024-12-21 13:05:41,651][02089] Fps is (10 sec: 3686.4, 60 sec: 1911.5, 300 sec: 1911.5). Total num frames: 57344. Throughput: 0: 452.5. Samples: 13574. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-21 13:05:41,657][02089] Avg episode reward: [(0, '4.365')] [2024-12-21 13:05:46,651][02089] Fps is (10 sec: 4096.3, 60 sec: 2223.5, 300 sec: 2223.5). Total num frames: 77824. Throughput: 0: 565.3. Samples: 19786. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-21 13:05:46,657][02089] Avg episode reward: [(0, '4.539')] [2024-12-21 13:05:47,047][04442] Updated weights for policy 0, policy_version 20 (0.0022) [2024-12-21 13:05:51,651][02089] Fps is (10 sec: 3686.4, 60 sec: 2355.2, 300 sec: 2355.2). Total num frames: 94208. Throughput: 0: 569.0. Samples: 22762. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-21 13:05:51,653][02089] Avg episode reward: [(0, '4.403')] [2024-12-21 13:05:56,651][02089] Fps is (10 sec: 3276.8, 60 sec: 2457.6, 300 sec: 2457.6). Total num frames: 110592. Throughput: 0: 601.8. Samples: 27080. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-21 13:05:56,657][02089] Avg episode reward: [(0, '4.304')] [2024-12-21 13:05:56,659][04429] Saving new best policy, reward=4.304! [2024-12-21 13:05:58,480][04442] Updated weights for policy 0, policy_version 30 (0.0025) [2024-12-21 13:06:01,651][02089] Fps is (10 sec: 4096.0, 60 sec: 2703.4, 300 sec: 2703.4). Total num frames: 135168. Throughput: 0: 755.2. Samples: 33984. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-21 13:06:01,656][02089] Avg episode reward: [(0, '4.341')] [2024-12-21 13:06:01,663][04429] Saving new best policy, reward=4.341! [2024-12-21 13:06:06,651][02089] Fps is (10 sec: 4505.6, 60 sec: 2830.0, 300 sec: 2830.0). Total num frames: 155648. Throughput: 0: 831.3. Samples: 37482. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-21 13:06:06,656][02089] Avg episode reward: [(0, '4.315')] [2024-12-21 13:06:08,778][04442] Updated weights for policy 0, policy_version 40 (0.0019) [2024-12-21 13:06:11,651][02089] Fps is (10 sec: 3686.4, 60 sec: 2867.2, 300 sec: 2867.2). Total num frames: 172032. Throughput: 0: 879.8. Samples: 42134. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-12-21 13:06:11,659][02089] Avg episode reward: [(0, '4.315')] [2024-12-21 13:06:16,650][02089] Fps is (10 sec: 3686.4, 60 sec: 3208.5, 300 sec: 2961.7). Total num frames: 192512. Throughput: 0: 929.3. Samples: 48056. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-21 13:06:16,653][02089] Avg episode reward: [(0, '4.364')] [2024-12-21 13:06:16,658][04429] Saving new best policy, reward=4.364! [2024-12-21 13:06:18,927][04442] Updated weights for policy 0, policy_version 50 (0.0035) [2024-12-21 13:06:21,651][02089] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3042.7). Total num frames: 212992. Throughput: 0: 961.7. Samples: 51562. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-21 13:06:21,653][02089] Avg episode reward: [(0, '4.384')] [2024-12-21 13:06:21,670][04429] Saving new best policy, reward=4.384! [2024-12-21 13:06:26,651][02089] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3058.3). Total num frames: 229376. Throughput: 0: 969.4. Samples: 57196. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-21 13:06:26,653][02089] Avg episode reward: [(0, '4.423')] [2024-12-21 13:06:26,655][04429] Saving new best policy, reward=4.423! [2024-12-21 13:06:30,610][04442] Updated weights for policy 0, policy_version 60 (0.0024) [2024-12-21 13:06:31,651][02089] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3123.2). Total num frames: 249856. Throughput: 0: 942.5. Samples: 62200. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-21 13:06:31,652][02089] Avg episode reward: [(0, '4.441')] [2024-12-21 13:06:31,664][04429] Saving new best policy, reward=4.441! [2024-12-21 13:06:36,651][02089] Fps is (10 sec: 4096.0, 60 sec: 3891.3, 300 sec: 3180.4). Total num frames: 270336. Throughput: 0: 953.3. Samples: 65662. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-21 13:06:36,653][02089] Avg episode reward: [(0, '4.496')] [2024-12-21 13:06:36,659][04429] Saving new best policy, reward=4.496! [2024-12-21 13:06:40,186][04442] Updated weights for policy 0, policy_version 70 (0.0021) [2024-12-21 13:06:41,651][02089] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3231.3). Total num frames: 290816. Throughput: 0: 997.8. Samples: 71980. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-21 13:06:41,653][02089] Avg episode reward: [(0, '4.420')] [2024-12-21 13:06:41,664][04429] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000071_290816.pth... [2024-12-21 13:06:46,651][02089] Fps is (10 sec: 3276.9, 60 sec: 3754.7, 300 sec: 3190.6). Total num frames: 303104. Throughput: 0: 934.4. Samples: 76032. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-21 13:06:46,657][02089] Avg episode reward: [(0, '4.457')] [2024-12-21 13:06:51,568][04442] Updated weights for policy 0, policy_version 80 (0.0015) [2024-12-21 13:06:51,651][02089] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3276.8). Total num frames: 327680. Throughput: 0: 926.6. Samples: 79178. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-21 13:06:51,658][02089] Avg episode reward: [(0, '4.511')] [2024-12-21 13:06:51,673][04429] Saving new best policy, reward=4.511! [2024-12-21 13:06:56,652][02089] Fps is (10 sec: 4505.0, 60 sec: 3959.4, 300 sec: 3315.8). Total num frames: 348160. Throughput: 0: 975.0. Samples: 86010. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-21 13:06:56,657][02089] Avg episode reward: [(0, '4.370')] [2024-12-21 13:07:01,651][02089] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3276.8). Total num frames: 360448. Throughput: 0: 945.5. Samples: 90602. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-21 13:07:01,653][02089] Avg episode reward: [(0, '4.417')] [2024-12-21 13:07:04,735][04442] Updated weights for policy 0, policy_version 90 (0.0018) [2024-12-21 13:07:06,654][02089] Fps is (10 sec: 2457.1, 60 sec: 3617.9, 300 sec: 3241.1). Total num frames: 372736. Throughput: 0: 904.8. Samples: 92280. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-21 13:07:06,662][02089] Avg episode reward: [(0, '4.565')] [2024-12-21 13:07:06,668][04429] Saving new best policy, reward=4.565! [2024-12-21 13:07:11,651][02089] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3276.8). Total num frames: 393216. Throughput: 0: 881.6. Samples: 96868. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-21 13:07:11,653][02089] Avg episode reward: [(0, '4.783')] [2024-12-21 13:07:11,666][04429] Saving new best policy, reward=4.783! [2024-12-21 13:07:15,168][04442] Updated weights for policy 0, policy_version 100 (0.0014) [2024-12-21 13:07:16,651][02089] Fps is (10 sec: 4097.4, 60 sec: 3686.4, 300 sec: 3309.6). Total num frames: 413696. Throughput: 0: 919.1. Samples: 103558. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-21 13:07:16,655][02089] Avg episode reward: [(0, '4.709')] [2024-12-21 13:07:21,651][02089] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3276.8). Total num frames: 425984. Throughput: 0: 895.2. Samples: 105948. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-12-21 13:07:21,659][02089] Avg episode reward: [(0, '4.630')] [2024-12-21 13:07:26,651][02089] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3307.1). Total num frames: 446464. Throughput: 0: 856.2. Samples: 110508. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-21 13:07:26,657][02089] Avg episode reward: [(0, '4.662')] [2024-12-21 13:07:27,090][04442] Updated weights for policy 0, policy_version 110 (0.0034) [2024-12-21 13:07:31,651][02089] Fps is (10 sec: 4505.5, 60 sec: 3686.4, 300 sec: 3364.6). Total num frames: 471040. Throughput: 0: 919.6. Samples: 117416. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-21 13:07:31,654][02089] Avg episode reward: [(0, '4.481')] [2024-12-21 13:07:36,651][02089] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3361.5). Total num frames: 487424. Throughput: 0: 927.8. Samples: 120930. Policy #0 lag: (min: 0.0, avg: 0.3, max: 2.0) [2024-12-21 13:07:36,653][02089] Avg episode reward: [(0, '4.483')] [2024-12-21 13:07:37,083][04442] Updated weights for policy 0, policy_version 120 (0.0021) [2024-12-21 13:07:41,651][02089] Fps is (10 sec: 3276.8, 60 sec: 3549.8, 300 sec: 3358.7). Total num frames: 503808. Throughput: 0: 868.3. Samples: 125082. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-12-21 13:07:41,653][02089] Avg episode reward: [(0, '4.726')] [2024-12-21 13:07:46,651][02089] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3382.5). Total num frames: 524288. Throughput: 0: 906.9. Samples: 131412. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-21 13:07:46,653][02089] Avg episode reward: [(0, '4.780')] [2024-12-21 13:07:47,852][04442] Updated weights for policy 0, policy_version 130 (0.0017) [2024-12-21 13:07:51,651][02089] Fps is (10 sec: 4505.7, 60 sec: 3686.4, 300 sec: 3430.4). Total num frames: 548864. Throughput: 0: 944.3. Samples: 134770. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-21 13:07:51,653][02089] Avg episode reward: [(0, '4.650')] [2024-12-21 13:07:56,651][02089] Fps is (10 sec: 3686.1, 60 sec: 3549.9, 300 sec: 3400.9). Total num frames: 561152. Throughput: 0: 957.4. Samples: 139952. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-21 13:07:56,656][02089] Avg episode reward: [(0, '4.554')] [2024-12-21 13:07:59,729][04442] Updated weights for policy 0, policy_version 140 (0.0034) [2024-12-21 13:08:01,651][02089] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3421.4). Total num frames: 581632. Throughput: 0: 926.0. Samples: 145226. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-21 13:08:01,657][02089] Avg episode reward: [(0, '4.525')] [2024-12-21 13:08:06,651][02089] Fps is (10 sec: 4096.4, 60 sec: 3823.1, 300 sec: 3440.6). Total num frames: 602112. Throughput: 0: 947.3. Samples: 148576. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-21 13:08:06,653][02089] Avg episode reward: [(0, '4.576')] [2024-12-21 13:08:08,521][04442] Updated weights for policy 0, policy_version 150 (0.0015) [2024-12-21 13:08:11,651][02089] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3458.8). Total num frames: 622592. Throughput: 0: 989.5. Samples: 155034. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-21 13:08:11,655][02089] Avg episode reward: [(0, '4.414')] [2024-12-21 13:08:16,651][02089] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3431.8). Total num frames: 634880. Throughput: 0: 931.7. Samples: 159340. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-21 13:08:16,654][02089] Avg episode reward: [(0, '4.510')] [2024-12-21 13:08:20,302][04442] Updated weights for policy 0, policy_version 160 (0.0018) [2024-12-21 13:08:21,651][02089] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3470.8). Total num frames: 659456. Throughput: 0: 928.7. Samples: 162720. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-21 13:08:21,655][02089] Avg episode reward: [(0, '4.551')] [2024-12-21 13:08:26,651][02089] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3486.9). Total num frames: 679936. Throughput: 0: 992.8. Samples: 169758. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-21 13:08:26,656][02089] Avg episode reward: [(0, '4.719')] [2024-12-21 13:08:31,109][04442] Updated weights for policy 0, policy_version 170 (0.0027) [2024-12-21 13:08:31,651][02089] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3481.6). Total num frames: 696320. Throughput: 0: 955.1. Samples: 174392. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-21 13:08:31,656][02089] Avg episode reward: [(0, '4.789')] [2024-12-21 13:08:31,666][04429] Saving new best policy, reward=4.789! [2024-12-21 13:08:36,652][02089] Fps is (10 sec: 3685.7, 60 sec: 3822.8, 300 sec: 3496.6). Total num frames: 716800. Throughput: 0: 936.5. Samples: 176916. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-21 13:08:36,657][02089] Avg episode reward: [(0, '4.597')] [2024-12-21 13:08:40,735][04442] Updated weights for policy 0, policy_version 180 (0.0026) [2024-12-21 13:08:41,651][02089] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3530.4). Total num frames: 741376. Throughput: 0: 977.6. Samples: 183944. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-21 13:08:41,656][02089] Avg episode reward: [(0, '4.829')] [2024-12-21 13:08:41,668][04429] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000181_741376.pth... [2024-12-21 13:08:41,800][04429] Saving new best policy, reward=4.829! [2024-12-21 13:08:46,651][02089] Fps is (10 sec: 4096.7, 60 sec: 3891.2, 300 sec: 3524.5). Total num frames: 757760. Throughput: 0: 982.1. Samples: 189420. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-21 13:08:46,656][02089] Avg episode reward: [(0, '5.042')] [2024-12-21 13:08:46,659][04429] Saving new best policy, reward=5.042! [2024-12-21 13:08:51,651][02089] Fps is (10 sec: 2867.2, 60 sec: 3686.4, 300 sec: 3500.2). Total num frames: 770048. Throughput: 0: 952.1. Samples: 191420. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-21 13:08:51,658][02089] Avg episode reward: [(0, '4.987')] [2024-12-21 13:08:52,565][04442] Updated weights for policy 0, policy_version 190 (0.0014) [2024-12-21 13:08:56,651][02089] Fps is (10 sec: 3686.4, 60 sec: 3891.3, 300 sec: 3531.7). Total num frames: 794624. Throughput: 0: 949.8. Samples: 197776. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-21 13:08:56,658][02089] Avg episode reward: [(0, '5.158')] [2024-12-21 13:08:56,661][04429] Saving new best policy, reward=5.158! [2024-12-21 13:09:01,651][02089] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3543.9). Total num frames: 815104. Throughput: 0: 1001.5. Samples: 204406. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-21 13:09:01,653][02089] Avg episode reward: [(0, '5.051')] [2024-12-21 13:09:01,977][04442] Updated weights for policy 0, policy_version 200 (0.0014) [2024-12-21 13:09:06,651][02089] Fps is (10 sec: 3686.3, 60 sec: 3822.9, 300 sec: 3538.2). Total num frames: 831488. Throughput: 0: 973.0. Samples: 206504. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-21 13:09:06,656][02089] Avg episode reward: [(0, '5.111')] [2024-12-21 13:09:11,651][02089] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3549.9). Total num frames: 851968. Throughput: 0: 938.9. Samples: 212008. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-21 13:09:11,654][02089] Avg episode reward: [(0, '5.073')] [2024-12-21 13:09:12,974][04442] Updated weights for policy 0, policy_version 210 (0.0041) [2024-12-21 13:09:16,651][02089] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3577.7). Total num frames: 876544. Throughput: 0: 992.7. Samples: 219062. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-21 13:09:16,653][02089] Avg episode reward: [(0, '5.167')] [2024-12-21 13:09:16,657][04429] Saving new best policy, reward=5.167! [2024-12-21 13:09:21,652][02089] Fps is (10 sec: 3685.9, 60 sec: 3822.8, 300 sec: 3555.3). Total num frames: 888832. Throughput: 0: 995.7. Samples: 221722. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-21 13:09:21,659][02089] Avg episode reward: [(0, '5.343')] [2024-12-21 13:09:21,667][04429] Saving new best policy, reward=5.343! [2024-12-21 13:09:24,804][04442] Updated weights for policy 0, policy_version 220 (0.0032) [2024-12-21 13:09:26,650][02089] Fps is (10 sec: 3276.9, 60 sec: 3822.9, 300 sec: 3565.9). Total num frames: 909312. Throughput: 0: 933.2. Samples: 225938. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-21 13:09:26,654][02089] Avg episode reward: [(0, '5.234')] [2024-12-21 13:09:31,651][02089] Fps is (10 sec: 4096.5, 60 sec: 3891.2, 300 sec: 3576.1). Total num frames: 929792. Throughput: 0: 969.0. Samples: 233024. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-21 13:09:31,655][02089] Avg episode reward: [(0, '5.326')] [2024-12-21 13:09:33,555][04442] Updated weights for policy 0, policy_version 230 (0.0020) [2024-12-21 13:09:36,651][02089] Fps is (10 sec: 4096.0, 60 sec: 3891.3, 300 sec: 3585.9). Total num frames: 950272. Throughput: 0: 1001.1. Samples: 236468. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-21 13:09:36,657][02089] Avg episode reward: [(0, '5.220')] [2024-12-21 13:09:41,651][02089] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3580.2). Total num frames: 966656. Throughput: 0: 960.8. Samples: 241012. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-21 13:09:41,653][02089] Avg episode reward: [(0, '5.122')] [2024-12-21 13:09:45,068][04442] Updated weights for policy 0, policy_version 240 (0.0033) [2024-12-21 13:09:46,650][02089] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3589.6). Total num frames: 987136. Throughput: 0: 950.5. Samples: 247178. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-21 13:09:46,653][02089] Avg episode reward: [(0, '4.885')] [2024-12-21 13:09:51,652][02089] Fps is (10 sec: 4504.8, 60 sec: 4027.6, 300 sec: 3613.2). Total num frames: 1011712. Throughput: 0: 977.7. Samples: 250500. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-21 13:09:51,658][02089] Avg episode reward: [(0, '5.059')] [2024-12-21 13:09:56,651][02089] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3578.6). Total num frames: 1019904. Throughput: 0: 964.7. Samples: 255418. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-21 13:09:56,664][02089] Avg episode reward: [(0, '5.079')] [2024-12-21 13:09:56,928][04442] Updated weights for policy 0, policy_version 250 (0.0018) [2024-12-21 13:10:01,651][02089] Fps is (10 sec: 2048.3, 60 sec: 3618.1, 300 sec: 3559.3). Total num frames: 1032192. Throughput: 0: 884.4. Samples: 258860. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-21 13:10:01,653][02089] Avg episode reward: [(0, '5.231')] [2024-12-21 13:10:06,651][02089] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3568.4). Total num frames: 1052672. Throughput: 0: 881.9. Samples: 261406. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-21 13:10:06,655][02089] Avg episode reward: [(0, '4.993')] [2024-12-21 13:10:08,413][04442] Updated weights for policy 0, policy_version 260 (0.0025) [2024-12-21 13:10:11,651][02089] Fps is (10 sec: 4505.8, 60 sec: 3754.7, 300 sec: 3651.7). Total num frames: 1077248. Throughput: 0: 941.7. Samples: 268314. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-21 13:10:11,653][02089] Avg episode reward: [(0, '4.861')] [2024-12-21 13:10:16,651][02089] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3707.2). Total num frames: 1093632. Throughput: 0: 904.4. Samples: 273724. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-21 13:10:16,656][02089] Avg episode reward: [(0, '4.945')] [2024-12-21 13:10:20,275][04442] Updated weights for policy 0, policy_version 270 (0.0018) [2024-12-21 13:10:21,651][02089] Fps is (10 sec: 3276.8, 60 sec: 3686.5, 300 sec: 3762.8). Total num frames: 1110016. Throughput: 0: 873.0. Samples: 275754. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-21 13:10:21,653][02089] Avg episode reward: [(0, '5.283')] [2024-12-21 13:10:26,651][02089] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3776.7). Total num frames: 1134592. Throughput: 0: 920.6. Samples: 282438. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-21 13:10:26,655][02089] Avg episode reward: [(0, '5.577')] [2024-12-21 13:10:26,658][04429] Saving new best policy, reward=5.577! [2024-12-21 13:10:29,199][04442] Updated weights for policy 0, policy_version 280 (0.0031) [2024-12-21 13:10:31,653][02089] Fps is (10 sec: 4504.7, 60 sec: 3754.5, 300 sec: 3790.5). Total num frames: 1155072. Throughput: 0: 924.4. Samples: 288778. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-21 13:10:31,659][02089] Avg episode reward: [(0, '5.511')] [2024-12-21 13:10:36,653][02089] Fps is (10 sec: 3276.0, 60 sec: 3618.0, 300 sec: 3762.7). Total num frames: 1167360. Throughput: 0: 898.3. Samples: 290926. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-21 13:10:36,655][02089] Avg episode reward: [(0, '5.379')] [2024-12-21 13:10:40,781][04442] Updated weights for policy 0, policy_version 290 (0.0019) [2024-12-21 13:10:41,651][02089] Fps is (10 sec: 3687.1, 60 sec: 3754.7, 300 sec: 3776.7). Total num frames: 1191936. Throughput: 0: 912.2. Samples: 296468. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-21 13:10:41,655][02089] Avg episode reward: [(0, '5.203')] [2024-12-21 13:10:41,664][04429] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000291_1191936.pth... [2024-12-21 13:10:41,793][04429] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000071_290816.pth [2024-12-21 13:10:46,651][02089] Fps is (10 sec: 4506.7, 60 sec: 3754.7, 300 sec: 3790.5). Total num frames: 1212416. Throughput: 0: 991.7. Samples: 303486. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-21 13:10:46,656][02089] Avg episode reward: [(0, '5.729')] [2024-12-21 13:10:46,660][04429] Saving new best policy, reward=5.729! [2024-12-21 13:10:51,304][04442] Updated weights for policy 0, policy_version 300 (0.0033) [2024-12-21 13:10:51,651][02089] Fps is (10 sec: 3686.4, 60 sec: 3618.2, 300 sec: 3790.5). Total num frames: 1228800. Throughput: 0: 990.9. Samples: 305996. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-21 13:10:51,655][02089] Avg episode reward: [(0, '6.153')] [2024-12-21 13:10:51,666][04429] Saving new best policy, reward=6.153! [2024-12-21 13:10:56,651][02089] Fps is (10 sec: 3276.7, 60 sec: 3754.7, 300 sec: 3762.8). Total num frames: 1245184. Throughput: 0: 933.4. Samples: 310318. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-21 13:10:56,653][02089] Avg episode reward: [(0, '6.416')] [2024-12-21 13:10:56,661][04429] Saving new best policy, reward=6.416! [2024-12-21 13:11:01,336][04442] Updated weights for policy 0, policy_version 310 (0.0027) [2024-12-21 13:11:01,651][02089] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3776.7). Total num frames: 1269760. Throughput: 0: 967.9. Samples: 317278. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-21 13:11:01,658][02089] Avg episode reward: [(0, '6.091')] [2024-12-21 13:11:06,651][02089] Fps is (10 sec: 4505.7, 60 sec: 3959.5, 300 sec: 3790.5). Total num frames: 1290240. Throughput: 0: 1001.7. Samples: 320830. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-21 13:11:06,655][02089] Avg episode reward: [(0, '6.308')] [2024-12-21 13:11:11,660][02089] Fps is (10 sec: 3273.8, 60 sec: 3754.1, 300 sec: 3762.7). Total num frames: 1302528. Throughput: 0: 955.7. Samples: 325454. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-12-21 13:11:11,663][02089] Avg episode reward: [(0, '6.546')] [2024-12-21 13:11:11,676][04429] Saving new best policy, reward=6.546! [2024-12-21 13:11:12,849][04442] Updated weights for policy 0, policy_version 320 (0.0035) [2024-12-21 13:11:16,651][02089] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3776.7). Total num frames: 1327104. Throughput: 0: 952.2. Samples: 331626. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-21 13:11:16,653][02089] Avg episode reward: [(0, '6.799')] [2024-12-21 13:11:16,659][04429] Saving new best policy, reward=6.799! [2024-12-21 13:11:21,651][02089] Fps is (10 sec: 4509.7, 60 sec: 3959.5, 300 sec: 3790.5). Total num frames: 1347584. Throughput: 0: 979.4. Samples: 334996. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-21 13:11:21,656][02089] Avg episode reward: [(0, '6.661')] [2024-12-21 13:11:22,124][04442] Updated weights for policy 0, policy_version 330 (0.0029) [2024-12-21 13:11:26,652][02089] Fps is (10 sec: 3685.7, 60 sec: 3822.8, 300 sec: 3776.6). Total num frames: 1363968. Throughput: 0: 976.6. Samples: 340418. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-21 13:11:26,658][02089] Avg episode reward: [(0, '6.705')] [2024-12-21 13:11:31,651][02089] Fps is (10 sec: 3276.8, 60 sec: 3754.8, 300 sec: 3762.8). Total num frames: 1380352. Throughput: 0: 933.4. Samples: 345488. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-21 13:11:31,655][02089] Avg episode reward: [(0, '6.799')] [2024-12-21 13:11:33,685][04442] Updated weights for policy 0, policy_version 340 (0.0027) [2024-12-21 13:11:36,651][02089] Fps is (10 sec: 4096.7, 60 sec: 3959.6, 300 sec: 3776.7). Total num frames: 1404928. Throughput: 0: 953.7. Samples: 348914. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-21 13:11:36,655][02089] Avg episode reward: [(0, '6.699')] [2024-12-21 13:11:41,651][02089] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 1421312. Throughput: 0: 1006.4. Samples: 355606. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-21 13:11:41,653][02089] Avg episode reward: [(0, '7.332')] [2024-12-21 13:11:41,664][04429] Saving new best policy, reward=7.332! [2024-12-21 13:11:44,732][04442] Updated weights for policy 0, policy_version 350 (0.0025) [2024-12-21 13:11:46,651][02089] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3762.8). Total num frames: 1437696. Throughput: 0: 944.8. Samples: 359794. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-21 13:11:46,653][02089] Avg episode reward: [(0, '7.116')] [2024-12-21 13:11:51,651][02089] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3762.8). Total num frames: 1458176. Throughput: 0: 937.9. Samples: 363036. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-21 13:11:51,653][02089] Avg episode reward: [(0, '6.966')] [2024-12-21 13:11:54,340][04442] Updated weights for policy 0, policy_version 360 (0.0021) [2024-12-21 13:11:56,651][02089] Fps is (10 sec: 4505.2, 60 sec: 3959.4, 300 sec: 3804.4). Total num frames: 1482752. Throughput: 0: 986.1. Samples: 369820. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-21 13:11:56,656][02089] Avg episode reward: [(0, '6.730')] [2024-12-21 13:12:01,651][02089] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3818.3). Total num frames: 1499136. Throughput: 0: 956.7. Samples: 374678. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-21 13:12:01,655][02089] Avg episode reward: [(0, '6.928')] [2024-12-21 13:12:05,759][04442] Updated weights for policy 0, policy_version 370 (0.0017) [2024-12-21 13:12:06,651][02089] Fps is (10 sec: 3686.7, 60 sec: 3822.9, 300 sec: 3818.3). Total num frames: 1519616. Throughput: 0: 935.2. Samples: 377078. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-21 13:12:06,653][02089] Avg episode reward: [(0, '7.321')] [2024-12-21 13:12:11,651][02089] Fps is (10 sec: 4096.0, 60 sec: 3960.1, 300 sec: 3818.3). Total num frames: 1540096. Throughput: 0: 972.2. Samples: 384164. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-21 13:12:11,653][02089] Avg episode reward: [(0, '7.786')] [2024-12-21 13:12:11,659][04429] Saving new best policy, reward=7.786! [2024-12-21 13:12:15,469][04442] Updated weights for policy 0, policy_version 380 (0.0032) [2024-12-21 13:12:16,651][02089] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3832.2). Total num frames: 1556480. Throughput: 0: 991.2. Samples: 390094. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-12-21 13:12:16,656][02089] Avg episode reward: [(0, '8.341')] [2024-12-21 13:12:16,661][04429] Saving new best policy, reward=8.341! [2024-12-21 13:12:21,651][02089] Fps is (10 sec: 3276.7, 60 sec: 3754.7, 300 sec: 3818.3). Total num frames: 1572864. Throughput: 0: 960.6. Samples: 392140. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-21 13:12:21,653][02089] Avg episode reward: [(0, '8.641')] [2024-12-21 13:12:21,660][04429] Saving new best policy, reward=8.641! [2024-12-21 13:12:26,395][04442] Updated weights for policy 0, policy_version 390 (0.0015) [2024-12-21 13:12:26,651][02089] Fps is (10 sec: 4096.0, 60 sec: 3891.3, 300 sec: 3818.3). Total num frames: 1597440. Throughput: 0: 943.9. Samples: 398082. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-21 13:12:26,653][02089] Avg episode reward: [(0, '8.420')] [2024-12-21 13:12:31,651][02089] Fps is (10 sec: 4505.7, 60 sec: 3959.5, 300 sec: 3832.2). Total num frames: 1617920. Throughput: 0: 1004.6. Samples: 405000. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-21 13:12:31,661][02089] Avg episode reward: [(0, '8.340')] [2024-12-21 13:12:36,653][02089] Fps is (10 sec: 3685.7, 60 sec: 3822.8, 300 sec: 3832.2). Total num frames: 1634304. Throughput: 0: 980.7. Samples: 407170. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-21 13:12:36,658][02089] Avg episode reward: [(0, '8.705')] [2024-12-21 13:12:36,663][04429] Saving new best policy, reward=8.705! [2024-12-21 13:12:37,950][04442] Updated weights for policy 0, policy_version 400 (0.0023) [2024-12-21 13:12:41,651][02089] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3832.2). Total num frames: 1654784. Throughput: 0: 940.7. Samples: 412150. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-21 13:12:41,653][02089] Avg episode reward: [(0, '9.085')] [2024-12-21 13:12:41,659][04429] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000404_1654784.pth... [2024-12-21 13:12:41,783][04429] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000181_741376.pth [2024-12-21 13:12:41,798][04429] Saving new best policy, reward=9.085! [2024-12-21 13:12:46,651][02089] Fps is (10 sec: 4096.8, 60 sec: 3959.5, 300 sec: 3818.3). Total num frames: 1675264. Throughput: 0: 985.4. Samples: 419022. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-21 13:12:46,654][02089] Avg episode reward: [(0, '9.015')] [2024-12-21 13:12:47,683][04442] Updated weights for policy 0, policy_version 410 (0.0030) [2024-12-21 13:12:51,651][02089] Fps is (10 sec: 3276.5, 60 sec: 3822.9, 300 sec: 3818.3). Total num frames: 1687552. Throughput: 0: 978.7. Samples: 421120. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-21 13:12:51,659][02089] Avg episode reward: [(0, '8.740')] [2024-12-21 13:12:56,651][02089] Fps is (10 sec: 2457.6, 60 sec: 3618.2, 300 sec: 3790.5). Total num frames: 1699840. Throughput: 0: 897.2. Samples: 424538. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-21 13:12:56,657][02089] Avg episode reward: [(0, '9.030')] [2024-12-21 13:13:01,554][04442] Updated weights for policy 0, policy_version 420 (0.0015) [2024-12-21 13:13:01,651][02089] Fps is (10 sec: 3277.1, 60 sec: 3686.4, 300 sec: 3790.5). Total num frames: 1720320. Throughput: 0: 881.7. Samples: 429770. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-21 13:13:01,654][02089] Avg episode reward: [(0, '9.148')] [2024-12-21 13:13:01,661][04429] Saving new best policy, reward=9.148! [2024-12-21 13:13:06,651][02089] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3790.5). Total num frames: 1740800. Throughput: 0: 914.6. Samples: 433298. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-21 13:13:06,653][02089] Avg episode reward: [(0, '10.247')] [2024-12-21 13:13:06,657][04429] Saving new best policy, reward=10.247! [2024-12-21 13:13:11,651][02089] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3804.4). Total num frames: 1757184. Throughput: 0: 920.6. Samples: 439508. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-21 13:13:11,653][02089] Avg episode reward: [(0, '11.028')] [2024-12-21 13:13:11,661][04429] Saving new best policy, reward=11.028! [2024-12-21 13:13:11,994][04442] Updated weights for policy 0, policy_version 430 (0.0016) [2024-12-21 13:13:16,651][02089] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3776.7). Total num frames: 1773568. Throughput: 0: 863.6. Samples: 443864. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-21 13:13:16,653][02089] Avg episode reward: [(0, '11.603')] [2024-12-21 13:13:16,676][04429] Saving new best policy, reward=11.603! [2024-12-21 13:13:21,651][02089] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3790.5). Total num frames: 1798144. Throughput: 0: 891.9. Samples: 447304. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-12-21 13:13:21,654][02089] Avg episode reward: [(0, '11.603')] [2024-12-21 13:13:21,995][04442] Updated weights for policy 0, policy_version 440 (0.0026) [2024-12-21 13:13:26,656][02089] Fps is (10 sec: 4503.2, 60 sec: 3686.1, 300 sec: 3804.4). Total num frames: 1818624. Throughput: 0: 936.0. Samples: 454274. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-21 13:13:26,658][02089] Avg episode reward: [(0, '12.364')] [2024-12-21 13:13:26,660][04429] Saving new best policy, reward=12.364! [2024-12-21 13:13:31,651][02089] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3790.6). Total num frames: 1835008. Throughput: 0: 884.4. Samples: 458818. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-21 13:13:31,657][02089] Avg episode reward: [(0, '11.531')] [2024-12-21 13:13:33,473][04442] Updated weights for policy 0, policy_version 450 (0.0022) [2024-12-21 13:13:36,651][02089] Fps is (10 sec: 3688.3, 60 sec: 3686.5, 300 sec: 3776.7). Total num frames: 1855488. Throughput: 0: 898.4. Samples: 461548. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-12-21 13:13:36,658][02089] Avg episode reward: [(0, '11.508')] [2024-12-21 13:13:41,651][02089] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3804.4). Total num frames: 1880064. Throughput: 0: 980.0. Samples: 468638. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-21 13:13:41,653][02089] Avg episode reward: [(0, '10.928')] [2024-12-21 13:13:42,397][04442] Updated weights for policy 0, policy_version 460 (0.0024) [2024-12-21 13:13:46,651][02089] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3818.3). Total num frames: 1896448. Throughput: 0: 989.0. Samples: 474276. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-21 13:13:46,654][02089] Avg episode reward: [(0, '10.922')] [2024-12-21 13:13:51,651][02089] Fps is (10 sec: 3276.7, 60 sec: 3754.7, 300 sec: 3790.5). Total num frames: 1912832. Throughput: 0: 958.7. Samples: 476438. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-21 13:13:51,658][02089] Avg episode reward: [(0, '10.991')] [2024-12-21 13:13:53,831][04442] Updated weights for policy 0, policy_version 470 (0.0025) [2024-12-21 13:13:56,651][02089] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3804.4). Total num frames: 1937408. Throughput: 0: 967.4. Samples: 483040. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-21 13:13:56,655][02089] Avg episode reward: [(0, '10.941')] [2024-12-21 13:14:01,651][02089] Fps is (10 sec: 4505.8, 60 sec: 3959.5, 300 sec: 3818.3). Total num frames: 1957888. Throughput: 0: 1017.2. Samples: 489640. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-21 13:14:01,658][02089] Avg episode reward: [(0, '11.616')] [2024-12-21 13:14:04,228][04442] Updated weights for policy 0, policy_version 480 (0.0023) [2024-12-21 13:14:06,651][02089] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 1970176. Throughput: 0: 986.9. Samples: 491716. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-21 13:14:06,658][02089] Avg episode reward: [(0, '12.185')] [2024-12-21 13:14:11,651][02089] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3790.5). Total num frames: 1994752. Throughput: 0: 954.5. Samples: 497222. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-21 13:14:11,653][02089] Avg episode reward: [(0, '13.573')] [2024-12-21 13:14:11,664][04429] Saving new best policy, reward=13.573! [2024-12-21 13:14:14,186][04442] Updated weights for policy 0, policy_version 490 (0.0021) [2024-12-21 13:14:16,651][02089] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3818.3). Total num frames: 2015232. Throughput: 0: 1008.8. Samples: 504216. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-21 13:14:16,653][02089] Avg episode reward: [(0, '14.389')] [2024-12-21 13:14:16,656][04429] Saving new best policy, reward=14.389! [2024-12-21 13:14:21,651][02089] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3804.4). Total num frames: 2031616. Throughput: 0: 1004.0. Samples: 506730. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-21 13:14:21,657][02089] Avg episode reward: [(0, '14.509')] [2024-12-21 13:14:21,675][04429] Saving new best policy, reward=14.509! [2024-12-21 13:14:26,143][04442] Updated weights for policy 0, policy_version 500 (0.0035) [2024-12-21 13:14:26,651][02089] Fps is (10 sec: 3276.8, 60 sec: 3823.3, 300 sec: 3790.5). Total num frames: 2048000. Throughput: 0: 940.0. Samples: 510936. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-21 13:14:26,653][02089] Avg episode reward: [(0, '14.418')] [2024-12-21 13:14:31,651][02089] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3804.4). Total num frames: 2072576. Throughput: 0: 970.5. Samples: 517948. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-21 13:14:31,653][02089] Avg episode reward: [(0, '14.793')] [2024-12-21 13:14:31,661][04429] Saving new best policy, reward=14.793! [2024-12-21 13:14:35,395][04442] Updated weights for policy 0, policy_version 510 (0.0013) [2024-12-21 13:14:36,651][02089] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3804.4). Total num frames: 2088960. Throughput: 0: 998.9. Samples: 521388. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-21 13:14:36,655][02089] Avg episode reward: [(0, '15.020')] [2024-12-21 13:14:36,659][04429] Saving new best policy, reward=15.020! [2024-12-21 13:14:41,651][02089] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3790.5). Total num frames: 2105344. Throughput: 0: 949.6. Samples: 525774. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-21 13:14:41,657][02089] Avg episode reward: [(0, '15.257')] [2024-12-21 13:14:41,672][04429] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000514_2105344.pth... [2024-12-21 13:14:41,800][04429] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000291_1191936.pth [2024-12-21 13:14:41,814][04429] Saving new best policy, reward=15.257! [2024-12-21 13:14:46,651][02089] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3776.7). Total num frames: 2125824. Throughput: 0: 938.4. Samples: 531868. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-21 13:14:46,653][02089] Avg episode reward: [(0, '14.796')] [2024-12-21 13:14:46,952][04442] Updated weights for policy 0, policy_version 520 (0.0021) [2024-12-21 13:14:51,651][02089] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3832.2). Total num frames: 2150400. Throughput: 0: 969.2. Samples: 535328. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-21 13:14:51,660][02089] Avg episode reward: [(0, '13.730')] [2024-12-21 13:14:56,650][02089] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3832.2). Total num frames: 2162688. Throughput: 0: 969.2. Samples: 540836. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-21 13:14:56,655][02089] Avg episode reward: [(0, '14.116')] [2024-12-21 13:14:58,411][04442] Updated weights for policy 0, policy_version 530 (0.0017) [2024-12-21 13:15:01,651][02089] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3832.2). Total num frames: 2183168. Throughput: 0: 930.5. Samples: 546088. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-21 13:15:01,653][02089] Avg episode reward: [(0, '14.397')] [2024-12-21 13:15:06,651][02089] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3832.2). Total num frames: 2207744. Throughput: 0: 953.1. Samples: 549620. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-21 13:15:06,656][02089] Avg episode reward: [(0, '14.834')] [2024-12-21 13:15:07,123][04442] Updated weights for policy 0, policy_version 540 (0.0015) [2024-12-21 13:15:11,651][02089] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 2228224. Throughput: 0: 1010.4. Samples: 556406. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-21 13:15:11,653][02089] Avg episode reward: [(0, '15.354')] [2024-12-21 13:15:11,663][04429] Saving new best policy, reward=15.354! [2024-12-21 13:15:16,651][02089] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3832.2). Total num frames: 2240512. Throughput: 0: 948.2. Samples: 560616. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-21 13:15:16,653][02089] Avg episode reward: [(0, '14.526')] [2024-12-21 13:15:18,731][04442] Updated weights for policy 0, policy_version 550 (0.0022) [2024-12-21 13:15:21,651][02089] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3832.2). Total num frames: 2265088. Throughput: 0: 944.0. Samples: 563870. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-21 13:15:21,653][02089] Avg episode reward: [(0, '14.317')] [2024-12-21 13:15:26,651][02089] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3832.2). Total num frames: 2285568. Throughput: 0: 1000.0. Samples: 570774. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-21 13:15:26,657][02089] Avg episode reward: [(0, '15.262')] [2024-12-21 13:15:28,436][04442] Updated weights for policy 0, policy_version 560 (0.0030) [2024-12-21 13:15:31,652][02089] Fps is (10 sec: 3685.9, 60 sec: 3822.8, 300 sec: 3846.1). Total num frames: 2301952. Throughput: 0: 969.8. Samples: 575512. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-21 13:15:31,659][02089] Avg episode reward: [(0, '15.964')] [2024-12-21 13:15:31,669][04429] Saving new best policy, reward=15.964! [2024-12-21 13:15:36,651][02089] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3818.3). Total num frames: 2318336. Throughput: 0: 944.4. Samples: 577828. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-21 13:15:36,653][02089] Avg episode reward: [(0, '15.729')] [2024-12-21 13:15:39,505][04442] Updated weights for policy 0, policy_version 570 (0.0030) [2024-12-21 13:15:41,658][02089] Fps is (10 sec: 4093.6, 60 sec: 3959.0, 300 sec: 3832.1). Total num frames: 2342912. Throughput: 0: 973.8. Samples: 584664. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-21 13:15:41,665][02089] Avg episode reward: [(0, '15.435')] [2024-12-21 13:15:46,651][02089] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3818.3). Total num frames: 2355200. Throughput: 0: 958.5. Samples: 589220. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-21 13:15:46,653][02089] Avg episode reward: [(0, '16.851')] [2024-12-21 13:15:46,659][04429] Saving new best policy, reward=16.851! [2024-12-21 13:15:51,651][02089] Fps is (10 sec: 2459.4, 60 sec: 3618.1, 300 sec: 3804.4). Total num frames: 2367488. Throughput: 0: 916.8. Samples: 590878. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-21 13:15:51,654][02089] Avg episode reward: [(0, '15.926')] [2024-12-21 13:15:54,028][04442] Updated weights for policy 0, policy_version 580 (0.0019) [2024-12-21 13:15:56,651][02089] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3790.5). Total num frames: 2387968. Throughput: 0: 870.2. Samples: 595564. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-21 13:15:56,653][02089] Avg episode reward: [(0, '15.993')] [2024-12-21 13:16:01,651][02089] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3790.5). Total num frames: 2408448. Throughput: 0: 933.5. Samples: 602624. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-21 13:16:01,653][02089] Avg episode reward: [(0, '16.110')] [2024-12-21 13:16:02,817][04442] Updated weights for policy 0, policy_version 590 (0.0019) [2024-12-21 13:16:06,654][02089] Fps is (10 sec: 3685.2, 60 sec: 3617.9, 300 sec: 3804.5). Total num frames: 2424832. Throughput: 0: 931.8. Samples: 605806. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-21 13:16:06,656][02089] Avg episode reward: [(0, '16.781')] [2024-12-21 13:16:11,651][02089] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3790.5). Total num frames: 2445312. Throughput: 0: 873.3. Samples: 610074. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-21 13:16:11,658][02089] Avg episode reward: [(0, '16.586')] [2024-12-21 13:16:14,178][04442] Updated weights for policy 0, policy_version 600 (0.0025) [2024-12-21 13:16:16,651][02089] Fps is (10 sec: 4097.4, 60 sec: 3754.7, 300 sec: 3790.5). Total num frames: 2465792. Throughput: 0: 924.3. Samples: 617106. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-21 13:16:16,657][02089] Avg episode reward: [(0, '17.190')] [2024-12-21 13:16:16,660][04429] Saving new best policy, reward=17.190! [2024-12-21 13:16:21,651][02089] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3804.4). Total num frames: 2486272. Throughput: 0: 949.7. Samples: 620566. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-12-21 13:16:21,657][02089] Avg episode reward: [(0, '17.535')] [2024-12-21 13:16:21,719][04429] Saving new best policy, reward=17.535! [2024-12-21 13:16:24,892][04442] Updated weights for policy 0, policy_version 610 (0.0028) [2024-12-21 13:16:26,652][02089] Fps is (10 sec: 3685.9, 60 sec: 3618.1, 300 sec: 3804.4). Total num frames: 2502656. Throughput: 0: 903.7. Samples: 625326. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-21 13:16:26,658][02089] Avg episode reward: [(0, '18.857')] [2024-12-21 13:16:26,660][04429] Saving new best policy, reward=18.857! [2024-12-21 13:16:31,651][02089] Fps is (10 sec: 3686.3, 60 sec: 3686.5, 300 sec: 3790.5). Total num frames: 2523136. Throughput: 0: 930.6. Samples: 631096. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-21 13:16:31,658][02089] Avg episode reward: [(0, '18.558')] [2024-12-21 13:16:34,492][04442] Updated weights for policy 0, policy_version 620 (0.0033) [2024-12-21 13:16:36,651][02089] Fps is (10 sec: 4506.2, 60 sec: 3822.9, 300 sec: 3818.3). Total num frames: 2547712. Throughput: 0: 971.8. Samples: 634608. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-21 13:16:36,653][02089] Avg episode reward: [(0, '19.826')] [2024-12-21 13:16:36,658][04429] Saving new best policy, reward=19.826! [2024-12-21 13:16:41,653][02089] Fps is (10 sec: 4095.2, 60 sec: 3686.7, 300 sec: 3818.3). Total num frames: 2564096. Throughput: 0: 999.1. Samples: 640524. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-21 13:16:41,661][02089] Avg episode reward: [(0, '19.759')] [2024-12-21 13:16:41,669][04429] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000626_2564096.pth... [2024-12-21 13:16:41,829][04429] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000404_1654784.pth [2024-12-21 13:16:46,152][04442] Updated weights for policy 0, policy_version 630 (0.0023) [2024-12-21 13:16:46,651][02089] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3804.4). Total num frames: 2580480. Throughput: 0: 947.3. Samples: 645252. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-21 13:16:46,653][02089] Avg episode reward: [(0, '19.322')] [2024-12-21 13:16:51,651][02089] Fps is (10 sec: 4096.9, 60 sec: 3959.5, 300 sec: 3804.4). Total num frames: 2605056. Throughput: 0: 955.5. Samples: 648800. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-21 13:16:51,653][02089] Avg episode reward: [(0, '19.121')] [2024-12-21 13:16:55,038][04442] Updated weights for policy 0, policy_version 640 (0.0033) [2024-12-21 13:16:56,651][02089] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3818.3). Total num frames: 2625536. Throughput: 0: 1015.3. Samples: 655762. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-21 13:16:56,653][02089] Avg episode reward: [(0, '19.066')] [2024-12-21 13:17:01,651][02089] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 2637824. Throughput: 0: 953.6. Samples: 660020. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-21 13:17:01,658][02089] Avg episode reward: [(0, '19.009')] [2024-12-21 13:17:06,386][04442] Updated weights for policy 0, policy_version 650 (0.0013) [2024-12-21 13:17:06,651][02089] Fps is (10 sec: 3686.4, 60 sec: 3959.7, 300 sec: 3804.4). Total num frames: 2662400. Throughput: 0: 945.2. Samples: 663100. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-21 13:17:06,656][02089] Avg episode reward: [(0, '19.652')] [2024-12-21 13:17:11,651][02089] Fps is (10 sec: 4505.5, 60 sec: 3959.4, 300 sec: 3818.3). Total num frames: 2682880. Throughput: 0: 996.0. Samples: 670146. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-21 13:17:11,653][02089] Avg episode reward: [(0, '19.061')] [2024-12-21 13:17:16,651][02089] Fps is (10 sec: 3686.3, 60 sec: 3891.2, 300 sec: 3818.3). Total num frames: 2699264. Throughput: 0: 985.5. Samples: 675442. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-21 13:17:16,658][02089] Avg episode reward: [(0, '19.493')] [2024-12-21 13:17:17,349][04442] Updated weights for policy 0, policy_version 660 (0.0035) [2024-12-21 13:17:21,651][02089] Fps is (10 sec: 3686.5, 60 sec: 3891.2, 300 sec: 3804.4). Total num frames: 2719744. Throughput: 0: 955.3. Samples: 677596. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-21 13:17:21,652][02089] Avg episode reward: [(0, '19.997')] [2024-12-21 13:17:21,669][04429] Saving new best policy, reward=19.997! [2024-12-21 13:17:26,651][02089] Fps is (10 sec: 4096.1, 60 sec: 3959.6, 300 sec: 3804.4). Total num frames: 2740224. Throughput: 0: 976.6. Samples: 684468. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-21 13:17:26,660][02089] Avg episode reward: [(0, '20.870')] [2024-12-21 13:17:26,662][04429] Saving new best policy, reward=20.870! [2024-12-21 13:17:26,933][04442] Updated weights for policy 0, policy_version 670 (0.0023) [2024-12-21 13:17:31,651][02089] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3818.3). Total num frames: 2760704. Throughput: 0: 1005.0. Samples: 690478. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-21 13:17:31,655][02089] Avg episode reward: [(0, '21.093')] [2024-12-21 13:17:31,668][04429] Saving new best policy, reward=21.093! [2024-12-21 13:17:36,651][02089] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3804.4). Total num frames: 2777088. Throughput: 0: 971.0. Samples: 692496. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-21 13:17:36,655][02089] Avg episode reward: [(0, '21.764')] [2024-12-21 13:17:36,660][04429] Saving new best policy, reward=21.764! [2024-12-21 13:17:38,500][04442] Updated weights for policy 0, policy_version 680 (0.0023) [2024-12-21 13:17:41,651][02089] Fps is (10 sec: 3686.3, 60 sec: 3891.3, 300 sec: 3804.4). Total num frames: 2797568. Throughput: 0: 947.0. Samples: 698378. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-21 13:17:41,659][02089] Avg episode reward: [(0, '21.110')] [2024-12-21 13:17:46,651][02089] Fps is (10 sec: 4505.5, 60 sec: 4027.7, 300 sec: 3846.1). Total num frames: 2822144. Throughput: 0: 1011.0. Samples: 705514. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-12-21 13:17:46,653][02089] Avg episode reward: [(0, '19.614')] [2024-12-21 13:17:47,958][04442] Updated weights for policy 0, policy_version 690 (0.0016) [2024-12-21 13:17:51,651][02089] Fps is (10 sec: 3686.5, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 2834432. Throughput: 0: 995.6. Samples: 707900. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-21 13:17:51,655][02089] Avg episode reward: [(0, '20.158')] [2024-12-21 13:17:56,651][02089] Fps is (10 sec: 3276.9, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 2854912. Throughput: 0: 948.7. Samples: 712836. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-21 13:17:56,654][02089] Avg episode reward: [(0, '20.088')] [2024-12-21 13:17:58,776][04442] Updated weights for policy 0, policy_version 700 (0.0016) [2024-12-21 13:18:01,651][02089] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3860.0). Total num frames: 2879488. Throughput: 0: 985.5. Samples: 719788. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-21 13:18:01,656][02089] Avg episode reward: [(0, '20.800')] [2024-12-21 13:18:06,652][02089] Fps is (10 sec: 4095.4, 60 sec: 3891.1, 300 sec: 3859.9). Total num frames: 2895872. Throughput: 0: 1011.4. Samples: 723112. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-21 13:18:06,655][02089] Avg episode reward: [(0, '21.599')] [2024-12-21 13:18:10,295][04442] Updated weights for policy 0, policy_version 710 (0.0016) [2024-12-21 13:18:11,651][02089] Fps is (10 sec: 3276.8, 60 sec: 3823.0, 300 sec: 3860.0). Total num frames: 2912256. Throughput: 0: 952.4. Samples: 727326. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-21 13:18:11,653][02089] Avg episode reward: [(0, '23.324')] [2024-12-21 13:18:11,660][04429] Saving new best policy, reward=23.324! [2024-12-21 13:18:16,651][02089] Fps is (10 sec: 4096.7, 60 sec: 3959.5, 300 sec: 3860.0). Total num frames: 2936832. Throughput: 0: 964.4. Samples: 733878. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-21 13:18:16,652][02089] Avg episode reward: [(0, '23.174')] [2024-12-21 13:18:19,275][04442] Updated weights for policy 0, policy_version 720 (0.0014) [2024-12-21 13:18:21,651][02089] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3860.0). Total num frames: 2957312. Throughput: 0: 997.8. Samples: 737396. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-21 13:18:21,656][02089] Avg episode reward: [(0, '24.064')] [2024-12-21 13:18:21,665][04429] Saving new best policy, reward=24.064! [2024-12-21 13:18:26,651][02089] Fps is (10 sec: 3276.7, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 2969600. Throughput: 0: 979.2. Samples: 742440. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-21 13:18:26,653][02089] Avg episode reward: [(0, '23.467')] [2024-12-21 13:18:30,879][04442] Updated weights for policy 0, policy_version 730 (0.0018) [2024-12-21 13:18:31,651][02089] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 2990080. Throughput: 0: 942.1. Samples: 747908. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-21 13:18:31,657][02089] Avg episode reward: [(0, '23.563')] [2024-12-21 13:18:36,651][02089] Fps is (10 sec: 4505.7, 60 sec: 3959.5, 300 sec: 3846.1). Total num frames: 3014656. Throughput: 0: 968.0. Samples: 751460. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-21 13:18:36,653][02089] Avg episode reward: [(0, '22.777')] [2024-12-21 13:18:41,652][02089] Fps is (10 sec: 3686.0, 60 sec: 3822.9, 300 sec: 3832.2). Total num frames: 3026944. Throughput: 0: 977.8. Samples: 756838. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-21 13:18:41,657][02089] Avg episode reward: [(0, '23.222')] [2024-12-21 13:18:41,665][04429] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000739_3026944.pth... [2024-12-21 13:18:41,869][04429] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000514_2105344.pth [2024-12-21 13:18:42,359][04442] Updated weights for policy 0, policy_version 740 (0.0021) [2024-12-21 13:18:46,652][02089] Fps is (10 sec: 2457.3, 60 sec: 3618.1, 300 sec: 3818.3). Total num frames: 3039232. Throughput: 0: 898.4. Samples: 760216. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-21 13:18:46,654][02089] Avg episode reward: [(0, '22.428')] [2024-12-21 13:18:51,651][02089] Fps is (10 sec: 3277.1, 60 sec: 3754.7, 300 sec: 3804.4). Total num frames: 3059712. Throughput: 0: 872.0. Samples: 762352. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-21 13:18:51,657][02089] Avg episode reward: [(0, '22.496')] [2024-12-21 13:18:54,002][04442] Updated weights for policy 0, policy_version 750 (0.0041) [2024-12-21 13:18:56,651][02089] Fps is (10 sec: 4096.6, 60 sec: 3754.7, 300 sec: 3804.4). Total num frames: 3080192. Throughput: 0: 935.9. Samples: 769440. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-21 13:18:56,653][02089] Avg episode reward: [(0, '22.760')] [2024-12-21 13:19:01,653][02089] Fps is (10 sec: 4095.1, 60 sec: 3686.3, 300 sec: 3832.2). Total num frames: 3100672. Throughput: 0: 921.6. Samples: 775352. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-21 13:19:01,659][02089] Avg episode reward: [(0, '22.509')] [2024-12-21 13:19:05,392][04442] Updated weights for policy 0, policy_version 760 (0.0034) [2024-12-21 13:19:06,651][02089] Fps is (10 sec: 3686.4, 60 sec: 3686.5, 300 sec: 3804.4). Total num frames: 3117056. Throughput: 0: 891.7. Samples: 777522. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-21 13:19:06,655][02089] Avg episode reward: [(0, '21.932')] [2024-12-21 13:19:11,651][02089] Fps is (10 sec: 3687.2, 60 sec: 3754.7, 300 sec: 3804.4). Total num frames: 3137536. Throughput: 0: 920.3. Samples: 783852. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-21 13:19:11,662][02089] Avg episode reward: [(0, '22.111')] [2024-12-21 13:19:14,198][04442] Updated weights for policy 0, policy_version 770 (0.0018) [2024-12-21 13:19:16,651][02089] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3832.2). Total num frames: 3162112. Throughput: 0: 956.8. Samples: 790966. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-12-21 13:19:16,653][02089] Avg episode reward: [(0, '21.730')] [2024-12-21 13:19:21,656][02089] Fps is (10 sec: 3684.5, 60 sec: 3617.8, 300 sec: 3818.2). Total num frames: 3174400. Throughput: 0: 923.4. Samples: 793020. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-21 13:19:21,659][02089] Avg episode reward: [(0, '21.948')] [2024-12-21 13:19:25,942][04442] Updated weights for policy 0, policy_version 780 (0.0039) [2024-12-21 13:19:26,652][02089] Fps is (10 sec: 3276.3, 60 sec: 3754.6, 300 sec: 3804.4). Total num frames: 3194880. Throughput: 0: 916.6. Samples: 798084. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-21 13:19:26,659][02089] Avg episode reward: [(0, '22.425')] [2024-12-21 13:19:31,651][02089] Fps is (10 sec: 4508.0, 60 sec: 3822.9, 300 sec: 3832.2). Total num frames: 3219456. Throughput: 0: 996.1. Samples: 805040. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-21 13:19:31,656][02089] Avg episode reward: [(0, '22.355')] [2024-12-21 13:19:35,637][04442] Updated weights for policy 0, policy_version 790 (0.0023) [2024-12-21 13:19:36,651][02089] Fps is (10 sec: 4096.6, 60 sec: 3686.4, 300 sec: 3832.2). Total num frames: 3235840. Throughput: 0: 1019.2. Samples: 808214. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-21 13:19:36,656][02089] Avg episode reward: [(0, '23.016')] [2024-12-21 13:19:41,651][02089] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3818.3). Total num frames: 3252224. Throughput: 0: 956.5. Samples: 812484. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-21 13:19:41,652][02089] Avg episode reward: [(0, '23.147')] [2024-12-21 13:19:46,039][04442] Updated weights for policy 0, policy_version 800 (0.0030) [2024-12-21 13:19:46,651][02089] Fps is (10 sec: 4096.0, 60 sec: 3959.6, 300 sec: 3818.3). Total num frames: 3276800. Throughput: 0: 980.3. Samples: 819462. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-21 13:19:46,653][02089] Avg episode reward: [(0, '23.383')] [2024-12-21 13:19:51,651][02089] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3846.1). Total num frames: 3297280. Throughput: 0: 1010.0. Samples: 822974. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-21 13:19:51,658][02089] Avg episode reward: [(0, '23.072')] [2024-12-21 13:19:56,651][02089] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3832.2). Total num frames: 3313664. Throughput: 0: 978.4. Samples: 827878. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-21 13:19:56,656][02089] Avg episode reward: [(0, '22.422')] [2024-12-21 13:19:57,760][04442] Updated weights for policy 0, policy_version 810 (0.0021) [2024-12-21 13:20:01,651][02089] Fps is (10 sec: 3686.4, 60 sec: 3891.3, 300 sec: 3818.3). Total num frames: 3334144. Throughput: 0: 948.9. Samples: 833666. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-21 13:20:01,658][02089] Avg episode reward: [(0, '23.699')] [2024-12-21 13:20:06,299][04442] Updated weights for policy 0, policy_version 820 (0.0013) [2024-12-21 13:20:06,651][02089] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3832.2). Total num frames: 3358720. Throughput: 0: 982.9. Samples: 837246. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-21 13:20:06,653][02089] Avg episode reward: [(0, '23.744')] [2024-12-21 13:20:11,651][02089] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3846.1). Total num frames: 3375104. Throughput: 0: 1008.6. Samples: 843468. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-21 13:20:11,658][02089] Avg episode reward: [(0, '24.413')] [2024-12-21 13:20:11,670][04429] Saving new best policy, reward=24.413! [2024-12-21 13:20:16,651][02089] Fps is (10 sec: 3276.7, 60 sec: 3822.9, 300 sec: 3818.3). Total num frames: 3391488. Throughput: 0: 957.5. Samples: 848130. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-21 13:20:16,655][02089] Avg episode reward: [(0, '24.077')] [2024-12-21 13:20:18,005][04442] Updated weights for policy 0, policy_version 830 (0.0015) [2024-12-21 13:20:21,651][02089] Fps is (10 sec: 4096.0, 60 sec: 4028.1, 300 sec: 3832.2). Total num frames: 3416064. Throughput: 0: 964.2. Samples: 851602. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-21 13:20:21,653][02089] Avg episode reward: [(0, '25.436')] [2024-12-21 13:20:21,668][04429] Saving new best policy, reward=25.436! [2024-12-21 13:20:26,651][02089] Fps is (10 sec: 4505.7, 60 sec: 4027.8, 300 sec: 3846.1). Total num frames: 3436544. Throughput: 0: 1026.5. Samples: 858678. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-21 13:20:26,653][02089] Avg episode reward: [(0, '25.542')] [2024-12-21 13:20:26,657][04429] Saving new best policy, reward=25.542! [2024-12-21 13:20:27,937][04442] Updated weights for policy 0, policy_version 840 (0.0017) [2024-12-21 13:20:31,651][02089] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3832.2). Total num frames: 3448832. Throughput: 0: 962.5. Samples: 862774. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-21 13:20:31,656][02089] Avg episode reward: [(0, '24.694')] [2024-12-21 13:20:36,651][02089] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3832.3). Total num frames: 3473408. Throughput: 0: 950.8. Samples: 865762. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-21 13:20:36,655][02089] Avg episode reward: [(0, '24.985')] [2024-12-21 13:20:38,209][04442] Updated weights for policy 0, policy_version 850 (0.0022) [2024-12-21 13:20:41,651][02089] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3860.0). Total num frames: 3493888. Throughput: 0: 997.4. Samples: 872760. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-21 13:20:41,653][02089] Avg episode reward: [(0, '27.069')] [2024-12-21 13:20:41,687][04429] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000854_3497984.pth... [2024-12-21 13:20:41,829][04429] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000626_2564096.pth [2024-12-21 13:20:41,846][04429] Saving new best policy, reward=27.069! [2024-12-21 13:20:46,651][02089] Fps is (10 sec: 3686.3, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 3510272. Throughput: 0: 985.4. Samples: 878008. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-21 13:20:46,654][02089] Avg episode reward: [(0, '26.860')] [2024-12-21 13:20:49,876][04442] Updated weights for policy 0, policy_version 860 (0.0018) [2024-12-21 13:20:51,650][02089] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 3530752. Throughput: 0: 951.7. Samples: 880072. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-21 13:20:51,652][02089] Avg episode reward: [(0, '25.681')] [2024-12-21 13:20:56,651][02089] Fps is (10 sec: 4096.1, 60 sec: 3959.5, 300 sec: 3873.8). Total num frames: 3551232. Throughput: 0: 964.5. Samples: 886872. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-21 13:20:56,652][02089] Avg episode reward: [(0, '25.962')] [2024-12-21 13:20:58,649][04442] Updated weights for policy 0, policy_version 870 (0.0025) [2024-12-21 13:21:01,653][02089] Fps is (10 sec: 4095.0, 60 sec: 3959.3, 300 sec: 3887.7). Total num frames: 3571712. Throughput: 0: 1000.4. Samples: 893148. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-21 13:21:01,656][02089] Avg episode reward: [(0, '26.209')] [2024-12-21 13:21:06,651][02089] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3860.0). Total num frames: 3584000. Throughput: 0: 969.9. Samples: 895246. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-21 13:21:06,657][02089] Avg episode reward: [(0, '25.261')] [2024-12-21 13:21:10,414][04442] Updated weights for policy 0, policy_version 880 (0.0019) [2024-12-21 13:21:11,651][02089] Fps is (10 sec: 3687.2, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 3608576. Throughput: 0: 940.7. Samples: 901008. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-21 13:21:11,652][02089] Avg episode reward: [(0, '25.801')] [2024-12-21 13:21:16,651][02089] Fps is (10 sec: 4915.2, 60 sec: 4027.8, 300 sec: 3887.7). Total num frames: 3633152. Throughput: 0: 1007.3. Samples: 908102. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-21 13:21:16,655][02089] Avg episode reward: [(0, '25.901')] [2024-12-21 13:21:20,858][04442] Updated weights for policy 0, policy_version 890 (0.0018) [2024-12-21 13:21:21,651][02089] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3873.9). Total num frames: 3645440. Throughput: 0: 997.3. Samples: 910640. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-21 13:21:21,657][02089] Avg episode reward: [(0, '24.794')] [2024-12-21 13:21:26,651][02089] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3873.8). Total num frames: 3665920. Throughput: 0: 946.3. Samples: 915342. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-21 13:21:26,658][02089] Avg episode reward: [(0, '25.052')] [2024-12-21 13:21:30,845][04442] Updated weights for policy 0, policy_version 900 (0.0016) [2024-12-21 13:21:31,651][02089] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3860.0). Total num frames: 3686400. Throughput: 0: 983.8. Samples: 922280. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-21 13:21:31,657][02089] Avg episode reward: [(0, '25.001')] [2024-12-21 13:21:36,651][02089] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3860.0). Total num frames: 3702784. Throughput: 0: 1000.7. Samples: 925102. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-21 13:21:36,657][02089] Avg episode reward: [(0, '25.418')] [2024-12-21 13:21:41,651][02089] Fps is (10 sec: 2867.2, 60 sec: 3686.4, 300 sec: 3846.1). Total num frames: 3715072. Throughput: 0: 925.7. Samples: 928528. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-21 13:21:41,656][02089] Avg episode reward: [(0, '24.349')] [2024-12-21 13:21:45,066][04442] Updated weights for policy 0, policy_version 910 (0.0031) [2024-12-21 13:21:46,651][02089] Fps is (10 sec: 2867.2, 60 sec: 3686.4, 300 sec: 3818.3). Total num frames: 3731456. Throughput: 0: 893.7. Samples: 933362. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-21 13:21:46,655][02089] Avg episode reward: [(0, '24.855')] [2024-12-21 13:21:51,651][02089] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3832.2). Total num frames: 3756032. Throughput: 0: 925.0. Samples: 936872. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-21 13:21:51,652][02089] Avg episode reward: [(0, '25.519')] [2024-12-21 13:21:53,870][04442] Updated weights for policy 0, policy_version 920 (0.0015) [2024-12-21 13:21:56,651][02089] Fps is (10 sec: 4505.5, 60 sec: 3754.7, 300 sec: 3860.0). Total num frames: 3776512. Throughput: 0: 943.5. Samples: 943466. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-21 13:21:56,659][02089] Avg episode reward: [(0, '25.111')] [2024-12-21 13:22:01,651][02089] Fps is (10 sec: 3276.8, 60 sec: 3618.3, 300 sec: 3818.3). Total num frames: 3788800. Throughput: 0: 880.8. Samples: 947738. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-21 13:22:01,652][02089] Avg episode reward: [(0, '24.334')] [2024-12-21 13:22:05,433][04442] Updated weights for policy 0, policy_version 930 (0.0026) [2024-12-21 13:22:06,651][02089] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3832.2). Total num frames: 3813376. Throughput: 0: 899.3. Samples: 951108. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-12-21 13:22:06,656][02089] Avg episode reward: [(0, '23.610')] [2024-12-21 13:22:11,651][02089] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3846.1). Total num frames: 3833856. Throughput: 0: 952.2. Samples: 958190. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-21 13:22:11,658][02089] Avg episode reward: [(0, '21.513')] [2024-12-21 13:22:16,012][04442] Updated weights for policy 0, policy_version 940 (0.0020) [2024-12-21 13:22:16,655][02089] Fps is (10 sec: 3684.8, 60 sec: 3617.9, 300 sec: 3832.1). Total num frames: 3850240. Throughput: 0: 907.0. Samples: 963100. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-21 13:22:16,658][02089] Avg episode reward: [(0, '21.820')] [2024-12-21 13:22:21,651][02089] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3832.2). Total num frames: 3870720. Throughput: 0: 896.5. Samples: 965444. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-21 13:22:21,652][02089] Avg episode reward: [(0, '23.100')] [2024-12-21 13:22:25,733][04442] Updated weights for policy 0, policy_version 950 (0.0017) [2024-12-21 13:22:26,651][02089] Fps is (10 sec: 4507.6, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 3895296. Throughput: 0: 976.1. Samples: 972454. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-21 13:22:26,653][02089] Avg episode reward: [(0, '23.432')] [2024-12-21 13:22:31,651][02089] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3846.1). Total num frames: 3911680. Throughput: 0: 998.9. Samples: 978314. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-21 13:22:31,656][02089] Avg episode reward: [(0, '24.270')] [2024-12-21 13:22:36,651][02089] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3832.2). Total num frames: 3928064. Throughput: 0: 967.5. Samples: 980410. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-21 13:22:36,659][02089] Avg episode reward: [(0, '24.854')] [2024-12-21 13:22:37,245][04442] Updated weights for policy 0, policy_version 960 (0.0020) [2024-12-21 13:22:41,651][02089] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3832.2). Total num frames: 3952640. Throughput: 0: 959.6. Samples: 986648. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-21 13:22:41,653][02089] Avg episode reward: [(0, '26.388')] [2024-12-21 13:22:41,662][04429] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000965_3952640.pth... [2024-12-21 13:22:41,784][04429] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000739_3026944.pth [2024-12-21 13:22:46,101][04442] Updated weights for policy 0, policy_version 970 (0.0036) [2024-12-21 13:22:46,651][02089] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3860.0). Total num frames: 3973120. Throughput: 0: 1019.8. Samples: 993628. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-21 13:22:46,653][02089] Avg episode reward: [(0, '25.888')] [2024-12-21 13:22:51,655][02089] Fps is (10 sec: 3275.4, 60 sec: 3822.7, 300 sec: 3832.1). Total num frames: 3985408. Throughput: 0: 992.5. Samples: 995774. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-21 13:22:51,657][02089] Avg episode reward: [(0, '25.927')] [2024-12-21 13:22:55,848][04429] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-12-21 13:22:55,848][02089] Component Batcher_0 stopped! [2024-12-21 13:22:55,848][04429] Stopping Batcher_0... [2024-12-21 13:22:55,858][04429] Loop batcher_evt_loop terminating... [2024-12-21 13:22:55,916][04442] Weights refcount: 2 0 [2024-12-21 13:22:55,920][04442] Stopping InferenceWorker_p0-w0... [2024-12-21 13:22:55,921][04442] Loop inference_proc0-0_evt_loop terminating... [2024-12-21 13:22:55,921][02089] Component InferenceWorker_p0-w0 stopped! [2024-12-21 13:22:55,980][04429] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000854_3497984.pth [2024-12-21 13:22:56,008][04429] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-12-21 13:22:56,190][02089] Component LearnerWorker_p0 stopped! [2024-12-21 13:22:56,192][04429] Stopping LearnerWorker_p0... [2024-12-21 13:22:56,193][04429] Loop learner_proc0_evt_loop terminating... [2024-12-21 13:22:56,230][04446] Stopping RolloutWorker_w3... [2024-12-21 13:22:56,230][02089] Component RolloutWorker_w3 stopped! [2024-12-21 13:22:56,232][04446] Loop rollout_proc3_evt_loop terminating... [2024-12-21 13:22:56,243][02089] Component RolloutWorker_w7 stopped! [2024-12-21 13:22:56,242][04450] Stopping RolloutWorker_w7... [2024-12-21 13:22:56,248][02089] Component RolloutWorker_w1 stopped! [2024-12-21 13:22:56,254][04443] Stopping RolloutWorker_w1... [2024-12-21 13:22:56,256][02089] Component RolloutWorker_w5 stopped! [2024-12-21 13:22:56,246][04450] Loop rollout_proc7_evt_loop terminating... [2024-12-21 13:22:56,260][04448] Stopping RolloutWorker_w5... [2024-12-21 13:22:56,255][04443] Loop rollout_proc1_evt_loop terminating... [2024-12-21 13:22:56,262][04448] Loop rollout_proc5_evt_loop terminating... [2024-12-21 13:22:56,272][04444] Stopping RolloutWorker_w0... [2024-12-21 13:22:56,272][02089] Component RolloutWorker_w0 stopped! [2024-12-21 13:22:56,280][04447] Stopping RolloutWorker_w4... [2024-12-21 13:22:56,283][04447] Loop rollout_proc4_evt_loop terminating... [2024-12-21 13:22:56,280][02089] Component RolloutWorker_w4 stopped! [2024-12-21 13:22:56,295][04444] Loop rollout_proc0_evt_loop terminating... [2024-12-21 13:22:56,313][04449] Stopping RolloutWorker_w6... [2024-12-21 13:22:56,316][04449] Loop rollout_proc6_evt_loop terminating... [2024-12-21 13:22:56,314][02089] Component RolloutWorker_w6 stopped! [2024-12-21 13:22:56,333][04445] Stopping RolloutWorker_w2... [2024-12-21 13:22:56,333][02089] Component RolloutWorker_w2 stopped! [2024-12-21 13:22:56,336][02089] Waiting for process learner_proc0 to stop... [2024-12-21 13:22:56,341][04445] Loop rollout_proc2_evt_loop terminating... [2024-12-21 13:22:57,891][02089] Waiting for process inference_proc0-0 to join... [2024-12-21 13:22:57,900][02089] Waiting for process rollout_proc0 to join... [2024-12-21 13:22:59,817][02089] Waiting for process rollout_proc1 to join... [2024-12-21 13:22:59,826][02089] Waiting for process rollout_proc2 to join... [2024-12-21 13:22:59,852][02089] Waiting for process rollout_proc3 to join... [2024-12-21 13:22:59,856][02089] Waiting for process rollout_proc4 to join... [2024-12-21 13:22:59,860][02089] Waiting for process rollout_proc5 to join... [2024-12-21 13:22:59,863][02089] Waiting for process rollout_proc6 to join... [2024-12-21 13:22:59,866][02089] Waiting for process rollout_proc7 to join... [2024-12-21 13:22:59,870][02089] Batcher 0 profile tree view: batching: 26.4837, releasing_batches: 0.0294 [2024-12-21 13:22:59,872][02089] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0000 wait_policy_total: 427.2956 update_model: 8.7427 weight_update: 0.0050 one_step: 0.0025 handle_policy_step: 584.2219 deserialize: 14.7512, stack: 3.1527, obs_to_device_normalize: 123.8642, forward: 292.5321, send_messages: 28.8992 prepare_outputs: 91.1658 to_cpu: 55.2367 [2024-12-21 13:22:59,874][02089] Learner 0 profile tree view: misc: 0.0048, prepare_batch: 14.0818 train: 74.1073 epoch_init: 0.0134, minibatch_init: 0.0128, losses_postprocess: 0.6504, kl_divergence: 0.6554, after_optimizer: 34.2816 calculate_losses: 25.9485 losses_init: 0.0037, forward_head: 1.3813, bptt_initial: 16.9322, tail: 1.1103, advantages_returns: 0.2894, losses: 3.9151 bptt: 1.9496 bptt_forward_core: 1.8604 update: 11.9060 clip: 0.8906 [2024-12-21 13:22:59,877][02089] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.3260, enqueue_policy_requests: 102.2030, env_step: 830.5280, overhead: 13.2464, complete_rollouts: 7.6569 save_policy_outputs: 20.7380 split_output_tensors: 8.4482 [2024-12-21 13:22:59,879][02089] RolloutWorker_w7 profile tree view: wait_for_trajectories: 0.2967, enqueue_policy_requests: 106.8665, env_step: 830.7750, overhead: 12.9086, complete_rollouts: 6.4772 save_policy_outputs: 20.5492 split_output_tensors: 8.3398 [2024-12-21 13:22:59,880][02089] Loop Runner_EvtLoop terminating... [2024-12-21 13:22:59,881][02089] Runner profile tree view: main_loop: 1092.5302 [2024-12-21 13:22:59,883][02089] Collected {0: 4005888}, FPS: 3666.6 [2024-12-21 13:23:23,986][02089] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-12-21 13:23:23,987][02089] Overriding arg 'num_workers' with value 1 passed from command line [2024-12-21 13:23:23,990][02089] Adding new argument 'no_render'=True that is not in the saved config file! [2024-12-21 13:23:23,992][02089] Adding new argument 'save_video'=True that is not in the saved config file! [2024-12-21 13:23:23,994][02089] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-12-21 13:23:23,996][02089] Adding new argument 'video_name'=None that is not in the saved config file! [2024-12-21 13:23:23,997][02089] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! [2024-12-21 13:23:23,998][02089] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-12-21 13:23:24,000][02089] Adding new argument 'push_to_hub'=False that is not in the saved config file! [2024-12-21 13:23:24,001][02089] Adding new argument 'hf_repository'=None that is not in the saved config file! [2024-12-21 13:23:24,002][02089] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-12-21 13:23:24,003][02089] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-12-21 13:23:24,004][02089] Adding new argument 'train_script'=None that is not in the saved config file! [2024-12-21 13:23:24,005][02089] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-12-21 13:23:24,006][02089] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-12-21 13:23:24,041][02089] Doom resolution: 160x120, resize resolution: (128, 72) [2024-12-21 13:23:24,044][02089] RunningMeanStd input shape: (3, 72, 128) [2024-12-21 13:23:24,046][02089] RunningMeanStd input shape: (1,) [2024-12-21 13:23:24,061][02089] ConvEncoder: input_channels=3 [2024-12-21 13:23:24,166][02089] Conv encoder output size: 512 [2024-12-21 13:23:24,169][02089] Policy head output size: 512 [2024-12-21 13:23:24,452][02089] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-12-21 13:23:25,251][02089] Num frames 100... [2024-12-21 13:23:25,384][02089] Num frames 200... [2024-12-21 13:23:25,511][02089] Num frames 300... [2024-12-21 13:23:25,635][02089] Num frames 400... [2024-12-21 13:23:25,754][02089] Num frames 500... [2024-12-21 13:23:25,873][02089] Num frames 600... [2024-12-21 13:23:26,028][02089] Num frames 700... [2024-12-21 13:23:26,204][02089] Num frames 800... [2024-12-21 13:23:26,383][02089] Num frames 900... [2024-12-21 13:23:26,565][02089] Num frames 1000... [2024-12-21 13:23:26,727][02089] Num frames 1100... [2024-12-21 13:23:26,895][02089] Num frames 1200... [2024-12-21 13:23:27,060][02089] Num frames 1300... [2024-12-21 13:23:27,227][02089] Num frames 1400... [2024-12-21 13:23:27,398][02089] Num frames 1500... [2024-12-21 13:23:27,480][02089] Avg episode rewards: #0: 41.130, true rewards: #0: 15.130 [2024-12-21 13:23:27,482][02089] Avg episode reward: 41.130, avg true_objective: 15.130 [2024-12-21 13:23:27,644][02089] Num frames 1600... [2024-12-21 13:23:27,814][02089] Num frames 1700... [2024-12-21 13:23:27,987][02089] Num frames 1800... [2024-12-21 13:23:28,172][02089] Num frames 1900... [2024-12-21 13:23:28,353][02089] Num frames 2000... [2024-12-21 13:23:28,537][02089] Num frames 2100... [2024-12-21 13:23:28,663][02089] Num frames 2200... [2024-12-21 13:23:28,788][02089] Num frames 2300... [2024-12-21 13:23:28,910][02089] Num frames 2400... [2024-12-21 13:23:29,032][02089] Num frames 2500... [2024-12-21 13:23:29,152][02089] Num frames 2600... [2024-12-21 13:23:29,278][02089] Num frames 2700... [2024-12-21 13:23:29,402][02089] Num frames 2800... [2024-12-21 13:23:29,540][02089] Num frames 2900... [2024-12-21 13:23:29,668][02089] Num frames 3000... [2024-12-21 13:23:29,794][02089] Num frames 3100... [2024-12-21 13:23:29,924][02089] Num frames 3200... [2024-12-21 13:23:30,048][02089] Num frames 3300... [2024-12-21 13:23:30,172][02089] Num frames 3400... [2024-12-21 13:23:30,299][02089] Num frames 3500... [2024-12-21 13:23:30,424][02089] Num frames 3600... [2024-12-21 13:23:30,499][02089] Avg episode rewards: #0: 48.564, true rewards: #0: 18.065 [2024-12-21 13:23:30,501][02089] Avg episode reward: 48.564, avg true_objective: 18.065 [2024-12-21 13:23:30,622][02089] Num frames 3700... [2024-12-21 13:23:30,746][02089] Num frames 3800... [2024-12-21 13:23:30,868][02089] Num frames 3900... [2024-12-21 13:23:30,990][02089] Num frames 4000... [2024-12-21 13:23:31,136][02089] Num frames 4100... [2024-12-21 13:23:31,270][02089] Num frames 4200... [2024-12-21 13:23:31,396][02089] Num frames 4300... [2024-12-21 13:23:31,523][02089] Num frames 4400... [2024-12-21 13:23:31,658][02089] Num frames 4500... [2024-12-21 13:23:31,778][02089] Num frames 4600... [2024-12-21 13:23:31,911][02089] Avg episode rewards: #0: 39.203, true rewards: #0: 15.537 [2024-12-21 13:23:31,913][02089] Avg episode reward: 39.203, avg true_objective: 15.537 [2024-12-21 13:23:31,963][02089] Num frames 4700... [2024-12-21 13:23:32,083][02089] Num frames 4800... [2024-12-21 13:23:32,205][02089] Num frames 4900... [2024-12-21 13:23:32,331][02089] Num frames 5000... [2024-12-21 13:23:32,452][02089] Num frames 5100... [2024-12-21 13:23:32,593][02089] Num frames 5200... [2024-12-21 13:23:32,657][02089] Avg episode rewards: #0: 31.262, true rewards: #0: 13.012 [2024-12-21 13:23:32,659][02089] Avg episode reward: 31.262, avg true_objective: 13.012 [2024-12-21 13:23:32,773][02089] Num frames 5300... [2024-12-21 13:23:32,894][02089] Num frames 5400... [2024-12-21 13:23:33,014][02089] Num frames 5500... [2024-12-21 13:23:33,143][02089] Num frames 5600... [2024-12-21 13:23:33,267][02089] Num frames 5700... [2024-12-21 13:23:33,388][02089] Num frames 5800... [2024-12-21 13:23:33,511][02089] Num frames 5900... [2024-12-21 13:23:33,583][02089] Avg episode rewards: #0: 27.818, true rewards: #0: 11.818 [2024-12-21 13:23:33,585][02089] Avg episode reward: 27.818, avg true_objective: 11.818 [2024-12-21 13:23:33,701][02089] Num frames 6000... [2024-12-21 13:23:33,821][02089] Num frames 6100... [2024-12-21 13:23:33,941][02089] Num frames 6200... [2024-12-21 13:23:34,107][02089] Avg episode rewards: #0: 23.821, true rewards: #0: 10.488 [2024-12-21 13:23:34,109][02089] Avg episode reward: 23.821, avg true_objective: 10.488 [2024-12-21 13:23:34,120][02089] Num frames 6300... [2024-12-21 13:23:34,243][02089] Num frames 6400... [2024-12-21 13:23:34,364][02089] Num frames 6500... [2024-12-21 13:23:34,485][02089] Num frames 6600... [2024-12-21 13:23:34,616][02089] Num frames 6700... [2024-12-21 13:23:34,747][02089] Num frames 6800... [2024-12-21 13:23:34,868][02089] Num frames 6900... [2024-12-21 13:23:34,991][02089] Num frames 7000... [2024-12-21 13:23:35,116][02089] Num frames 7100... [2024-12-21 13:23:35,238][02089] Num frames 7200... [2024-12-21 13:23:35,359][02089] Num frames 7300... [2024-12-21 13:23:35,480][02089] Num frames 7400... [2024-12-21 13:23:35,606][02089] Num frames 7500... [2024-12-21 13:23:35,738][02089] Num frames 7600... [2024-12-21 13:23:35,855][02089] Num frames 7700... [2024-12-21 13:23:35,976][02089] Num frames 7800... [2024-12-21 13:23:36,099][02089] Num frames 7900... [2024-12-21 13:23:36,226][02089] Avg episode rewards: #0: 26.653, true rewards: #0: 11.367 [2024-12-21 13:23:36,228][02089] Avg episode reward: 26.653, avg true_objective: 11.367 [2024-12-21 13:23:36,283][02089] Num frames 8000... [2024-12-21 13:23:36,409][02089] Num frames 8100... [2024-12-21 13:23:36,535][02089] Num frames 8200... [2024-12-21 13:23:36,666][02089] Num frames 8300... [2024-12-21 13:23:36,795][02089] Num frames 8400... [2024-12-21 13:23:36,916][02089] Num frames 8500... [2024-12-21 13:23:37,043][02089] Num frames 8600... [2024-12-21 13:23:37,168][02089] Num frames 8700... [2024-12-21 13:23:37,292][02089] Num frames 8800... [2024-12-21 13:23:37,416][02089] Num frames 8900... [2024-12-21 13:23:37,543][02089] Num frames 9000... [2024-12-21 13:23:37,701][02089] Avg episode rewards: #0: 26.609, true rewards: #0: 11.359 [2024-12-21 13:23:37,703][02089] Avg episode reward: 26.609, avg true_objective: 11.359 [2024-12-21 13:23:37,727][02089] Num frames 9100... [2024-12-21 13:23:37,847][02089] Num frames 9200... [2024-12-21 13:23:37,963][02089] Num frames 9300... [2024-12-21 13:23:38,084][02089] Num frames 9400... [2024-12-21 13:23:38,213][02089] Avg episode rewards: #0: 24.401, true rewards: #0: 10.512 [2024-12-21 13:23:38,215][02089] Avg episode reward: 24.401, avg true_objective: 10.512 [2024-12-21 13:23:38,266][02089] Num frames 9500... [2024-12-21 13:23:38,390][02089] Num frames 9600... [2024-12-21 13:23:38,534][02089] Num frames 9700... [2024-12-21 13:23:38,714][02089] Num frames 9800... [2024-12-21 13:23:38,897][02089] Num frames 9900... [2024-12-21 13:23:39,064][02089] Num frames 10000... [2024-12-21 13:23:39,232][02089] Num frames 10100... [2024-12-21 13:23:39,398][02089] Num frames 10200... [2024-12-21 13:23:39,568][02089] Num frames 10300... [2024-12-21 13:23:39,737][02089] Num frames 10400... [2024-12-21 13:23:39,910][02089] Num frames 10500... [2024-12-21 13:23:40,088][02089] Num frames 10600... [2024-12-21 13:23:40,264][02089] Num frames 10700... [2024-12-21 13:23:40,446][02089] Num frames 10800... [2024-12-21 13:23:40,630][02089] Num frames 10900... [2024-12-21 13:23:40,691][02089] Avg episode rewards: #0: 25.701, true rewards: #0: 10.901 [2024-12-21 13:23:40,693][02089] Avg episode reward: 25.701, avg true_objective: 10.901 [2024-12-21 13:24:45,234][02089] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2024-12-21 13:30:11,250][02089] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-12-21 13:30:11,252][02089] Overriding arg 'num_workers' with value 1 passed from command line [2024-12-21 13:30:11,254][02089] Adding new argument 'no_render'=True that is not in the saved config file! [2024-12-21 13:30:11,256][02089] Adding new argument 'save_video'=True that is not in the saved config file! [2024-12-21 13:30:11,258][02089] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-12-21 13:30:11,261][02089] Adding new argument 'video_name'=None that is not in the saved config file! [2024-12-21 13:30:11,263][02089] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2024-12-21 13:30:11,265][02089] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-12-21 13:30:11,266][02089] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2024-12-21 13:30:11,268][02089] Adding new argument 'hf_repository'='husseinmo/vizdoom_health_gathering_supreme' that is not in the saved config file! [2024-12-21 13:30:11,269][02089] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-12-21 13:30:11,270][02089] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-12-21 13:30:11,271][02089] Adding new argument 'train_script'=None that is not in the saved config file! [2024-12-21 13:30:11,272][02089] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-12-21 13:30:11,273][02089] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-12-21 13:30:11,312][02089] RunningMeanStd input shape: (3, 72, 128) [2024-12-21 13:30:11,314][02089] RunningMeanStd input shape: (1,) [2024-12-21 13:30:11,333][02089] ConvEncoder: input_channels=3 [2024-12-21 13:30:11,395][02089] Conv encoder output size: 512 [2024-12-21 13:30:11,398][02089] Policy head output size: 512 [2024-12-21 13:30:11,438][02089] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-12-21 13:30:12,087][02089] Num frames 100... [2024-12-21 13:30:12,263][02089] Num frames 200... [2024-12-21 13:30:12,443][02089] Num frames 300... [2024-12-21 13:30:12,584][02089] Num frames 400... [2024-12-21 13:30:12,664][02089] Avg episode rewards: #0: 6.190, true rewards: #0: 4.190 [2024-12-21 13:30:12,665][02089] Avg episode reward: 6.190, avg true_objective: 4.190 [2024-12-21 13:30:12,770][02089] Num frames 500... [2024-12-21 13:30:12,893][02089] Num frames 600... [2024-12-21 13:30:13,016][02089] Num frames 700... [2024-12-21 13:30:13,135][02089] Num frames 800... [2024-12-21 13:30:13,262][02089] Num frames 900... [2024-12-21 13:30:13,389][02089] Num frames 1000... [2024-12-21 13:30:13,528][02089] Num frames 1100... [2024-12-21 13:30:13,651][02089] Num frames 1200... [2024-12-21 13:30:13,775][02089] Num frames 1300... [2024-12-21 13:30:13,902][02089] Num frames 1400... [2024-12-21 13:30:14,025][02089] Num frames 1500... [2024-12-21 13:30:14,148][02089] Num frames 1600... [2024-12-21 13:30:14,274][02089] Num frames 1700... [2024-12-21 13:30:14,340][02089] Avg episode rewards: #0: 16.540, true rewards: #0: 8.540 [2024-12-21 13:30:14,341][02089] Avg episode reward: 16.540, avg true_objective: 8.540 [2024-12-21 13:30:14,458][02089] Num frames 1800... [2024-12-21 13:30:14,597][02089] Num frames 1900... [2024-12-21 13:30:14,717][02089] Num frames 2000... [2024-12-21 13:30:14,837][02089] Num frames 2100... [2024-12-21 13:30:14,958][02089] Num frames 2200... [2024-12-21 13:30:15,080][02089] Num frames 2300... [2024-12-21 13:30:15,205][02089] Num frames 2400... [2024-12-21 13:30:15,328][02089] Num frames 2500... [2024-12-21 13:30:15,457][02089] Num frames 2600... [2024-12-21 13:30:15,595][02089] Num frames 2700... [2024-12-21 13:30:15,647][02089] Avg episode rewards: #0: 18.333, true rewards: #0: 9.000 [2024-12-21 13:30:15,649][02089] Avg episode reward: 18.333, avg true_objective: 9.000 [2024-12-21 13:30:15,770][02089] Num frames 2800... [2024-12-21 13:30:15,891][02089] Num frames 2900... [2024-12-21 13:30:16,015][02089] Num frames 3000... [2024-12-21 13:30:16,139][02089] Num frames 3100... [2024-12-21 13:30:16,263][02089] Num frames 3200... [2024-12-21 13:30:16,388][02089] Num frames 3300... [2024-12-21 13:30:16,515][02089] Num frames 3400... [2024-12-21 13:30:16,651][02089] Num frames 3500... [2024-12-21 13:30:16,772][02089] Num frames 3600... [2024-12-21 13:30:16,894][02089] Num frames 3700... [2024-12-21 13:30:16,982][02089] Avg episode rewards: #0: 19.065, true rewards: #0: 9.315 [2024-12-21 13:30:16,983][02089] Avg episode reward: 19.065, avg true_objective: 9.315 [2024-12-21 13:30:17,074][02089] Num frames 3800... [2024-12-21 13:30:17,199][02089] Num frames 3900... [2024-12-21 13:30:17,322][02089] Num frames 4000... [2024-12-21 13:30:17,445][02089] Num frames 4100... [2024-12-21 13:30:17,574][02089] Num frames 4200... [2024-12-21 13:30:17,836][02089] Num frames 4300... [2024-12-21 13:30:17,999][02089] Num frames 4400... [2024-12-21 13:30:18,128][02089] Num frames 4500... [2024-12-21 13:30:18,263][02089] Num frames 4600... [2024-12-21 13:30:18,389][02089] Num frames 4700... [2024-12-21 13:30:18,697][02089] Num frames 4800... [2024-12-21 13:30:18,842][02089] Num frames 4900... [2024-12-21 13:30:18,968][02089] Num frames 5000... [2024-12-21 13:30:19,093][02089] Num frames 5100... [2024-12-21 13:30:19,178][02089] Avg episode rewards: #0: 22.448, true rewards: #0: 10.248 [2024-12-21 13:30:19,179][02089] Avg episode reward: 22.448, avg true_objective: 10.248 [2024-12-21 13:30:19,277][02089] Num frames 5200... [2024-12-21 13:30:19,588][02089] Num frames 5300... [2024-12-21 13:30:19,716][02089] Num frames 5400... [2024-12-21 13:30:19,839][02089] Num frames 5500... [2024-12-21 13:30:19,959][02089] Num frames 5600... [2024-12-21 13:30:20,057][02089] Avg episode rewards: #0: 20.227, true rewards: #0: 9.393 [2024-12-21 13:30:20,058][02089] Avg episode reward: 20.227, avg true_objective: 9.393 [2024-12-21 13:30:20,141][02089] Num frames 5700... [2024-12-21 13:30:20,269][02089] Num frames 5800... [2024-12-21 13:30:20,404][02089] Num frames 5900... [2024-12-21 13:30:20,533][02089] Num frames 6000... [2024-12-21 13:30:20,660][02089] Num frames 6100... [2024-12-21 13:30:20,787][02089] Num frames 6200... [2024-12-21 13:30:20,906][02089] Num frames 6300... [2024-12-21 13:30:21,034][02089] Num frames 6400... [2024-12-21 13:30:21,161][02089] Num frames 6500... [2024-12-21 13:30:21,290][02089] Num frames 6600... [2024-12-21 13:30:21,417][02089] Num frames 6700... [2024-12-21 13:30:21,550][02089] Num frames 6800... [2024-12-21 13:30:21,683][02089] Num frames 6900... [2024-12-21 13:30:21,820][02089] Num frames 7000... [2024-12-21 13:30:21,945][02089] Num frames 7100... [2024-12-21 13:30:22,068][02089] Num frames 7200... [2024-12-21 13:30:22,192][02089] Num frames 7300... [2024-12-21 13:30:22,325][02089] Num frames 7400... [2024-12-21 13:30:22,449][02089] Num frames 7500... [2024-12-21 13:30:22,628][02089] Num frames 7600... [2024-12-21 13:30:22,813][02089] Num frames 7700... [2024-12-21 13:30:22,932][02089] Avg episode rewards: #0: 25.766, true rewards: #0: 11.051 [2024-12-21 13:30:22,934][02089] Avg episode reward: 25.766, avg true_objective: 11.051 [2024-12-21 13:30:23,037][02089] Num frames 7800... [2024-12-21 13:30:23,207][02089] Num frames 7900... [2024-12-21 13:30:23,377][02089] Num frames 8000... [2024-12-21 13:30:23,562][02089] Num frames 8100... [2024-12-21 13:30:23,723][02089] Num frames 8200... [2024-12-21 13:30:23,833][02089] Avg episode rewards: #0: 23.540, true rewards: #0: 10.290 [2024-12-21 13:30:23,835][02089] Avg episode reward: 23.540, avg true_objective: 10.290 [2024-12-21 13:30:23,952][02089] Num frames 8300... [2024-12-21 13:30:24,126][02089] Num frames 8400... [2024-12-21 13:30:24,297][02089] Num frames 8500... [2024-12-21 13:30:24,476][02089] Num frames 8600... [2024-12-21 13:30:24,658][02089] Num frames 8700... [2024-12-21 13:30:24,832][02089] Num frames 8800... [2024-12-21 13:30:25,008][02089] Num frames 8900... [2024-12-21 13:30:25,167][02089] Num frames 9000... [2024-12-21 13:30:25,219][02089] Avg episode rewards: #0: 22.555, true rewards: #0: 10.000 [2024-12-21 13:30:25,220][02089] Avg episode reward: 22.555, avg true_objective: 10.000 [2024-12-21 13:30:25,345][02089] Num frames 9100... [2024-12-21 13:30:25,471][02089] Num frames 9200... [2024-12-21 13:30:25,597][02089] Num frames 9300... [2024-12-21 13:30:25,718][02089] Avg episode rewards: #0: 20.856, true rewards: #0: 9.356 [2024-12-21 13:30:25,720][02089] Avg episode reward: 20.856, avg true_objective: 9.356 [2024-12-21 13:31:20,323][02089] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2024-12-21 13:31:40,245][02089] The model has been pushed to https://huggingface.co/husseinmo/vizdoom_health_gathering_supreme [2024-12-21 13:33:30,556][02089] Loading legacy config file train_dir/doom_health_gathering_supreme_2222/cfg.json instead of train_dir/doom_health_gathering_supreme_2222/config.json [2024-12-21 13:33:30,558][02089] Loading existing experiment configuration from train_dir/doom_health_gathering_supreme_2222/config.json [2024-12-21 13:33:30,560][02089] Overriding arg 'experiment' with value 'doom_health_gathering_supreme_2222' passed from command line [2024-12-21 13:33:30,561][02089] Overriding arg 'train_dir' with value 'train_dir' passed from command line [2024-12-21 13:33:30,563][02089] Overriding arg 'num_workers' with value 1 passed from command line [2024-12-21 13:33:30,564][02089] Adding new argument 'lr_adaptive_min'=1e-06 that is not in the saved config file! [2024-12-21 13:33:30,566][02089] Adding new argument 'lr_adaptive_max'=0.01 that is not in the saved config file! [2024-12-21 13:33:30,567][02089] Adding new argument 'env_gpu_observations'=True that is not in the saved config file! [2024-12-21 13:33:30,569][02089] Adding new argument 'no_render'=True that is not in the saved config file! [2024-12-21 13:33:30,570][02089] Adding new argument 'save_video'=True that is not in the saved config file! [2024-12-21 13:33:30,572][02089] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-12-21 13:33:30,574][02089] Adding new argument 'video_name'=None that is not in the saved config file! [2024-12-21 13:33:30,575][02089] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! [2024-12-21 13:33:30,578][02089] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-12-21 13:33:30,580][02089] Adding new argument 'push_to_hub'=False that is not in the saved config file! [2024-12-21 13:33:30,581][02089] Adding new argument 'hf_repository'=None that is not in the saved config file! [2024-12-21 13:33:30,582][02089] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-12-21 13:33:30,583][02089] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-12-21 13:33:30,587][02089] Adding new argument 'train_script'=None that is not in the saved config file! [2024-12-21 13:33:30,592][02089] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-12-21 13:33:30,594][02089] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-12-21 13:33:30,618][02089] RunningMeanStd input shape: (3, 72, 128) [2024-12-21 13:33:30,619][02089] RunningMeanStd input shape: (1,) [2024-12-21 13:33:30,632][02089] ConvEncoder: input_channels=3 [2024-12-21 13:33:30,679][02089] Conv encoder output size: 512 [2024-12-21 13:33:30,682][02089] Policy head output size: 512 [2024-12-21 13:33:30,704][02089] Loading state from checkpoint train_dir/doom_health_gathering_supreme_2222/checkpoint_p0/checkpoint_000539850_4422451200.pth... [2024-12-21 13:33:31,138][02089] Num frames 100... [2024-12-21 13:33:31,263][02089] Num frames 200... [2024-12-21 13:33:31,385][02089] Num frames 300... [2024-12-21 13:33:31,524][02089] Num frames 400... [2024-12-21 13:33:31,669][02089] Num frames 500... [2024-12-21 13:33:31,790][02089] Num frames 600... [2024-12-21 13:33:31,914][02089] Num frames 700... [2024-12-21 13:33:32,040][02089] Num frames 800... [2024-12-21 13:33:32,169][02089] Num frames 900... [2024-12-21 13:33:32,295][02089] Num frames 1000... [2024-12-21 13:33:32,424][02089] Num frames 1100... [2024-12-21 13:33:32,567][02089] Num frames 1200... [2024-12-21 13:33:32,697][02089] Num frames 1300... [2024-12-21 13:33:32,824][02089] Num frames 1400... [2024-12-21 13:33:32,946][02089] Num frames 1500... [2024-12-21 13:33:33,071][02089] Num frames 1600... [2024-12-21 13:33:33,196][02089] Num frames 1700... [2024-12-21 13:33:33,327][02089] Num frames 1800... [2024-12-21 13:33:33,451][02089] Num frames 1900... [2024-12-21 13:33:33,594][02089] Num frames 2000... [2024-12-21 13:33:33,726][02089] Num frames 2100... [2024-12-21 13:33:33,778][02089] Avg episode rewards: #0: 60.998, true rewards: #0: 21.000 [2024-12-21 13:33:33,780][02089] Avg episode reward: 60.998, avg true_objective: 21.000 [2024-12-21 13:33:33,908][02089] Num frames 2200... [2024-12-21 13:33:34,036][02089] Num frames 2300... [2024-12-21 13:33:34,161][02089] Num frames 2400... [2024-12-21 13:33:34,287][02089] Num frames 2500... [2024-12-21 13:33:34,414][02089] Num frames 2600... [2024-12-21 13:33:34,550][02089] Num frames 2700... [2024-12-21 13:33:34,686][02089] Num frames 2800... [2024-12-21 13:33:34,813][02089] Num frames 2900... [2024-12-21 13:33:34,940][02089] Num frames 3000... [2024-12-21 13:33:35,071][02089] Num frames 3100... [2024-12-21 13:33:35,199][02089] Num frames 3200... [2024-12-21 13:33:35,326][02089] Num frames 3300... [2024-12-21 13:33:35,450][02089] Num frames 3400... [2024-12-21 13:33:35,582][02089] Num frames 3500... [2024-12-21 13:33:35,715][02089] Num frames 3600... [2024-12-21 13:33:35,838][02089] Num frames 3700... [2024-12-21 13:33:35,964][02089] Num frames 3800... [2024-12-21 13:33:36,090][02089] Num frames 3900... [2024-12-21 13:33:36,220][02089] Num frames 4000... [2024-12-21 13:33:36,344][02089] Num frames 4100... [2024-12-21 13:33:36,473][02089] Num frames 4200... [2024-12-21 13:33:36,527][02089] Avg episode rewards: #0: 63.999, true rewards: #0: 21.000 [2024-12-21 13:33:36,529][02089] Avg episode reward: 63.999, avg true_objective: 21.000 [2024-12-21 13:33:36,674][02089] Num frames 4300... [2024-12-21 13:33:36,796][02089] Num frames 4400... [2024-12-21 13:33:36,920][02089] Num frames 4500... [2024-12-21 13:33:37,114][02089] Num frames 4600... [2024-12-21 13:33:37,293][02089] Num frames 4700... [2024-12-21 13:33:37,464][02089] Num frames 4800... [2024-12-21 13:33:37,669][02089] Num frames 4900... [2024-12-21 13:33:37,847][02089] Num frames 5000... [2024-12-21 13:33:38,013][02089] Num frames 5100... [2024-12-21 13:33:38,186][02089] Num frames 5200... [2024-12-21 13:33:38,377][02089] Num frames 5300... [2024-12-21 13:33:38,576][02089] Num frames 5400... [2024-12-21 13:33:38,772][02089] Num frames 5500... [2024-12-21 13:33:38,977][02089] Num frames 5600... [2024-12-21 13:33:39,156][02089] Num frames 5700... [2024-12-21 13:33:39,335][02089] Num frames 5800... [2024-12-21 13:33:39,524][02089] Num frames 5900... [2024-12-21 13:33:39,709][02089] Num frames 6000... [2024-12-21 13:33:39,840][02089] Num frames 6100... [2024-12-21 13:33:39,963][02089] Num frames 6200... [2024-12-21 13:33:40,095][02089] Num frames 6300... [2024-12-21 13:33:40,148][02089] Avg episode rewards: #0: 62.332, true rewards: #0: 21.000 [2024-12-21 13:33:40,150][02089] Avg episode reward: 62.332, avg true_objective: 21.000 [2024-12-21 13:33:40,277][02089] Num frames 6400... [2024-12-21 13:33:40,401][02089] Num frames 6500... [2024-12-21 13:33:40,534][02089] Num frames 6600... [2024-12-21 13:33:40,656][02089] Num frames 6700... [2024-12-21 13:33:40,778][02089] Num frames 6800... [2024-12-21 13:33:40,910][02089] Num frames 6900... [2024-12-21 13:33:41,039][02089] Num frames 7000... [2024-12-21 13:33:41,164][02089] Num frames 7100... [2024-12-21 13:33:41,288][02089] Num frames 7200... [2024-12-21 13:33:41,417][02089] Num frames 7300... [2024-12-21 13:33:41,548][02089] Num frames 7400... [2024-12-21 13:33:41,671][02089] Num frames 7500... [2024-12-21 13:33:41,794][02089] Num frames 7600... [2024-12-21 13:33:41,927][02089] Num frames 7700... [2024-12-21 13:33:42,049][02089] Num frames 7800... [2024-12-21 13:33:42,181][02089] Num frames 7900... [2024-12-21 13:33:42,310][02089] Num frames 8000... [2024-12-21 13:33:42,441][02089] Num frames 8100... [2024-12-21 13:33:42,582][02089] Num frames 8200... [2024-12-21 13:33:42,706][02089] Num frames 8300... [2024-12-21 13:33:42,832][02089] Num frames 8400... [2024-12-21 13:33:42,885][02089] Avg episode rewards: #0: 63.749, true rewards: #0: 21.000 [2024-12-21 13:33:42,887][02089] Avg episode reward: 63.749, avg true_objective: 21.000 [2024-12-21 13:33:43,015][02089] Num frames 8500... [2024-12-21 13:33:43,143][02089] Num frames 8600... [2024-12-21 13:33:43,264][02089] Num frames 8700... [2024-12-21 13:33:43,386][02089] Num frames 8800... [2024-12-21 13:33:43,519][02089] Num frames 8900... [2024-12-21 13:33:43,654][02089] Num frames 9000... [2024-12-21 13:33:43,778][02089] Num frames 9100... [2024-12-21 13:33:43,911][02089] Num frames 9200... [2024-12-21 13:33:44,036][02089] Num frames 9300... [2024-12-21 13:33:44,166][02089] Num frames 9400... [2024-12-21 13:33:44,296][02089] Num frames 9500... [2024-12-21 13:33:44,423][02089] Num frames 9600... [2024-12-21 13:33:44,509][02089] Avg episode rewards: #0: 57.843, true rewards: #0: 19.244 [2024-12-21 13:33:44,511][02089] Avg episode reward: 57.843, avg true_objective: 19.244 [2024-12-21 13:33:44,619][02089] Num frames 9700... [2024-12-21 13:33:44,746][02089] Num frames 9800... [2024-12-21 13:33:44,870][02089] Num frames 9900... [2024-12-21 13:33:45,007][02089] Num frames 10000... [2024-12-21 13:33:45,138][02089] Num frames 10100... [2024-12-21 13:33:45,272][02089] Num frames 10200... [2024-12-21 13:33:45,396][02089] Num frames 10300... [2024-12-21 13:33:45,528][02089] Num frames 10400... [2024-12-21 13:33:45,656][02089] Num frames 10500... [2024-12-21 13:33:45,780][02089] Num frames 10600... [2024-12-21 13:33:45,905][02089] Num frames 10700... [2024-12-21 13:33:46,041][02089] Num frames 10800... [2024-12-21 13:33:46,168][02089] Num frames 10900... [2024-12-21 13:33:46,291][02089] Num frames 11000... [2024-12-21 13:33:46,420][02089] Num frames 11100... [2024-12-21 13:33:46,553][02089] Num frames 11200... [2024-12-21 13:33:46,677][02089] Num frames 11300... [2024-12-21 13:33:46,799][02089] Num frames 11400... [2024-12-21 13:33:46,931][02089] Num frames 11500... [2024-12-21 13:33:47,068][02089] Num frames 11600... [2024-12-21 13:33:47,193][02089] Num frames 11700... [2024-12-21 13:33:47,278][02089] Avg episode rewards: #0: 59.369, true rewards: #0: 19.537 [2024-12-21 13:33:47,279][02089] Avg episode reward: 59.369, avg true_objective: 19.537 [2024-12-21 13:33:47,383][02089] Num frames 11800... [2024-12-21 13:33:47,522][02089] Num frames 11900... [2024-12-21 13:33:47,651][02089] Num frames 12000... [2024-12-21 13:33:47,774][02089] Num frames 12100... [2024-12-21 13:33:47,901][02089] Num frames 12200... [2024-12-21 13:33:48,035][02089] Num frames 12300... [2024-12-21 13:33:48,160][02089] Num frames 12400... [2024-12-21 13:33:48,292][02089] Num frames 12500... [2024-12-21 13:33:48,419][02089] Num frames 12600... [2024-12-21 13:33:48,567][02089] Num frames 12700... [2024-12-21 13:33:48,694][02089] Num frames 12800... [2024-12-21 13:33:48,821][02089] Num frames 12900... [2024-12-21 13:33:48,946][02089] Num frames 13000... [2024-12-21 13:33:49,081][02089] Num frames 13100... [2024-12-21 13:33:49,215][02089] Num frames 13200... [2024-12-21 13:33:49,342][02089] Num frames 13300... [2024-12-21 13:33:49,422][02089] Avg episode rewards: #0: 57.738, true rewards: #0: 19.024 [2024-12-21 13:33:49,424][02089] Avg episode reward: 57.738, avg true_objective: 19.024 [2024-12-21 13:33:49,534][02089] Num frames 13400... [2024-12-21 13:33:49,659][02089] Num frames 13500... [2024-12-21 13:33:49,824][02089] Num frames 13600... [2024-12-21 13:33:50,002][02089] Num frames 13700... [2024-12-21 13:33:50,195][02089] Num frames 13800... [2024-12-21 13:33:50,369][02089] Num frames 13900... [2024-12-21 13:33:50,547][02089] Num frames 14000... [2024-12-21 13:33:50,714][02089] Num frames 14100... [2024-12-21 13:33:50,886][02089] Num frames 14200... [2024-12-21 13:33:51,057][02089] Num frames 14300... [2024-12-21 13:33:51,240][02089] Num frames 14400... [2024-12-21 13:33:51,426][02089] Num frames 14500... [2024-12-21 13:33:51,606][02089] Num frames 14600... [2024-12-21 13:33:51,787][02089] Num frames 14700... [2024-12-21 13:33:51,972][02089] Num frames 14800... [2024-12-21 13:33:52,161][02089] Num frames 14900... [2024-12-21 13:33:52,353][02089] Num frames 15000... [2024-12-21 13:33:52,524][02089] Num frames 15100... [2024-12-21 13:33:52,650][02089] Num frames 15200... [2024-12-21 13:33:52,775][02089] Num frames 15300... [2024-12-21 13:33:52,900][02089] Num frames 15400... [2024-12-21 13:33:52,978][02089] Avg episode rewards: #0: 58.645, true rewards: #0: 19.271 [2024-12-21 13:33:52,981][02089] Avg episode reward: 58.645, avg true_objective: 19.271 [2024-12-21 13:33:53,087][02089] Num frames 15500... [2024-12-21 13:33:53,223][02089] Num frames 15600... [2024-12-21 13:33:53,358][02089] Num frames 15700... [2024-12-21 13:33:53,495][02089] Num frames 15800... [2024-12-21 13:33:53,629][02089] Num frames 15900... [2024-12-21 13:33:53,762][02089] Num frames 16000... [2024-12-21 13:33:53,884][02089] Num frames 16100... [2024-12-21 13:33:54,012][02089] Num frames 16200... [2024-12-21 13:33:54,138][02089] Num frames 16300... [2024-12-21 13:33:54,280][02089] Num frames 16400... [2024-12-21 13:33:54,410][02089] Num frames 16500... [2024-12-21 13:33:54,548][02089] Num frames 16600... [2024-12-21 13:33:54,676][02089] Num frames 16700... [2024-12-21 13:33:54,804][02089] Num frames 16800... [2024-12-21 13:33:54,929][02089] Num frames 16900... [2024-12-21 13:33:55,062][02089] Num frames 17000... [2024-12-21 13:33:55,201][02089] Num frames 17100... [2024-12-21 13:33:55,337][02089] Num frames 17200... [2024-12-21 13:33:55,463][02089] Num frames 17300... [2024-12-21 13:33:55,599][02089] Num frames 17400... [2024-12-21 13:33:55,725][02089] Num frames 17500... [2024-12-21 13:33:55,806][02089] Avg episode rewards: #0: 59.240, true rewards: #0: 19.463 [2024-12-21 13:33:55,807][02089] Avg episode reward: 59.240, avg true_objective: 19.463 [2024-12-21 13:33:55,917][02089] Num frames 17600... [2024-12-21 13:33:56,054][02089] Num frames 17700... [2024-12-21 13:33:56,181][02089] Num frames 17800... [2024-12-21 13:33:56,316][02089] Num frames 17900... [2024-12-21 13:33:56,442][02089] Num frames 18000... [2024-12-21 13:33:56,579][02089] Num frames 18100... [2024-12-21 13:33:56,705][02089] Num frames 18200... [2024-12-21 13:33:56,832][02089] Num frames 18300... [2024-12-21 13:33:56,956][02089] Num frames 18400... [2024-12-21 13:33:57,085][02089] Num frames 18500... [2024-12-21 13:33:57,211][02089] Num frames 18600... [2024-12-21 13:33:57,349][02089] Num frames 18700... [2024-12-21 13:33:57,479][02089] Num frames 18800... [2024-12-21 13:33:57,617][02089] Num frames 18900... [2024-12-21 13:33:57,747][02089] Num frames 19000... [2024-12-21 13:33:57,882][02089] Num frames 19100... [2024-12-21 13:33:58,009][02089] Num frames 19200... [2024-12-21 13:33:58,137][02089] Num frames 19300... [2024-12-21 13:33:58,250][02089] Avg episode rewards: #0: 58.640, true rewards: #0: 19.341 [2024-12-21 13:33:58,253][02089] Avg episode reward: 58.640, avg true_objective: 19.341 [2024-12-21 13:35:53,937][02089] Replay video saved to train_dir/doom_health_gathering_supreme_2222/replay.mp4!