diff --git "a/sf_log.txt" "b/sf_log.txt" new file mode 100644--- /dev/null +++ "b/sf_log.txt" @@ -0,0 +1,1307 @@ +[2024-08-11 11:07:14,139][00221] Saving configuration to /content/train_dir/default_experiment/config.json... +[2024-08-11 11:07:14,145][00221] Rollout worker 0 uses device cpu +[2024-08-11 11:07:14,146][00221] Rollout worker 1 uses device cpu +[2024-08-11 11:07:14,148][00221] Rollout worker 2 uses device cpu +[2024-08-11 11:07:14,149][00221] Rollout worker 3 uses device cpu +[2024-08-11 11:07:14,150][00221] Rollout worker 4 uses device cpu +[2024-08-11 11:07:14,151][00221] Rollout worker 5 uses device cpu +[2024-08-11 11:07:14,152][00221] Rollout worker 6 uses device cpu +[2024-08-11 11:07:14,153][00221] Rollout worker 7 uses device cpu +[2024-08-11 11:07:14,316][00221] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-08-11 11:07:14,318][00221] InferenceWorker_p0-w0: min num requests: 2 +[2024-08-11 11:07:14,350][00221] Starting all processes... +[2024-08-11 11:07:14,351][00221] Starting process learner_proc0 +[2024-08-11 11:07:15,657][00221] Starting all processes... +[2024-08-11 11:07:15,669][00221] Starting process inference_proc0-0 +[2024-08-11 11:07:15,670][00221] Starting process rollout_proc0 +[2024-08-11 11:07:15,670][00221] Starting process rollout_proc1 +[2024-08-11 11:07:15,670][00221] Starting process rollout_proc2 +[2024-08-11 11:07:15,670][00221] Starting process rollout_proc3 +[2024-08-11 11:07:15,670][00221] Starting process rollout_proc4 +[2024-08-11 11:07:15,670][00221] Starting process rollout_proc5 +[2024-08-11 11:07:15,670][00221] Starting process rollout_proc6 +[2024-08-11 11:07:15,670][00221] Starting process rollout_proc7 +[2024-08-11 11:07:30,549][02645] Worker 7 uses CPU cores [1] +[2024-08-11 11:07:30,556][02639] Worker 1 uses CPU cores [1] +[2024-08-11 11:07:30,596][02641] Worker 3 uses CPU cores [1] +[2024-08-11 11:07:30,685][02624] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-08-11 11:07:30,685][02624] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 +[2024-08-11 11:07:30,767][02624] Num visible devices: 1 +[2024-08-11 11:07:30,794][02624] Starting seed is not provided +[2024-08-11 11:07:30,794][02624] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-08-11 11:07:30,795][02624] Initializing actor-critic model on device cuda:0 +[2024-08-11 11:07:30,796][02624] RunningMeanStd input shape: (3, 72, 128) +[2024-08-11 11:07:30,798][02624] RunningMeanStd input shape: (1,) +[2024-08-11 11:07:30,883][02624] ConvEncoder: input_channels=3 +[2024-08-11 11:07:30,886][02638] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-08-11 11:07:30,887][02638] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 +[2024-08-11 11:07:31,007][02637] Worker 0 uses CPU cores [0] +[2024-08-11 11:07:31,011][02638] Num visible devices: 1 +[2024-08-11 11:07:31,084][02643] Worker 5 uses CPU cores [1] +[2024-08-11 11:07:31,098][02640] Worker 2 uses CPU cores [0] +[2024-08-11 11:07:31,158][02642] Worker 4 uses CPU cores [0] +[2024-08-11 11:07:31,169][02644] Worker 6 uses CPU cores [0] +[2024-08-11 11:07:31,319][02624] Conv encoder output size: 512 +[2024-08-11 11:07:31,320][02624] Policy head output size: 512 +[2024-08-11 11:07:31,388][02624] Created Actor Critic model with architecture: +[2024-08-11 11:07:31,388][02624] ActorCriticSharedWeights( + (obs_normalizer): ObservationNormalizer( + (running_mean_std): RunningMeanStdDictInPlace( + (running_mean_std): ModuleDict( + (obs): RunningMeanStdInPlace() + ) + ) + ) + (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) + (encoder): VizdoomEncoder( + (basic_encoder): ConvEncoder( + (enc): RecursiveScriptModule( + original_name=ConvEncoderImpl + (conv_head): RecursiveScriptModule( + original_name=Sequential + (0): RecursiveScriptModule(original_name=Conv2d) + (1): RecursiveScriptModule(original_name=ELU) + (2): RecursiveScriptModule(original_name=Conv2d) + (3): RecursiveScriptModule(original_name=ELU) + (4): RecursiveScriptModule(original_name=Conv2d) + (5): RecursiveScriptModule(original_name=ELU) + ) + (mlp_layers): RecursiveScriptModule( + original_name=Sequential + (0): RecursiveScriptModule(original_name=Linear) + (1): RecursiveScriptModule(original_name=ELU) + ) + ) + ) + ) + (core): ModelCoreRNN( + (core): GRU(512, 512) + ) + (decoder): MlpDecoder( + (mlp): Identity() + ) + (critic_linear): Linear(in_features=512, out_features=1, bias=True) + (action_parameterization): ActionParameterizationDefault( + (distribution_linear): Linear(in_features=512, out_features=5, bias=True) + ) +) +[2024-08-11 11:07:31,723][02624] Using optimizer +[2024-08-11 11:07:32,799][02624] No checkpoints found +[2024-08-11 11:07:32,799][02624] Did not load from checkpoint, starting from scratch! +[2024-08-11 11:07:32,800][02624] Initialized policy 0 weights for model version 0 +[2024-08-11 11:07:32,803][02624] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-08-11 11:07:32,811][02624] LearnerWorker_p0 finished initialization! +[2024-08-11 11:07:33,020][02638] RunningMeanStd input shape: (3, 72, 128) +[2024-08-11 11:07:33,022][02638] RunningMeanStd input shape: (1,) +[2024-08-11 11:07:33,041][02638] ConvEncoder: input_channels=3 +[2024-08-11 11:07:33,206][02638] Conv encoder output size: 512 +[2024-08-11 11:07:33,206][02638] Policy head output size: 512 +[2024-08-11 11:07:33,296][00221] Inference worker 0-0 is ready! +[2024-08-11 11:07:33,299][00221] All inference workers are ready! Signal rollout workers to start! +[2024-08-11 11:07:33,645][02644] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-08-11 11:07:33,668][02642] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-08-11 11:07:33,674][02645] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-08-11 11:07:33,649][02640] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-08-11 11:07:33,696][02639] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-08-11 11:07:33,699][02641] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-08-11 11:07:33,706][02643] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-08-11 11:07:33,776][02637] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-08-11 11:07:34,310][00221] Heartbeat connected on Batcher_0 +[2024-08-11 11:07:34,312][00221] Heartbeat connected on LearnerWorker_p0 +[2024-08-11 11:07:34,363][00221] Heartbeat connected on InferenceWorker_p0-w0 +[2024-08-11 11:07:35,026][00221] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-08-11 11:07:35,542][02645] Decorrelating experience for 0 frames... +[2024-08-11 11:07:35,541][02639] Decorrelating experience for 0 frames... +[2024-08-11 11:07:35,622][02642] Decorrelating experience for 0 frames... +[2024-08-11 11:07:35,624][02640] Decorrelating experience for 0 frames... +[2024-08-11 11:07:35,620][02644] Decorrelating experience for 0 frames... +[2024-08-11 11:07:35,668][02637] Decorrelating experience for 0 frames... +[2024-08-11 11:07:36,901][02643] Decorrelating experience for 0 frames... +[2024-08-11 11:07:36,936][02645] Decorrelating experience for 32 frames... +[2024-08-11 11:07:37,120][02644] Decorrelating experience for 32 frames... +[2024-08-11 11:07:37,129][02642] Decorrelating experience for 32 frames... +[2024-08-11 11:07:37,126][02640] Decorrelating experience for 32 frames... +[2024-08-11 11:07:37,178][02641] Decorrelating experience for 0 frames... +[2024-08-11 11:07:37,201][02637] Decorrelating experience for 32 frames... +[2024-08-11 11:07:37,237][02639] Decorrelating experience for 32 frames... +[2024-08-11 11:07:38,183][02643] Decorrelating experience for 32 frames... +[2024-08-11 11:07:38,467][02641] Decorrelating experience for 32 frames... +[2024-08-11 11:07:38,739][02644] Decorrelating experience for 64 frames... +[2024-08-11 11:07:38,741][02640] Decorrelating experience for 64 frames... +[2024-08-11 11:07:38,777][02642] Decorrelating experience for 64 frames... +[2024-08-11 11:07:38,969][02639] Decorrelating experience for 64 frames... +[2024-08-11 11:07:39,848][02645] Decorrelating experience for 64 frames... +[2024-08-11 11:07:39,885][02643] Decorrelating experience for 64 frames... +[2024-08-11 11:07:40,016][02640] Decorrelating experience for 96 frames... +[2024-08-11 11:07:40,026][00221] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-08-11 11:07:40,057][02642] Decorrelating experience for 96 frames... +[2024-08-11 11:07:40,157][02641] Decorrelating experience for 64 frames... +[2024-08-11 11:07:40,219][02637] Decorrelating experience for 64 frames... +[2024-08-11 11:07:40,291][00221] Heartbeat connected on RolloutWorker_w2 +[2024-08-11 11:07:40,342][00221] Heartbeat connected on RolloutWorker_w4 +[2024-08-11 11:07:41,520][02644] Decorrelating experience for 96 frames... +[2024-08-11 11:07:41,609][02645] Decorrelating experience for 96 frames... +[2024-08-11 11:07:41,800][00221] Heartbeat connected on RolloutWorker_w6 +[2024-08-11 11:07:42,012][00221] Heartbeat connected on RolloutWorker_w7 +[2024-08-11 11:07:42,138][02643] Decorrelating experience for 96 frames... +[2024-08-11 11:07:42,197][02637] Decorrelating experience for 96 frames... +[2024-08-11 11:07:42,328][00221] Heartbeat connected on RolloutWorker_w0 +[2024-08-11 11:07:42,336][02641] Decorrelating experience for 96 frames... +[2024-08-11 11:07:42,472][00221] Heartbeat connected on RolloutWorker_w5 +[2024-08-11 11:07:42,682][00221] Heartbeat connected on RolloutWorker_w3 +[2024-08-11 11:07:43,487][02639] Decorrelating experience for 96 frames... +[2024-08-11 11:07:44,022][00221] Heartbeat connected on RolloutWorker_w1 +[2024-08-11 11:07:45,026][00221] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 14.6. Samples: 146. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-08-11 11:07:45,029][00221] Avg episode reward: [(0, '1.887')] +[2024-08-11 11:07:45,961][02624] Signal inference workers to stop experience collection... +[2024-08-11 11:07:46,003][02638] InferenceWorker_p0-w0: stopping experience collection +[2024-08-11 11:07:48,838][02624] Signal inference workers to resume experience collection... +[2024-08-11 11:07:48,839][02638] InferenceWorker_p0-w0: resuming experience collection +[2024-08-11 11:07:50,026][00221] Fps is (10 sec: 409.6, 60 sec: 273.1, 300 sec: 273.1). Total num frames: 4096. Throughput: 0: 176.5. Samples: 2648. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) +[2024-08-11 11:07:50,029][00221] Avg episode reward: [(0, '2.365')] +[2024-08-11 11:07:55,025][00221] Fps is (10 sec: 2867.2, 60 sec: 1433.6, 300 sec: 1433.6). Total num frames: 28672. Throughput: 0: 379.5. Samples: 7590. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-08-11 11:07:55,028][00221] Avg episode reward: [(0, '3.581')] +[2024-08-11 11:07:57,903][02638] Updated weights for policy 0, policy_version 10 (0.0028) +[2024-08-11 11:08:00,026][00221] Fps is (10 sec: 4505.6, 60 sec: 1966.1, 300 sec: 1966.1). Total num frames: 49152. Throughput: 0: 411.3. Samples: 10282. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-08-11 11:08:00,030][00221] Avg episode reward: [(0, '4.249')] +[2024-08-11 11:08:05,028][00221] Fps is (10 sec: 3276.0, 60 sec: 2047.8, 300 sec: 2047.8). Total num frames: 61440. Throughput: 0: 519.8. Samples: 15594. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-08-11 11:08:05,030][00221] Avg episode reward: [(0, '4.344')] +[2024-08-11 11:08:09,961][02638] Updated weights for policy 0, policy_version 20 (0.0030) +[2024-08-11 11:08:10,026][00221] Fps is (10 sec: 3276.8, 60 sec: 2340.6, 300 sec: 2340.6). Total num frames: 81920. Throughput: 0: 577.4. Samples: 20210. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2024-08-11 11:08:10,030][00221] Avg episode reward: [(0, '4.291')] +[2024-08-11 11:08:15,025][00221] Fps is (10 sec: 4097.0, 60 sec: 2560.0, 300 sec: 2560.0). Total num frames: 102400. Throughput: 0: 593.7. Samples: 23748. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-08-11 11:08:15,030][00221] Avg episode reward: [(0, '4.206')] +[2024-08-11 11:08:15,033][02624] Saving new best policy, reward=4.206! +[2024-08-11 11:08:19,696][02638] Updated weights for policy 0, policy_version 30 (0.0038) +[2024-08-11 11:08:20,026][00221] Fps is (10 sec: 4095.7, 60 sec: 2730.6, 300 sec: 2730.6). Total num frames: 122880. Throughput: 0: 676.1. Samples: 30424. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-08-11 11:08:20,031][00221] Avg episode reward: [(0, '4.498')] +[2024-08-11 11:08:20,042][02624] Saving new best policy, reward=4.498! +[2024-08-11 11:08:25,026][00221] Fps is (10 sec: 3276.8, 60 sec: 2703.4, 300 sec: 2703.4). Total num frames: 135168. Throughput: 0: 767.1. Samples: 34518. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-08-11 11:08:25,029][00221] Avg episode reward: [(0, '4.519')] +[2024-08-11 11:08:25,035][02624] Saving new best policy, reward=4.519! +[2024-08-11 11:08:30,026][00221] Fps is (10 sec: 3686.7, 60 sec: 2904.4, 300 sec: 2904.4). Total num frames: 159744. Throughput: 0: 829.3. Samples: 37464. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-08-11 11:08:30,028][00221] Avg episode reward: [(0, '4.483')] +[2024-08-11 11:08:30,817][02638] Updated weights for policy 0, policy_version 40 (0.0031) +[2024-08-11 11:08:35,026][00221] Fps is (10 sec: 4505.6, 60 sec: 3003.7, 300 sec: 3003.7). Total num frames: 180224. Throughput: 0: 928.4. Samples: 44428. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2024-08-11 11:08:35,029][00221] Avg episode reward: [(0, '4.519')] +[2024-08-11 11:08:40,026][00221] Fps is (10 sec: 3276.8, 60 sec: 3208.5, 300 sec: 2961.7). Total num frames: 192512. Throughput: 0: 927.6. Samples: 49330. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-08-11 11:08:40,029][00221] Avg episode reward: [(0, '4.503')] +[2024-08-11 11:08:42,608][02638] Updated weights for policy 0, policy_version 50 (0.0027) +[2024-08-11 11:08:45,026][00221] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3042.7). Total num frames: 212992. Throughput: 0: 917.8. Samples: 51584. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-08-11 11:08:45,032][00221] Avg episode reward: [(0, '4.420')] +[2024-08-11 11:08:50,026][00221] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3167.6). Total num frames: 237568. Throughput: 0: 955.4. Samples: 58586. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-08-11 11:08:50,029][00221] Avg episode reward: [(0, '4.429')] +[2024-08-11 11:08:51,502][02638] Updated weights for policy 0, policy_version 60 (0.0043) +[2024-08-11 11:08:55,026][00221] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3174.4). Total num frames: 253952. Throughput: 0: 986.8. Samples: 64618. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-08-11 11:08:55,033][00221] Avg episode reward: [(0, '4.443')] +[2024-08-11 11:09:00,026][00221] Fps is (10 sec: 3276.7, 60 sec: 3686.4, 300 sec: 3180.4). Total num frames: 270336. Throughput: 0: 955.5. Samples: 66746. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-08-11 11:09:00,032][00221] Avg episode reward: [(0, '4.466')] +[2024-08-11 11:09:02,932][02638] Updated weights for policy 0, policy_version 70 (0.0026) +[2024-08-11 11:09:05,030][00221] Fps is (10 sec: 4094.3, 60 sec: 3891.1, 300 sec: 3276.7). Total num frames: 294912. Throughput: 0: 935.6. Samples: 72528. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-08-11 11:09:05,031][00221] Avg episode reward: [(0, '4.518')] +[2024-08-11 11:09:10,026][00221] Fps is (10 sec: 4915.4, 60 sec: 3959.5, 300 sec: 3363.0). Total num frames: 319488. Throughput: 0: 1002.1. Samples: 79614. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2024-08-11 11:09:10,028][00221] Avg episode reward: [(0, '4.654')] +[2024-08-11 11:09:10,039][02624] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000078_319488.pth... +[2024-08-11 11:09:10,256][02624] Saving new best policy, reward=4.654! +[2024-08-11 11:09:13,094][02638] Updated weights for policy 0, policy_version 80 (0.0027) +[2024-08-11 11:09:15,026][00221] Fps is (10 sec: 3687.9, 60 sec: 3822.9, 300 sec: 3317.8). Total num frames: 331776. Throughput: 0: 987.3. Samples: 81892. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-08-11 11:09:15,036][00221] Avg episode reward: [(0, '4.569')] +[2024-08-11 11:09:20,026][00221] Fps is (10 sec: 2867.2, 60 sec: 3754.7, 300 sec: 3315.8). Total num frames: 348160. Throughput: 0: 934.1. Samples: 86462. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-08-11 11:09:20,030][00221] Avg episode reward: [(0, '4.548')] +[2024-08-11 11:09:23,620][02638] Updated weights for policy 0, policy_version 90 (0.0042) +[2024-08-11 11:09:25,026][00221] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3388.5). Total num frames: 372736. Throughput: 0: 982.0. Samples: 93522. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-08-11 11:09:25,031][00221] Avg episode reward: [(0, '4.648')] +[2024-08-11 11:09:30,026][00221] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3419.3). Total num frames: 393216. Throughput: 0: 1008.4. Samples: 96964. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-08-11 11:09:30,028][00221] Avg episode reward: [(0, '4.812')] +[2024-08-11 11:09:30,034][02624] Saving new best policy, reward=4.812! +[2024-08-11 11:09:35,026][00221] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3379.2). Total num frames: 405504. Throughput: 0: 945.3. Samples: 101124. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-08-11 11:09:35,030][00221] Avg episode reward: [(0, '5.051')] +[2024-08-11 11:09:35,033][02624] Saving new best policy, reward=5.051! +[2024-08-11 11:09:35,576][02638] Updated weights for policy 0, policy_version 100 (0.0043) +[2024-08-11 11:09:40,026][00221] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3440.6). Total num frames: 430080. Throughput: 0: 946.1. Samples: 107192. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-08-11 11:09:40,028][00221] Avg episode reward: [(0, '4.925')] +[2024-08-11 11:09:44,444][02638] Updated weights for policy 0, policy_version 110 (0.0016) +[2024-08-11 11:09:45,026][00221] Fps is (10 sec: 4505.4, 60 sec: 3959.4, 300 sec: 3465.8). Total num frames: 450560. Throughput: 0: 977.8. Samples: 110746. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-08-11 11:09:45,032][00221] Avg episode reward: [(0, '4.736')] +[2024-08-11 11:09:50,026][00221] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3458.8). Total num frames: 466944. Throughput: 0: 970.8. Samples: 116208. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-08-11 11:09:50,032][00221] Avg episode reward: [(0, '4.835')] +[2024-08-11 11:09:55,026][00221] Fps is (10 sec: 3276.9, 60 sec: 3822.9, 300 sec: 3452.3). Total num frames: 483328. Throughput: 0: 927.9. Samples: 121370. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-08-11 11:09:55,032][00221] Avg episode reward: [(0, '4.950')] +[2024-08-11 11:09:56,096][02638] Updated weights for policy 0, policy_version 120 (0.0035) +[2024-08-11 11:10:00,026][00221] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3502.8). Total num frames: 507904. Throughput: 0: 956.9. Samples: 124954. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-08-11 11:10:00,028][00221] Avg episode reward: [(0, '5.510')] +[2024-08-11 11:10:00,039][02624] Saving new best policy, reward=5.510! +[2024-08-11 11:10:05,026][00221] Fps is (10 sec: 4096.0, 60 sec: 3823.2, 300 sec: 3495.3). Total num frames: 524288. Throughput: 0: 992.4. Samples: 131122. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-08-11 11:10:05,031][00221] Avg episode reward: [(0, '5.280')] +[2024-08-11 11:10:07,161][02638] Updated weights for policy 0, policy_version 130 (0.0022) +[2024-08-11 11:10:10,026][00221] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3488.2). Total num frames: 540672. Throughput: 0: 929.4. Samples: 135346. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-08-11 11:10:10,028][00221] Avg episode reward: [(0, '5.245')] +[2024-08-11 11:10:15,026][00221] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3507.2). Total num frames: 561152. Throughput: 0: 925.2. Samples: 138600. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-08-11 11:10:15,028][00221] Avg episode reward: [(0, '5.166')] +[2024-08-11 11:10:16,873][02638] Updated weights for policy 0, policy_version 140 (0.0027) +[2024-08-11 11:10:20,026][00221] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3549.9). Total num frames: 585728. Throughput: 0: 988.3. Samples: 145596. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-08-11 11:10:20,028][00221] Avg episode reward: [(0, '5.137')] +[2024-08-11 11:10:25,027][00221] Fps is (10 sec: 3685.9, 60 sec: 3754.6, 300 sec: 3517.7). Total num frames: 598016. Throughput: 0: 960.0. Samples: 150394. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-08-11 11:10:25,036][00221] Avg episode reward: [(0, '5.073')] +[2024-08-11 11:10:28,523][02638] Updated weights for policy 0, policy_version 150 (0.0034) +[2024-08-11 11:10:30,026][00221] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3534.3). Total num frames: 618496. Throughput: 0: 930.5. Samples: 152616. Policy #0 lag: (min: 0.0, avg: 0.3, max: 2.0) +[2024-08-11 11:10:30,028][00221] Avg episode reward: [(0, '5.416')] +[2024-08-11 11:10:35,026][00221] Fps is (10 sec: 4506.2, 60 sec: 3959.5, 300 sec: 3572.6). Total num frames: 643072. Throughput: 0: 958.7. Samples: 159348. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2024-08-11 11:10:35,032][00221] Avg episode reward: [(0, '5.517')] +[2024-08-11 11:10:35,036][02624] Saving new best policy, reward=5.517! +[2024-08-11 11:10:38,186][02638] Updated weights for policy 0, policy_version 160 (0.0044) +[2024-08-11 11:10:40,026][00221] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3564.6). Total num frames: 659456. Throughput: 0: 975.2. Samples: 165252. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-08-11 11:10:40,029][00221] Avg episode reward: [(0, '5.320')] +[2024-08-11 11:10:45,026][00221] Fps is (10 sec: 2867.2, 60 sec: 3686.4, 300 sec: 3535.5). Total num frames: 671744. Throughput: 0: 941.3. Samples: 167312. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-08-11 11:10:45,032][00221] Avg episode reward: [(0, '5.314')] +[2024-08-11 11:10:49,536][02638] Updated weights for policy 0, policy_version 170 (0.0024) +[2024-08-11 11:10:50,026][00221] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3570.9). Total num frames: 696320. Throughput: 0: 934.9. Samples: 173192. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-08-11 11:10:50,028][00221] Avg episode reward: [(0, '5.312')] +[2024-08-11 11:10:55,026][00221] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3584.0). Total num frames: 716800. Throughput: 0: 992.7. Samples: 180018. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-08-11 11:10:55,028][00221] Avg episode reward: [(0, '5.576')] +[2024-08-11 11:10:55,031][02624] Saving new best policy, reward=5.576! +[2024-08-11 11:11:00,026][00221] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3576.5). Total num frames: 733184. Throughput: 0: 965.8. Samples: 182060. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2024-08-11 11:11:00,028][00221] Avg episode reward: [(0, '5.473')] +[2024-08-11 11:11:01,383][02638] Updated weights for policy 0, policy_version 180 (0.0013) +[2024-08-11 11:11:05,026][00221] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3569.4). Total num frames: 749568. Throughput: 0: 917.9. Samples: 186900. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-08-11 11:11:05,029][00221] Avg episode reward: [(0, '5.703')] +[2024-08-11 11:11:05,033][02624] Saving new best policy, reward=5.703! +[2024-08-11 11:11:10,026][00221] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3600.7). Total num frames: 774144. Throughput: 0: 958.1. Samples: 193508. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-08-11 11:11:10,029][00221] Avg episode reward: [(0, '5.576')] +[2024-08-11 11:11:10,054][02624] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000189_774144.pth... +[2024-08-11 11:11:10,812][02638] Updated weights for policy 0, policy_version 190 (0.0033) +[2024-08-11 11:11:15,030][00221] Fps is (10 sec: 4094.2, 60 sec: 3822.7, 300 sec: 3593.2). Total num frames: 790528. Throughput: 0: 979.1. Samples: 196682. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-08-11 11:11:15,032][00221] Avg episode reward: [(0, '5.626')] +[2024-08-11 11:11:20,026][00221] Fps is (10 sec: 2867.2, 60 sec: 3618.1, 300 sec: 3568.1). Total num frames: 802816. Throughput: 0: 917.8. Samples: 200648. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-08-11 11:11:20,028][00221] Avg episode reward: [(0, '5.952')] +[2024-08-11 11:11:20,036][02624] Saving new best policy, reward=5.952! +[2024-08-11 11:11:23,302][02638] Updated weights for policy 0, policy_version 200 (0.0044) +[2024-08-11 11:11:25,026][00221] Fps is (10 sec: 3278.2, 60 sec: 3754.7, 300 sec: 3579.5). Total num frames: 823296. Throughput: 0: 915.1. Samples: 206432. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-08-11 11:11:25,033][00221] Avg episode reward: [(0, '6.141')] +[2024-08-11 11:11:25,036][02624] Saving new best policy, reward=6.141! +[2024-08-11 11:11:30,026][00221] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3590.5). Total num frames: 843776. Throughput: 0: 939.6. Samples: 209592. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-08-11 11:11:30,028][00221] Avg episode reward: [(0, '6.266')] +[2024-08-11 11:11:30,037][02624] Saving new best policy, reward=6.266! +[2024-08-11 11:11:35,014][02638] Updated weights for policy 0, policy_version 210 (0.0040) +[2024-08-11 11:11:35,034][00221] Fps is (10 sec: 3683.3, 60 sec: 3617.6, 300 sec: 3583.9). Total num frames: 860160. Throughput: 0: 916.9. Samples: 214460. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-08-11 11:11:35,036][00221] Avg episode reward: [(0, '6.211')] +[2024-08-11 11:11:40,026][00221] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3577.7). Total num frames: 876544. Throughput: 0: 874.5. Samples: 219372. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-08-11 11:11:40,032][00221] Avg episode reward: [(0, '5.928')] +[2024-08-11 11:11:45,026][00221] Fps is (10 sec: 3689.5, 60 sec: 3754.7, 300 sec: 3588.1). Total num frames: 897024. Throughput: 0: 903.5. Samples: 222718. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-08-11 11:11:45,028][00221] Avg episode reward: [(0, '6.145')] +[2024-08-11 11:11:45,065][02638] Updated weights for policy 0, policy_version 220 (0.0027) +[2024-08-11 11:11:50,030][00221] Fps is (10 sec: 4094.3, 60 sec: 3686.1, 300 sec: 3598.0). Total num frames: 917504. Throughput: 0: 935.6. Samples: 229004. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-08-11 11:11:50,040][00221] Avg episode reward: [(0, '6.251')] +[2024-08-11 11:11:55,026][00221] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3576.1). Total num frames: 929792. Throughput: 0: 876.3. Samples: 232940. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-08-11 11:11:55,029][00221] Avg episode reward: [(0, '6.556')] +[2024-08-11 11:11:55,032][02624] Saving new best policy, reward=6.556! +[2024-08-11 11:11:57,385][02638] Updated weights for policy 0, policy_version 230 (0.0027) +[2024-08-11 11:12:00,026][00221] Fps is (10 sec: 3278.1, 60 sec: 3618.1, 300 sec: 3585.9). Total num frames: 950272. Throughput: 0: 875.2. Samples: 236062. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-08-11 11:12:00,028][00221] Avg episode reward: [(0, '6.573')] +[2024-08-11 11:12:00,041][02624] Saving new best policy, reward=6.573! +[2024-08-11 11:12:05,026][00221] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3595.4). Total num frames: 970752. Throughput: 0: 925.5. Samples: 242294. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-08-11 11:12:05,031][00221] Avg episode reward: [(0, '6.943')] +[2024-08-11 11:12:05,033][02624] Saving new best policy, reward=6.943! +[2024-08-11 11:12:08,652][02638] Updated weights for policy 0, policy_version 240 (0.0014) +[2024-08-11 11:12:10,027][00221] Fps is (10 sec: 3276.3, 60 sec: 3481.5, 300 sec: 3574.7). Total num frames: 983040. Throughput: 0: 897.0. Samples: 246798. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-08-11 11:12:10,030][00221] Avg episode reward: [(0, '7.336')] +[2024-08-11 11:12:10,045][02624] Saving new best policy, reward=7.336! +[2024-08-11 11:12:15,026][00221] Fps is (10 sec: 3276.8, 60 sec: 3550.1, 300 sec: 3584.0). Total num frames: 1003520. Throughput: 0: 876.9. Samples: 249054. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-08-11 11:12:15,031][00221] Avg episode reward: [(0, '7.270')] +[2024-08-11 11:12:18,836][02638] Updated weights for policy 0, policy_version 250 (0.0026) +[2024-08-11 11:12:20,026][00221] Fps is (10 sec: 4506.3, 60 sec: 3754.7, 300 sec: 3607.4). Total num frames: 1028096. Throughput: 0: 924.2. Samples: 256042. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-08-11 11:12:20,032][00221] Avg episode reward: [(0, '7.300')] +[2024-08-11 11:12:25,026][00221] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3601.7). Total num frames: 1044480. Throughput: 0: 943.6. Samples: 261834. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-08-11 11:12:25,029][00221] Avg episode reward: [(0, '8.103')] +[2024-08-11 11:12:25,034][02624] Saving new best policy, reward=8.103! +[2024-08-11 11:12:30,026][00221] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3596.2). Total num frames: 1060864. Throughput: 0: 914.5. Samples: 263870. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-08-11 11:12:30,029][00221] Avg episode reward: [(0, '8.483')] +[2024-08-11 11:12:30,039][02624] Saving new best policy, reward=8.483! +[2024-08-11 11:12:30,692][02638] Updated weights for policy 0, policy_version 260 (0.0020) +[2024-08-11 11:12:35,026][00221] Fps is (10 sec: 3686.4, 60 sec: 3686.9, 300 sec: 3665.6). Total num frames: 1081344. Throughput: 0: 908.6. Samples: 269888. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-08-11 11:12:35,031][00221] Avg episode reward: [(0, '8.058')] +[2024-08-11 11:12:39,572][02638] Updated weights for policy 0, policy_version 270 (0.0021) +[2024-08-11 11:12:40,026][00221] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3748.9). Total num frames: 1105920. Throughput: 0: 975.4. Samples: 276834. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-08-11 11:12:40,032][00221] Avg episode reward: [(0, '7.896')] +[2024-08-11 11:12:45,026][00221] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3776.7). Total num frames: 1118208. Throughput: 0: 953.0. Samples: 278948. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-08-11 11:12:45,028][00221] Avg episode reward: [(0, '7.974')] +[2024-08-11 11:12:50,026][00221] Fps is (10 sec: 3276.8, 60 sec: 3686.7, 300 sec: 3762.8). Total num frames: 1138688. Throughput: 0: 930.0. Samples: 284144. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-08-11 11:12:50,032][00221] Avg episode reward: [(0, '8.498')] +[2024-08-11 11:12:50,040][02624] Saving new best policy, reward=8.498! +[2024-08-11 11:12:51,185][02638] Updated weights for policy 0, policy_version 280 (0.0017) +[2024-08-11 11:12:55,025][00221] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3776.7). Total num frames: 1163264. Throughput: 0: 979.9. Samples: 290892. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-08-11 11:12:55,028][00221] Avg episode reward: [(0, '8.551')] +[2024-08-11 11:12:55,030][02624] Saving new best policy, reward=8.551! +[2024-08-11 11:13:00,026][00221] Fps is (10 sec: 4095.9, 60 sec: 3822.9, 300 sec: 3790.6). Total num frames: 1179648. Throughput: 0: 998.2. Samples: 293972. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-08-11 11:13:00,038][00221] Avg episode reward: [(0, '8.753')] +[2024-08-11 11:13:00,051][02624] Saving new best policy, reward=8.753! +[2024-08-11 11:13:02,861][02638] Updated weights for policy 0, policy_version 290 (0.0041) +[2024-08-11 11:13:05,026][00221] Fps is (10 sec: 2867.2, 60 sec: 3686.4, 300 sec: 3762.8). Total num frames: 1191936. Throughput: 0: 932.2. Samples: 297990. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-08-11 11:13:05,027][00221] Avg episode reward: [(0, '9.329')] +[2024-08-11 11:13:05,092][02624] Saving new best policy, reward=9.329! +[2024-08-11 11:13:10,026][00221] Fps is (10 sec: 3686.5, 60 sec: 3891.3, 300 sec: 3776.6). Total num frames: 1216512. Throughput: 0: 953.7. Samples: 304750. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-08-11 11:13:10,028][00221] Avg episode reward: [(0, '9.344')] +[2024-08-11 11:13:10,043][02624] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000297_1216512.pth... +[2024-08-11 11:13:10,195][02624] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000078_319488.pth +[2024-08-11 11:13:10,218][02624] Saving new best policy, reward=9.344! +[2024-08-11 11:13:12,035][02638] Updated weights for policy 0, policy_version 300 (0.0026) +[2024-08-11 11:13:15,026][00221] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3776.7). Total num frames: 1236992. Throughput: 0: 980.2. Samples: 307980. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-08-11 11:13:15,035][00221] Avg episode reward: [(0, '9.644')] +[2024-08-11 11:13:15,037][02624] Saving new best policy, reward=9.644! +[2024-08-11 11:13:20,027][00221] Fps is (10 sec: 3685.9, 60 sec: 3754.6, 300 sec: 3790.5). Total num frames: 1253376. Throughput: 0: 954.6. Samples: 312846. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-08-11 11:13:20,033][00221] Avg episode reward: [(0, '9.587')] +[2024-08-11 11:13:24,011][02638] Updated weights for policy 0, policy_version 310 (0.0026) +[2024-08-11 11:13:25,026][00221] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3762.8). Total num frames: 1269760. Throughput: 0: 923.5. Samples: 318392. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-08-11 11:13:25,028][00221] Avg episode reward: [(0, '10.096')] +[2024-08-11 11:13:25,033][02624] Saving new best policy, reward=10.096! +[2024-08-11 11:13:30,026][00221] Fps is (10 sec: 4096.6, 60 sec: 3891.2, 300 sec: 3776.7). Total num frames: 1294336. Throughput: 0: 952.0. Samples: 321788. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-08-11 11:13:30,028][00221] Avg episode reward: [(0, '10.197')] +[2024-08-11 11:13:30,035][02624] Saving new best policy, reward=10.197! +[2024-08-11 11:13:33,912][02638] Updated weights for policy 0, policy_version 320 (0.0021) +[2024-08-11 11:13:35,026][00221] Fps is (10 sec: 4095.9, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 1310720. Throughput: 0: 968.3. Samples: 327718. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2024-08-11 11:13:35,030][00221] Avg episode reward: [(0, '9.887')] +[2024-08-11 11:13:40,026][00221] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3776.7). Total num frames: 1327104. Throughput: 0: 918.9. Samples: 332242. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2024-08-11 11:13:40,030][00221] Avg episode reward: [(0, '10.191')] +[2024-08-11 11:13:44,529][02638] Updated weights for policy 0, policy_version 330 (0.0022) +[2024-08-11 11:13:45,026][00221] Fps is (10 sec: 4096.2, 60 sec: 3891.2, 300 sec: 3776.7). Total num frames: 1351680. Throughput: 0: 929.7. Samples: 335806. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-08-11 11:13:45,033][00221] Avg episode reward: [(0, '10.427')] +[2024-08-11 11:13:45,036][02624] Saving new best policy, reward=10.427! +[2024-08-11 11:13:50,026][00221] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3790.5). Total num frames: 1372160. Throughput: 0: 994.2. Samples: 342730. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2024-08-11 11:13:50,032][00221] Avg episode reward: [(0, '11.068')] +[2024-08-11 11:13:50,047][02624] Saving new best policy, reward=11.068! +[2024-08-11 11:13:55,029][00221] Fps is (10 sec: 3275.7, 60 sec: 3686.2, 300 sec: 3776.6). Total num frames: 1384448. Throughput: 0: 941.0. Samples: 347100. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-08-11 11:13:55,031][00221] Avg episode reward: [(0, '11.449')] +[2024-08-11 11:13:55,041][02624] Saving new best policy, reward=11.449! +[2024-08-11 11:13:56,588][02638] Updated weights for policy 0, policy_version 340 (0.0033) +[2024-08-11 11:14:00,026][00221] Fps is (10 sec: 3686.3, 60 sec: 3822.9, 300 sec: 3776.7). Total num frames: 1409024. Throughput: 0: 928.5. Samples: 349764. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-08-11 11:14:00,032][00221] Avg episode reward: [(0, '11.639')] +[2024-08-11 11:14:00,043][02624] Saving new best policy, reward=11.639! +[2024-08-11 11:14:05,026][00221] Fps is (10 sec: 4507.1, 60 sec: 3959.5, 300 sec: 3762.8). Total num frames: 1429504. Throughput: 0: 972.4. Samples: 356602. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-08-11 11:14:05,028][00221] Avg episode reward: [(0, '11.222')] +[2024-08-11 11:14:05,499][02638] Updated weights for policy 0, policy_version 350 (0.0028) +[2024-08-11 11:14:10,026][00221] Fps is (10 sec: 3686.6, 60 sec: 3822.9, 300 sec: 3776.7). Total num frames: 1445888. Throughput: 0: 969.3. Samples: 362012. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-08-11 11:14:10,030][00221] Avg episode reward: [(0, '12.143')] +[2024-08-11 11:14:10,037][02624] Saving new best policy, reward=12.143! +[2024-08-11 11:14:15,026][00221] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3776.7). Total num frames: 1462272. Throughput: 0: 941.0. Samples: 364134. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-08-11 11:14:15,028][00221] Avg episode reward: [(0, '12.381')] +[2024-08-11 11:14:15,030][02624] Saving new best policy, reward=12.381! +[2024-08-11 11:14:17,058][02638] Updated weights for policy 0, policy_version 360 (0.0027) +[2024-08-11 11:14:20,026][00221] Fps is (10 sec: 4096.0, 60 sec: 3891.3, 300 sec: 3776.7). Total num frames: 1486848. Throughput: 0: 953.5. Samples: 370624. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-08-11 11:14:20,029][00221] Avg episode reward: [(0, '12.402')] +[2024-08-11 11:14:20,037][02624] Saving new best policy, reward=12.402! +[2024-08-11 11:14:25,026][00221] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3776.7). Total num frames: 1507328. Throughput: 0: 992.3. Samples: 376896. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-08-11 11:14:25,029][00221] Avg episode reward: [(0, '13.962')] +[2024-08-11 11:14:25,036][02624] Saving new best policy, reward=13.962! +[2024-08-11 11:14:28,242][02638] Updated weights for policy 0, policy_version 370 (0.0034) +[2024-08-11 11:14:30,026][00221] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3776.7). Total num frames: 1519616. Throughput: 0: 955.7. Samples: 378812. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-08-11 11:14:30,028][00221] Avg episode reward: [(0, '14.123')] +[2024-08-11 11:14:30,043][02624] Saving new best policy, reward=14.123! +[2024-08-11 11:14:35,026][00221] Fps is (10 sec: 3276.8, 60 sec: 3823.0, 300 sec: 3762.8). Total num frames: 1540096. Throughput: 0: 919.7. Samples: 384116. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-08-11 11:14:35,031][00221] Avg episode reward: [(0, '13.593')] +[2024-08-11 11:14:38,188][02638] Updated weights for policy 0, policy_version 380 (0.0042) +[2024-08-11 11:14:40,026][00221] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3776.7). Total num frames: 1564672. Throughput: 0: 976.1. Samples: 391020. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-08-11 11:14:40,028][00221] Avg episode reward: [(0, '12.575')] +[2024-08-11 11:14:45,026][00221] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3762.8). Total num frames: 1576960. Throughput: 0: 974.2. Samples: 393602. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-08-11 11:14:45,032][00221] Avg episode reward: [(0, '11.420')] +[2024-08-11 11:14:49,858][02638] Updated weights for policy 0, policy_version 390 (0.0032) +[2024-08-11 11:14:50,026][00221] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3776.6). Total num frames: 1597440. Throughput: 0: 921.5. Samples: 398070. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-08-11 11:14:50,028][00221] Avg episode reward: [(0, '12.136')] +[2024-08-11 11:14:55,026][00221] Fps is (10 sec: 4096.0, 60 sec: 3891.4, 300 sec: 3762.8). Total num frames: 1617920. Throughput: 0: 954.1. Samples: 404948. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-08-11 11:14:55,028][00221] Avg episode reward: [(0, '13.016')] +[2024-08-11 11:14:59,575][02638] Updated weights for policy 0, policy_version 400 (0.0027) +[2024-08-11 11:15:00,026][00221] Fps is (10 sec: 4096.0, 60 sec: 3823.0, 300 sec: 3776.7). Total num frames: 1638400. Throughput: 0: 981.6. Samples: 408308. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-08-11 11:15:00,028][00221] Avg episode reward: [(0, '13.780')] +[2024-08-11 11:15:05,028][00221] Fps is (10 sec: 3276.1, 60 sec: 3686.3, 300 sec: 3762.7). Total num frames: 1650688. Throughput: 0: 932.8. Samples: 412604. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-08-11 11:15:05,030][00221] Avg episode reward: [(0, '14.839')] +[2024-08-11 11:15:05,034][02624] Saving new best policy, reward=14.839! +[2024-08-11 11:15:10,026][00221] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3762.8). Total num frames: 1671168. Throughput: 0: 924.1. Samples: 418480. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-08-11 11:15:10,027][00221] Avg episode reward: [(0, '14.709')] +[2024-08-11 11:15:10,122][02624] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000409_1675264.pth... +[2024-08-11 11:15:10,252][02624] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000189_774144.pth +[2024-08-11 11:15:11,089][02638] Updated weights for policy 0, policy_version 410 (0.0040) +[2024-08-11 11:15:15,028][00221] Fps is (10 sec: 4505.5, 60 sec: 3891.0, 300 sec: 3762.7). Total num frames: 1695744. Throughput: 0: 957.6. Samples: 421908. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-08-11 11:15:15,030][00221] Avg episode reward: [(0, '14.394')] +[2024-08-11 11:15:20,026][00221] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3776.7). Total num frames: 1712128. Throughput: 0: 962.0. Samples: 427406. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-08-11 11:15:20,032][00221] Avg episode reward: [(0, '14.244')] +[2024-08-11 11:15:22,893][02638] Updated weights for policy 0, policy_version 420 (0.0049) +[2024-08-11 11:15:25,026][00221] Fps is (10 sec: 3277.5, 60 sec: 3686.4, 300 sec: 3762.8). Total num frames: 1728512. Throughput: 0: 916.6. Samples: 432266. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-08-11 11:15:25,030][00221] Avg episode reward: [(0, '13.350')] +[2024-08-11 11:15:30,026][00221] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3748.9). Total num frames: 1748992. Throughput: 0: 935.6. Samples: 435702. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-08-11 11:15:30,033][00221] Avg episode reward: [(0, '13.590')] +[2024-08-11 11:15:31,812][02638] Updated weights for policy 0, policy_version 430 (0.0030) +[2024-08-11 11:15:35,026][00221] Fps is (10 sec: 4096.1, 60 sec: 3822.9, 300 sec: 3762.8). Total num frames: 1769472. Throughput: 0: 980.5. Samples: 442194. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-08-11 11:15:35,029][00221] Avg episode reward: [(0, '14.427')] +[2024-08-11 11:15:40,026][00221] Fps is (10 sec: 3276.7, 60 sec: 3618.1, 300 sec: 3762.8). Total num frames: 1781760. Throughput: 0: 919.3. Samples: 446316. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-08-11 11:15:40,033][00221] Avg episode reward: [(0, '15.065')] +[2024-08-11 11:15:40,042][02624] Saving new best policy, reward=15.065! +[2024-08-11 11:15:43,822][02638] Updated weights for policy 0, policy_version 440 (0.0026) +[2024-08-11 11:15:45,026][00221] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3762.8). Total num frames: 1806336. Throughput: 0: 912.2. Samples: 449358. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-08-11 11:15:45,028][00221] Avg episode reward: [(0, '16.714')] +[2024-08-11 11:15:45,032][02624] Saving new best policy, reward=16.714! +[2024-08-11 11:15:50,026][00221] Fps is (10 sec: 4505.7, 60 sec: 3822.9, 300 sec: 3762.8). Total num frames: 1826816. Throughput: 0: 969.1. Samples: 456210. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-08-11 11:15:50,028][00221] Avg episode reward: [(0, '17.284')] +[2024-08-11 11:15:50,034][02624] Saving new best policy, reward=17.284! +[2024-08-11 11:15:54,743][02638] Updated weights for policy 0, policy_version 450 (0.0024) +[2024-08-11 11:15:55,026][00221] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3762.8). Total num frames: 1843200. Throughput: 0: 945.8. Samples: 461042. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-08-11 11:15:55,028][00221] Avg episode reward: [(0, '17.711')] +[2024-08-11 11:15:55,036][02624] Saving new best policy, reward=17.711! +[2024-08-11 11:16:00,026][00221] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3762.8). Total num frames: 1859584. Throughput: 0: 916.6. Samples: 463154. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-08-11 11:16:00,031][00221] Avg episode reward: [(0, '17.663')] +[2024-08-11 11:16:04,964][02638] Updated weights for policy 0, policy_version 460 (0.0025) +[2024-08-11 11:16:05,026][00221] Fps is (10 sec: 4096.0, 60 sec: 3891.3, 300 sec: 3762.8). Total num frames: 1884160. Throughput: 0: 941.4. Samples: 469768. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-08-11 11:16:05,028][00221] Avg episode reward: [(0, '16.361')] +[2024-08-11 11:16:10,026][00221] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3762.8). Total num frames: 1900544. Throughput: 0: 965.4. Samples: 475708. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-08-11 11:16:10,029][00221] Avg episode reward: [(0, '16.539')] +[2024-08-11 11:16:15,026][00221] Fps is (10 sec: 3276.8, 60 sec: 3686.5, 300 sec: 3776.7). Total num frames: 1916928. Throughput: 0: 933.9. Samples: 477728. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-08-11 11:16:15,032][00221] Avg episode reward: [(0, '16.211')] +[2024-08-11 11:16:16,754][02638] Updated weights for policy 0, policy_version 470 (0.0044) +[2024-08-11 11:16:20,026][00221] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3776.6). Total num frames: 1937408. Throughput: 0: 922.3. Samples: 483696. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-08-11 11:16:20,030][00221] Avg episode reward: [(0, '16.116')] +[2024-08-11 11:16:25,029][00221] Fps is (10 sec: 4504.0, 60 sec: 3891.0, 300 sec: 3790.5). Total num frames: 1961984. Throughput: 0: 983.7. Samples: 490586. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-08-11 11:16:25,035][00221] Avg episode reward: [(0, '15.525')] +[2024-08-11 11:16:26,052][02638] Updated weights for policy 0, policy_version 480 (0.0033) +[2024-08-11 11:16:30,026][00221] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3776.8). Total num frames: 1974272. Throughput: 0: 964.8. Samples: 492772. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-08-11 11:16:30,029][00221] Avg episode reward: [(0, '16.422')] +[2024-08-11 11:16:35,026][00221] Fps is (10 sec: 2868.2, 60 sec: 3686.4, 300 sec: 3776.7). Total num frames: 1990656. Throughput: 0: 911.9. Samples: 497244. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-08-11 11:16:35,028][00221] Avg episode reward: [(0, '18.527')] +[2024-08-11 11:16:35,034][02624] Saving new best policy, reward=18.527! +[2024-08-11 11:16:37,971][02638] Updated weights for policy 0, policy_version 490 (0.0037) +[2024-08-11 11:16:40,026][00221] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3790.5). Total num frames: 2015232. Throughput: 0: 946.3. Samples: 503624. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-08-11 11:16:40,033][00221] Avg episode reward: [(0, '18.219')] +[2024-08-11 11:16:45,026][00221] Fps is (10 sec: 3686.3, 60 sec: 3686.4, 300 sec: 3762.8). Total num frames: 2027520. Throughput: 0: 966.2. Samples: 506634. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-08-11 11:16:45,032][00221] Avg episode reward: [(0, '18.134')] +[2024-08-11 11:16:50,026][00221] Fps is (10 sec: 2867.2, 60 sec: 3618.1, 300 sec: 3776.7). Total num frames: 2043904. Throughput: 0: 905.0. Samples: 510492. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-08-11 11:16:50,031][00221] Avg episode reward: [(0, '18.496')] +[2024-08-11 11:16:50,669][02638] Updated weights for policy 0, policy_version 500 (0.0048) +[2024-08-11 11:16:55,026][00221] Fps is (10 sec: 3686.5, 60 sec: 3686.4, 300 sec: 3776.7). Total num frames: 2064384. Throughput: 0: 910.8. Samples: 516694. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-08-11 11:16:55,032][00221] Avg episode reward: [(0, '17.020')] +[2024-08-11 11:16:59,729][02638] Updated weights for policy 0, policy_version 510 (0.0025) +[2024-08-11 11:17:00,026][00221] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 2088960. Throughput: 0: 943.2. Samples: 520170. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-08-11 11:17:00,030][00221] Avg episode reward: [(0, '16.019')] +[2024-08-11 11:17:05,026][00221] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3790.6). Total num frames: 2101248. Throughput: 0: 922.4. Samples: 525206. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-08-11 11:17:05,042][00221] Avg episode reward: [(0, '16.993')] +[2024-08-11 11:17:10,026][00221] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3790.5). Total num frames: 2121728. Throughput: 0: 883.1. Samples: 530322. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-08-11 11:17:10,028][00221] Avg episode reward: [(0, '17.511')] +[2024-08-11 11:17:10,043][02624] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000518_2121728.pth... +[2024-08-11 11:17:10,207][02624] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000297_1216512.pth +[2024-08-11 11:17:11,817][02638] Updated weights for policy 0, policy_version 520 (0.0028) +[2024-08-11 11:17:15,026][00221] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3776.7). Total num frames: 2142208. Throughput: 0: 908.1. Samples: 533638. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-08-11 11:17:15,031][00221] Avg episode reward: [(0, '20.203')] +[2024-08-11 11:17:15,034][02624] Saving new best policy, reward=20.203! +[2024-08-11 11:17:20,026][00221] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3776.6). Total num frames: 2158592. Throughput: 0: 948.6. Samples: 539930. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-08-11 11:17:20,031][00221] Avg episode reward: [(0, '20.424')] +[2024-08-11 11:17:20,044][02624] Saving new best policy, reward=20.424! +[2024-08-11 11:17:23,197][02638] Updated weights for policy 0, policy_version 530 (0.0018) +[2024-08-11 11:17:25,026][00221] Fps is (10 sec: 3276.8, 60 sec: 3550.1, 300 sec: 3776.7). Total num frames: 2174976. Throughput: 0: 897.3. Samples: 544002. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-08-11 11:17:25,027][00221] Avg episode reward: [(0, '21.389')] +[2024-08-11 11:17:25,033][02624] Saving new best policy, reward=21.389! +[2024-08-11 11:17:30,026][00221] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3776.7). Total num frames: 2195456. Throughput: 0: 900.3. Samples: 547146. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-08-11 11:17:30,028][00221] Avg episode reward: [(0, '20.842')] +[2024-08-11 11:17:32,892][02638] Updated weights for policy 0, policy_version 540 (0.0032) +[2024-08-11 11:17:35,029][00221] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3776.7). Total num frames: 2220032. Throughput: 0: 968.0. Samples: 554050. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-08-11 11:17:35,031][00221] Avg episode reward: [(0, '20.329')] +[2024-08-11 11:17:40,026][00221] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3776.7). Total num frames: 2232320. Throughput: 0: 931.2. Samples: 558598. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-08-11 11:17:40,031][00221] Avg episode reward: [(0, '18.347')] +[2024-08-11 11:17:44,730][02638] Updated weights for policy 0, policy_version 550 (0.0019) +[2024-08-11 11:17:45,026][00221] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3776.7). Total num frames: 2252800. Throughput: 0: 905.3. Samples: 560908. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-08-11 11:17:45,028][00221] Avg episode reward: [(0, '18.289')] +[2024-08-11 11:17:50,026][00221] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3776.6). Total num frames: 2277376. Throughput: 0: 950.2. Samples: 567966. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-08-11 11:17:50,028][00221] Avg episode reward: [(0, '17.787')] +[2024-08-11 11:17:54,442][02638] Updated weights for policy 0, policy_version 560 (0.0037) +[2024-08-11 11:17:55,026][00221] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3776.7). Total num frames: 2293760. Throughput: 0: 966.4. Samples: 573810. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-08-11 11:17:55,031][00221] Avg episode reward: [(0, '17.534')] +[2024-08-11 11:18:00,026][00221] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3790.5). Total num frames: 2310144. Throughput: 0: 938.9. Samples: 575890. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-08-11 11:18:00,028][00221] Avg episode reward: [(0, '18.843')] +[2024-08-11 11:18:05,029][00221] Fps is (10 sec: 3685.2, 60 sec: 3822.7, 300 sec: 3776.6). Total num frames: 2330624. Throughput: 0: 938.6. Samples: 582172. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-08-11 11:18:05,031][00221] Avg episode reward: [(0, '18.744')] +[2024-08-11 11:18:05,111][02638] Updated weights for policy 0, policy_version 570 (0.0045) +[2024-08-11 11:18:10,025][00221] Fps is (10 sec: 4505.7, 60 sec: 3891.2, 300 sec: 3790.5). Total num frames: 2355200. Throughput: 0: 997.6. Samples: 588892. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-08-11 11:18:10,031][00221] Avg episode reward: [(0, '18.280')] +[2024-08-11 11:18:15,026][00221] Fps is (10 sec: 3687.6, 60 sec: 3754.7, 300 sec: 3776.7). Total num frames: 2367488. Throughput: 0: 974.4. Samples: 590992. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-08-11 11:18:15,032][00221] Avg episode reward: [(0, '18.767')] +[2024-08-11 11:18:16,946][02638] Updated weights for policy 0, policy_version 580 (0.0024) +[2024-08-11 11:18:20,026][00221] Fps is (10 sec: 3276.6, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 2387968. Throughput: 0: 934.9. Samples: 596120. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-08-11 11:18:20,029][00221] Avg episode reward: [(0, '18.884')] +[2024-08-11 11:18:25,026][00221] Fps is (10 sec: 4505.5, 60 sec: 3959.4, 300 sec: 3790.5). Total num frames: 2412544. Throughput: 0: 989.5. Samples: 603128. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-08-11 11:18:25,033][00221] Avg episode reward: [(0, '18.700')] +[2024-08-11 11:18:25,704][02638] Updated weights for policy 0, policy_version 590 (0.0041) +[2024-08-11 11:18:30,026][00221] Fps is (10 sec: 4096.2, 60 sec: 3891.2, 300 sec: 3790.5). Total num frames: 2428928. Throughput: 0: 1004.6. Samples: 606114. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-08-11 11:18:30,029][00221] Avg episode reward: [(0, '19.671')] +[2024-08-11 11:18:35,026][00221] Fps is (10 sec: 3276.9, 60 sec: 3754.7, 300 sec: 3790.5). Total num frames: 2445312. Throughput: 0: 942.6. Samples: 610382. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-08-11 11:18:35,032][00221] Avg episode reward: [(0, '19.638')] +[2024-08-11 11:18:37,507][02638] Updated weights for policy 0, policy_version 600 (0.0024) +[2024-08-11 11:18:40,026][00221] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3776.7). Total num frames: 2465792. Throughput: 0: 961.2. Samples: 617062. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-08-11 11:18:40,027][00221] Avg episode reward: [(0, '21.888')] +[2024-08-11 11:18:40,036][02624] Saving new best policy, reward=21.888! +[2024-08-11 11:18:45,028][00221] Fps is (10 sec: 4095.0, 60 sec: 3891.0, 300 sec: 3776.6). Total num frames: 2486272. Throughput: 0: 989.1. Samples: 620400. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-08-11 11:18:45,034][00221] Avg episode reward: [(0, '21.658')] +[2024-08-11 11:18:48,378][02638] Updated weights for policy 0, policy_version 610 (0.0031) +[2024-08-11 11:18:50,026][00221] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3790.6). Total num frames: 2502656. Throughput: 0: 956.7. Samples: 625222. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-08-11 11:18:50,027][00221] Avg episode reward: [(0, '22.797')] +[2024-08-11 11:18:50,044][02624] Saving new best policy, reward=22.797! +[2024-08-11 11:18:55,026][00221] Fps is (10 sec: 3687.3, 60 sec: 3822.9, 300 sec: 3776.7). Total num frames: 2523136. Throughput: 0: 933.5. Samples: 630898. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-08-11 11:18:55,033][00221] Avg episode reward: [(0, '22.441')] +[2024-08-11 11:18:58,269][02638] Updated weights for policy 0, policy_version 620 (0.0031) +[2024-08-11 11:19:00,026][00221] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3790.5). Total num frames: 2547712. Throughput: 0: 964.5. Samples: 634396. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-08-11 11:19:00,032][00221] Avg episode reward: [(0, '21.793')] +[2024-08-11 11:19:05,027][00221] Fps is (10 sec: 3686.0, 60 sec: 3823.1, 300 sec: 3776.6). Total num frames: 2560000. Throughput: 0: 983.5. Samples: 640376. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-08-11 11:19:05,031][00221] Avg episode reward: [(0, '21.292')] +[2024-08-11 11:19:09,890][02638] Updated weights for policy 0, policy_version 630 (0.0044) +[2024-08-11 11:19:10,026][00221] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3790.5). Total num frames: 2580480. Throughput: 0: 927.8. Samples: 644880. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-08-11 11:19:10,032][00221] Avg episode reward: [(0, '21.962')] +[2024-08-11 11:19:10,044][02624] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000630_2580480.pth... +[2024-08-11 11:19:10,171][02624] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000409_1675264.pth +[2024-08-11 11:19:15,026][00221] Fps is (10 sec: 4096.5, 60 sec: 3891.2, 300 sec: 3776.7). Total num frames: 2600960. Throughput: 0: 938.1. Samples: 648328. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2024-08-11 11:19:15,031][00221] Avg episode reward: [(0, '21.234')] +[2024-08-11 11:19:18,803][02638] Updated weights for policy 0, policy_version 640 (0.0038) +[2024-08-11 11:19:20,026][00221] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3776.7). Total num frames: 2621440. Throughput: 0: 997.6. Samples: 655276. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-08-11 11:19:20,028][00221] Avg episode reward: [(0, '20.938')] +[2024-08-11 11:19:25,026][00221] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3790.5). Total num frames: 2637824. Throughput: 0: 945.3. Samples: 659602. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-08-11 11:19:25,028][00221] Avg episode reward: [(0, '22.264')] +[2024-08-11 11:19:30,026][00221] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 2658304. Throughput: 0: 931.9. Samples: 662332. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-08-11 11:19:30,032][00221] Avg episode reward: [(0, '21.439')] +[2024-08-11 11:19:30,586][02638] Updated weights for policy 0, policy_version 650 (0.0031) +[2024-08-11 11:19:35,026][00221] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3790.5). Total num frames: 2682880. Throughput: 0: 984.2. Samples: 669510. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-08-11 11:19:35,033][00221] Avg episode reward: [(0, '19.851')] +[2024-08-11 11:19:40,028][00221] Fps is (10 sec: 3685.5, 60 sec: 3822.8, 300 sec: 3790.5). Total num frames: 2695168. Throughput: 0: 972.0. Samples: 674642. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-08-11 11:19:40,030][00221] Avg episode reward: [(0, '19.685')] +[2024-08-11 11:19:41,972][02638] Updated weights for policy 0, policy_version 660 (0.0046) +[2024-08-11 11:19:45,026][00221] Fps is (10 sec: 2867.2, 60 sec: 3754.8, 300 sec: 3776.7). Total num frames: 2711552. Throughput: 0: 940.8. Samples: 676732. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-08-11 11:19:45,027][00221] Avg episode reward: [(0, '18.727')] +[2024-08-11 11:19:50,026][00221] Fps is (10 sec: 4097.0, 60 sec: 3891.2, 300 sec: 3790.5). Total num frames: 2736128. Throughput: 0: 953.8. Samples: 683294. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-08-11 11:19:50,031][00221] Avg episode reward: [(0, '18.373')] +[2024-08-11 11:19:51,339][02638] Updated weights for policy 0, policy_version 670 (0.0034) +[2024-08-11 11:19:55,026][00221] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3790.5). Total num frames: 2756608. Throughput: 0: 994.2. Samples: 689618. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-08-11 11:19:55,031][00221] Avg episode reward: [(0, '18.635')] +[2024-08-11 11:20:00,026][00221] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3790.6). Total num frames: 2768896. Throughput: 0: 963.8. Samples: 691698. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-08-11 11:20:00,032][00221] Avg episode reward: [(0, '19.018')] +[2024-08-11 11:20:02,983][02638] Updated weights for policy 0, policy_version 680 (0.0031) +[2024-08-11 11:20:05,026][00221] Fps is (10 sec: 3686.4, 60 sec: 3891.3, 300 sec: 3804.4). Total num frames: 2793472. Throughput: 0: 931.8. Samples: 697206. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-08-11 11:20:05,033][00221] Avg episode reward: [(0, '19.700')] +[2024-08-11 11:20:10,026][00221] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3790.6). Total num frames: 2813952. Throughput: 0: 985.4. Samples: 703946. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-08-11 11:20:10,032][00221] Avg episode reward: [(0, '20.203')] +[2024-08-11 11:20:13,090][02638] Updated weights for policy 0, policy_version 690 (0.0025) +[2024-08-11 11:20:15,026][00221] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 2830336. Throughput: 0: 982.6. Samples: 706548. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-08-11 11:20:15,030][00221] Avg episode reward: [(0, '21.855')] +[2024-08-11 11:20:20,026][00221] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3790.5). Total num frames: 2846720. Throughput: 0: 921.0. Samples: 710956. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-08-11 11:20:20,033][00221] Avg episode reward: [(0, '22.166')] +[2024-08-11 11:20:23,950][02638] Updated weights for policy 0, policy_version 700 (0.0041) +[2024-08-11 11:20:25,026][00221] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3804.4). Total num frames: 2871296. Throughput: 0: 958.9. Samples: 717788. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-08-11 11:20:25,028][00221] Avg episode reward: [(0, '22.410')] +[2024-08-11 11:20:30,026][00221] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 2887680. Throughput: 0: 988.7. Samples: 721224. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2024-08-11 11:20:30,028][00221] Avg episode reward: [(0, '23.317')] +[2024-08-11 11:20:30,038][02624] Saving new best policy, reward=23.317! +[2024-08-11 11:20:35,026][00221] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3804.4). Total num frames: 2904064. Throughput: 0: 939.0. Samples: 725548. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-08-11 11:20:35,031][00221] Avg episode reward: [(0, '22.615')] +[2024-08-11 11:20:35,858][02638] Updated weights for policy 0, policy_version 710 (0.0031) +[2024-08-11 11:20:40,026][00221] Fps is (10 sec: 3686.4, 60 sec: 3823.1, 300 sec: 3790.5). Total num frames: 2924544. Throughput: 0: 929.0. Samples: 731422. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-08-11 11:20:40,028][00221] Avg episode reward: [(0, '22.969')] +[2024-08-11 11:20:44,954][02638] Updated weights for policy 0, policy_version 720 (0.0017) +[2024-08-11 11:20:45,026][00221] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3804.4). Total num frames: 2949120. Throughput: 0: 957.4. Samples: 734780. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-08-11 11:20:45,029][00221] Avg episode reward: [(0, '22.177')] +[2024-08-11 11:20:50,026][00221] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3790.5). Total num frames: 2961408. Throughput: 0: 958.8. Samples: 740354. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-08-11 11:20:50,032][00221] Avg episode reward: [(0, '22.404')] +[2024-08-11 11:20:55,026][00221] Fps is (10 sec: 2867.2, 60 sec: 3686.4, 300 sec: 3790.5). Total num frames: 2977792. Throughput: 0: 914.9. Samples: 745116. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-08-11 11:20:55,028][00221] Avg episode reward: [(0, '21.720')] +[2024-08-11 11:20:57,091][02638] Updated weights for policy 0, policy_version 730 (0.0032) +[2024-08-11 11:21:00,026][00221] Fps is (10 sec: 4095.9, 60 sec: 3891.2, 300 sec: 3790.5). Total num frames: 3002368. Throughput: 0: 927.5. Samples: 748288. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-08-11 11:21:00,033][00221] Avg episode reward: [(0, '21.466')] +[2024-08-11 11:21:05,029][00221] Fps is (10 sec: 4094.7, 60 sec: 3754.5, 300 sec: 3790.5). Total num frames: 3018752. Throughput: 0: 970.5. Samples: 754632. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-08-11 11:21:05,032][00221] Avg episode reward: [(0, '19.561')] +[2024-08-11 11:21:08,814][02638] Updated weights for policy 0, policy_version 740 (0.0020) +[2024-08-11 11:21:10,026][00221] Fps is (10 sec: 2867.3, 60 sec: 3618.1, 300 sec: 3776.6). Total num frames: 3031040. Throughput: 0: 905.4. Samples: 758530. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-08-11 11:21:10,028][00221] Avg episode reward: [(0, '22.074')] +[2024-08-11 11:21:10,041][02624] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000740_3031040.pth... +[2024-08-11 11:21:10,218][02624] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000518_2121728.pth +[2024-08-11 11:21:15,026][00221] Fps is (10 sec: 3277.9, 60 sec: 3686.4, 300 sec: 3776.7). Total num frames: 3051520. Throughput: 0: 884.6. Samples: 761030. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-08-11 11:21:15,028][00221] Avg episode reward: [(0, '21.431')] +[2024-08-11 11:21:20,026][00221] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3748.9). Total num frames: 3067904. Throughput: 0: 918.6. Samples: 766884. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-08-11 11:21:20,028][00221] Avg episode reward: [(0, '22.100')] +[2024-08-11 11:21:20,056][02638] Updated weights for policy 0, policy_version 750 (0.0022) +[2024-08-11 11:21:25,026][00221] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3762.8). Total num frames: 3084288. Throughput: 0: 887.8. Samples: 771372. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-08-11 11:21:25,029][00221] Avg episode reward: [(0, '22.699')] +[2024-08-11 11:21:30,026][00221] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3748.9). Total num frames: 3096576. Throughput: 0: 853.2. Samples: 773172. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-08-11 11:21:30,029][00221] Avg episode reward: [(0, '21.897')] +[2024-08-11 11:21:33,078][02638] Updated weights for policy 0, policy_version 760 (0.0020) +[2024-08-11 11:21:35,026][00221] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3748.9). Total num frames: 3121152. Throughput: 0: 861.0. Samples: 779100. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-08-11 11:21:35,028][00221] Avg episode reward: [(0, '23.582')] +[2024-08-11 11:21:35,032][02624] Saving new best policy, reward=23.582! +[2024-08-11 11:21:40,026][00221] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3762.8). Total num frames: 3137536. Throughput: 0: 889.6. Samples: 785146. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-08-11 11:21:40,028][00221] Avg episode reward: [(0, '22.574')] +[2024-08-11 11:21:44,933][02638] Updated weights for policy 0, policy_version 770 (0.0028) +[2024-08-11 11:21:45,031][00221] Fps is (10 sec: 3274.9, 60 sec: 3413.0, 300 sec: 3762.7). Total num frames: 3153920. Throughput: 0: 862.0. Samples: 787082. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-08-11 11:21:45,034][00221] Avg episode reward: [(0, '23.081')] +[2024-08-11 11:21:50,026][00221] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3762.8). Total num frames: 3174400. Throughput: 0: 838.4. Samples: 792356. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-08-11 11:21:50,032][00221] Avg episode reward: [(0, '22.650')] +[2024-08-11 11:21:54,600][02638] Updated weights for policy 0, policy_version 780 (0.0030) +[2024-08-11 11:21:55,026][00221] Fps is (10 sec: 4098.4, 60 sec: 3618.1, 300 sec: 3748.9). Total num frames: 3194880. Throughput: 0: 899.2. Samples: 798994. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-08-11 11:21:55,031][00221] Avg episode reward: [(0, '23.446')] +[2024-08-11 11:22:00,026][00221] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3762.8). Total num frames: 3211264. Throughput: 0: 902.6. Samples: 801648. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-08-11 11:22:00,031][00221] Avg episode reward: [(0, '23.298')] +[2024-08-11 11:22:05,026][00221] Fps is (10 sec: 3276.8, 60 sec: 3481.8, 300 sec: 3748.9). Total num frames: 3227648. Throughput: 0: 868.4. Samples: 805960. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-08-11 11:22:05,027][00221] Avg episode reward: [(0, '23.506')] +[2024-08-11 11:22:06,404][02638] Updated weights for policy 0, policy_version 790 (0.0032) +[2024-08-11 11:22:10,026][00221] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3748.9). Total num frames: 3248128. Throughput: 0: 915.9. Samples: 812588. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-08-11 11:22:10,028][00221] Avg episode reward: [(0, '23.944')] +[2024-08-11 11:22:10,039][02624] Saving new best policy, reward=23.944! +[2024-08-11 11:22:15,026][00221] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3762.8). Total num frames: 3268608. Throughput: 0: 944.7. Samples: 815682. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-08-11 11:22:15,028][00221] Avg episode reward: [(0, '25.160')] +[2024-08-11 11:22:15,033][02624] Saving new best policy, reward=25.160! +[2024-08-11 11:22:17,841][02638] Updated weights for policy 0, policy_version 800 (0.0033) +[2024-08-11 11:22:20,029][00221] Fps is (10 sec: 3275.7, 60 sec: 3549.7, 300 sec: 3748.8). Total num frames: 3280896. Throughput: 0: 910.6. Samples: 820080. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-08-11 11:22:20,031][00221] Avg episode reward: [(0, '25.180')] +[2024-08-11 11:22:20,045][02624] Saving new best policy, reward=25.180! +[2024-08-11 11:22:25,026][00221] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3748.9). Total num frames: 3301376. Throughput: 0: 898.4. Samples: 825572. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-08-11 11:22:25,032][00221] Avg episode reward: [(0, '22.755')] +[2024-08-11 11:22:28,292][02638] Updated weights for policy 0, policy_version 810 (0.0035) +[2024-08-11 11:22:30,026][00221] Fps is (10 sec: 4507.1, 60 sec: 3822.9, 300 sec: 3748.9). Total num frames: 3325952. Throughput: 0: 930.3. Samples: 828938. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-08-11 11:22:30,028][00221] Avg episode reward: [(0, '22.099')] +[2024-08-11 11:22:35,026][00221] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3748.9). Total num frames: 3338240. Throughput: 0: 934.9. Samples: 834426. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-08-11 11:22:35,030][00221] Avg episode reward: [(0, '23.050')] +[2024-08-11 11:22:40,026][00221] Fps is (10 sec: 2867.2, 60 sec: 3618.1, 300 sec: 3735.0). Total num frames: 3354624. Throughput: 0: 889.2. Samples: 839008. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-08-11 11:22:40,028][00221] Avg episode reward: [(0, '21.480')] +[2024-08-11 11:22:40,507][02638] Updated weights for policy 0, policy_version 820 (0.0047) +[2024-08-11 11:22:45,026][00221] Fps is (10 sec: 3686.4, 60 sec: 3686.8, 300 sec: 3721.1). Total num frames: 3375104. Throughput: 0: 901.8. Samples: 842228. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-08-11 11:22:45,032][00221] Avg episode reward: [(0, '19.371')] +[2024-08-11 11:22:50,026][00221] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3735.0). Total num frames: 3395584. Throughput: 0: 955.3. Samples: 848948. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-08-11 11:22:50,028][00221] Avg episode reward: [(0, '20.863')] +[2024-08-11 11:22:50,688][02638] Updated weights for policy 0, policy_version 830 (0.0026) +[2024-08-11 11:22:55,027][00221] Fps is (10 sec: 3276.4, 60 sec: 3549.8, 300 sec: 3721.1). Total num frames: 3407872. Throughput: 0: 895.8. Samples: 852902. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-08-11 11:22:55,031][00221] Avg episode reward: [(0, '21.220')] +[2024-08-11 11:23:00,026][00221] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3735.0). Total num frames: 3432448. Throughput: 0: 891.3. Samples: 855792. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-08-11 11:23:00,032][00221] Avg episode reward: [(0, '20.061')] +[2024-08-11 11:23:01,824][02638] Updated weights for policy 0, policy_version 840 (0.0037) +[2024-08-11 11:23:05,026][00221] Fps is (10 sec: 4506.1, 60 sec: 3754.7, 300 sec: 3721.1). Total num frames: 3452928. Throughput: 0: 943.1. Samples: 862516. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-08-11 11:23:05,031][00221] Avg episode reward: [(0, '21.955')] +[2024-08-11 11:23:10,026][00221] Fps is (10 sec: 3686.3, 60 sec: 3686.4, 300 sec: 3735.0). Total num frames: 3469312. Throughput: 0: 933.4. Samples: 867574. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-08-11 11:23:10,032][00221] Avg episode reward: [(0, '22.055')] +[2024-08-11 11:23:10,045][02624] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000847_3469312.pth... +[2024-08-11 11:23:10,206][02624] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000630_2580480.pth +[2024-08-11 11:23:13,874][02638] Updated weights for policy 0, policy_version 850 (0.0016) +[2024-08-11 11:23:15,026][00221] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3721.1). Total num frames: 3485696. Throughput: 0: 902.0. Samples: 869530. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-08-11 11:23:15,032][00221] Avg episode reward: [(0, '22.322')] +[2024-08-11 11:23:20,026][00221] Fps is (10 sec: 3686.5, 60 sec: 3754.9, 300 sec: 3707.2). Total num frames: 3506176. Throughput: 0: 925.4. Samples: 876068. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-08-11 11:23:20,028][00221] Avg episode reward: [(0, '25.090')] +[2024-08-11 11:23:22,905][02638] Updated weights for policy 0, policy_version 860 (0.0042) +[2024-08-11 11:23:25,027][00221] Fps is (10 sec: 4095.5, 60 sec: 3754.6, 300 sec: 3721.1). Total num frames: 3526656. Throughput: 0: 958.1. Samples: 882124. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-08-11 11:23:25,029][00221] Avg episode reward: [(0, '25.432')] +[2024-08-11 11:23:25,033][02624] Saving new best policy, reward=25.432! +[2024-08-11 11:23:30,026][00221] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3707.2). Total num frames: 3538944. Throughput: 0: 931.6. Samples: 884152. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-08-11 11:23:30,029][00221] Avg episode reward: [(0, '26.744')] +[2024-08-11 11:23:30,045][02624] Saving new best policy, reward=26.744! +[2024-08-11 11:23:35,026][00221] Fps is (10 sec: 3277.2, 60 sec: 3686.4, 300 sec: 3707.2). Total num frames: 3559424. Throughput: 0: 894.0. Samples: 889176. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-08-11 11:23:35,028][00221] Avg episode reward: [(0, '26.132')] +[2024-08-11 11:23:35,571][02638] Updated weights for policy 0, policy_version 870 (0.0027) +[2024-08-11 11:23:40,026][00221] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3707.3). Total num frames: 3579904. Throughput: 0: 952.1. Samples: 895744. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-08-11 11:23:40,029][00221] Avg episode reward: [(0, '27.634')] +[2024-08-11 11:23:40,044][02624] Saving new best policy, reward=27.634! +[2024-08-11 11:23:45,029][00221] Fps is (10 sec: 3685.1, 60 sec: 3686.2, 300 sec: 3707.2). Total num frames: 3596288. Throughput: 0: 944.9. Samples: 898314. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-08-11 11:23:45,031][00221] Avg episode reward: [(0, '26.240')] +[2024-08-11 11:23:47,467][02638] Updated weights for policy 0, policy_version 880 (0.0026) +[2024-08-11 11:23:50,026][00221] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3693.3). Total num frames: 3612672. Throughput: 0: 888.8. Samples: 902512. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-08-11 11:23:50,030][00221] Avg episode reward: [(0, '25.763')] +[2024-08-11 11:23:55,026][00221] Fps is (10 sec: 4097.3, 60 sec: 3823.0, 300 sec: 3693.3). Total num frames: 3637248. Throughput: 0: 923.2. Samples: 909118. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-08-11 11:23:55,033][00221] Avg episode reward: [(0, '25.150')] +[2024-08-11 11:23:56,670][02638] Updated weights for policy 0, policy_version 890 (0.0024) +[2024-08-11 11:24:00,026][00221] Fps is (10 sec: 4096.1, 60 sec: 3686.4, 300 sec: 3707.2). Total num frames: 3653632. Throughput: 0: 956.8. Samples: 912586. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2024-08-11 11:24:00,032][00221] Avg episode reward: [(0, '26.175')] +[2024-08-11 11:24:05,026][00221] Fps is (10 sec: 2867.3, 60 sec: 3549.9, 300 sec: 3679.5). Total num frames: 3665920. Throughput: 0: 905.7. Samples: 916824. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-08-11 11:24:05,032][00221] Avg episode reward: [(0, '26.069')] +[2024-08-11 11:24:08,956][02638] Updated weights for policy 0, policy_version 900 (0.0036) +[2024-08-11 11:24:10,026][00221] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3693.3). Total num frames: 3690496. Throughput: 0: 899.4. Samples: 922596. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-08-11 11:24:10,028][00221] Avg episode reward: [(0, '25.586')] +[2024-08-11 11:24:15,026][00221] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3693.3). Total num frames: 3710976. Throughput: 0: 929.9. Samples: 925998. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-08-11 11:24:15,034][00221] Avg episode reward: [(0, '26.683')] +[2024-08-11 11:24:20,026][00221] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3679.5). Total num frames: 3723264. Throughput: 0: 934.2. Samples: 931216. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-08-11 11:24:20,033][00221] Avg episode reward: [(0, '26.923')] +[2024-08-11 11:24:20,185][02638] Updated weights for policy 0, policy_version 910 (0.0029) +[2024-08-11 11:24:25,026][00221] Fps is (10 sec: 2867.1, 60 sec: 3549.9, 300 sec: 3665.6). Total num frames: 3739648. Throughput: 0: 888.5. Samples: 935726. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-08-11 11:24:25,033][00221] Avg episode reward: [(0, '27.951')] +[2024-08-11 11:24:25,125][02624] Saving new best policy, reward=27.951! +[2024-08-11 11:24:30,026][00221] Fps is (10 sec: 4095.9, 60 sec: 3754.7, 300 sec: 3665.6). Total num frames: 3764224. Throughput: 0: 901.6. Samples: 938884. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-08-11 11:24:30,033][00221] Avg episode reward: [(0, '28.608')] +[2024-08-11 11:24:30,044][02624] Saving new best policy, reward=28.608! +[2024-08-11 11:24:30,747][02638] Updated weights for policy 0, policy_version 920 (0.0023) +[2024-08-11 11:24:35,026][00221] Fps is (10 sec: 4096.2, 60 sec: 3686.4, 300 sec: 3679.5). Total num frames: 3780608. Throughput: 0: 950.2. Samples: 945272. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-08-11 11:24:35,028][00221] Avg episode reward: [(0, '28.215')] +[2024-08-11 11:24:40,026][00221] Fps is (10 sec: 2867.3, 60 sec: 3549.9, 300 sec: 3665.6). Total num frames: 3792896. Throughput: 0: 892.0. Samples: 949260. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-08-11 11:24:40,034][00221] Avg episode reward: [(0, '29.099')] +[2024-08-11 11:24:40,092][02624] Saving new best policy, reward=29.099! +[2024-08-11 11:24:43,063][02638] Updated weights for policy 0, policy_version 930 (0.0028) +[2024-08-11 11:24:45,026][00221] Fps is (10 sec: 3686.4, 60 sec: 3686.6, 300 sec: 3665.6). Total num frames: 3817472. Throughput: 0: 876.8. Samples: 952044. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-08-11 11:24:45,027][00221] Avg episode reward: [(0, '27.512')] +[2024-08-11 11:24:50,026][00221] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3665.6). Total num frames: 3837952. Throughput: 0: 933.5. Samples: 958832. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-08-11 11:24:50,032][00221] Avg episode reward: [(0, '26.756')] +[2024-08-11 11:24:53,269][02638] Updated weights for policy 0, policy_version 940 (0.0018) +[2024-08-11 11:24:55,030][00221] Fps is (10 sec: 3684.8, 60 sec: 3617.9, 300 sec: 3679.4). Total num frames: 3854336. Throughput: 0: 916.2. Samples: 963830. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-08-11 11:24:55,032][00221] Avg episode reward: [(0, '26.608')] +[2024-08-11 11:25:00,026][00221] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3651.7). Total num frames: 3870720. Throughput: 0: 887.2. Samples: 965920. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-08-11 11:25:00,033][00221] Avg episode reward: [(0, '24.298')] +[2024-08-11 11:25:04,216][02638] Updated weights for policy 0, policy_version 950 (0.0019) +[2024-08-11 11:25:05,026][00221] Fps is (10 sec: 3688.1, 60 sec: 3754.7, 300 sec: 3651.7). Total num frames: 3891200. Throughput: 0: 914.5. Samples: 972370. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-08-11 11:25:05,034][00221] Avg episode reward: [(0, '22.812')] +[2024-08-11 11:25:10,026][00221] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3665.6). Total num frames: 3911680. Throughput: 0: 952.8. Samples: 978600. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-08-11 11:25:10,033][00221] Avg episode reward: [(0, '22.324')] +[2024-08-11 11:25:10,044][02624] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000955_3911680.pth... +[2024-08-11 11:25:10,219][02624] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000740_3031040.pth +[2024-08-11 11:25:15,026][00221] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3651.7). Total num frames: 3923968. Throughput: 0: 925.5. Samples: 980532. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-08-11 11:25:15,027][00221] Avg episode reward: [(0, '23.255')] +[2024-08-11 11:25:16,408][02638] Updated weights for policy 0, policy_version 960 (0.0037) +[2024-08-11 11:25:20,026][00221] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3637.8). Total num frames: 3944448. Throughput: 0: 897.4. Samples: 985654. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-08-11 11:25:20,032][00221] Avg episode reward: [(0, '24.524')] +[2024-08-11 11:25:25,026][00221] Fps is (10 sec: 4505.6, 60 sec: 3823.0, 300 sec: 3665.6). Total num frames: 3969024. Throughput: 0: 958.1. Samples: 992374. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-08-11 11:25:25,028][00221] Avg episode reward: [(0, '24.152')] +[2024-08-11 11:25:25,679][02638] Updated weights for policy 0, policy_version 970 (0.0017) +[2024-08-11 11:25:30,028][00221] Fps is (10 sec: 3685.5, 60 sec: 3618.0, 300 sec: 3651.7). Total num frames: 3981312. Throughput: 0: 951.8. Samples: 994876. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-08-11 11:25:30,036][00221] Avg episode reward: [(0, '25.939')] +[2024-08-11 11:25:35,026][00221] Fps is (10 sec: 2867.2, 60 sec: 3618.1, 300 sec: 3637.8). Total num frames: 3997696. Throughput: 0: 887.3. Samples: 998762. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-08-11 11:25:35,032][00221] Avg episode reward: [(0, '24.894')] +[2024-08-11 11:25:36,308][02624] Stopping Batcher_0... +[2024-08-11 11:25:36,309][02624] Loop batcher_evt_loop terminating... +[2024-08-11 11:25:36,309][00221] Component Batcher_0 stopped! +[2024-08-11 11:25:36,311][02624] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... +[2024-08-11 11:25:36,395][00221] Component RolloutWorker_w1 stopped! +[2024-08-11 11:25:36,400][02643] Stopping RolloutWorker_w5... +[2024-08-11 11:25:36,402][00221] Component RolloutWorker_w5 stopped! +[2024-08-11 11:25:36,410][02638] Weights refcount: 2 0 +[2024-08-11 11:25:36,395][02639] Stopping RolloutWorker_w1... +[2024-08-11 11:25:36,403][02643] Loop rollout_proc5_evt_loop terminating... +[2024-08-11 11:25:36,415][00221] Component RolloutWorker_w4 stopped! +[2024-08-11 11:25:36,419][02642] Stopping RolloutWorker_w4... +[2024-08-11 11:25:36,421][00221] Component InferenceWorker_p0-w0 stopped! +[2024-08-11 11:25:36,413][02639] Loop rollout_proc1_evt_loop terminating... +[2024-08-11 11:25:36,425][02638] Stopping InferenceWorker_p0-w0... +[2024-08-11 11:25:36,426][02638] Loop inference_proc0-0_evt_loop terminating... +[2024-08-11 11:25:36,433][02644] Stopping RolloutWorker_w6... +[2024-08-11 11:25:36,433][00221] Component RolloutWorker_w6 stopped! +[2024-08-11 11:25:36,434][00221] Component RolloutWorker_w7 stopped! +[2024-08-11 11:25:36,434][02642] Loop rollout_proc4_evt_loop terminating... +[2024-08-11 11:25:36,443][02644] Loop rollout_proc6_evt_loop terminating... +[2024-08-11 11:25:36,433][02645] Stopping RolloutWorker_w7... +[2024-08-11 11:25:36,449][00221] Component RolloutWorker_w2 stopped! +[2024-08-11 11:25:36,455][00221] Component RolloutWorker_w0 stopped! +[2024-08-11 11:25:36,449][02645] Loop rollout_proc7_evt_loop terminating... +[2024-08-11 11:25:36,451][02637] Stopping RolloutWorker_w0... +[2024-08-11 11:25:36,463][02637] Loop rollout_proc0_evt_loop terminating... +[2024-08-11 11:25:36,449][02640] Stopping RolloutWorker_w2... +[2024-08-11 11:25:36,469][02641] Stopping RolloutWorker_w3... +[2024-08-11 11:25:36,469][00221] Component RolloutWorker_w3 stopped! +[2024-08-11 11:25:36,472][02641] Loop rollout_proc3_evt_loop terminating... +[2024-08-11 11:25:36,468][02640] Loop rollout_proc2_evt_loop terminating... +[2024-08-11 11:25:36,497][02624] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000847_3469312.pth +[2024-08-11 11:25:36,513][02624] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... +[2024-08-11 11:25:36,690][00221] Component LearnerWorker_p0 stopped! +[2024-08-11 11:25:36,690][02624] Stopping LearnerWorker_p0... +[2024-08-11 11:25:36,693][02624] Loop learner_proc0_evt_loop terminating... +[2024-08-11 11:25:36,693][00221] Waiting for process learner_proc0 to stop... +[2024-08-11 11:25:38,211][00221] Waiting for process inference_proc0-0 to join... +[2024-08-11 11:25:38,216][00221] Waiting for process rollout_proc0 to join... +[2024-08-11 11:25:40,086][00221] Waiting for process rollout_proc1 to join... +[2024-08-11 11:25:40,090][00221] Waiting for process rollout_proc2 to join... +[2024-08-11 11:25:40,095][00221] Waiting for process rollout_proc3 to join... +[2024-08-11 11:25:40,099][00221] Waiting for process rollout_proc4 to join... +[2024-08-11 11:25:40,104][00221] Waiting for process rollout_proc5 to join... +[2024-08-11 11:25:40,108][00221] Waiting for process rollout_proc6 to join... +[2024-08-11 11:25:40,111][00221] Waiting for process rollout_proc7 to join... +[2024-08-11 11:25:40,114][00221] Batcher 0 profile tree view: +batching: 27.8949, releasing_batches: 0.0314 +[2024-08-11 11:25:40,118][00221] InferenceWorker_p0-w0 profile tree view: +wait_policy: 0.0049 + wait_policy_total: 411.5365 +update_model: 9.2979 + weight_update: 0.0019 +one_step: 0.0032 + handle_policy_step: 615.1092 + deserialize: 16.0363, stack: 3.0893, obs_to_device_normalize: 124.3913, forward: 328.9067, send_messages: 30.8385 + prepare_outputs: 82.2664 + to_cpu: 47.2236 +[2024-08-11 11:25:40,120][00221] Learner 0 profile tree view: +misc: 0.0073, prepare_batch: 14.9899 +train: 73.7830 + epoch_init: 0.0057, minibatch_init: 0.0180, losses_postprocess: 0.6744, kl_divergence: 0.6177, after_optimizer: 34.0486 + calculate_losses: 25.9707 + losses_init: 0.0037, forward_head: 1.3265, bptt_initial: 16.9622, tail: 1.1766, advantages_returns: 0.2774, losses: 3.7369 + bptt: 2.0988 + bptt_forward_core: 2.0037 + update: 11.7642 + clip: 0.9665 +[2024-08-11 11:25:40,123][00221] RolloutWorker_w0 profile tree view: +wait_for_trajectories: 0.3571, enqueue_policy_requests: 101.0353, env_step: 841.4191, overhead: 14.3779, complete_rollouts: 7.6473 +save_policy_outputs: 21.0938 + split_output_tensors: 8.3709 +[2024-08-11 11:25:40,124][00221] RolloutWorker_w7 profile tree view: +wait_for_trajectories: 0.3584, enqueue_policy_requests: 103.4353, env_step: 837.6421, overhead: 14.2160, complete_rollouts: 6.8723 +save_policy_outputs: 21.4510 + split_output_tensors: 8.8589 +[2024-08-11 11:25:40,125][00221] Loop Runner_EvtLoop terminating... +[2024-08-11 11:25:40,127][00221] Runner profile tree view: +main_loop: 1105.7774 +[2024-08-11 11:25:40,129][00221] Collected {0: 4005888}, FPS: 3622.7 +[2024-08-11 11:25:40,585][00221] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json +[2024-08-11 11:25:40,587][00221] Overriding arg 'num_workers' with value 1 passed from command line +[2024-08-11 11:25:40,590][00221] Adding new argument 'no_render'=True that is not in the saved config file! +[2024-08-11 11:25:40,593][00221] Adding new argument 'save_video'=True that is not in the saved config file! +[2024-08-11 11:25:40,594][00221] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! +[2024-08-11 11:25:40,597][00221] Adding new argument 'video_name'=None that is not in the saved config file! +[2024-08-11 11:25:40,598][00221] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! +[2024-08-11 11:25:40,599][00221] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! +[2024-08-11 11:25:40,601][00221] Adding new argument 'push_to_hub'=False that is not in the saved config file! +[2024-08-11 11:25:40,602][00221] Adding new argument 'hf_repository'=None that is not in the saved config file! +[2024-08-11 11:25:40,603][00221] Adding new argument 'policy_index'=0 that is not in the saved config file! +[2024-08-11 11:25:40,604][00221] Adding new argument 'eval_deterministic'=False that is not in the saved config file! +[2024-08-11 11:25:40,605][00221] Adding new argument 'train_script'=None that is not in the saved config file! +[2024-08-11 11:25:40,606][00221] Adding new argument 'enjoy_script'=None that is not in the saved config file! +[2024-08-11 11:25:40,607][00221] Using frameskip 1 and render_action_repeat=4 for evaluation +[2024-08-11 11:25:40,640][00221] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-08-11 11:25:40,644][00221] RunningMeanStd input shape: (3, 72, 128) +[2024-08-11 11:25:40,646][00221] RunningMeanStd input shape: (1,) +[2024-08-11 11:25:40,666][00221] ConvEncoder: input_channels=3 +[2024-08-11 11:25:40,772][00221] Conv encoder output size: 512 +[2024-08-11 11:25:40,774][00221] Policy head output size: 512 +[2024-08-11 11:25:40,965][00221] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... +[2024-08-11 11:25:41,779][00221] Num frames 100... +[2024-08-11 11:25:41,899][00221] Num frames 200... +[2024-08-11 11:25:42,030][00221] Num frames 300... +[2024-08-11 11:25:42,151][00221] Num frames 400... +[2024-08-11 11:25:42,270][00221] Num frames 500... +[2024-08-11 11:25:42,394][00221] Num frames 600... +[2024-08-11 11:25:42,516][00221] Num frames 700... +[2024-08-11 11:25:42,641][00221] Num frames 800... +[2024-08-11 11:25:42,764][00221] Num frames 900... +[2024-08-11 11:25:42,888][00221] Num frames 1000... +[2024-08-11 11:25:43,019][00221] Num frames 1100... +[2024-08-11 11:25:43,172][00221] Num frames 1200... +[2024-08-11 11:25:43,347][00221] Num frames 1300... +[2024-08-11 11:25:43,513][00221] Num frames 1400... +[2024-08-11 11:25:43,680][00221] Num frames 1500... +[2024-08-11 11:25:43,848][00221] Num frames 1600... +[2024-08-11 11:25:44,016][00221] Num frames 1700... +[2024-08-11 11:25:44,197][00221] Num frames 1800... +[2024-08-11 11:25:44,368][00221] Num frames 1900... +[2024-08-11 11:25:44,542][00221] Num frames 2000... +[2024-08-11 11:25:44,714][00221] Num frames 2100... +[2024-08-11 11:25:44,769][00221] Avg episode rewards: #0: 58.999, true rewards: #0: 21.000 +[2024-08-11 11:25:44,771][00221] Avg episode reward: 58.999, avg true_objective: 21.000 +[2024-08-11 11:25:44,944][00221] Num frames 2200... +[2024-08-11 11:25:45,122][00221] Num frames 2300... +[2024-08-11 11:25:45,294][00221] Num frames 2400... +[2024-08-11 11:25:45,470][00221] Num frames 2500... +[2024-08-11 11:25:45,638][00221] Num frames 2600... +[2024-08-11 11:25:45,757][00221] Num frames 2700... +[2024-08-11 11:25:45,876][00221] Num frames 2800... +[2024-08-11 11:25:45,994][00221] Num frames 2900... +[2024-08-11 11:25:46,121][00221] Num frames 3000... +[2024-08-11 11:25:46,250][00221] Num frames 3100... +[2024-08-11 11:25:46,375][00221] Num frames 3200... +[2024-08-11 11:25:46,500][00221] Num frames 3300... +[2024-08-11 11:25:46,604][00221] Avg episode rewards: #0: 47.184, true rewards: #0: 16.685 +[2024-08-11 11:25:46,605][00221] Avg episode reward: 47.184, avg true_objective: 16.685 +[2024-08-11 11:25:46,681][00221] Num frames 3400... +[2024-08-11 11:25:46,803][00221] Num frames 3500... +[2024-08-11 11:25:46,921][00221] Num frames 3600... +[2024-08-11 11:25:47,038][00221] Num frames 3700... +[2024-08-11 11:25:47,173][00221] Num frames 3800... +[2024-08-11 11:25:47,291][00221] Num frames 3900... +[2024-08-11 11:25:47,412][00221] Num frames 4000... +[2024-08-11 11:25:47,496][00221] Avg episode rewards: #0: 37.746, true rewards: #0: 13.413 +[2024-08-11 11:25:47,498][00221] Avg episode reward: 37.746, avg true_objective: 13.413 +[2024-08-11 11:25:47,601][00221] Num frames 4100... +[2024-08-11 11:25:47,722][00221] Num frames 4200... +[2024-08-11 11:25:47,844][00221] Num frames 4300... +[2024-08-11 11:25:47,967][00221] Num frames 4400... +[2024-08-11 11:25:48,085][00221] Num frames 4500... +[2024-08-11 11:25:48,223][00221] Num frames 4600... +[2024-08-11 11:25:48,342][00221] Num frames 4700... +[2024-08-11 11:25:48,462][00221] Num frames 4800... +[2024-08-11 11:25:48,587][00221] Num frames 4900... +[2024-08-11 11:25:48,720][00221] Num frames 5000... +[2024-08-11 11:25:48,846][00221] Num frames 5100... +[2024-08-11 11:25:48,969][00221] Num frames 5200... +[2024-08-11 11:25:49,090][00221] Num frames 5300... +[2024-08-11 11:25:49,194][00221] Avg episode rewards: #0: 35.590, true rewards: #0: 13.340 +[2024-08-11 11:25:49,196][00221] Avg episode reward: 35.590, avg true_objective: 13.340 +[2024-08-11 11:25:49,276][00221] Num frames 5400... +[2024-08-11 11:25:49,394][00221] Num frames 5500... +[2024-08-11 11:25:49,520][00221] Num frames 5600... +[2024-08-11 11:25:49,640][00221] Num frames 5700... +[2024-08-11 11:25:49,768][00221] Num frames 5800... +[2024-08-11 11:25:49,899][00221] Num frames 5900... +[2024-08-11 11:25:50,029][00221] Num frames 6000... +[2024-08-11 11:25:50,156][00221] Num frames 6100... +[2024-08-11 11:25:50,286][00221] Num frames 6200... +[2024-08-11 11:25:50,404][00221] Num frames 6300... +[2024-08-11 11:25:50,529][00221] Num frames 6400... +[2024-08-11 11:25:50,650][00221] Num frames 6500... +[2024-08-11 11:25:50,776][00221] Num frames 6600... +[2024-08-11 11:25:50,896][00221] Num frames 6700... +[2024-08-11 11:25:51,020][00221] Num frames 6800... +[2024-08-11 11:25:51,145][00221] Num frames 6900... +[2024-08-11 11:25:51,278][00221] Num frames 7000... +[2024-08-11 11:25:51,403][00221] Num frames 7100... +[2024-08-11 11:25:51,505][00221] Avg episode rewards: #0: 36.672, true rewards: #0: 14.272 +[2024-08-11 11:25:51,507][00221] Avg episode reward: 36.672, avg true_objective: 14.272 +[2024-08-11 11:25:51,588][00221] Num frames 7200... +[2024-08-11 11:25:51,707][00221] Num frames 7300... +[2024-08-11 11:25:51,827][00221] Num frames 7400... +[2024-08-11 11:25:51,948][00221] Num frames 7500... +[2024-08-11 11:25:52,069][00221] Num frames 7600... +[2024-08-11 11:25:52,198][00221] Num frames 7700... +[2024-08-11 11:25:52,325][00221] Num frames 7800... +[2024-08-11 11:25:52,444][00221] Num frames 7900... +[2024-08-11 11:25:52,567][00221] Num frames 8000... +[2024-08-11 11:25:52,687][00221] Num frames 8100... +[2024-08-11 11:25:52,853][00221] Avg episode rewards: #0: 34.320, true rewards: #0: 13.653 +[2024-08-11 11:25:52,856][00221] Avg episode reward: 34.320, avg true_objective: 13.653 +[2024-08-11 11:25:52,869][00221] Num frames 8200... +[2024-08-11 11:25:52,984][00221] Num frames 8300... +[2024-08-11 11:25:53,110][00221] Num frames 8400... +[2024-08-11 11:25:53,229][00221] Num frames 8500... +[2024-08-11 11:25:53,359][00221] Num frames 8600... +[2024-08-11 11:25:53,477][00221] Num frames 8700... +[2024-08-11 11:25:53,598][00221] Num frames 8800... +[2024-08-11 11:25:53,719][00221] Num frames 8900... +[2024-08-11 11:25:53,842][00221] Num frames 9000... +[2024-08-11 11:25:53,964][00221] Num frames 9100... +[2024-08-11 11:25:54,049][00221] Avg episode rewards: #0: 32.317, true rewards: #0: 13.031 +[2024-08-11 11:25:54,050][00221] Avg episode reward: 32.317, avg true_objective: 13.031 +[2024-08-11 11:25:54,156][00221] Num frames 9200... +[2024-08-11 11:25:54,278][00221] Num frames 9300... +[2024-08-11 11:25:54,409][00221] Num frames 9400... +[2024-08-11 11:25:54,533][00221] Num frames 9500... +[2024-08-11 11:25:54,650][00221] Num frames 9600... +[2024-08-11 11:25:54,770][00221] Num frames 9700... +[2024-08-11 11:25:54,887][00221] Num frames 9800... +[2024-08-11 11:25:55,009][00221] Num frames 9900... +[2024-08-11 11:25:55,136][00221] Num frames 10000... +[2024-08-11 11:25:55,258][00221] Num frames 10100... +[2024-08-11 11:25:55,385][00221] Num frames 10200... +[2024-08-11 11:25:55,507][00221] Num frames 10300... +[2024-08-11 11:25:55,645][00221] Num frames 10400... +[2024-08-11 11:25:55,821][00221] Num frames 10500... +[2024-08-11 11:25:55,992][00221] Avg episode rewards: #0: 32.702, true rewards: #0: 13.202 +[2024-08-11 11:25:55,994][00221] Avg episode reward: 32.702, avg true_objective: 13.202 +[2024-08-11 11:25:56,066][00221] Num frames 10600... +[2024-08-11 11:25:56,247][00221] Num frames 10700... +[2024-08-11 11:25:56,420][00221] Num frames 10800... +[2024-08-11 11:25:56,580][00221] Num frames 10900... +[2024-08-11 11:25:56,742][00221] Num frames 11000... +[2024-08-11 11:25:56,912][00221] Num frames 11100... +[2024-08-11 11:25:57,084][00221] Num frames 11200... +[2024-08-11 11:25:57,253][00221] Num frames 11300... +[2024-08-11 11:25:57,435][00221] Num frames 11400... +[2024-08-11 11:25:57,608][00221] Num frames 11500... +[2024-08-11 11:25:57,807][00221] Avg episode rewards: #0: 31.762, true rewards: #0: 12.873 +[2024-08-11 11:25:57,809][00221] Avg episode reward: 31.762, avg true_objective: 12.873 +[2024-08-11 11:25:57,837][00221] Num frames 11600... +[2024-08-11 11:25:58,020][00221] Num frames 11700... +[2024-08-11 11:25:58,177][00221] Num frames 11800... +[2024-08-11 11:25:58,303][00221] Num frames 11900... +[2024-08-11 11:25:58,424][00221] Num frames 12000... +[2024-08-11 11:25:58,551][00221] Num frames 12100... +[2024-08-11 11:25:58,672][00221] Num frames 12200... +[2024-08-11 11:25:58,796][00221] Avg episode rewards: #0: 30.058, true rewards: #0: 12.258 +[2024-08-11 11:25:58,798][00221] Avg episode reward: 30.058, avg true_objective: 12.258 +[2024-08-11 11:27:14,671][00221] Replay video saved to /content/train_dir/default_experiment/replay.mp4! +[2024-08-11 11:29:31,052][00221] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json +[2024-08-11 11:29:31,054][00221] Overriding arg 'num_workers' with value 1 passed from command line +[2024-08-11 11:29:31,056][00221] Adding new argument 'no_render'=True that is not in the saved config file! +[2024-08-11 11:29:31,058][00221] Adding new argument 'save_video'=True that is not in the saved config file! +[2024-08-11 11:29:31,059][00221] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! +[2024-08-11 11:29:31,061][00221] Adding new argument 'video_name'=None that is not in the saved config file! +[2024-08-11 11:29:31,063][00221] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! +[2024-08-11 11:29:31,067][00221] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! +[2024-08-11 11:29:31,069][00221] Adding new argument 'push_to_hub'=True that is not in the saved config file! +[2024-08-11 11:29:31,070][00221] Adding new argument 'hf_repository'='ThomasSimonini/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! +[2024-08-11 11:29:31,071][00221] Adding new argument 'policy_index'=0 that is not in the saved config file! +[2024-08-11 11:29:31,072][00221] Adding new argument 'eval_deterministic'=False that is not in the saved config file! +[2024-08-11 11:29:31,073][00221] Adding new argument 'train_script'=None that is not in the saved config file! +[2024-08-11 11:29:31,074][00221] Adding new argument 'enjoy_script'=None that is not in the saved config file! +[2024-08-11 11:29:31,075][00221] Using frameskip 1 and render_action_repeat=4 for evaluation +[2024-08-11 11:29:31,108][00221] RunningMeanStd input shape: (3, 72, 128) +[2024-08-11 11:29:31,110][00221] RunningMeanStd input shape: (1,) +[2024-08-11 11:29:31,126][00221] ConvEncoder: input_channels=3 +[2024-08-11 11:29:31,172][00221] Conv encoder output size: 512 +[2024-08-11 11:29:31,174][00221] Policy head output size: 512 +[2024-08-11 11:29:31,194][00221] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... +[2024-08-11 11:29:31,617][00221] Num frames 100... +[2024-08-11 11:29:31,747][00221] Num frames 200... +[2024-08-11 11:29:31,876][00221] Num frames 300... +[2024-08-11 11:29:31,998][00221] Num frames 400... +[2024-08-11 11:29:32,122][00221] Num frames 500... +[2024-08-11 11:29:32,246][00221] Num frames 600... +[2024-08-11 11:29:32,369][00221] Num frames 700... +[2024-08-11 11:29:32,493][00221] Num frames 800... +[2024-08-11 11:29:32,618][00221] Num frames 900... +[2024-08-11 11:29:32,747][00221] Num frames 1000... +[2024-08-11 11:29:32,873][00221] Num frames 1100... +[2024-08-11 11:29:32,992][00221] Num frames 1200... +[2024-08-11 11:29:33,119][00221] Num frames 1300... +[2024-08-11 11:29:33,278][00221] Num frames 1400... +[2024-08-11 11:29:33,350][00221] Avg episode rewards: #0: 33.080, true rewards: #0: 14.080 +[2024-08-11 11:29:33,352][00221] Avg episode reward: 33.080, avg true_objective: 14.080 +[2024-08-11 11:29:33,509][00221] Num frames 1500... +[2024-08-11 11:29:33,671][00221] Num frames 1600... +[2024-08-11 11:29:33,849][00221] Num frames 1700... +[2024-08-11 11:29:34,019][00221] Num frames 1800... +[2024-08-11 11:29:34,196][00221] Num frames 1900... +[2024-08-11 11:29:34,355][00221] Num frames 2000... +[2024-08-11 11:29:34,519][00221] Num frames 2100... +[2024-08-11 11:29:34,685][00221] Num frames 2200... +[2024-08-11 11:29:34,864][00221] Num frames 2300... +[2024-08-11 11:29:35,037][00221] Num frames 2400... +[2024-08-11 11:29:35,219][00221] Num frames 2500... +[2024-08-11 11:29:35,433][00221] Avg episode rewards: #0: 30.960, true rewards: #0: 12.960 +[2024-08-11 11:29:35,436][00221] Avg episode reward: 30.960, avg true_objective: 12.960 +[2024-08-11 11:29:35,454][00221] Num frames 2600... +[2024-08-11 11:29:35,623][00221] Num frames 2700... +[2024-08-11 11:29:35,774][00221] Num frames 2800... +[2024-08-11 11:29:35,898][00221] Num frames 2900... +[2024-08-11 11:29:36,018][00221] Num frames 3000... +[2024-08-11 11:29:36,141][00221] Num frames 3100... +[2024-08-11 11:29:36,261][00221] Num frames 3200... +[2024-08-11 11:29:36,377][00221] Num frames 3300... +[2024-08-11 11:29:36,505][00221] Avg episode rewards: #0: 25.533, true rewards: #0: 11.200 +[2024-08-11 11:29:36,507][00221] Avg episode reward: 25.533, avg true_objective: 11.200 +[2024-08-11 11:29:36,557][00221] Num frames 3400... +[2024-08-11 11:29:36,682][00221] Num frames 3500... +[2024-08-11 11:29:36,805][00221] Num frames 3600... +[2024-08-11 11:29:36,935][00221] Num frames 3700... +[2024-08-11 11:29:37,061][00221] Num frames 3800... +[2024-08-11 11:29:37,193][00221] Num frames 3900... +[2024-08-11 11:29:37,321][00221] Num frames 4000... +[2024-08-11 11:29:37,441][00221] Num frames 4100... +[2024-08-11 11:29:37,563][00221] Num frames 4200... +[2024-08-11 11:29:37,681][00221] Num frames 4300... +[2024-08-11 11:29:37,800][00221] Num frames 4400... +[2024-08-11 11:29:37,933][00221] Num frames 4500... +[2024-08-11 11:29:38,004][00221] Avg episode rewards: #0: 26.530, true rewards: #0: 11.280 +[2024-08-11 11:29:38,005][00221] Avg episode reward: 26.530, avg true_objective: 11.280 +[2024-08-11 11:29:38,117][00221] Num frames 4600... +[2024-08-11 11:29:38,242][00221] Num frames 4700... +[2024-08-11 11:29:38,372][00221] Num frames 4800... +[2024-08-11 11:29:38,492][00221] Num frames 4900... +[2024-08-11 11:29:38,615][00221] Num frames 5000... +[2024-08-11 11:29:38,735][00221] Num frames 5100... +[2024-08-11 11:29:38,864][00221] Num frames 5200... +[2024-08-11 11:29:38,993][00221] Num frames 5300... +[2024-08-11 11:29:39,123][00221] Num frames 5400... +[2024-08-11 11:29:39,249][00221] Num frames 5500... +[2024-08-11 11:29:39,374][00221] Num frames 5600... +[2024-08-11 11:29:39,495][00221] Num frames 5700... +[2024-08-11 11:29:39,622][00221] Avg episode rewards: #0: 27.520, true rewards: #0: 11.520 +[2024-08-11 11:29:39,626][00221] Avg episode reward: 27.520, avg true_objective: 11.520 +[2024-08-11 11:29:39,678][00221] Num frames 5800... +[2024-08-11 11:29:39,798][00221] Num frames 5900... +[2024-08-11 11:29:39,926][00221] Num frames 6000... +[2024-08-11 11:29:40,059][00221] Num frames 6100... +[2024-08-11 11:29:40,189][00221] Num frames 6200... +[2024-08-11 11:29:40,317][00221] Num frames 6300... +[2024-08-11 11:29:40,470][00221] Num frames 6400... +[2024-08-11 11:29:40,611][00221] Avg episode rewards: #0: 26.115, true rewards: #0: 10.782 +[2024-08-11 11:29:40,614][00221] Avg episode reward: 26.115, avg true_objective: 10.782 +[2024-08-11 11:29:40,652][00221] Num frames 6500... +[2024-08-11 11:29:40,773][00221] Num frames 6600... +[2024-08-11 11:29:40,896][00221] Num frames 6700... +[2024-08-11 11:29:41,026][00221] Num frames 6800... +[2024-08-11 11:29:41,156][00221] Num frames 6900... +[2024-08-11 11:29:41,277][00221] Num frames 7000... +[2024-08-11 11:29:41,405][00221] Num frames 7100... +[2024-08-11 11:29:41,527][00221] Num frames 7200... +[2024-08-11 11:29:41,634][00221] Avg episode rewards: #0: 25.202, true rewards: #0: 10.344 +[2024-08-11 11:29:41,637][00221] Avg episode reward: 25.202, avg true_objective: 10.344 +[2024-08-11 11:29:41,709][00221] Num frames 7300... +[2024-08-11 11:29:41,832][00221] Num frames 7400... +[2024-08-11 11:29:41,950][00221] Num frames 7500... +[2024-08-11 11:29:42,075][00221] Num frames 7600... +[2024-08-11 11:29:42,206][00221] Num frames 7700... +[2024-08-11 11:29:42,323][00221] Num frames 7800... +[2024-08-11 11:29:42,443][00221] Num frames 7900... +[2024-08-11 11:29:42,561][00221] Num frames 8000... +[2024-08-11 11:29:42,684][00221] Num frames 8100... +[2024-08-11 11:29:42,799][00221] Num frames 8200... +[2024-08-11 11:29:42,923][00221] Num frames 8300... +[2024-08-11 11:29:43,052][00221] Num frames 8400... +[2024-08-11 11:29:43,180][00221] Num frames 8500... +[2024-08-11 11:29:43,304][00221] Num frames 8600... +[2024-08-11 11:29:43,425][00221] Num frames 8700... +[2024-08-11 11:29:43,550][00221] Num frames 8800... +[2024-08-11 11:29:43,673][00221] Num frames 8900... +[2024-08-11 11:29:43,791][00221] Avg episode rewards: #0: 27.185, true rewards: #0: 11.185 +[2024-08-11 11:29:43,793][00221] Avg episode reward: 27.185, avg true_objective: 11.185 +[2024-08-11 11:29:43,861][00221] Num frames 9000... +[2024-08-11 11:29:43,985][00221] Num frames 9100... +[2024-08-11 11:29:44,122][00221] Num frames 9200... +[2024-08-11 11:29:44,256][00221] Num frames 9300... +[2024-08-11 11:29:44,383][00221] Num frames 9400... +[2024-08-11 11:29:44,503][00221] Num frames 9500... +[2024-08-11 11:29:44,628][00221] Num frames 9600... +[2024-08-11 11:29:44,752][00221] Num frames 9700... +[2024-08-11 11:29:44,877][00221] Num frames 9800... +[2024-08-11 11:29:45,024][00221] Avg episode rewards: #0: 26.862, true rewards: #0: 10.973 +[2024-08-11 11:29:45,025][00221] Avg episode reward: 26.862, avg true_objective: 10.973 +[2024-08-11 11:29:45,068][00221] Num frames 9900... +[2024-08-11 11:29:45,201][00221] Num frames 10000... +[2024-08-11 11:29:45,322][00221] Num frames 10100... +[2024-08-11 11:29:45,445][00221] Num frames 10200... +[2024-08-11 11:29:45,565][00221] Num frames 10300... +[2024-08-11 11:29:45,690][00221] Num frames 10400... +[2024-08-11 11:29:45,860][00221] Num frames 10500... +[2024-08-11 11:29:46,029][00221] Num frames 10600... +[2024-08-11 11:29:46,105][00221] Avg episode rewards: #0: 25.712, true rewards: #0: 10.612 +[2024-08-11 11:29:46,107][00221] Avg episode reward: 25.712, avg true_objective: 10.612 +[2024-08-11 11:30:57,504][00221] Replay video saved to /content/train_dir/default_experiment/replay.mp4! +[2024-08-11 11:31:06,931][00221] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json +[2024-08-11 11:31:06,933][00221] Overriding arg 'num_workers' with value 1 passed from command line +[2024-08-11 11:31:06,934][00221] Adding new argument 'no_render'=True that is not in the saved config file! +[2024-08-11 11:31:06,936][00221] Adding new argument 'save_video'=True that is not in the saved config file! +[2024-08-11 11:31:06,938][00221] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! +[2024-08-11 11:31:06,940][00221] Adding new argument 'video_name'=None that is not in the saved config file! +[2024-08-11 11:31:06,941][00221] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! +[2024-08-11 11:31:06,943][00221] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! +[2024-08-11 11:31:06,944][00221] Adding new argument 'push_to_hub'=True that is not in the saved config file! +[2024-08-11 11:31:06,946][00221] Adding new argument 'hf_repository'='maavaneck/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! +[2024-08-11 11:31:06,947][00221] Adding new argument 'policy_index'=0 that is not in the saved config file! +[2024-08-11 11:31:06,949][00221] Adding new argument 'eval_deterministic'=False that is not in the saved config file! +[2024-08-11 11:31:06,950][00221] Adding new argument 'train_script'=None that is not in the saved config file! +[2024-08-11 11:31:06,951][00221] Adding new argument 'enjoy_script'=None that is not in the saved config file! +[2024-08-11 11:31:06,952][00221] Using frameskip 1 and render_action_repeat=4 for evaluation +[2024-08-11 11:31:06,994][00221] RunningMeanStd input shape: (3, 72, 128) +[2024-08-11 11:31:06,997][00221] RunningMeanStd input shape: (1,) +[2024-08-11 11:31:07,016][00221] ConvEncoder: input_channels=3 +[2024-08-11 11:31:07,059][00221] Conv encoder output size: 512 +[2024-08-11 11:31:07,061][00221] Policy head output size: 512 +[2024-08-11 11:31:07,079][00221] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... +[2024-08-11 11:31:07,519][00221] Num frames 100... +[2024-08-11 11:31:07,641][00221] Num frames 200... +[2024-08-11 11:31:07,760][00221] Num frames 300... +[2024-08-11 11:31:07,886][00221] Num frames 400... +[2024-08-11 11:31:07,962][00221] Avg episode rewards: #0: 6.160, true rewards: #0: 4.160 +[2024-08-11 11:31:07,965][00221] Avg episode reward: 6.160, avg true_objective: 4.160 +[2024-08-11 11:31:08,074][00221] Num frames 500... +[2024-08-11 11:31:08,205][00221] Num frames 600... +[2024-08-11 11:31:08,325][00221] Num frames 700... +[2024-08-11 11:31:08,447][00221] Num frames 800... +[2024-08-11 11:31:08,570][00221] Num frames 900... +[2024-08-11 11:31:08,735][00221] Avg episode rewards: #0: 8.460, true rewards: #0: 4.960 +[2024-08-11 11:31:08,736][00221] Avg episode reward: 8.460, avg true_objective: 4.960 +[2024-08-11 11:31:08,751][00221] Num frames 1000... +[2024-08-11 11:31:08,873][00221] Num frames 1100... +[2024-08-11 11:31:08,995][00221] Num frames 1200... +[2024-08-11 11:31:09,132][00221] Num frames 1300... +[2024-08-11 11:31:09,249][00221] Num frames 1400... +[2024-08-11 11:31:09,375][00221] Num frames 1500... +[2024-08-11 11:31:09,496][00221] Num frames 1600... +[2024-08-11 11:31:09,614][00221] Num frames 1700... +[2024-08-11 11:31:09,694][00221] Avg episode rewards: #0: 10.390, true rewards: #0: 5.723 +[2024-08-11 11:31:09,696][00221] Avg episode reward: 10.390, avg true_objective: 5.723 +[2024-08-11 11:31:09,796][00221] Num frames 1800... +[2024-08-11 11:31:09,919][00221] Num frames 1900... +[2024-08-11 11:31:10,040][00221] Num frames 2000... +[2024-08-11 11:31:10,177][00221] Num frames 2100... +[2024-08-11 11:31:10,306][00221] Num frames 2200... +[2024-08-11 11:31:10,432][00221] Num frames 2300... +[2024-08-11 11:31:10,557][00221] Num frames 2400... +[2024-08-11 11:31:10,678][00221] Num frames 2500... +[2024-08-11 11:31:10,809][00221] Num frames 2600... +[2024-08-11 11:31:10,961][00221] Num frames 2700... +[2024-08-11 11:31:11,084][00221] Num frames 2800... +[2024-08-11 11:31:11,230][00221] Num frames 2900... +[2024-08-11 11:31:11,356][00221] Num frames 3000... +[2024-08-11 11:31:11,477][00221] Num frames 3100... +[2024-08-11 11:31:11,599][00221] Num frames 3200... +[2024-08-11 11:31:11,720][00221] Num frames 3300... +[2024-08-11 11:31:11,851][00221] Num frames 3400... +[2024-08-11 11:31:11,970][00221] Num frames 3500... +[2024-08-11 11:31:12,093][00221] Num frames 3600... +[2024-08-11 11:31:12,248][00221] Avg episode rewards: #0: 20.920, true rewards: #0: 9.170 +[2024-08-11 11:31:12,250][00221] Avg episode reward: 20.920, avg true_objective: 9.170 +[2024-08-11 11:31:12,292][00221] Num frames 3700... +[2024-08-11 11:31:12,415][00221] Num frames 3800... +[2024-08-11 11:31:12,531][00221] Num frames 3900... +[2024-08-11 11:31:12,648][00221] Num frames 4000... +[2024-08-11 11:31:12,770][00221] Num frames 4100... +[2024-08-11 11:31:12,887][00221] Num frames 4200... +[2024-08-11 11:31:12,998][00221] Avg episode rewards: #0: 18.488, true rewards: #0: 8.488 +[2024-08-11 11:31:12,999][00221] Avg episode reward: 18.488, avg true_objective: 8.488 +[2024-08-11 11:31:13,068][00221] Num frames 4300... +[2024-08-11 11:31:13,204][00221] Num frames 4400... +[2024-08-11 11:31:13,328][00221] Num frames 4500... +[2024-08-11 11:31:13,448][00221] Num frames 4600... +[2024-08-11 11:31:13,565][00221] Num frames 4700... +[2024-08-11 11:31:13,686][00221] Num frames 4800... +[2024-08-11 11:31:13,804][00221] Num frames 4900... +[2024-08-11 11:31:13,925][00221] Num frames 5000... +[2024-08-11 11:31:14,043][00221] Num frames 5100... +[2024-08-11 11:31:14,195][00221] Avg episode rewards: #0: 18.787, true rewards: #0: 8.620 +[2024-08-11 11:31:14,199][00221] Avg episode reward: 18.787, avg true_objective: 8.620 +[2024-08-11 11:31:14,243][00221] Num frames 5200... +[2024-08-11 11:31:14,364][00221] Num frames 5300... +[2024-08-11 11:31:14,488][00221] Num frames 5400... +[2024-08-11 11:31:14,609][00221] Num frames 5500... +[2024-08-11 11:31:14,731][00221] Num frames 5600... +[2024-08-11 11:31:14,855][00221] Num frames 5700... +[2024-08-11 11:31:14,932][00221] Avg episode rewards: #0: 17.166, true rewards: #0: 8.166 +[2024-08-11 11:31:14,935][00221] Avg episode reward: 17.166, avg true_objective: 8.166 +[2024-08-11 11:31:15,042][00221] Num frames 5800... +[2024-08-11 11:31:15,172][00221] Num frames 5900... +[2024-08-11 11:31:15,303][00221] Num frames 6000... +[2024-08-11 11:31:15,424][00221] Num frames 6100... +[2024-08-11 11:31:15,545][00221] Num frames 6200... +[2024-08-11 11:31:15,667][00221] Num frames 6300... +[2024-08-11 11:31:15,788][00221] Num frames 6400... +[2024-08-11 11:31:15,907][00221] Num frames 6500... +[2024-08-11 11:31:16,036][00221] Num frames 6600... +[2024-08-11 11:31:16,166][00221] Num frames 6700... +[2024-08-11 11:31:16,304][00221] Num frames 6800... +[2024-08-11 11:31:16,429][00221] Num frames 6900... +[2024-08-11 11:31:16,550][00221] Num frames 7000... +[2024-08-11 11:31:16,676][00221] Num frames 7100... +[2024-08-11 11:31:16,849][00221] Num frames 7200... +[2024-08-11 11:31:17,018][00221] Num frames 7300... +[2024-08-11 11:31:17,201][00221] Num frames 7400... +[2024-08-11 11:31:17,392][00221] Num frames 7500... +[2024-08-11 11:31:17,559][00221] Num frames 7600... +[2024-08-11 11:31:17,721][00221] Num frames 7700... +[2024-08-11 11:31:17,881][00221] Num frames 7800... +[2024-08-11 11:31:17,968][00221] Avg episode rewards: #0: 21.020, true rewards: #0: 9.770 +[2024-08-11 11:31:17,970][00221] Avg episode reward: 21.020, avg true_objective: 9.770 +[2024-08-11 11:31:18,119][00221] Num frames 7900... +[2024-08-11 11:31:18,290][00221] Num frames 8000... +[2024-08-11 11:31:18,474][00221] Num frames 8100... +[2024-08-11 11:31:18,655][00221] Num frames 8200... +[2024-08-11 11:31:18,829][00221] Num frames 8300... +[2024-08-11 11:31:19,013][00221] Num frames 8400... +[2024-08-11 11:31:19,212][00221] Num frames 8500... +[2024-08-11 11:31:19,342][00221] Num frames 8600... +[2024-08-11 11:31:19,471][00221] Num frames 8700... +[2024-08-11 11:31:19,594][00221] Num frames 8800... +[2024-08-11 11:31:19,715][00221] Num frames 8900... +[2024-08-11 11:31:19,839][00221] Num frames 9000... +[2024-08-11 11:31:19,966][00221] Num frames 9100... +[2024-08-11 11:31:20,087][00221] Num frames 9200... +[2024-08-11 11:31:20,215][00221] Avg episode rewards: #0: 23.062, true rewards: #0: 10.284 +[2024-08-11 11:31:20,218][00221] Avg episode reward: 23.062, avg true_objective: 10.284 +[2024-08-11 11:31:20,274][00221] Num frames 9300... +[2024-08-11 11:31:20,403][00221] Num frames 9400... +[2024-08-11 11:31:20,528][00221] Num frames 9500... +[2024-08-11 11:31:20,652][00221] Num frames 9600... +[2024-08-11 11:31:20,782][00221] Num frames 9700... +[2024-08-11 11:31:20,906][00221] Num frames 9800... +[2024-08-11 11:31:21,027][00221] Num frames 9900... +[2024-08-11 11:31:21,157][00221] Num frames 10000... +[2024-08-11 11:31:21,281][00221] Num frames 10100... +[2024-08-11 11:31:21,407][00221] Num frames 10200... +[2024-08-11 11:31:21,538][00221] Num frames 10300... +[2024-08-11 11:31:21,659][00221] Num frames 10400... +[2024-08-11 11:31:21,790][00221] Num frames 10500... +[2024-08-11 11:31:21,921][00221] Num frames 10600... +[2024-08-11 11:31:22,048][00221] Num frames 10700... +[2024-08-11 11:31:22,174][00221] Num frames 10800... +[2024-08-11 11:31:22,298][00221] Num frames 10900... +[2024-08-11 11:31:22,419][00221] Num frames 11000... +[2024-08-11 11:31:22,552][00221] Num frames 11100... +[2024-08-11 11:31:22,675][00221] Num frames 11200... +[2024-08-11 11:31:22,794][00221] Num frames 11300... +[2024-08-11 11:31:22,921][00221] Avg episode rewards: #0: 26.856, true rewards: #0: 11.356 +[2024-08-11 11:31:22,923][00221] Avg episode reward: 26.856, avg true_objective: 11.356 +[2024-08-11 11:32:32,804][00221] Replay video saved to /content/train_dir/default_experiment/replay.mp4!