Dorian-T's picture
Upload folder using huggingface_hub
925c8ce verified
[2024-09-26 11:13:43,492][00517] Saving configuration to /content/train_dir/default_experiment/config.json...
[2024-09-26 11:13:43,496][00517] Rollout worker 0 uses device cpu
[2024-09-26 11:13:43,497][00517] Rollout worker 1 uses device cpu
[2024-09-26 11:13:43,499][00517] Rollout worker 2 uses device cpu
[2024-09-26 11:13:43,500][00517] Rollout worker 3 uses device cpu
[2024-09-26 11:13:43,504][00517] Rollout worker 4 uses device cpu
[2024-09-26 11:13:43,505][00517] Rollout worker 5 uses device cpu
[2024-09-26 11:13:43,506][00517] Rollout worker 6 uses device cpu
[2024-09-26 11:13:43,508][00517] Rollout worker 7 uses device cpu
[2024-09-26 11:13:43,663][00517] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2024-09-26 11:13:43,664][00517] InferenceWorker_p0-w0: min num requests: 2
[2024-09-26 11:13:43,696][00517] Starting all processes...
[2024-09-26 11:13:43,697][00517] Starting process learner_proc0
[2024-09-26 11:13:44,414][00517] Starting all processes...
[2024-09-26 11:13:44,423][00517] Starting process inference_proc0-0
[2024-09-26 11:13:44,424][00517] Starting process rollout_proc0
[2024-09-26 11:13:44,427][00517] Starting process rollout_proc1
[2024-09-26 11:13:44,427][00517] Starting process rollout_proc2
[2024-09-26 11:13:44,427][00517] Starting process rollout_proc3
[2024-09-26 11:13:44,427][00517] Starting process rollout_proc4
[2024-09-26 11:13:44,427][00517] Starting process rollout_proc5
[2024-09-26 11:13:44,427][00517] Starting process rollout_proc6
[2024-09-26 11:13:44,427][00517] Starting process rollout_proc7
[2024-09-26 11:13:59,359][03841] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2024-09-26 11:13:59,359][03841] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0
[2024-09-26 11:13:59,474][03841] Num visible devices: 1
[2024-09-26 11:13:59,532][03841] Starting seed is not provided
[2024-09-26 11:13:59,535][03841] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2024-09-26 11:13:59,535][03841] Initializing actor-critic model on device cuda:0
[2024-09-26 11:13:59,536][03841] RunningMeanStd input shape: (3, 72, 128)
[2024-09-26 11:13:59,539][03841] RunningMeanStd input shape: (1,)
[2024-09-26 11:13:59,707][03841] ConvEncoder: input_channels=3
[2024-09-26 11:13:59,904][03857] Worker 2 uses CPU cores [0]
[2024-09-26 11:14:00,371][03858] Worker 3 uses CPU cores [1]
[2024-09-26 11:14:00,377][03854] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2024-09-26 11:14:00,386][03854] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0
[2024-09-26 11:14:00,480][03855] Worker 0 uses CPU cores [0]
[2024-09-26 11:14:00,529][03854] Num visible devices: 1
[2024-09-26 11:14:00,710][03859] Worker 4 uses CPU cores [0]
[2024-09-26 11:14:00,762][03856] Worker 1 uses CPU cores [1]
[2024-09-26 11:14:00,784][03860] Worker 5 uses CPU cores [1]
[2024-09-26 11:14:00,855][03861] Worker 7 uses CPU cores [1]
[2024-09-26 11:14:00,911][03841] Conv encoder output size: 512
[2024-09-26 11:14:00,911][03841] Policy head output size: 512
[2024-09-26 11:14:00,992][03862] Worker 6 uses CPU cores [0]
[2024-09-26 11:14:01,008][03841] Created Actor Critic model with architecture:
[2024-09-26 11:14:01,008][03841] ActorCriticSharedWeights(
(obs_normalizer): ObservationNormalizer(
(running_mean_std): RunningMeanStdDictInPlace(
(running_mean_std): ModuleDict(
(obs): RunningMeanStdInPlace()
)
)
)
(returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace)
(encoder): VizdoomEncoder(
(basic_encoder): ConvEncoder(
(enc): RecursiveScriptModule(
original_name=ConvEncoderImpl
(conv_head): RecursiveScriptModule(
original_name=Sequential
(0): RecursiveScriptModule(original_name=Conv2d)
(1): RecursiveScriptModule(original_name=ELU)
(2): RecursiveScriptModule(original_name=Conv2d)
(3): RecursiveScriptModule(original_name=ELU)
(4): RecursiveScriptModule(original_name=Conv2d)
(5): RecursiveScriptModule(original_name=ELU)
)
(mlp_layers): RecursiveScriptModule(
original_name=Sequential
(0): RecursiveScriptModule(original_name=Linear)
(1): RecursiveScriptModule(original_name=ELU)
)
)
)
)
(core): ModelCoreRNN(
(core): GRU(512, 512)
)
(decoder): MlpDecoder(
(mlp): Identity()
)
(critic_linear): Linear(in_features=512, out_features=1, bias=True)
(action_parameterization): ActionParameterizationDefault(
(distribution_linear): Linear(in_features=512, out_features=5, bias=True)
)
)
[2024-09-26 11:14:01,462][03841] Using optimizer <class 'torch.optim.adam.Adam'>
[2024-09-26 11:14:02,408][03841] No checkpoints found
[2024-09-26 11:14:02,409][03841] Did not load from checkpoint, starting from scratch!
[2024-09-26 11:14:02,409][03841] Initialized policy 0 weights for model version 0
[2024-09-26 11:14:02,414][03841] LearnerWorker_p0 finished initialization!
[2024-09-26 11:14:02,416][03841] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2024-09-26 11:14:02,654][03854] RunningMeanStd input shape: (3, 72, 128)
[2024-09-26 11:14:02,655][03854] RunningMeanStd input shape: (1,)
[2024-09-26 11:14:02,674][03854] ConvEncoder: input_channels=3
[2024-09-26 11:14:02,837][03854] Conv encoder output size: 512
[2024-09-26 11:14:02,838][03854] Policy head output size: 512
[2024-09-26 11:14:02,936][00517] Inference worker 0-0 is ready!
[2024-09-26 11:14:02,939][00517] All inference workers are ready! Signal rollout workers to start!
[2024-09-26 11:14:03,177][03862] Doom resolution: 160x120, resize resolution: (128, 72)
[2024-09-26 11:14:03,176][03855] Doom resolution: 160x120, resize resolution: (128, 72)
[2024-09-26 11:14:03,177][03859] Doom resolution: 160x120, resize resolution: (128, 72)
[2024-09-26 11:14:03,183][03857] Doom resolution: 160x120, resize resolution: (128, 72)
[2024-09-26 11:14:03,250][03861] Doom resolution: 160x120, resize resolution: (128, 72)
[2024-09-26 11:14:03,248][03858] Doom resolution: 160x120, resize resolution: (128, 72)
[2024-09-26 11:14:03,250][03856] Doom resolution: 160x120, resize resolution: (128, 72)
[2024-09-26 11:14:03,258][03860] Doom resolution: 160x120, resize resolution: (128, 72)
[2024-09-26 11:14:03,515][00517] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
[2024-09-26 11:14:03,655][00517] Heartbeat connected on Batcher_0
[2024-09-26 11:14:03,659][00517] Heartbeat connected on LearnerWorker_p0
[2024-09-26 11:14:03,704][00517] Heartbeat connected on InferenceWorker_p0-w0
[2024-09-26 11:14:04,271][03856] Decorrelating experience for 0 frames...
[2024-09-26 11:14:04,271][03860] Decorrelating experience for 0 frames...
[2024-09-26 11:14:04,675][03860] Decorrelating experience for 32 frames...
[2024-09-26 11:14:04,859][03857] Decorrelating experience for 0 frames...
[2024-09-26 11:14:04,855][03855] Decorrelating experience for 0 frames...
[2024-09-26 11:14:04,852][03859] Decorrelating experience for 0 frames...
[2024-09-26 11:14:04,861][03862] Decorrelating experience for 0 frames...
[2024-09-26 11:14:05,738][03860] Decorrelating experience for 64 frames...
[2024-09-26 11:14:06,047][03855] Decorrelating experience for 32 frames...
[2024-09-26 11:14:06,042][03861] Decorrelating experience for 0 frames...
[2024-09-26 11:14:06,052][03862] Decorrelating experience for 32 frames...
[2024-09-26 11:14:06,058][03856] Decorrelating experience for 32 frames...
[2024-09-26 11:14:06,057][03857] Decorrelating experience for 32 frames...
[2024-09-26 11:14:06,977][03860] Decorrelating experience for 96 frames...
[2024-09-26 11:14:07,157][03859] Decorrelating experience for 32 frames...
[2024-09-26 11:14:07,188][00517] Heartbeat connected on RolloutWorker_w5
[2024-09-26 11:14:07,255][03861] Decorrelating experience for 32 frames...
[2024-09-26 11:14:07,773][03856] Decorrelating experience for 64 frames...
[2024-09-26 11:14:07,836][03855] Decorrelating experience for 64 frames...
[2024-09-26 11:14:07,838][03857] Decorrelating experience for 64 frames...
[2024-09-26 11:14:08,438][03862] Decorrelating experience for 64 frames...
[2024-09-26 11:14:08,516][00517] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
[2024-09-26 11:14:08,805][03858] Decorrelating experience for 0 frames...
[2024-09-26 11:14:08,819][03859] Decorrelating experience for 64 frames...
[2024-09-26 11:14:09,217][03855] Decorrelating experience for 96 frames...
[2024-09-26 11:14:09,233][03861] Decorrelating experience for 64 frames...
[2024-09-26 11:14:09,509][00517] Heartbeat connected on RolloutWorker_w0
[2024-09-26 11:14:10,539][03858] Decorrelating experience for 32 frames...
[2024-09-26 11:14:10,545][03856] Decorrelating experience for 96 frames...
[2024-09-26 11:14:10,734][03862] Decorrelating experience for 96 frames...
[2024-09-26 11:14:10,964][00517] Heartbeat connected on RolloutWorker_w1
[2024-09-26 11:14:10,989][00517] Heartbeat connected on RolloutWorker_w6
[2024-09-26 11:14:11,295][03857] Decorrelating experience for 96 frames...
[2024-09-26 11:14:11,327][03859] Decorrelating experience for 96 frames...
[2024-09-26 11:14:11,628][03861] Decorrelating experience for 96 frames...
[2024-09-26 11:14:11,750][00517] Heartbeat connected on RolloutWorker_w2
[2024-09-26 11:14:11,778][00517] Heartbeat connected on RolloutWorker_w4
[2024-09-26 11:14:11,882][00517] Heartbeat connected on RolloutWorker_w7
[2024-09-26 11:14:13,519][00517] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 129.4. Samples: 1294. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
[2024-09-26 11:14:13,522][00517] Avg episode reward: [(0, '3.323')]
[2024-09-26 11:14:13,722][03841] Signal inference workers to stop experience collection...
[2024-09-26 11:14:13,749][03854] InferenceWorker_p0-w0: stopping experience collection
[2024-09-26 11:14:13,869][03858] Decorrelating experience for 64 frames...
[2024-09-26 11:14:15,242][03858] Decorrelating experience for 96 frames...
[2024-09-26 11:14:15,421][00517] Heartbeat connected on RolloutWorker_w3
[2024-09-26 11:14:17,711][03841] Signal inference workers to resume experience collection...
[2024-09-26 11:14:17,713][03854] InferenceWorker_p0-w0: resuming experience collection
[2024-09-26 11:14:18,515][00517] Fps is (10 sec: 409.6, 60 sec: 273.1, 300 sec: 273.1). Total num frames: 4096. Throughput: 0: 159.7. Samples: 2396. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0)
[2024-09-26 11:14:18,519][00517] Avg episode reward: [(0, '3.126')]
[2024-09-26 11:14:23,515][00517] Fps is (10 sec: 2868.1, 60 sec: 1433.6, 300 sec: 1433.6). Total num frames: 28672. Throughput: 0: 268.8. Samples: 5376. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-26 11:14:23,518][00517] Avg episode reward: [(0, '3.758')]
[2024-09-26 11:14:25,550][03854] Updated weights for policy 0, policy_version 10 (0.0023)
[2024-09-26 11:14:28,517][00517] Fps is (10 sec: 4914.4, 60 sec: 2129.8, 300 sec: 2129.8). Total num frames: 53248. Throughput: 0: 491.9. Samples: 12298. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-09-26 11:14:28,519][00517] Avg episode reward: [(0, '4.349')]
[2024-09-26 11:14:33,516][00517] Fps is (10 sec: 3686.3, 60 sec: 2184.5, 300 sec: 2184.5). Total num frames: 65536. Throughput: 0: 566.1. Samples: 16984. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-09-26 11:14:33,525][00517] Avg episode reward: [(0, '4.335')]
[2024-09-26 11:14:37,973][03854] Updated weights for policy 0, policy_version 20 (0.0032)
[2024-09-26 11:14:38,515][00517] Fps is (10 sec: 2867.7, 60 sec: 2340.6, 300 sec: 2340.6). Total num frames: 81920. Throughput: 0: 543.9. Samples: 19036. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-26 11:14:38,518][00517] Avg episode reward: [(0, '4.241')]
[2024-09-26 11:14:43,516][00517] Fps is (10 sec: 4095.9, 60 sec: 2662.4, 300 sec: 2662.4). Total num frames: 106496. Throughput: 0: 646.4. Samples: 25858. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-09-26 11:14:43,519][00517] Avg episode reward: [(0, '4.150')]
[2024-09-26 11:14:43,526][03841] Saving new best policy, reward=4.150!
[2024-09-26 11:14:46,936][03854] Updated weights for policy 0, policy_version 30 (0.0033)
[2024-09-26 11:14:48,515][00517] Fps is (10 sec: 4505.6, 60 sec: 2821.7, 300 sec: 2821.7). Total num frames: 126976. Throughput: 0: 712.7. Samples: 32072. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-26 11:14:48,521][00517] Avg episode reward: [(0, '4.447')]
[2024-09-26 11:14:48,541][03841] Saving new best policy, reward=4.447!
[2024-09-26 11:14:53,515][00517] Fps is (10 sec: 3276.9, 60 sec: 2785.3, 300 sec: 2785.3). Total num frames: 139264. Throughput: 0: 757.1. Samples: 34068. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-26 11:14:53,518][00517] Avg episode reward: [(0, '4.467')]
[2024-09-26 11:14:53,522][03841] Saving new best policy, reward=4.467!
[2024-09-26 11:14:58,203][03854] Updated weights for policy 0, policy_version 40 (0.0027)
[2024-09-26 11:14:58,515][00517] Fps is (10 sec: 3686.4, 60 sec: 2978.9, 300 sec: 2978.9). Total num frames: 163840. Throughput: 0: 860.2. Samples: 40000. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-26 11:14:58,521][00517] Avg episode reward: [(0, '4.516')]
[2024-09-26 11:14:58,533][03841] Saving new best policy, reward=4.516!
[2024-09-26 11:15:03,515][00517] Fps is (10 sec: 4505.6, 60 sec: 3072.0, 300 sec: 3072.0). Total num frames: 184320. Throughput: 0: 992.4. Samples: 47052. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-26 11:15:03,522][00517] Avg episode reward: [(0, '4.453')]
[2024-09-26 11:15:08,519][00517] Fps is (10 sec: 3685.0, 60 sec: 3344.9, 300 sec: 3087.6). Total num frames: 200704. Throughput: 0: 975.4. Samples: 49272. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-09-26 11:15:08,522][00517] Avg episode reward: [(0, '4.376')]
[2024-09-26 11:15:09,325][03854] Updated weights for policy 0, policy_version 50 (0.0052)
[2024-09-26 11:15:13,515][00517] Fps is (10 sec: 3686.4, 60 sec: 3686.6, 300 sec: 3159.8). Total num frames: 221184. Throughput: 0: 930.9. Samples: 54188. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-09-26 11:15:13,518][00517] Avg episode reward: [(0, '4.197')]
[2024-09-26 11:15:18,515][00517] Fps is (10 sec: 4097.6, 60 sec: 3959.5, 300 sec: 3222.2). Total num frames: 241664. Throughput: 0: 983.3. Samples: 61234. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-09-26 11:15:18,517][00517] Avg episode reward: [(0, '4.534')]
[2024-09-26 11:15:18,596][03854] Updated weights for policy 0, policy_version 60 (0.0026)
[2024-09-26 11:15:18,603][03841] Saving new best policy, reward=4.534!
[2024-09-26 11:15:23,516][00517] Fps is (10 sec: 4095.9, 60 sec: 3891.2, 300 sec: 3276.8). Total num frames: 262144. Throughput: 0: 1009.6. Samples: 64468. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-09-26 11:15:23,519][00517] Avg episode reward: [(0, '4.777')]
[2024-09-26 11:15:23,525][03841] Saving new best policy, reward=4.777!
[2024-09-26 11:15:28,515][00517] Fps is (10 sec: 3686.4, 60 sec: 3754.8, 300 sec: 3276.8). Total num frames: 278528. Throughput: 0: 953.7. Samples: 68776. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-09-26 11:15:28,518][00517] Avg episode reward: [(0, '4.498')]
[2024-09-26 11:15:30,200][03854] Updated weights for policy 0, policy_version 70 (0.0022)
[2024-09-26 11:15:33,515][00517] Fps is (10 sec: 3686.5, 60 sec: 3891.2, 300 sec: 3322.3). Total num frames: 299008. Throughput: 0: 964.0. Samples: 75450. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-09-26 11:15:33,521][00517] Avg episode reward: [(0, '4.374')]
[2024-09-26 11:15:38,518][00517] Fps is (10 sec: 4094.8, 60 sec: 3959.3, 300 sec: 3362.9). Total num frames: 319488. Throughput: 0: 995.0. Samples: 78844. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-09-26 11:15:38,521][00517] Avg episode reward: [(0, '4.399')]
[2024-09-26 11:15:38,527][03841] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000078_319488.pth...
[2024-09-26 11:15:40,404][03854] Updated weights for policy 0, policy_version 80 (0.0014)
[2024-09-26 11:15:43,515][00517] Fps is (10 sec: 3686.4, 60 sec: 3823.0, 300 sec: 3358.7). Total num frames: 335872. Throughput: 0: 969.9. Samples: 83644. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-26 11:15:43,521][00517] Avg episode reward: [(0, '4.431')]
[2024-09-26 11:15:48,515][00517] Fps is (10 sec: 3687.5, 60 sec: 3822.9, 300 sec: 3393.8). Total num frames: 356352. Throughput: 0: 945.1. Samples: 89582. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-09-26 11:15:48,518][00517] Avg episode reward: [(0, '4.384')]
[2024-09-26 11:15:50,545][03854] Updated weights for policy 0, policy_version 90 (0.0049)
[2024-09-26 11:15:53,516][00517] Fps is (10 sec: 4505.5, 60 sec: 4027.7, 300 sec: 3463.0). Total num frames: 380928. Throughput: 0: 975.2. Samples: 93152. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-09-26 11:15:53,521][00517] Avg episode reward: [(0, '4.525')]
[2024-09-26 11:15:58,518][00517] Fps is (10 sec: 4095.0, 60 sec: 3891.0, 300 sec: 3454.8). Total num frames: 397312. Throughput: 0: 994.8. Samples: 98956. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-09-26 11:15:58,520][00517] Avg episode reward: [(0, '4.698')]
[2024-09-26 11:16:01,960][03854] Updated weights for policy 0, policy_version 100 (0.0028)
[2024-09-26 11:16:03,515][00517] Fps is (10 sec: 3276.9, 60 sec: 3822.9, 300 sec: 3447.5). Total num frames: 413696. Throughput: 0: 950.8. Samples: 104020. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2024-09-26 11:16:03,523][00517] Avg episode reward: [(0, '4.603')]
[2024-09-26 11:16:08,515][00517] Fps is (10 sec: 4097.0, 60 sec: 3959.7, 300 sec: 3506.2). Total num frames: 438272. Throughput: 0: 953.1. Samples: 107356. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2024-09-26 11:16:08,523][00517] Avg episode reward: [(0, '4.553')]
[2024-09-26 11:16:11,040][03854] Updated weights for policy 0, policy_version 110 (0.0042)
[2024-09-26 11:16:13,516][00517] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3497.4). Total num frames: 454656. Throughput: 0: 1009.7. Samples: 114212. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2024-09-26 11:16:13,520][00517] Avg episode reward: [(0, '4.597')]
[2024-09-26 11:16:18,515][00517] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3489.2). Total num frames: 471040. Throughput: 0: 954.1. Samples: 118384. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0)
[2024-09-26 11:16:18,518][00517] Avg episode reward: [(0, '4.601')]
[2024-09-26 11:16:22,583][03854] Updated weights for policy 0, policy_version 120 (0.0018)
[2024-09-26 11:16:23,515][00517] Fps is (10 sec: 4096.1, 60 sec: 3891.2, 300 sec: 3540.1). Total num frames: 495616. Throughput: 0: 951.8. Samples: 121670. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2024-09-26 11:16:23,521][00517] Avg episode reward: [(0, '4.317')]
[2024-09-26 11:16:28,515][00517] Fps is (10 sec: 4915.2, 60 sec: 4027.7, 300 sec: 3587.5). Total num frames: 520192. Throughput: 0: 1002.8. Samples: 128770. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2024-09-26 11:16:28,521][00517] Avg episode reward: [(0, '4.532')]
[2024-09-26 11:16:32,920][03854] Updated weights for policy 0, policy_version 130 (0.0034)
[2024-09-26 11:16:33,515][00517] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3549.9). Total num frames: 532480. Throughput: 0: 979.1. Samples: 133642. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2024-09-26 11:16:33,517][00517] Avg episode reward: [(0, '4.481')]
[2024-09-26 11:16:38,515][00517] Fps is (10 sec: 3276.8, 60 sec: 3891.4, 300 sec: 3567.5). Total num frames: 552960. Throughput: 0: 950.9. Samples: 135944. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-26 11:16:38,518][00517] Avg episode reward: [(0, '4.446')]
[2024-09-26 11:16:42,910][03854] Updated weights for policy 0, policy_version 140 (0.0024)
[2024-09-26 11:16:43,516][00517] Fps is (10 sec: 4095.9, 60 sec: 3959.4, 300 sec: 3584.0). Total num frames: 573440. Throughput: 0: 976.4. Samples: 142894. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-09-26 11:16:43,518][00517] Avg episode reward: [(0, '4.520')]
[2024-09-26 11:16:48,515][00517] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3599.5). Total num frames: 593920. Throughput: 0: 996.1. Samples: 148844. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-09-26 11:16:48,518][00517] Avg episode reward: [(0, '4.472')]
[2024-09-26 11:16:53,515][00517] Fps is (10 sec: 3686.5, 60 sec: 3822.9, 300 sec: 3590.0). Total num frames: 610304. Throughput: 0: 967.2. Samples: 150882. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-09-26 11:16:53,518][00517] Avg episode reward: [(0, '4.421')]
[2024-09-26 11:16:54,387][03854] Updated weights for policy 0, policy_version 150 (0.0031)
[2024-09-26 11:16:58,516][00517] Fps is (10 sec: 3686.3, 60 sec: 3891.3, 300 sec: 3604.5). Total num frames: 630784. Throughput: 0: 953.8. Samples: 157134. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-09-26 11:16:58,518][00517] Avg episode reward: [(0, '4.541')]
[2024-09-26 11:17:03,346][03854] Updated weights for policy 0, policy_version 160 (0.0028)
[2024-09-26 11:17:03,515][00517] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3640.9). Total num frames: 655360. Throughput: 0: 1015.0. Samples: 164058. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-09-26 11:17:03,519][00517] Avg episode reward: [(0, '4.872')]
[2024-09-26 11:17:03,524][03841] Saving new best policy, reward=4.872!
[2024-09-26 11:17:08,519][00517] Fps is (10 sec: 3685.1, 60 sec: 3822.7, 300 sec: 3608.8). Total num frames: 667648. Throughput: 0: 986.3. Samples: 166056. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-09-26 11:17:08,522][00517] Avg episode reward: [(0, '4.775')]
[2024-09-26 11:17:13,516][00517] Fps is (10 sec: 3276.7, 60 sec: 3891.2, 300 sec: 3621.7). Total num frames: 688128. Throughput: 0: 943.1. Samples: 171210. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2024-09-26 11:17:13,522][00517] Avg episode reward: [(0, '4.406')]
[2024-09-26 11:17:15,029][03854] Updated weights for policy 0, policy_version 170 (0.0028)
[2024-09-26 11:17:18,515][00517] Fps is (10 sec: 4507.3, 60 sec: 4027.7, 300 sec: 3654.9). Total num frames: 712704. Throughput: 0: 992.0. Samples: 178282. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2024-09-26 11:17:18,518][00517] Avg episode reward: [(0, '4.395')]
[2024-09-26 11:17:23,515][00517] Fps is (10 sec: 4096.1, 60 sec: 3891.2, 300 sec: 3645.4). Total num frames: 729088. Throughput: 0: 1009.3. Samples: 181362. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2024-09-26 11:17:23,521][00517] Avg episode reward: [(0, '4.750')]
[2024-09-26 11:17:25,871][03854] Updated weights for policy 0, policy_version 180 (0.0037)
[2024-09-26 11:17:28,515][00517] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3636.4). Total num frames: 745472. Throughput: 0: 949.8. Samples: 185636. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2024-09-26 11:17:28,519][00517] Avg episode reward: [(0, '4.885')]
[2024-09-26 11:17:28,528][03841] Saving new best policy, reward=4.885!
[2024-09-26 11:17:33,516][00517] Fps is (10 sec: 4095.9, 60 sec: 3959.5, 300 sec: 3666.9). Total num frames: 770048. Throughput: 0: 970.1. Samples: 192498. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-09-26 11:17:33,522][00517] Avg episode reward: [(0, '4.874')]
[2024-09-26 11:17:35,178][03854] Updated weights for policy 0, policy_version 190 (0.0045)
[2024-09-26 11:17:38,519][00517] Fps is (10 sec: 4504.0, 60 sec: 3959.2, 300 sec: 3676.8). Total num frames: 790528. Throughput: 0: 1003.1. Samples: 196026. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-09-26 11:17:38,521][00517] Avg episode reward: [(0, '4.985')]
[2024-09-26 11:17:38,534][03841] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000193_790528.pth...
[2024-09-26 11:17:38,697][03841] Saving new best policy, reward=4.985!
[2024-09-26 11:17:43,515][00517] Fps is (10 sec: 2867.2, 60 sec: 3754.7, 300 sec: 3630.5). Total num frames: 798720. Throughput: 0: 953.0. Samples: 200020. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2024-09-26 11:17:43,517][00517] Avg episode reward: [(0, '5.021')]
[2024-09-26 11:17:43,522][03841] Saving new best policy, reward=5.021!
[2024-09-26 11:17:48,515][00517] Fps is (10 sec: 2048.7, 60 sec: 3618.1, 300 sec: 3604.5). Total num frames: 811008. Throughput: 0: 875.0. Samples: 203432. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-09-26 11:17:48,517][00517] Avg episode reward: [(0, '5.097')]
[2024-09-26 11:17:48,529][03841] Saving new best policy, reward=5.097!
[2024-09-26 11:17:49,745][03854] Updated weights for policy 0, policy_version 200 (0.0043)
[2024-09-26 11:17:53,515][00517] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3633.0). Total num frames: 835584. Throughput: 0: 904.7. Samples: 206766. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2024-09-26 11:17:53,518][00517] Avg episode reward: [(0, '4.934')]
[2024-09-26 11:17:58,516][00517] Fps is (10 sec: 4505.5, 60 sec: 3754.7, 300 sec: 3642.8). Total num frames: 856064. Throughput: 0: 947.0. Samples: 213826. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-09-26 11:17:58,520][00517] Avg episode reward: [(0, '4.824')]
[2024-09-26 11:17:59,067][03854] Updated weights for policy 0, policy_version 210 (0.0020)
[2024-09-26 11:18:03,517][00517] Fps is (10 sec: 3685.7, 60 sec: 3618.0, 300 sec: 3635.2). Total num frames: 872448. Throughput: 0: 890.5. Samples: 218354. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-09-26 11:18:03,520][00517] Avg episode reward: [(0, '4.899')]
[2024-09-26 11:18:08,518][00517] Fps is (10 sec: 3685.4, 60 sec: 3754.7, 300 sec: 3644.6). Total num frames: 892928. Throughput: 0: 889.4. Samples: 221386. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-26 11:18:08,523][00517] Avg episode reward: [(0, '4.737')]
[2024-09-26 11:18:09,971][03854] Updated weights for policy 0, policy_version 220 (0.0028)
[2024-09-26 11:18:13,515][00517] Fps is (10 sec: 4506.4, 60 sec: 3822.9, 300 sec: 3670.0). Total num frames: 917504. Throughput: 0: 949.8. Samples: 228376. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-26 11:18:13,518][00517] Avg episode reward: [(0, '4.687')]
[2024-09-26 11:18:18,516][00517] Fps is (10 sec: 4097.1, 60 sec: 3686.4, 300 sec: 3662.3). Total num frames: 933888. Throughput: 0: 917.1. Samples: 233766. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-09-26 11:18:18,521][00517] Avg episode reward: [(0, '4.709')]
[2024-09-26 11:18:21,257][03854] Updated weights for policy 0, policy_version 230 (0.0025)
[2024-09-26 11:18:23,515][00517] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3654.9). Total num frames: 950272. Throughput: 0: 886.0. Samples: 235892. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-09-26 11:18:23,522][00517] Avg episode reward: [(0, '5.055')]
[2024-09-26 11:18:28,516][00517] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3678.7). Total num frames: 974848. Throughput: 0: 953.5. Samples: 242926. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-09-26 11:18:28,522][00517] Avg episode reward: [(0, '5.125')]
[2024-09-26 11:18:28,533][03841] Saving new best policy, reward=5.125!
[2024-09-26 11:18:29,888][03854] Updated weights for policy 0, policy_version 240 (0.0014)
[2024-09-26 11:18:33,515][00517] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3686.4). Total num frames: 995328. Throughput: 0: 1016.0. Samples: 249154. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-26 11:18:33,518][00517] Avg episode reward: [(0, '5.242')]
[2024-09-26 11:18:33,524][03841] Saving new best policy, reward=5.242!
[2024-09-26 11:18:38,515][00517] Fps is (10 sec: 3276.9, 60 sec: 3618.3, 300 sec: 3664.1). Total num frames: 1007616. Throughput: 0: 987.0. Samples: 251182. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-26 11:18:38,518][00517] Avg episode reward: [(0, '5.134')]
[2024-09-26 11:18:41,487][03854] Updated weights for policy 0, policy_version 250 (0.0040)
[2024-09-26 11:18:43,515][00517] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3686.4). Total num frames: 1032192. Throughput: 0: 964.5. Samples: 257228. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-26 11:18:43,517][00517] Avg episode reward: [(0, '4.974')]
[2024-09-26 11:18:48,515][00517] Fps is (10 sec: 4915.2, 60 sec: 4096.0, 300 sec: 3708.0). Total num frames: 1056768. Throughput: 0: 1022.8. Samples: 264376. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-26 11:18:48,525][00517] Avg episode reward: [(0, '5.351')]
[2024-09-26 11:18:48,537][03841] Saving new best policy, reward=5.351!
[2024-09-26 11:18:51,285][03854] Updated weights for policy 0, policy_version 260 (0.0022)
[2024-09-26 11:18:53,515][00517] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3686.4). Total num frames: 1069056. Throughput: 0: 1005.2. Samples: 266618. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-09-26 11:18:53,520][00517] Avg episode reward: [(0, '5.484')]
[2024-09-26 11:18:53,526][03841] Saving new best policy, reward=5.484!
[2024-09-26 11:18:58,515][00517] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3693.3). Total num frames: 1089536. Throughput: 0: 962.2. Samples: 271676. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-09-26 11:18:58,522][00517] Avg episode reward: [(0, '5.349')]
[2024-09-26 11:19:01,531][03854] Updated weights for policy 0, policy_version 270 (0.0025)
[2024-09-26 11:19:03,515][00517] Fps is (10 sec: 4505.6, 60 sec: 4027.9, 300 sec: 3776.7). Total num frames: 1114112. Throughput: 0: 1000.0. Samples: 278768. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-09-26 11:19:03,518][00517] Avg episode reward: [(0, '5.305')]
[2024-09-26 11:19:08,517][00517] Fps is (10 sec: 4095.5, 60 sec: 3959.6, 300 sec: 3832.2). Total num frames: 1130496. Throughput: 0: 1023.3. Samples: 281942. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-26 11:19:08,525][00517] Avg episode reward: [(0, '5.369')]
[2024-09-26 11:19:13,140][03854] Updated weights for policy 0, policy_version 280 (0.0030)
[2024-09-26 11:19:13,515][00517] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3873.8). Total num frames: 1146880. Throughput: 0: 960.9. Samples: 286166. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-09-26 11:19:13,518][00517] Avg episode reward: [(0, '5.487')]
[2024-09-26 11:19:13,520][03841] Saving new best policy, reward=5.487!
[2024-09-26 11:19:18,515][00517] Fps is (10 sec: 4096.5, 60 sec: 3959.5, 300 sec: 3873.8). Total num frames: 1171456. Throughput: 0: 973.2. Samples: 292948. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2024-09-26 11:19:18,522][00517] Avg episode reward: [(0, '5.832')]
[2024-09-26 11:19:18,535][03841] Saving new best policy, reward=5.832!
[2024-09-26 11:19:21,947][03854] Updated weights for policy 0, policy_version 290 (0.0030)
[2024-09-26 11:19:23,518][00517] Fps is (10 sec: 4504.3, 60 sec: 4027.5, 300 sec: 3859.9). Total num frames: 1191936. Throughput: 0: 1005.4. Samples: 296426. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-09-26 11:19:23,525][00517] Avg episode reward: [(0, '6.107')]
[2024-09-26 11:19:23,534][03841] Saving new best policy, reward=6.107!
[2024-09-26 11:19:28,515][00517] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3860.0). Total num frames: 1204224. Throughput: 0: 976.7. Samples: 301178. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-09-26 11:19:28,518][00517] Avg episode reward: [(0, '6.209')]
[2024-09-26 11:19:28,526][03841] Saving new best policy, reward=6.209!
[2024-09-26 11:19:33,515][00517] Fps is (10 sec: 3277.7, 60 sec: 3822.9, 300 sec: 3873.8). Total num frames: 1224704. Throughput: 0: 948.9. Samples: 307076. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-09-26 11:19:33,522][00517] Avg episode reward: [(0, '6.300')]
[2024-09-26 11:19:33,536][03841] Saving new best policy, reward=6.300!
[2024-09-26 11:19:33,545][03854] Updated weights for policy 0, policy_version 300 (0.0017)
[2024-09-26 11:19:38,515][00517] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3873.8). Total num frames: 1249280. Throughput: 0: 975.8. Samples: 310528. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-09-26 11:19:38,518][00517] Avg episode reward: [(0, '6.690')]
[2024-09-26 11:19:38,528][03841] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000305_1249280.pth...
[2024-09-26 11:19:38,652][03841] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000078_319488.pth
[2024-09-26 11:19:38,666][03841] Saving new best policy, reward=6.690!
[2024-09-26 11:19:43,515][00517] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 1265664. Throughput: 0: 984.6. Samples: 315982. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-09-26 11:19:43,522][00517] Avg episode reward: [(0, '6.842')]
[2024-09-26 11:19:43,527][03841] Saving new best policy, reward=6.842!
[2024-09-26 11:19:44,931][03854] Updated weights for policy 0, policy_version 310 (0.0021)
[2024-09-26 11:19:48,515][00517] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3873.8). Total num frames: 1282048. Throughput: 0: 939.5. Samples: 321046. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-09-26 11:19:48,518][00517] Avg episode reward: [(0, '7.628')]
[2024-09-26 11:19:48,526][03841] Saving new best policy, reward=7.628!
[2024-09-26 11:19:53,515][00517] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3873.8). Total num frames: 1306624. Throughput: 0: 944.0. Samples: 324422. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-09-26 11:19:53,518][00517] Avg episode reward: [(0, '8.186')]
[2024-09-26 11:19:53,525][03841] Saving new best policy, reward=8.186!
[2024-09-26 11:19:54,367][03854] Updated weights for policy 0, policy_version 320 (0.0022)
[2024-09-26 11:19:58,515][00517] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 1323008. Throughput: 0: 997.4. Samples: 331050. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2024-09-26 11:19:58,518][00517] Avg episode reward: [(0, '8.131')]
[2024-09-26 11:20:03,515][00517] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3860.0). Total num frames: 1339392. Throughput: 0: 941.4. Samples: 335310. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-26 11:20:03,518][00517] Avg episode reward: [(0, '7.350')]
[2024-09-26 11:20:05,691][03854] Updated weights for policy 0, policy_version 330 (0.0043)
[2024-09-26 11:20:08,515][00517] Fps is (10 sec: 4096.0, 60 sec: 3891.3, 300 sec: 3873.8). Total num frames: 1363968. Throughput: 0: 940.0. Samples: 338722. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2024-09-26 11:20:08,517][00517] Avg episode reward: [(0, '7.398')]
[2024-09-26 11:20:13,519][00517] Fps is (10 sec: 4503.9, 60 sec: 3959.2, 300 sec: 3873.8). Total num frames: 1384448. Throughput: 0: 985.9. Samples: 345546. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-09-26 11:20:13,522][00517] Avg episode reward: [(0, '7.766')]
[2024-09-26 11:20:15,745][03854] Updated weights for policy 0, policy_version 340 (0.0023)
[2024-09-26 11:20:18,515][00517] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3846.1). Total num frames: 1396736. Throughput: 0: 960.4. Samples: 350294. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-09-26 11:20:18,518][00517] Avg episode reward: [(0, '7.979')]
[2024-09-26 11:20:23,515][00517] Fps is (10 sec: 3687.8, 60 sec: 3823.1, 300 sec: 3873.8). Total num frames: 1421312. Throughput: 0: 943.2. Samples: 352970. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-26 11:20:23,522][00517] Avg episode reward: [(0, '8.200')]
[2024-09-26 11:20:23,525][03841] Saving new best policy, reward=8.200!
[2024-09-26 11:20:26,080][03854] Updated weights for policy 0, policy_version 350 (0.0043)
[2024-09-26 11:20:28,515][00517] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3873.8). Total num frames: 1441792. Throughput: 0: 978.1. Samples: 359996. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2024-09-26 11:20:28,523][00517] Avg episode reward: [(0, '7.708')]
[2024-09-26 11:20:33,515][00517] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 1458176. Throughput: 0: 990.2. Samples: 365604. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-09-26 11:20:33,520][00517] Avg episode reward: [(0, '7.251')]
[2024-09-26 11:20:37,473][03854] Updated weights for policy 0, policy_version 360 (0.0027)
[2024-09-26 11:20:38,517][00517] Fps is (10 sec: 3685.8, 60 sec: 3822.8, 300 sec: 3873.8). Total num frames: 1478656. Throughput: 0: 962.9. Samples: 367752. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2024-09-26 11:20:38,520][00517] Avg episode reward: [(0, '7.181')]
[2024-09-26 11:20:43,515][00517] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 1499136. Throughput: 0: 962.0. Samples: 374338. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-26 11:20:43,522][00517] Avg episode reward: [(0, '7.999')]
[2024-09-26 11:20:46,322][03854] Updated weights for policy 0, policy_version 370 (0.0030)
[2024-09-26 11:20:48,515][00517] Fps is (10 sec: 4096.7, 60 sec: 3959.5, 300 sec: 3860.0). Total num frames: 1519616. Throughput: 0: 1013.5. Samples: 380918. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2024-09-26 11:20:48,520][00517] Avg episode reward: [(0, '7.812')]
[2024-09-26 11:20:53,515][00517] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3860.0). Total num frames: 1536000. Throughput: 0: 984.5. Samples: 383024. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-09-26 11:20:53,518][00517] Avg episode reward: [(0, '7.787')]
[2024-09-26 11:20:57,540][03854] Updated weights for policy 0, policy_version 380 (0.0032)
[2024-09-26 11:20:58,515][00517] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3887.7). Total num frames: 1560576. Throughput: 0: 965.5. Samples: 388992. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-26 11:20:58,518][00517] Avg episode reward: [(0, '8.061')]
[2024-09-26 11:21:03,515][00517] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3873.8). Total num frames: 1581056. Throughput: 0: 1017.6. Samples: 396088. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-09-26 11:21:03,518][00517] Avg episode reward: [(0, '9.020')]
[2024-09-26 11:21:03,581][03841] Saving new best policy, reward=9.020!
[2024-09-26 11:21:08,054][03854] Updated weights for policy 0, policy_version 390 (0.0022)
[2024-09-26 11:21:08,515][00517] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 1597440. Throughput: 0: 1009.5. Samples: 398398. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-09-26 11:21:08,522][00517] Avg episode reward: [(0, '9.141')]
[2024-09-26 11:21:08,531][03841] Saving new best policy, reward=9.141!
[2024-09-26 11:21:13,516][00517] Fps is (10 sec: 3276.7, 60 sec: 3823.2, 300 sec: 3873.8). Total num frames: 1613824. Throughput: 0: 960.3. Samples: 403208. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-26 11:21:13,518][00517] Avg episode reward: [(0, '9.309')]
[2024-09-26 11:21:13,580][03841] Saving new best policy, reward=9.309!
[2024-09-26 11:21:17,873][03854] Updated weights for policy 0, policy_version 400 (0.0031)
[2024-09-26 11:21:18,515][00517] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3873.8). Total num frames: 1638400. Throughput: 0: 991.6. Samples: 410226. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-09-26 11:21:18,517][00517] Avg episode reward: [(0, '9.670')]
[2024-09-26 11:21:18,528][03841] Saving new best policy, reward=9.670!
[2024-09-26 11:21:23,515][00517] Fps is (10 sec: 4505.7, 60 sec: 3959.5, 300 sec: 3860.0). Total num frames: 1658880. Throughput: 0: 1016.9. Samples: 413510. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-26 11:21:23,520][00517] Avg episode reward: [(0, '9.749')]
[2024-09-26 11:21:23,524][03841] Saving new best policy, reward=9.749!
[2024-09-26 11:21:28,518][00517] Fps is (10 sec: 3685.5, 60 sec: 3891.0, 300 sec: 3873.8). Total num frames: 1675264. Throughput: 0: 964.9. Samples: 417760. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-26 11:21:28,523][00517] Avg episode reward: [(0, '10.786')]
[2024-09-26 11:21:28,535][03841] Saving new best policy, reward=10.786!
[2024-09-26 11:21:29,384][03854] Updated weights for policy 0, policy_version 410 (0.0054)
[2024-09-26 11:21:33,515][00517] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3873.8). Total num frames: 1695744. Throughput: 0: 972.9. Samples: 424700. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-26 11:21:33,518][00517] Avg episode reward: [(0, '11.393')]
[2024-09-26 11:21:33,522][03841] Saving new best policy, reward=11.393!
[2024-09-26 11:21:38,518][00517] Fps is (10 sec: 4506.6, 60 sec: 4027.8, 300 sec: 3887.7). Total num frames: 1720320. Throughput: 0: 1003.9. Samples: 428198. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-09-26 11:21:38,522][00517] Avg episode reward: [(0, '12.013')]
[2024-09-26 11:21:38,532][03841] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000420_1720320.pth...
[2024-09-26 11:21:38,544][03854] Updated weights for policy 0, policy_version 420 (0.0040)
[2024-09-26 11:21:38,751][03841] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000193_790528.pth
[2024-09-26 11:21:38,765][03841] Saving new best policy, reward=12.013!
[2024-09-26 11:21:43,515][00517] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 1732608. Throughput: 0: 972.8. Samples: 432770. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-09-26 11:21:43,518][00517] Avg episode reward: [(0, '11.385')]
[2024-09-26 11:21:48,515][00517] Fps is (10 sec: 3276.9, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 1753088. Throughput: 0: 950.0. Samples: 438836. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-09-26 11:21:48,517][00517] Avg episode reward: [(0, '10.968')]
[2024-09-26 11:21:49,780][03854] Updated weights for policy 0, policy_version 430 (0.0048)
[2024-09-26 11:21:53,515][00517] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3887.7). Total num frames: 1777664. Throughput: 0: 977.0. Samples: 442364. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-26 11:21:53,517][00517] Avg episode reward: [(0, '10.295')]
[2024-09-26 11:21:58,515][00517] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 1794048. Throughput: 0: 999.7. Samples: 448196. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-26 11:21:58,520][00517] Avg episode reward: [(0, '10.438')]
[2024-09-26 11:22:00,898][03854] Updated weights for policy 0, policy_version 440 (0.0019)
[2024-09-26 11:22:03,515][00517] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3873.9). Total num frames: 1810432. Throughput: 0: 958.5. Samples: 453360. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-09-26 11:22:03,520][00517] Avg episode reward: [(0, '10.157')]
[2024-09-26 11:22:08,515][00517] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3887.7). Total num frames: 1835008. Throughput: 0: 966.4. Samples: 456998. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-09-26 11:22:08,518][00517] Avg episode reward: [(0, '10.659')]
[2024-09-26 11:22:09,814][03854] Updated weights for policy 0, policy_version 450 (0.0027)
[2024-09-26 11:22:13,515][00517] Fps is (10 sec: 4505.6, 60 sec: 4027.8, 300 sec: 3873.8). Total num frames: 1855488. Throughput: 0: 1016.9. Samples: 463520. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2024-09-26 11:22:13,523][00517] Avg episode reward: [(0, '11.082')]
[2024-09-26 11:22:18,515][00517] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3860.0). Total num frames: 1867776. Throughput: 0: 957.0. Samples: 467764. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2024-09-26 11:22:18,518][00517] Avg episode reward: [(0, '11.228')]
[2024-09-26 11:22:21,229][03854] Updated weights for policy 0, policy_version 460 (0.0037)
[2024-09-26 11:22:23,518][00517] Fps is (10 sec: 3685.3, 60 sec: 3891.0, 300 sec: 3887.7). Total num frames: 1892352. Throughput: 0: 959.5. Samples: 471376. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2024-09-26 11:22:23,520][00517] Avg episode reward: [(0, '11.458')]
[2024-09-26 11:22:28,516][00517] Fps is (10 sec: 3686.1, 60 sec: 3823.0, 300 sec: 3846.1). Total num frames: 1904640. Throughput: 0: 972.1. Samples: 476514. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-09-26 11:22:28,518][00517] Avg episode reward: [(0, '12.031')]
[2024-09-26 11:22:28,534][03841] Saving new best policy, reward=12.031!
[2024-09-26 11:22:33,515][00517] Fps is (10 sec: 2458.3, 60 sec: 3686.4, 300 sec: 3818.4). Total num frames: 1916928. Throughput: 0: 922.0. Samples: 480328. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-09-26 11:22:33,518][00517] Avg episode reward: [(0, '11.815')]
[2024-09-26 11:22:35,036][03854] Updated weights for policy 0, policy_version 470 (0.0022)
[2024-09-26 11:22:38,515][00517] Fps is (10 sec: 3277.1, 60 sec: 3618.2, 300 sec: 3860.0). Total num frames: 1937408. Throughput: 0: 895.4. Samples: 482656. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-09-26 11:22:38,518][00517] Avg episode reward: [(0, '12.539')]
[2024-09-26 11:22:38,532][03841] Saving new best policy, reward=12.539!
[2024-09-26 11:22:43,515][00517] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3901.6). Total num frames: 1961984. Throughput: 0: 916.0. Samples: 489418. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-09-26 11:22:43,518][00517] Avg episode reward: [(0, '13.056')]
[2024-09-26 11:22:43,520][03841] Saving new best policy, reward=13.056!
[2024-09-26 11:22:44,275][03854] Updated weights for policy 0, policy_version 480 (0.0022)
[2024-09-26 11:22:48,515][00517] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3873.8). Total num frames: 1978368. Throughput: 0: 934.8. Samples: 495426. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-09-26 11:22:48,518][00517] Avg episode reward: [(0, '13.572')]
[2024-09-26 11:22:48,527][03841] Saving new best policy, reward=13.572!
[2024-09-26 11:22:53,515][00517] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3860.0). Total num frames: 1994752. Throughput: 0: 899.8. Samples: 497488. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-09-26 11:22:53,518][00517] Avg episode reward: [(0, '14.349')]
[2024-09-26 11:22:53,520][03841] Saving new best policy, reward=14.349!
[2024-09-26 11:22:55,652][03854] Updated weights for policy 0, policy_version 490 (0.0025)
[2024-09-26 11:22:58,515][00517] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3887.8). Total num frames: 2019328. Throughput: 0: 897.5. Samples: 503906. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-09-26 11:22:58,520][00517] Avg episode reward: [(0, '14.365')]
[2024-09-26 11:22:58,533][03841] Saving new best policy, reward=14.365!
[2024-09-26 11:23:03,515][00517] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3887.8). Total num frames: 2039808. Throughput: 0: 956.4. Samples: 510804. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-26 11:23:03,518][00517] Avg episode reward: [(0, '13.961')]
[2024-09-26 11:23:05,543][03854] Updated weights for policy 0, policy_version 500 (0.0022)
[2024-09-26 11:23:08,515][00517] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3860.0). Total num frames: 2056192. Throughput: 0: 921.5. Samples: 512842. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-09-26 11:23:08,518][00517] Avg episode reward: [(0, '13.984')]
[2024-09-26 11:23:13,515][00517] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3873.8). Total num frames: 2076672. Throughput: 0: 927.2. Samples: 518236. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-26 11:23:13,521][00517] Avg episode reward: [(0, '13.380')]
[2024-09-26 11:23:15,798][03854] Updated weights for policy 0, policy_version 510 (0.0029)
[2024-09-26 11:23:18,515][00517] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3901.6). Total num frames: 2101248. Throughput: 0: 999.4. Samples: 525302. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-09-26 11:23:18,523][00517] Avg episode reward: [(0, '14.234')]
[2024-09-26 11:23:23,516][00517] Fps is (10 sec: 3686.1, 60 sec: 3686.5, 300 sec: 3860.0). Total num frames: 2113536. Throughput: 0: 1010.8. Samples: 528144. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2024-09-26 11:23:23,524][00517] Avg episode reward: [(0, '15.467')]
[2024-09-26 11:23:23,527][03841] Saving new best policy, reward=15.467!
[2024-09-26 11:23:27,452][03854] Updated weights for policy 0, policy_version 520 (0.0024)
[2024-09-26 11:23:28,515][00517] Fps is (10 sec: 3276.8, 60 sec: 3823.0, 300 sec: 3860.0). Total num frames: 2134016. Throughput: 0: 960.0. Samples: 532618. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2024-09-26 11:23:28,521][00517] Avg episode reward: [(0, '16.163')]
[2024-09-26 11:23:28,539][03841] Saving new best policy, reward=16.163!
[2024-09-26 11:23:33,515][00517] Fps is (10 sec: 4096.4, 60 sec: 3959.5, 300 sec: 3887.7). Total num frames: 2154496. Throughput: 0: 979.8. Samples: 539518. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2024-09-26 11:23:33,518][00517] Avg episode reward: [(0, '17.582')]
[2024-09-26 11:23:33,526][03841] Saving new best policy, reward=17.582!
[2024-09-26 11:23:36,274][03854] Updated weights for policy 0, policy_version 530 (0.0020)
[2024-09-26 11:23:38,515][00517] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3873.8). Total num frames: 2174976. Throughput: 0: 1010.4. Samples: 542956. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2024-09-26 11:23:38,517][00517] Avg episode reward: [(0, '17.512')]
[2024-09-26 11:23:38,534][03841] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000531_2174976.pth...
[2024-09-26 11:23:38,702][03841] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000305_1249280.pth
[2024-09-26 11:23:43,516][00517] Fps is (10 sec: 3686.0, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 2191360. Throughput: 0: 963.8. Samples: 547276. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-09-26 11:23:43,523][00517] Avg episode reward: [(0, '15.899')]
[2024-09-26 11:23:47,905][03854] Updated weights for policy 0, policy_version 540 (0.0039)
[2024-09-26 11:23:48,515][00517] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 2211840. Throughput: 0: 951.8. Samples: 553634. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-26 11:23:48,522][00517] Avg episode reward: [(0, '15.994')]
[2024-09-26 11:23:53,515][00517] Fps is (10 sec: 4506.0, 60 sec: 4027.7, 300 sec: 3887.7). Total num frames: 2236416. Throughput: 0: 985.2. Samples: 557174. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2024-09-26 11:23:53,519][00517] Avg episode reward: [(0, '16.205')]
[2024-09-26 11:23:58,515][00517] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 2248704. Throughput: 0: 986.4. Samples: 562622. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2024-09-26 11:23:58,520][00517] Avg episode reward: [(0, '15.889')]
[2024-09-26 11:23:58,731][03854] Updated weights for policy 0, policy_version 550 (0.0027)
[2024-09-26 11:24:03,515][00517] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3860.0). Total num frames: 2269184. Throughput: 0: 949.4. Samples: 568026. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2024-09-26 11:24:03,520][00517] Avg episode reward: [(0, '16.458')]
[2024-09-26 11:24:08,118][03854] Updated weights for policy 0, policy_version 560 (0.0027)
[2024-09-26 11:24:08,515][00517] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3887.7). Total num frames: 2293760. Throughput: 0: 965.7. Samples: 571600. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-26 11:24:08,522][00517] Avg episode reward: [(0, '17.245')]
[2024-09-26 11:24:13,515][00517] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 2310144. Throughput: 0: 1006.1. Samples: 577894. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-26 11:24:13,519][00517] Avg episode reward: [(0, '17.189')]
[2024-09-26 11:24:18,515][00517] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3846.1). Total num frames: 2326528. Throughput: 0: 949.7. Samples: 582256. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-26 11:24:18,522][00517] Avg episode reward: [(0, '17.011')]
[2024-09-26 11:24:19,746][03854] Updated weights for policy 0, policy_version 570 (0.0023)
[2024-09-26 11:24:23,515][00517] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3887.7). Total num frames: 2351104. Throughput: 0: 952.2. Samples: 585804. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2024-09-26 11:24:23,518][00517] Avg episode reward: [(0, '18.455')]
[2024-09-26 11:24:23,521][03841] Saving new best policy, reward=18.455!
[2024-09-26 11:24:28,515][00517] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3887.7). Total num frames: 2371584. Throughput: 0: 1014.1. Samples: 592908. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-09-26 11:24:28,518][00517] Avg episode reward: [(0, '19.112')]
[2024-09-26 11:24:28,524][03841] Saving new best policy, reward=19.112!
[2024-09-26 11:24:28,798][03854] Updated weights for policy 0, policy_version 580 (0.0018)
[2024-09-26 11:24:33,519][00517] Fps is (10 sec: 3685.0, 60 sec: 3891.0, 300 sec: 3859.9). Total num frames: 2387968. Throughput: 0: 974.1. Samples: 597472. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2024-09-26 11:24:33,526][00517] Avg episode reward: [(0, '19.015')]
[2024-09-26 11:24:38,515][00517] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 2408448. Throughput: 0: 955.7. Samples: 600182. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-26 11:24:38,518][00517] Avg episode reward: [(0, '21.336')]
[2024-09-26 11:24:38,530][03841] Saving new best policy, reward=21.336!
[2024-09-26 11:24:39,816][03854] Updated weights for policy 0, policy_version 590 (0.0022)
[2024-09-26 11:24:43,515][00517] Fps is (10 sec: 4507.3, 60 sec: 4027.8, 300 sec: 3901.6). Total num frames: 2433024. Throughput: 0: 988.8. Samples: 607116. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-09-26 11:24:43,522][00517] Avg episode reward: [(0, '21.780')]
[2024-09-26 11:24:43,524][03841] Saving new best policy, reward=21.780!
[2024-09-26 11:24:48,515][00517] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 2445312. Throughput: 0: 986.4. Samples: 612416. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-09-26 11:24:48,519][00517] Avg episode reward: [(0, '20.842')]
[2024-09-26 11:24:51,667][03854] Updated weights for policy 0, policy_version 600 (0.0041)
[2024-09-26 11:24:53,520][00517] Fps is (10 sec: 3275.4, 60 sec: 3822.7, 300 sec: 3873.8). Total num frames: 2465792. Throughput: 0: 954.0. Samples: 614532. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2024-09-26 11:24:53,529][00517] Avg episode reward: [(0, '21.986')]
[2024-09-26 11:24:53,531][03841] Saving new best policy, reward=21.986!
[2024-09-26 11:24:58,515][00517] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3887.7). Total num frames: 2486272. Throughput: 0: 960.1. Samples: 621098. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-09-26 11:24:58,518][00517] Avg episode reward: [(0, '21.871')]
[2024-09-26 11:25:00,504][03854] Updated weights for policy 0, policy_version 610 (0.0025)
[2024-09-26 11:25:03,516][00517] Fps is (10 sec: 4097.5, 60 sec: 3959.4, 300 sec: 3873.8). Total num frames: 2506752. Throughput: 0: 1007.4. Samples: 627590. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-26 11:25:03,524][00517] Avg episode reward: [(0, '21.203')]
[2024-09-26 11:25:08,515][00517] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3860.0). Total num frames: 2523136. Throughput: 0: 976.2. Samples: 629734. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2024-09-26 11:25:08,520][00517] Avg episode reward: [(0, '20.545')]
[2024-09-26 11:25:11,851][03854] Updated weights for policy 0, policy_version 620 (0.0022)
[2024-09-26 11:25:13,515][00517] Fps is (10 sec: 3686.6, 60 sec: 3891.2, 300 sec: 3887.7). Total num frames: 2543616. Throughput: 0: 948.5. Samples: 635590. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2024-09-26 11:25:13,520][00517] Avg episode reward: [(0, '19.532')]
[2024-09-26 11:25:18,517][00517] Fps is (10 sec: 4505.0, 60 sec: 4027.6, 300 sec: 3887.7). Total num frames: 2568192. Throughput: 0: 1005.6. Samples: 642722. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2024-09-26 11:25:18,519][00517] Avg episode reward: [(0, '19.592')]
[2024-09-26 11:25:22,011][03854] Updated weights for policy 0, policy_version 630 (0.0029)
[2024-09-26 11:25:23,515][00517] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 2584576. Throughput: 0: 997.8. Samples: 645084. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2024-09-26 11:25:23,518][00517] Avg episode reward: [(0, '18.777')]
[2024-09-26 11:25:28,515][00517] Fps is (10 sec: 3686.9, 60 sec: 3891.2, 300 sec: 3887.7). Total num frames: 2605056. Throughput: 0: 957.7. Samples: 650214. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-26 11:25:28,523][00517] Avg episode reward: [(0, '20.074')]
[2024-09-26 11:25:31,692][03854] Updated weights for policy 0, policy_version 640 (0.0050)
[2024-09-26 11:25:33,515][00517] Fps is (10 sec: 4096.0, 60 sec: 3959.7, 300 sec: 3887.8). Total num frames: 2625536. Throughput: 0: 997.5. Samples: 657302. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-09-26 11:25:33,517][00517] Avg episode reward: [(0, '20.265')]
[2024-09-26 11:25:38,515][00517] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3887.7). Total num frames: 2646016. Throughput: 0: 1024.8. Samples: 660646. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-09-26 11:25:38,518][00517] Avg episode reward: [(0, '18.597')]
[2024-09-26 11:25:38,530][03841] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000646_2646016.pth...
[2024-09-26 11:25:38,679][03841] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000420_1720320.pth
[2024-09-26 11:25:43,124][03854] Updated weights for policy 0, policy_version 650 (0.0035)
[2024-09-26 11:25:43,515][00517] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3873.8). Total num frames: 2662400. Throughput: 0: 973.0. Samples: 664882. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-09-26 11:25:43,518][00517] Avg episode reward: [(0, '18.120')]
[2024-09-26 11:25:48,515][00517] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3887.7). Total num frames: 2682880. Throughput: 0: 980.1. Samples: 671696. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-09-26 11:25:48,519][00517] Avg episode reward: [(0, '19.534')]
[2024-09-26 11:25:51,948][03854] Updated weights for policy 0, policy_version 660 (0.0031)
[2024-09-26 11:25:53,515][00517] Fps is (10 sec: 4505.6, 60 sec: 4028.0, 300 sec: 3887.7). Total num frames: 2707456. Throughput: 0: 1013.8. Samples: 675354. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-26 11:25:53,522][00517] Avg episode reward: [(0, '17.929')]
[2024-09-26 11:25:58,515][00517] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 2719744. Throughput: 0: 995.3. Samples: 680380. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2024-09-26 11:25:58,518][00517] Avg episode reward: [(0, '18.389')]
[2024-09-26 11:26:03,168][03854] Updated weights for policy 0, policy_version 670 (0.0032)
[2024-09-26 11:26:03,515][00517] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3887.7). Total num frames: 2744320. Throughput: 0: 971.5. Samples: 686438. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2024-09-26 11:26:03,518][00517] Avg episode reward: [(0, '19.920')]
[2024-09-26 11:26:08,515][00517] Fps is (10 sec: 4915.2, 60 sec: 4096.0, 300 sec: 3915.5). Total num frames: 2768896. Throughput: 0: 997.2. Samples: 689956. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-26 11:26:08,518][00517] Avg episode reward: [(0, '20.820')]
[2024-09-26 11:26:13,510][03854] Updated weights for policy 0, policy_version 680 (0.0019)
[2024-09-26 11:26:13,515][00517] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3887.7). Total num frames: 2785280. Throughput: 0: 1011.5. Samples: 695732. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-09-26 11:26:13,522][00517] Avg episode reward: [(0, '20.424')]
[2024-09-26 11:26:18,515][00517] Fps is (10 sec: 3276.8, 60 sec: 3891.3, 300 sec: 3873.8). Total num frames: 2801664. Throughput: 0: 964.9. Samples: 700722. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-09-26 11:26:18,518][00517] Avg episode reward: [(0, '20.238')]
[2024-09-26 11:26:23,384][03854] Updated weights for policy 0, policy_version 690 (0.0037)
[2024-09-26 11:26:23,515][00517] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3901.6). Total num frames: 2826240. Throughput: 0: 969.5. Samples: 704272. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-26 11:26:23,521][00517] Avg episode reward: [(0, '20.411')]
[2024-09-26 11:26:28,517][00517] Fps is (10 sec: 4504.9, 60 sec: 4027.6, 300 sec: 3901.6). Total num frames: 2846720. Throughput: 0: 1031.3. Samples: 711294. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-09-26 11:26:28,519][00517] Avg episode reward: [(0, '17.964')]
[2024-09-26 11:26:33,515][00517] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 2859008. Throughput: 0: 975.0. Samples: 715570. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-09-26 11:26:33,520][00517] Avg episode reward: [(0, '18.534')]
[2024-09-26 11:26:34,734][03854] Updated weights for policy 0, policy_version 700 (0.0025)
[2024-09-26 11:26:38,515][00517] Fps is (10 sec: 3687.0, 60 sec: 3959.5, 300 sec: 3901.6). Total num frames: 2883584. Throughput: 0: 964.9. Samples: 718774. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-09-26 11:26:38,518][00517] Avg episode reward: [(0, '18.422')]
[2024-09-26 11:26:43,515][00517] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3901.6). Total num frames: 2904064. Throughput: 0: 1005.7. Samples: 725636. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-26 11:26:43,522][00517] Avg episode reward: [(0, '18.508')]
[2024-09-26 11:26:43,991][03854] Updated weights for policy 0, policy_version 710 (0.0016)
[2024-09-26 11:26:48,518][00517] Fps is (10 sec: 3275.9, 60 sec: 3891.0, 300 sec: 3859.9). Total num frames: 2916352. Throughput: 0: 970.3. Samples: 730104. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-09-26 11:26:48,521][00517] Avg episode reward: [(0, '19.327')]
[2024-09-26 11:26:53,518][00517] Fps is (10 sec: 2456.9, 60 sec: 3686.2, 300 sec: 3846.0). Total num frames: 2928640. Throughput: 0: 927.5. Samples: 731694. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-09-26 11:26:53,521][00517] Avg episode reward: [(0, '19.352')]
[2024-09-26 11:26:58,388][03854] Updated weights for policy 0, policy_version 720 (0.0041)
[2024-09-26 11:26:58,515][00517] Fps is (10 sec: 3277.7, 60 sec: 3822.9, 300 sec: 3860.0). Total num frames: 2949120. Throughput: 0: 901.1. Samples: 736280. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-09-26 11:26:58,520][00517] Avg episode reward: [(0, '20.233')]
[2024-09-26 11:27:03,515][00517] Fps is (10 sec: 3687.5, 60 sec: 3686.4, 300 sec: 3832.2). Total num frames: 2965504. Throughput: 0: 931.2. Samples: 742624. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-26 11:27:03,520][00517] Avg episode reward: [(0, '21.181')]
[2024-09-26 11:27:08,515][00517] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3818.3). Total num frames: 2981888. Throughput: 0: 896.3. Samples: 744606. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2024-09-26 11:27:08,522][00517] Avg episode reward: [(0, '20.844')]
[2024-09-26 11:27:10,245][03854] Updated weights for policy 0, policy_version 730 (0.0049)
[2024-09-26 11:27:13,515][00517] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3846.1). Total num frames: 3002368. Throughput: 0: 861.1. Samples: 750040. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-26 11:27:13,523][00517] Avg episode reward: [(0, '20.740')]
[2024-09-26 11:27:18,515][00517] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3846.1). Total num frames: 3026944. Throughput: 0: 915.2. Samples: 756752. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-09-26 11:27:18,518][00517] Avg episode reward: [(0, '20.492')]
[2024-09-26 11:27:19,554][03854] Updated weights for policy 0, policy_version 740 (0.0033)
[2024-09-26 11:27:23,516][00517] Fps is (10 sec: 3686.0, 60 sec: 3549.8, 300 sec: 3846.1). Total num frames: 3039232. Throughput: 0: 898.9. Samples: 759224. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-09-26 11:27:23,519][00517] Avg episode reward: [(0, '20.726')]
[2024-09-26 11:27:28,515][00517] Fps is (10 sec: 2867.2, 60 sec: 3481.7, 300 sec: 3860.0). Total num frames: 3055616. Throughput: 0: 849.3. Samples: 763854. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-09-26 11:27:28,521][00517] Avg episode reward: [(0, '20.643')]
[2024-09-26 11:27:31,102][03854] Updated weights for policy 0, policy_version 750 (0.0038)
[2024-09-26 11:27:33,515][00517] Fps is (10 sec: 4096.4, 60 sec: 3686.4, 300 sec: 3873.8). Total num frames: 3080192. Throughput: 0: 900.7. Samples: 770632. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2024-09-26 11:27:33,522][00517] Avg episode reward: [(0, '20.176')]
[2024-09-26 11:27:38,515][00517] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3846.1). Total num frames: 3096576. Throughput: 0: 940.9. Samples: 774032. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2024-09-26 11:27:38,518][00517] Avg episode reward: [(0, '20.027')]
[2024-09-26 11:27:38,531][03841] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000756_3096576.pth...
[2024-09-26 11:27:38,730][03841] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000531_2174976.pth
[2024-09-26 11:27:43,031][03854] Updated weights for policy 0, policy_version 760 (0.0021)
[2024-09-26 11:27:43,515][00517] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3846.1). Total num frames: 3112960. Throughput: 0: 928.8. Samples: 778076. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2024-09-26 11:27:43,518][00517] Avg episode reward: [(0, '20.192')]
[2024-09-26 11:27:48,515][00517] Fps is (10 sec: 4096.0, 60 sec: 3686.6, 300 sec: 3873.8). Total num frames: 3137536. Throughput: 0: 928.8. Samples: 784420. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-09-26 11:27:48,518][00517] Avg episode reward: [(0, '19.651')]
[2024-09-26 11:27:52,011][03854] Updated weights for policy 0, policy_version 770 (0.0027)
[2024-09-26 11:27:53,515][00517] Fps is (10 sec: 4505.6, 60 sec: 3823.1, 300 sec: 3860.0). Total num frames: 3158016. Throughput: 0: 960.0. Samples: 787806. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-09-26 11:27:53,524][00517] Avg episode reward: [(0, '20.912')]
[2024-09-26 11:27:58,516][00517] Fps is (10 sec: 3276.5, 60 sec: 3686.3, 300 sec: 3832.2). Total num frames: 3170304. Throughput: 0: 948.0. Samples: 792702. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-09-26 11:27:58,518][00517] Avg episode reward: [(0, '21.130')]
[2024-09-26 11:28:03,516][00517] Fps is (10 sec: 3276.7, 60 sec: 3754.6, 300 sec: 3846.1). Total num frames: 3190784. Throughput: 0: 923.2. Samples: 798298. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-09-26 11:28:03,522][00517] Avg episode reward: [(0, '21.960')]
[2024-09-26 11:28:03,792][03854] Updated weights for policy 0, policy_version 780 (0.0033)
[2024-09-26 11:28:08,515][00517] Fps is (10 sec: 4506.0, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 3215360. Throughput: 0: 946.4. Samples: 801812. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-26 11:28:08,518][00517] Avg episode reward: [(0, '21.546')]
[2024-09-26 11:28:13,516][00517] Fps is (10 sec: 4096.1, 60 sec: 3822.9, 300 sec: 3832.2). Total num frames: 3231744. Throughput: 0: 978.3. Samples: 807878. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-26 11:28:13,518][00517] Avg episode reward: [(0, '20.355')]
[2024-09-26 11:28:14,487][03854] Updated weights for policy 0, policy_version 790 (0.0025)
[2024-09-26 11:28:18,515][00517] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3846.1). Total num frames: 3248128. Throughput: 0: 935.3. Samples: 812720. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-09-26 11:28:18,519][00517] Avg episode reward: [(0, '19.722')]
[2024-09-26 11:28:23,515][00517] Fps is (10 sec: 4096.1, 60 sec: 3891.3, 300 sec: 3860.0). Total num frames: 3272704. Throughput: 0: 937.0. Samples: 816196. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-26 11:28:23,518][00517] Avg episode reward: [(0, '18.094')]
[2024-09-26 11:28:24,059][03854] Updated weights for policy 0, policy_version 800 (0.0025)
[2024-09-26 11:28:28,515][00517] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3860.0). Total num frames: 3293184. Throughput: 0: 1004.1. Samples: 823260. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-26 11:28:28,523][00517] Avg episode reward: [(0, '17.712')]
[2024-09-26 11:28:33,515][00517] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3832.2). Total num frames: 3305472. Throughput: 0: 956.8. Samples: 827474. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-26 11:28:33,518][00517] Avg episode reward: [(0, '18.402')]
[2024-09-26 11:28:35,413][03854] Updated weights for policy 0, policy_version 810 (0.0031)
[2024-09-26 11:28:38,515][00517] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 3330048. Throughput: 0: 953.2. Samples: 830700. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-09-26 11:28:38,522][00517] Avg episode reward: [(0, '19.548')]
[2024-09-26 11:28:43,517][00517] Fps is (10 sec: 4914.6, 60 sec: 4027.6, 300 sec: 3873.8). Total num frames: 3354624. Throughput: 0: 997.2. Samples: 837578. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-09-26 11:28:43,523][00517] Avg episode reward: [(0, '19.403')]
[2024-09-26 11:28:44,697][03854] Updated weights for policy 0, policy_version 820 (0.0021)
[2024-09-26 11:28:48,515][00517] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3832.2). Total num frames: 3366912. Throughput: 0: 982.0. Samples: 842488. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-09-26 11:28:48,523][00517] Avg episode reward: [(0, '20.339')]
[2024-09-26 11:28:53,515][00517] Fps is (10 sec: 3277.2, 60 sec: 3822.9, 300 sec: 3860.0). Total num frames: 3387392. Throughput: 0: 954.0. Samples: 844742. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-09-26 11:28:53,522][00517] Avg episode reward: [(0, '20.446')]
[2024-09-26 11:28:55,774][03854] Updated weights for policy 0, policy_version 830 (0.0030)
[2024-09-26 11:28:58,515][00517] Fps is (10 sec: 4505.6, 60 sec: 4027.8, 300 sec: 3873.8). Total num frames: 3411968. Throughput: 0: 977.7. Samples: 851874. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-09-26 11:28:58,523][00517] Avg episode reward: [(0, '21.496')]
[2024-09-26 11:29:03,519][00517] Fps is (10 sec: 4094.4, 60 sec: 3959.2, 300 sec: 3846.0). Total num frames: 3428352. Throughput: 0: 1006.0. Samples: 857992. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-09-26 11:29:03,523][00517] Avg episode reward: [(0, '22.125')]
[2024-09-26 11:29:03,529][03841] Saving new best policy, reward=22.125!
[2024-09-26 11:29:06,931][03854] Updated weights for policy 0, policy_version 840 (0.0022)
[2024-09-26 11:29:08,515][00517] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 3444736. Throughput: 0: 973.6. Samples: 860010. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2024-09-26 11:29:08,521][00517] Avg episode reward: [(0, '23.028')]
[2024-09-26 11:29:08,528][03841] Saving new best policy, reward=23.028!
[2024-09-26 11:29:13,515][00517] Fps is (10 sec: 4097.6, 60 sec: 3959.5, 300 sec: 3873.8). Total num frames: 3469312. Throughput: 0: 955.2. Samples: 866244. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-09-26 11:29:13,521][00517] Avg episode reward: [(0, '25.026')]
[2024-09-26 11:29:13,528][03841] Saving new best policy, reward=25.026!
[2024-09-26 11:29:16,150][03854] Updated weights for policy 0, policy_version 850 (0.0028)
[2024-09-26 11:29:18,516][00517] Fps is (10 sec: 4505.5, 60 sec: 4027.7, 300 sec: 3860.0). Total num frames: 3489792. Throughput: 0: 1009.9. Samples: 872920. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-26 11:29:18,518][00517] Avg episode reward: [(0, '24.468')]
[2024-09-26 11:29:23,515][00517] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3832.2). Total num frames: 3502080. Throughput: 0: 983.0. Samples: 874936. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-09-26 11:29:23,518][00517] Avg episode reward: [(0, '24.007')]
[2024-09-26 11:29:27,559][03854] Updated weights for policy 0, policy_version 860 (0.0021)
[2024-09-26 11:29:28,515][00517] Fps is (10 sec: 3686.5, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 3526656. Throughput: 0: 954.6. Samples: 880536. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-09-26 11:29:28,518][00517] Avg episode reward: [(0, '23.209')]
[2024-09-26 11:29:33,515][00517] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3860.0). Total num frames: 3547136. Throughput: 0: 1003.6. Samples: 887648. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-09-26 11:29:33,518][00517] Avg episode reward: [(0, '22.967')]
[2024-09-26 11:29:37,232][03854] Updated weights for policy 0, policy_version 870 (0.0033)
[2024-09-26 11:29:38,516][00517] Fps is (10 sec: 3686.3, 60 sec: 3891.2, 300 sec: 3832.2). Total num frames: 3563520. Throughput: 0: 1014.8. Samples: 890408. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-09-26 11:29:38,520][00517] Avg episode reward: [(0, '21.254')]
[2024-09-26 11:29:38,532][03841] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000870_3563520.pth...
[2024-09-26 11:29:38,690][03841] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000646_2646016.pth
[2024-09-26 11:29:43,515][00517] Fps is (10 sec: 3686.4, 60 sec: 3823.0, 300 sec: 3860.0). Total num frames: 3584000. Throughput: 0: 957.5. Samples: 894962. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-09-26 11:29:43,519][00517] Avg episode reward: [(0, '21.489')]
[2024-09-26 11:29:47,731][03854] Updated weights for policy 0, policy_version 880 (0.0031)
[2024-09-26 11:29:48,515][00517] Fps is (10 sec: 4096.1, 60 sec: 3959.5, 300 sec: 3860.0). Total num frames: 3604480. Throughput: 0: 977.3. Samples: 901968. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-26 11:29:48,518][00517] Avg episode reward: [(0, '21.672')]
[2024-09-26 11:29:53,515][00517] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3860.0). Total num frames: 3624960. Throughput: 0: 1009.6. Samples: 905442. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-09-26 11:29:53,525][00517] Avg episode reward: [(0, '20.363')]
[2024-09-26 11:29:58,516][00517] Fps is (10 sec: 3686.3, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 3641344. Throughput: 0: 968.6. Samples: 909832. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-26 11:29:58,522][00517] Avg episode reward: [(0, '19.756')]
[2024-09-26 11:29:59,169][03854] Updated weights for policy 0, policy_version 890 (0.0034)
[2024-09-26 11:30:03,515][00517] Fps is (10 sec: 4096.0, 60 sec: 3959.7, 300 sec: 3873.8). Total num frames: 3665920. Throughput: 0: 966.5. Samples: 916414. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-09-26 11:30:03,517][00517] Avg episode reward: [(0, '20.404')]
[2024-09-26 11:30:07,842][03854] Updated weights for policy 0, policy_version 900 (0.0025)
[2024-09-26 11:30:08,515][00517] Fps is (10 sec: 4505.7, 60 sec: 4027.7, 300 sec: 3873.8). Total num frames: 3686400. Throughput: 0: 1000.8. Samples: 919970. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-26 11:30:08,518][00517] Avg episode reward: [(0, '20.355')]
[2024-09-26 11:30:13,517][00517] Fps is (10 sec: 3685.8, 60 sec: 3891.1, 300 sec: 3846.1). Total num frames: 3702784. Throughput: 0: 992.7. Samples: 925208. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-09-26 11:30:13,524][00517] Avg episode reward: [(0, '20.711')]
[2024-09-26 11:30:18,515][00517] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 3723264. Throughput: 0: 957.3. Samples: 930728. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2024-09-26 11:30:18,522][00517] Avg episode reward: [(0, '21.261')]
[2024-09-26 11:30:19,289][03854] Updated weights for policy 0, policy_version 910 (0.0019)
[2024-09-26 11:30:23,515][00517] Fps is (10 sec: 4096.7, 60 sec: 4027.7, 300 sec: 3860.0). Total num frames: 3743744. Throughput: 0: 974.4. Samples: 934254. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-09-26 11:30:23,522][00517] Avg episode reward: [(0, '22.163')]
[2024-09-26 11:30:28,516][00517] Fps is (10 sec: 3686.3, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 3760128. Throughput: 0: 1012.6. Samples: 940530. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-09-26 11:30:28,521][00517] Avg episode reward: [(0, '21.150')]
[2024-09-26 11:30:29,992][03854] Updated weights for policy 0, policy_version 920 (0.0036)
[2024-09-26 11:30:33,518][00517] Fps is (10 sec: 3685.4, 60 sec: 3891.0, 300 sec: 3846.0). Total num frames: 3780608. Throughput: 0: 959.9. Samples: 945168. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-09-26 11:30:33,525][00517] Avg episode reward: [(0, '21.893')]
[2024-09-26 11:30:38,515][00517] Fps is (10 sec: 4505.7, 60 sec: 4027.8, 300 sec: 3873.8). Total num frames: 3805184. Throughput: 0: 962.2. Samples: 948740. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-09-26 11:30:38,522][00517] Avg episode reward: [(0, '22.339')]
[2024-09-26 11:30:39,480][03854] Updated weights for policy 0, policy_version 930 (0.0026)
[2024-09-26 11:30:43,517][00517] Fps is (10 sec: 4506.1, 60 sec: 4027.6, 300 sec: 3873.8). Total num frames: 3825664. Throughput: 0: 1023.2. Samples: 955878. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-09-26 11:30:43,521][00517] Avg episode reward: [(0, '21.494')]
[2024-09-26 11:30:48,517][00517] Fps is (10 sec: 3276.3, 60 sec: 3891.1, 300 sec: 3832.2). Total num frames: 3837952. Throughput: 0: 974.3. Samples: 960260. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-09-26 11:30:48,519][00517] Avg episode reward: [(0, '22.437')]
[2024-09-26 11:30:50,877][03854] Updated weights for policy 0, policy_version 940 (0.0029)
[2024-09-26 11:30:53,516][00517] Fps is (10 sec: 3686.9, 60 sec: 3959.5, 300 sec: 3873.8). Total num frames: 3862528. Throughput: 0: 960.4. Samples: 963190. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-09-26 11:30:53,522][00517] Avg episode reward: [(0, '22.891')]
[2024-09-26 11:30:58,515][00517] Fps is (10 sec: 4506.4, 60 sec: 4027.7, 300 sec: 3860.0). Total num frames: 3883008. Throughput: 0: 1003.1. Samples: 970344. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-09-26 11:30:58,518][00517] Avg episode reward: [(0, '20.714')]
[2024-09-26 11:30:59,724][03854] Updated weights for policy 0, policy_version 950 (0.0023)
[2024-09-26 11:31:03,515][00517] Fps is (10 sec: 3686.5, 60 sec: 3891.2, 300 sec: 3832.2). Total num frames: 3899392. Throughput: 0: 998.4. Samples: 975656. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2024-09-26 11:31:03,518][00517] Avg episode reward: [(0, '21.527')]
[2024-09-26 11:31:08,517][00517] Fps is (10 sec: 3276.4, 60 sec: 3822.9, 300 sec: 3832.2). Total num frames: 3915776. Throughput: 0: 966.9. Samples: 977764. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-26 11:31:08,522][00517] Avg episode reward: [(0, '22.415')]
[2024-09-26 11:31:12,985][03854] Updated weights for policy 0, policy_version 960 (0.0018)
[2024-09-26 11:31:13,515][00517] Fps is (10 sec: 3276.8, 60 sec: 3823.0, 300 sec: 3832.2). Total num frames: 3932160. Throughput: 0: 941.5. Samples: 982898. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2024-09-26 11:31:13,518][00517] Avg episode reward: [(0, '22.468')]
[2024-09-26 11:31:18,517][00517] Fps is (10 sec: 3276.7, 60 sec: 3754.6, 300 sec: 3804.4). Total num frames: 3948544. Throughput: 0: 944.5. Samples: 987668. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-26 11:31:18,521][00517] Avg episode reward: [(0, '21.481')]
[2024-09-26 11:31:23,515][00517] Fps is (10 sec: 2867.2, 60 sec: 3618.1, 300 sec: 3776.7). Total num frames: 3960832. Throughput: 0: 911.6. Samples: 989760. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-26 11:31:23,518][00517] Avg episode reward: [(0, '22.446')]
[2024-09-26 11:31:25,236][03854] Updated weights for policy 0, policy_version 970 (0.0016)
[2024-09-26 11:31:28,515][00517] Fps is (10 sec: 3687.0, 60 sec: 3754.7, 300 sec: 3818.3). Total num frames: 3985408. Throughput: 0: 890.3. Samples: 995940. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-09-26 11:31:28,521][00517] Avg episode reward: [(0, '22.230')]
[2024-09-26 11:31:32,158][00517] Component Batcher_0 stopped!
[2024-09-26 11:31:32,158][03841] Stopping Batcher_0...
[2024-09-26 11:31:32,161][03841] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
[2024-09-26 11:31:32,162][03841] Loop batcher_evt_loop terminating...
[2024-09-26 11:31:32,223][03854] Weights refcount: 2 0
[2024-09-26 11:31:32,231][03854] Stopping InferenceWorker_p0-w0...
[2024-09-26 11:31:32,232][03854] Loop inference_proc0-0_evt_loop terminating...
[2024-09-26 11:31:32,232][00517] Component InferenceWorker_p0-w0 stopped!
[2024-09-26 11:31:32,286][03841] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000756_3096576.pth
[2024-09-26 11:31:32,299][03841] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
[2024-09-26 11:31:32,489][03841] Stopping LearnerWorker_p0...
[2024-09-26 11:31:32,494][03841] Loop learner_proc0_evt_loop terminating...
[2024-09-26 11:31:32,491][00517] Component LearnerWorker_p0 stopped!
[2024-09-26 11:31:32,532][00517] Component RolloutWorker_w4 stopped!
[2024-09-26 11:31:32,536][03859] Stopping RolloutWorker_w4...
[2024-09-26 11:31:32,539][03859] Loop rollout_proc4_evt_loop terminating...
[2024-09-26 11:31:32,545][00517] Component RolloutWorker_w0 stopped!
[2024-09-26 11:31:32,549][03855] Stopping RolloutWorker_w0...
[2024-09-26 11:31:32,554][03855] Loop rollout_proc0_evt_loop terminating...
[2024-09-26 11:31:32,557][00517] Component RolloutWorker_w2 stopped!
[2024-09-26 11:31:32,562][03857] Stopping RolloutWorker_w2...
[2024-09-26 11:31:32,565][00517] Component RolloutWorker_w6 stopped!
[2024-09-26 11:31:32,574][03862] Stopping RolloutWorker_w6...
[2024-09-26 11:31:32,574][03862] Loop rollout_proc6_evt_loop terminating...
[2024-09-26 11:31:32,563][03857] Loop rollout_proc2_evt_loop terminating...
[2024-09-26 11:31:32,643][03860] Stopping RolloutWorker_w5...
[2024-09-26 11:31:32,647][03860] Loop rollout_proc5_evt_loop terminating...
[2024-09-26 11:31:32,643][00517] Component RolloutWorker_w5 stopped!
[2024-09-26 11:31:32,679][03856] Stopping RolloutWorker_w1...
[2024-09-26 11:31:32,680][03856] Loop rollout_proc1_evt_loop terminating...
[2024-09-26 11:31:32,679][00517] Component RolloutWorker_w1 stopped!
[2024-09-26 11:31:32,686][03861] Stopping RolloutWorker_w7...
[2024-09-26 11:31:32,687][00517] Component RolloutWorker_w7 stopped!
[2024-09-26 11:31:32,708][03861] Loop rollout_proc7_evt_loop terminating...
[2024-09-26 11:31:32,805][00517] Component RolloutWorker_w3 stopped!
[2024-09-26 11:31:32,811][03858] Stopping RolloutWorker_w3...
[2024-09-26 11:31:32,810][00517] Waiting for process learner_proc0 to stop...
[2024-09-26 11:31:32,815][03858] Loop rollout_proc3_evt_loop terminating...
[2024-09-26 11:31:34,496][00517] Waiting for process inference_proc0-0 to join...
[2024-09-26 11:31:34,499][00517] Waiting for process rollout_proc0 to join...
[2024-09-26 11:31:37,492][00517] Waiting for process rollout_proc1 to join...
[2024-09-26 11:31:37,495][00517] Waiting for process rollout_proc2 to join...
[2024-09-26 11:31:37,500][00517] Waiting for process rollout_proc3 to join...
[2024-09-26 11:31:37,504][00517] Waiting for process rollout_proc4 to join...
[2024-09-26 11:31:37,508][00517] Waiting for process rollout_proc5 to join...
[2024-09-26 11:31:37,511][00517] Waiting for process rollout_proc6 to join...
[2024-09-26 11:31:37,514][00517] Waiting for process rollout_proc7 to join...
[2024-09-26 11:31:37,520][00517] Batcher 0 profile tree view:
batching: 26.5373, releasing_batches: 0.0301
[2024-09-26 11:31:37,522][00517] InferenceWorker_p0-w0 profile tree view:
wait_policy: 0.0008
wait_policy_total: 405.4145
update_model: 8.9785
weight_update: 0.0032
one_step: 0.0024
handle_policy_step: 588.0112
deserialize: 14.8474, stack: 3.1325, obs_to_device_normalize: 118.9407, forward: 310.9287, send_messages: 29.3429
prepare_outputs: 80.9291
to_cpu: 46.8123
[2024-09-26 11:31:37,523][00517] Learner 0 profile tree view:
misc: 0.0051, prepare_batch: 13.8913
train: 73.6695
epoch_init: 0.0096, minibatch_init: 0.0060, losses_postprocess: 0.6073, kl_divergence: 0.6351, after_optimizer: 33.3925
calculate_losses: 26.3247
losses_init: 0.0036, forward_head: 1.1924, bptt_initial: 17.8098, tail: 1.1476, advantages_returns: 0.2580, losses: 3.6333
bptt: 1.8914
bptt_forward_core: 1.7950
update: 12.0592
clip: 0.9005
[2024-09-26 11:31:37,525][00517] RolloutWorker_w0 profile tree view:
wait_for_trajectories: 0.3975, enqueue_policy_requests: 96.1732, env_step: 820.3472, overhead: 12.9690, complete_rollouts: 7.3084
save_policy_outputs: 20.4517
split_output_tensors: 8.2379
[2024-09-26 11:31:37,526][00517] RolloutWorker_w7 profile tree view:
wait_for_trajectories: 0.3228, enqueue_policy_requests: 97.5704, env_step: 814.1027, overhead: 13.1301, complete_rollouts: 6.8764
save_policy_outputs: 20.6057
split_output_tensors: 8.0254
[2024-09-26 11:31:37,528][00517] Loop Runner_EvtLoop terminating...
[2024-09-26 11:31:37,530][00517] Runner profile tree view:
main_loop: 1073.8340
[2024-09-26 11:31:37,531][00517] Collected {0: 4005888}, FPS: 3730.5
[2024-09-26 11:33:34,689][00517] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
[2024-09-26 11:33:34,691][00517] Overriding arg 'num_workers' with value 1 passed from command line
[2024-09-26 11:33:34,694][00517] Adding new argument 'no_render'=True that is not in the saved config file!
[2024-09-26 11:33:34,696][00517] Adding new argument 'save_video'=True that is not in the saved config file!
[2024-09-26 11:33:34,698][00517] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
[2024-09-26 11:33:34,700][00517] Adding new argument 'video_name'=None that is not in the saved config file!
[2024-09-26 11:33:34,702][00517] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file!
[2024-09-26 11:33:34,703][00517] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
[2024-09-26 11:33:34,705][00517] Adding new argument 'push_to_hub'=False that is not in the saved config file!
[2024-09-26 11:33:34,706][00517] Adding new argument 'hf_repository'=None that is not in the saved config file!
[2024-09-26 11:33:34,707][00517] Adding new argument 'policy_index'=0 that is not in the saved config file!
[2024-09-26 11:33:34,709][00517] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
[2024-09-26 11:33:34,710][00517] Adding new argument 'train_script'=None that is not in the saved config file!
[2024-09-26 11:33:34,711][00517] Adding new argument 'enjoy_script'=None that is not in the saved config file!
[2024-09-26 11:33:34,712][00517] Using frameskip 1 and render_action_repeat=4 for evaluation
[2024-09-26 11:33:34,744][00517] Doom resolution: 160x120, resize resolution: (128, 72)
[2024-09-26 11:33:34,747][00517] RunningMeanStd input shape: (3, 72, 128)
[2024-09-26 11:33:34,749][00517] RunningMeanStd input shape: (1,)
[2024-09-26 11:33:34,765][00517] ConvEncoder: input_channels=3
[2024-09-26 11:33:34,929][00517] Conv encoder output size: 512
[2024-09-26 11:33:34,934][00517] Policy head output size: 512
[2024-09-26 11:33:35,356][00517] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
[2024-09-26 11:33:36,859][00517] Num frames 100...
[2024-09-26 11:33:37,133][00517] Num frames 200...
[2024-09-26 11:33:37,432][00517] Num frames 300...
[2024-09-26 11:33:37,691][00517] Num frames 400...
[2024-09-26 11:33:38,010][00517] Num frames 500...
[2024-09-26 11:33:38,299][00517] Num frames 600...
[2024-09-26 11:33:38,510][00517] Num frames 700...
[2024-09-26 11:33:38,719][00517] Num frames 800...
[2024-09-26 11:33:38,841][00517] Num frames 900...
[2024-09-26 11:33:38,934][00517] Avg episode rewards: #0: 16.280, true rewards: #0: 9.280
[2024-09-26 11:33:38,936][00517] Avg episode reward: 16.280, avg true_objective: 9.280
[2024-09-26 11:33:39,029][00517] Num frames 1000...
[2024-09-26 11:33:39,154][00517] Num frames 1100...
[2024-09-26 11:33:39,301][00517] Num frames 1200...
[2024-09-26 11:33:39,425][00517] Num frames 1300...
[2024-09-26 11:33:39,550][00517] Num frames 1400...
[2024-09-26 11:33:39,672][00517] Num frames 1500...
[2024-09-26 11:33:39,794][00517] Num frames 1600...
[2024-09-26 11:33:39,921][00517] Num frames 1700...
[2024-09-26 11:33:40,046][00517] Num frames 1800...
[2024-09-26 11:33:40,178][00517] Num frames 1900...
[2024-09-26 11:33:40,315][00517] Num frames 2000...
[2024-09-26 11:33:40,444][00517] Num frames 2100...
[2024-09-26 11:33:40,516][00517] Avg episode rewards: #0: 21.560, true rewards: #0: 10.560
[2024-09-26 11:33:40,519][00517] Avg episode reward: 21.560, avg true_objective: 10.560
[2024-09-26 11:33:40,628][00517] Num frames 2200...
[2024-09-26 11:33:40,751][00517] Num frames 2300...
[2024-09-26 11:33:40,875][00517] Num frames 2400...
[2024-09-26 11:33:41,004][00517] Num frames 2500...
[2024-09-26 11:33:41,127][00517] Num frames 2600...
[2024-09-26 11:33:41,273][00517] Num frames 2700...
[2024-09-26 11:33:41,396][00517] Num frames 2800...
[2024-09-26 11:33:41,545][00517] Num frames 2900...
[2024-09-26 11:33:41,715][00517] Num frames 3000...
[2024-09-26 11:33:41,881][00517] Num frames 3100...
[2024-09-26 11:33:42,050][00517] Num frames 3200...
[2024-09-26 11:33:42,222][00517] Num frames 3300...
[2024-09-26 11:33:42,400][00517] Num frames 3400...
[2024-09-26 11:33:42,567][00517] Num frames 3500...
[2024-09-26 11:33:42,733][00517] Num frames 3600...
[2024-09-26 11:33:42,907][00517] Num frames 3700...
[2024-09-26 11:33:43,134][00517] Avg episode rewards: #0: 30.647, true rewards: #0: 12.647
[2024-09-26 11:33:43,136][00517] Avg episode reward: 30.647, avg true_objective: 12.647
[2024-09-26 11:33:43,153][00517] Num frames 3800...
[2024-09-26 11:33:43,343][00517] Num frames 3900...
[2024-09-26 11:33:43,521][00517] Num frames 4000...
[2024-09-26 11:33:43,694][00517] Num frames 4100...
[2024-09-26 11:33:43,898][00517] Num frames 4200...
[2024-09-26 11:33:44,049][00517] Num frames 4300...
[2024-09-26 11:33:44,175][00517] Num frames 4400...
[2024-09-26 11:33:44,307][00517] Num frames 4500...
[2024-09-26 11:33:44,440][00517] Num frames 4600...
[2024-09-26 11:33:44,566][00517] Num frames 4700...
[2024-09-26 11:33:44,692][00517] Num frames 4800...
[2024-09-26 11:33:44,817][00517] Num frames 4900...
[2024-09-26 11:33:44,944][00517] Num frames 5000...
[2024-09-26 11:33:45,069][00517] Num frames 5100...
[2024-09-26 11:33:45,197][00517] Num frames 5200...
[2024-09-26 11:33:45,327][00517] Num frames 5300...
[2024-09-26 11:33:45,460][00517] Num frames 5400...
[2024-09-26 11:33:45,587][00517] Num frames 5500...
[2024-09-26 11:33:45,653][00517] Avg episode rewards: #0: 35.270, true rewards: #0: 13.770
[2024-09-26 11:33:45,655][00517] Avg episode reward: 35.270, avg true_objective: 13.770
[2024-09-26 11:33:45,782][00517] Num frames 5600...
[2024-09-26 11:33:45,907][00517] Num frames 5700...
[2024-09-26 11:33:46,046][00517] Num frames 5800...
[2024-09-26 11:33:46,191][00517] Num frames 5900...
[2024-09-26 11:33:46,327][00517] Num frames 6000...
[2024-09-26 11:33:46,458][00517] Num frames 6100...
[2024-09-26 11:33:46,584][00517] Num frames 6200...
[2024-09-26 11:33:46,655][00517] Avg episode rewards: #0: 30.824, true rewards: #0: 12.424
[2024-09-26 11:33:46,657][00517] Avg episode reward: 30.824, avg true_objective: 12.424
[2024-09-26 11:33:46,768][00517] Num frames 6300...
[2024-09-26 11:33:46,892][00517] Num frames 6400...
[2024-09-26 11:33:47,020][00517] Num frames 6500...
[2024-09-26 11:33:47,145][00517] Num frames 6600...
[2024-09-26 11:33:47,278][00517] Num frames 6700...
[2024-09-26 11:33:47,401][00517] Num frames 6800...
[2024-09-26 11:33:47,535][00517] Num frames 6900...
[2024-09-26 11:33:47,657][00517] Num frames 7000...
[2024-09-26 11:33:47,781][00517] Num frames 7100...
[2024-09-26 11:33:47,950][00517] Avg episode rewards: #0: 29.158, true rewards: #0: 11.992
[2024-09-26 11:33:47,951][00517] Avg episode reward: 29.158, avg true_objective: 11.992
[2024-09-26 11:33:47,961][00517] Num frames 7200...
[2024-09-26 11:33:48,080][00517] Num frames 7300...
[2024-09-26 11:33:48,205][00517] Num frames 7400...
[2024-09-26 11:33:48,332][00517] Num frames 7500...
[2024-09-26 11:33:48,458][00517] Num frames 7600...
[2024-09-26 11:33:48,589][00517] Num frames 7700...
[2024-09-26 11:33:48,694][00517] Avg episode rewards: #0: 26.056, true rewards: #0: 11.056
[2024-09-26 11:33:48,695][00517] Avg episode reward: 26.056, avg true_objective: 11.056
[2024-09-26 11:33:48,772][00517] Num frames 7800...
[2024-09-26 11:33:48,892][00517] Num frames 7900...
[2024-09-26 11:33:49,020][00517] Num frames 8000...
[2024-09-26 11:33:49,145][00517] Num frames 8100...
[2024-09-26 11:33:49,291][00517] Num frames 8200...
[2024-09-26 11:33:49,428][00517] Num frames 8300...
[2024-09-26 11:33:49,571][00517] Num frames 8400...
[2024-09-26 11:33:49,692][00517] Num frames 8500...
[2024-09-26 11:33:49,809][00517] Num frames 8600...
[2024-09-26 11:33:49,934][00517] Num frames 8700...
[2024-09-26 11:33:50,053][00517] Num frames 8800...
[2024-09-26 11:33:50,176][00517] Num frames 8900...
[2024-09-26 11:33:50,304][00517] Num frames 9000...
[2024-09-26 11:33:50,424][00517] Num frames 9100...
[2024-09-26 11:33:50,563][00517] Num frames 9200...
[2024-09-26 11:33:50,686][00517] Num frames 9300...
[2024-09-26 11:33:50,813][00517] Num frames 9400...
[2024-09-26 11:33:50,874][00517] Avg episode rewards: #0: 28.004, true rewards: #0: 11.754
[2024-09-26 11:33:50,875][00517] Avg episode reward: 28.004, avg true_objective: 11.754
[2024-09-26 11:33:50,995][00517] Num frames 9500...
[2024-09-26 11:33:51,120][00517] Num frames 9600...
[2024-09-26 11:33:51,252][00517] Num frames 9700...
[2024-09-26 11:33:51,373][00517] Num frames 9800...
[2024-09-26 11:33:51,495][00517] Num frames 9900...
[2024-09-26 11:33:51,626][00517] Num frames 10000...
[2024-09-26 11:33:51,752][00517] Num frames 10100...
[2024-09-26 11:33:51,873][00517] Num frames 10200...
[2024-09-26 11:33:52,006][00517] Num frames 10300...
[2024-09-26 11:33:52,132][00517] Num frames 10400...
[2024-09-26 11:33:52,265][00517] Num frames 10500...
[2024-09-26 11:33:52,390][00517] Num frames 10600...
[2024-09-26 11:33:52,512][00517] Num frames 10700...
[2024-09-26 11:33:52,645][00517] Num frames 10800...
[2024-09-26 11:33:52,766][00517] Num frames 10900...
[2024-09-26 11:33:52,893][00517] Num frames 11000...
[2024-09-26 11:33:53,017][00517] Num frames 11100...
[2024-09-26 11:33:53,141][00517] Num frames 11200...
[2024-09-26 11:33:53,273][00517] Num frames 11300...
[2024-09-26 11:33:53,397][00517] Num frames 11400...
[2024-09-26 11:33:53,518][00517] Num frames 11500...
[2024-09-26 11:33:53,579][00517] Avg episode rewards: #0: 31.892, true rewards: #0: 12.781
[2024-09-26 11:33:53,581][00517] Avg episode reward: 31.892, avg true_objective: 12.781
[2024-09-26 11:33:53,708][00517] Num frames 11600...
[2024-09-26 11:33:53,831][00517] Num frames 11700...
[2024-09-26 11:33:53,956][00517] Num frames 11800...
[2024-09-26 11:33:54,127][00517] Num frames 11900...
[2024-09-26 11:33:54,329][00517] Avg episode rewards: #0: 29.383, true rewards: #0: 11.983
[2024-09-26 11:33:54,331][00517] Avg episode reward: 29.383, avg true_objective: 11.983
[2024-09-26 11:35:03,439][00517] Replay video saved to /content/train_dir/default_experiment/replay.mp4!
[2024-09-26 11:38:30,337][00517] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
[2024-09-26 11:38:30,338][00517] Overriding arg 'num_workers' with value 1 passed from command line
[2024-09-26 11:38:30,340][00517] Adding new argument 'no_render'=True that is not in the saved config file!
[2024-09-26 11:38:30,342][00517] Adding new argument 'save_video'=True that is not in the saved config file!
[2024-09-26 11:38:30,344][00517] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
[2024-09-26 11:38:30,346][00517] Adding new argument 'video_name'=None that is not in the saved config file!
[2024-09-26 11:38:30,347][00517] Adding new argument 'max_num_frames'=100000 that is not in the saved config file!
[2024-09-26 11:38:30,349][00517] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
[2024-09-26 11:38:30,350][00517] Adding new argument 'push_to_hub'=True that is not in the saved config file!
[2024-09-26 11:38:30,351][00517] Adding new argument 'hf_repository'='ThomasSimonini/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file!
[2024-09-26 11:38:30,352][00517] Adding new argument 'policy_index'=0 that is not in the saved config file!
[2024-09-26 11:38:30,355][00517] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
[2024-09-26 11:38:30,360][00517] Adding new argument 'train_script'=None that is not in the saved config file!
[2024-09-26 11:38:30,361][00517] Adding new argument 'enjoy_script'=None that is not in the saved config file!
[2024-09-26 11:38:30,362][00517] Using frameskip 1 and render_action_repeat=4 for evaluation
[2024-09-26 11:38:30,386][00517] RunningMeanStd input shape: (3, 72, 128)
[2024-09-26 11:38:30,389][00517] RunningMeanStd input shape: (1,)
[2024-09-26 11:38:30,400][00517] ConvEncoder: input_channels=3
[2024-09-26 11:38:30,437][00517] Conv encoder output size: 512
[2024-09-26 11:38:30,438][00517] Policy head output size: 512
[2024-09-26 11:38:30,457][00517] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
[2024-09-26 11:38:30,884][00517] Num frames 100...
[2024-09-26 11:38:31,007][00517] Num frames 200...
[2024-09-26 11:38:31,129][00517] Num frames 300...
[2024-09-26 11:38:31,258][00517] Num frames 400...
[2024-09-26 11:38:31,389][00517] Num frames 500...
[2024-09-26 11:38:31,511][00517] Num frames 600...
[2024-09-26 11:38:31,637][00517] Num frames 700...
[2024-09-26 11:38:31,761][00517] Num frames 800...
[2024-09-26 11:38:31,855][00517] Avg episode rewards: #0: 16.320, true rewards: #0: 8.320
[2024-09-26 11:38:31,856][00517] Avg episode reward: 16.320, avg true_objective: 8.320
[2024-09-26 11:38:31,948][00517] Num frames 900...
[2024-09-26 11:38:32,068][00517] Num frames 1000...
[2024-09-26 11:38:32,186][00517] Num frames 1100...
[2024-09-26 11:38:32,326][00517] Num frames 1200...
[2024-09-26 11:38:32,402][00517] Avg episode rewards: #0: 11.080, true rewards: #0: 6.080
[2024-09-26 11:38:32,405][00517] Avg episode reward: 11.080, avg true_objective: 6.080
[2024-09-26 11:38:32,511][00517] Num frames 1300...
[2024-09-26 11:38:32,630][00517] Num frames 1400...
[2024-09-26 11:38:32,752][00517] Num frames 1500...
[2024-09-26 11:38:32,874][00517] Num frames 1600...
[2024-09-26 11:38:33,002][00517] Num frames 1700...
[2024-09-26 11:38:33,124][00517] Num frames 1800...
[2024-09-26 11:38:33,209][00517] Avg episode rewards: #0: 10.413, true rewards: #0: 6.080
[2024-09-26 11:38:33,211][00517] Avg episode reward: 10.413, avg true_objective: 6.080
[2024-09-26 11:38:33,310][00517] Num frames 1900...
[2024-09-26 11:38:33,433][00517] Num frames 2000...
[2024-09-26 11:38:33,551][00517] Num frames 2100...
[2024-09-26 11:38:33,670][00517] Num frames 2200...
[2024-09-26 11:38:33,796][00517] Num frames 2300...
[2024-09-26 11:38:33,918][00517] Num frames 2400...
[2024-09-26 11:38:34,095][00517] Avg episode rewards: #0: 10.490, true rewards: #0: 6.240
[2024-09-26 11:38:34,097][00517] Avg episode reward: 10.490, avg true_objective: 6.240
[2024-09-26 11:38:34,107][00517] Num frames 2500...
[2024-09-26 11:38:34,233][00517] Num frames 2600...
[2024-09-26 11:38:34,360][00517] Num frames 2700...
[2024-09-26 11:38:34,479][00517] Num frames 2800...
[2024-09-26 11:38:34,597][00517] Num frames 2900...
[2024-09-26 11:38:34,723][00517] Num frames 3000...
[2024-09-26 11:38:34,842][00517] Num frames 3100...
[2024-09-26 11:38:34,964][00517] Num frames 3200...
[2024-09-26 11:38:35,100][00517] Num frames 3300...
[2024-09-26 11:38:35,227][00517] Num frames 3400...
[2024-09-26 11:38:35,352][00517] Num frames 3500...
[2024-09-26 11:38:35,473][00517] Num frames 3600...
[2024-09-26 11:38:35,595][00517] Num frames 3700...
[2024-09-26 11:38:35,719][00517] Num frames 3800...
[2024-09-26 11:38:35,840][00517] Num frames 3900...
[2024-09-26 11:38:41,935][00517] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
[2024-09-26 11:38:41,937][00517] Overriding arg 'num_workers' with value 1 passed from command line
[2024-09-26 11:38:41,939][00517] Adding new argument 'no_render'=True that is not in the saved config file!
[2024-09-26 11:38:41,942][00517] Adding new argument 'save_video'=True that is not in the saved config file!
[2024-09-26 11:38:41,943][00517] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
[2024-09-26 11:38:41,944][00517] Adding new argument 'video_name'=None that is not in the saved config file!
[2024-09-26 11:38:41,946][00517] Adding new argument 'max_num_frames'=100000 that is not in the saved config file!
[2024-09-26 11:38:41,947][00517] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
[2024-09-26 11:38:41,948][00517] Adding new argument 'push_to_hub'=True that is not in the saved config file!
[2024-09-26 11:38:41,950][00517] Adding new argument 'hf_repository'='Dorian-T/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file!
[2024-09-26 11:38:41,951][00517] Adding new argument 'policy_index'=0 that is not in the saved config file!
[2024-09-26 11:38:41,952][00517] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
[2024-09-26 11:38:41,954][00517] Adding new argument 'train_script'=None that is not in the saved config file!
[2024-09-26 11:38:41,955][00517] Adding new argument 'enjoy_script'=None that is not in the saved config file!
[2024-09-26 11:38:41,956][00517] Using frameskip 1 and render_action_repeat=4 for evaluation
[2024-09-26 11:38:41,990][00517] RunningMeanStd input shape: (3, 72, 128)
[2024-09-26 11:38:41,991][00517] RunningMeanStd input shape: (1,)
[2024-09-26 11:38:42,003][00517] ConvEncoder: input_channels=3
[2024-09-26 11:38:42,040][00517] Conv encoder output size: 512
[2024-09-26 11:38:42,042][00517] Policy head output size: 512
[2024-09-26 11:38:42,062][00517] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
[2024-09-26 11:38:42,500][00517] Num frames 100...
[2024-09-26 11:38:42,623][00517] Num frames 200...
[2024-09-26 11:38:42,746][00517] Num frames 300...
[2024-09-26 11:38:42,875][00517] Num frames 400...
[2024-09-26 11:38:42,997][00517] Num frames 500...
[2024-09-26 11:38:43,121][00517] Num frames 600...
[2024-09-26 11:38:43,256][00517] Num frames 700...
[2024-09-26 11:38:43,378][00517] Num frames 800...
[2024-09-26 11:38:43,499][00517] Num frames 900...
[2024-09-26 11:38:43,626][00517] Num frames 1000...
[2024-09-26 11:38:43,755][00517] Num frames 1100...
[2024-09-26 11:38:43,917][00517] Avg episode rewards: #0: 31.870, true rewards: #0: 11.870
[2024-09-26 11:38:43,918][00517] Avg episode reward: 31.870, avg true_objective: 11.870
[2024-09-26 11:38:43,939][00517] Num frames 1200...
[2024-09-26 11:38:44,060][00517] Num frames 1300...
[2024-09-26 11:38:44,188][00517] Num frames 1400...
[2024-09-26 11:38:44,335][00517] Num frames 1500...
[2024-09-26 11:38:44,460][00517] Num frames 1600...
[2024-09-26 11:38:44,585][00517] Num frames 1700...
[2024-09-26 11:38:44,756][00517] Avg episode rewards: #0: 20.475, true rewards: #0: 8.975
[2024-09-26 11:38:44,759][00517] Avg episode reward: 20.475, avg true_objective: 8.975
[2024-09-26 11:38:44,769][00517] Num frames 1800...
[2024-09-26 11:38:44,892][00517] Num frames 1900...
[2024-09-26 11:38:45,015][00517] Num frames 2000...
[2024-09-26 11:38:45,140][00517] Num frames 2100...
[2024-09-26 11:38:45,276][00517] Num frames 2200...
[2024-09-26 11:38:45,401][00517] Num frames 2300...
[2024-09-26 11:38:45,524][00517] Num frames 2400...
[2024-09-26 11:38:45,647][00517] Num frames 2500...
[2024-09-26 11:38:45,772][00517] Num frames 2600...
[2024-09-26 11:38:45,893][00517] Num frames 2700...
[2024-09-26 11:38:45,968][00517] Avg episode rewards: #0: 21.380, true rewards: #0: 9.047
[2024-09-26 11:38:45,969][00517] Avg episode reward: 21.380, avg true_objective: 9.047
[2024-09-26 11:38:46,075][00517] Num frames 2800...
[2024-09-26 11:38:46,197][00517] Num frames 2900...
[2024-09-26 11:38:46,341][00517] Num frames 3000...
[2024-09-26 11:38:46,466][00517] Num frames 3100...
[2024-09-26 11:38:46,591][00517] Num frames 3200...
[2024-09-26 11:38:46,714][00517] Num frames 3300...
[2024-09-26 11:38:46,838][00517] Num frames 3400...
[2024-09-26 11:38:46,963][00517] Num frames 3500...
[2024-09-26 11:38:47,087][00517] Num frames 3600...
[2024-09-26 11:38:47,220][00517] Num frames 3700...
[2024-09-26 11:38:47,385][00517] Num frames 3800...
[2024-09-26 11:38:47,510][00517] Num frames 3900...
[2024-09-26 11:38:47,634][00517] Num frames 4000...
[2024-09-26 11:38:47,759][00517] Num frames 4100...
[2024-09-26 11:38:47,883][00517] Num frames 4200...
[2024-09-26 11:38:48,008][00517] Num frames 4300...
[2024-09-26 11:38:48,133][00517] Num frames 4400...
[2024-09-26 11:38:48,265][00517] Num frames 4500...
[2024-09-26 11:38:48,397][00517] Num frames 4600...
[2024-09-26 11:38:48,519][00517] Num frames 4700...
[2024-09-26 11:38:48,667][00517] Avg episode rewards: #0: 30.180, true rewards: #0: 11.930
[2024-09-26 11:38:48,669][00517] Avg episode reward: 30.180, avg true_objective: 11.930
[2024-09-26 11:38:48,706][00517] Num frames 4800...
[2024-09-26 11:38:48,830][00517] Num frames 4900...
[2024-09-26 11:38:49,005][00517] Num frames 5000...
[2024-09-26 11:38:49,179][00517] Num frames 5100...
[2024-09-26 11:38:49,364][00517] Num frames 5200...
[2024-09-26 11:38:49,541][00517] Num frames 5300...
[2024-09-26 11:38:49,709][00517] Num frames 5400...
[2024-09-26 11:38:49,875][00517] Num frames 5500...
[2024-09-26 11:38:50,038][00517] Num frames 5600...
[2024-09-26 11:38:50,214][00517] Num frames 5700...
[2024-09-26 11:38:50,404][00517] Num frames 5800...
[2024-09-26 11:38:50,576][00517] Num frames 5900...
[2024-09-26 11:38:50,755][00517] Num frames 6000...
[2024-09-26 11:38:50,943][00517] Num frames 6100...
[2024-09-26 11:38:51,120][00517] Num frames 6200...
[2024-09-26 11:38:51,309][00517] Num frames 6300...
[2024-09-26 11:38:51,463][00517] Num frames 6400...
[2024-09-26 11:38:51,591][00517] Num frames 6500...
[2024-09-26 11:38:51,717][00517] Num frames 6600...
[2024-09-26 11:38:51,847][00517] Num frames 6700...
[2024-09-26 11:38:51,977][00517] Num frames 6800...
[2024-09-26 11:38:52,123][00517] Avg episode rewards: #0: 35.144, true rewards: #0: 13.744
[2024-09-26 11:38:52,124][00517] Avg episode reward: 35.144, avg true_objective: 13.744
[2024-09-26 11:38:52,164][00517] Num frames 6900...
[2024-09-26 11:38:52,303][00517] Num frames 7000...
[2024-09-26 11:38:52,430][00517] Num frames 7100...
[2024-09-26 11:38:52,563][00517] Num frames 7200...
[2024-09-26 11:38:52,687][00517] Num frames 7300...
[2024-09-26 11:38:52,812][00517] Num frames 7400...
[2024-09-26 11:38:52,940][00517] Num frames 7500...
[2024-09-26 11:38:53,064][00517] Num frames 7600...
[2024-09-26 11:38:53,194][00517] Num frames 7700...
[2024-09-26 11:38:53,326][00517] Num frames 7800...
[2024-09-26 11:38:53,453][00517] Num frames 7900...
[2024-09-26 11:38:53,584][00517] Num frames 8000...
[2024-09-26 11:38:53,711][00517] Num frames 8100...
[2024-09-26 11:38:53,836][00517] Num frames 8200...
[2024-09-26 11:38:53,965][00517] Num frames 8300...
[2024-09-26 11:38:54,117][00517] Avg episode rewards: #0: 34.460, true rewards: #0: 13.960
[2024-09-26 11:38:54,118][00517] Avg episode reward: 34.460, avg true_objective: 13.960
[2024-09-26 11:38:54,154][00517] Num frames 8400...
[2024-09-26 11:38:54,290][00517] Num frames 8500...
[2024-09-26 11:38:54,414][00517] Num frames 8600...
[2024-09-26 11:38:54,549][00517] Num frames 8700...
[2024-09-26 11:38:54,723][00517] Avg episode rewards: #0: 30.703, true rewards: #0: 12.560
[2024-09-26 11:38:54,725][00517] Avg episode reward: 30.703, avg true_objective: 12.560
[2024-09-26 11:38:54,740][00517] Num frames 8800...
[2024-09-26 11:38:54,867][00517] Num frames 8900...
[2024-09-26 11:38:54,996][00517] Num frames 9000...
[2024-09-26 11:38:55,122][00517] Num frames 9100...
[2024-09-26 11:38:55,257][00517] Num frames 9200...
[2024-09-26 11:38:55,379][00517] Num frames 9300...
[2024-09-26 11:38:55,506][00517] Num frames 9400...
[2024-09-26 11:38:55,640][00517] Num frames 9500...
[2024-09-26 11:38:55,774][00517] Num frames 9600...
[2024-09-26 11:38:55,901][00517] Avg episode rewards: #0: 28.695, true rewards: #0: 12.070
[2024-09-26 11:38:55,902][00517] Avg episode reward: 28.695, avg true_objective: 12.070
[2024-09-26 11:38:55,962][00517] Num frames 9700...
[2024-09-26 11:38:56,091][00517] Num frames 9800...
[2024-09-26 11:38:56,220][00517] Num frames 9900...
[2024-09-26 11:38:56,351][00517] Num frames 10000...
[2024-09-26 11:38:56,478][00517] Num frames 10100...
[2024-09-26 11:38:56,613][00517] Num frames 10200...
[2024-09-26 11:38:56,737][00517] Num frames 10300...
[2024-09-26 11:38:56,865][00517] Num frames 10400...
[2024-09-26 11:38:56,996][00517] Num frames 10500...
[2024-09-26 11:38:57,122][00517] Num frames 10600...
[2024-09-26 11:38:57,255][00517] Num frames 10700...
[2024-09-26 11:38:57,381][00517] Num frames 10800...
[2024-09-26 11:38:57,507][00517] Num frames 10900...
[2024-09-26 11:38:57,643][00517] Num frames 11000...
[2024-09-26 11:38:57,769][00517] Num frames 11100...
[2024-09-26 11:38:57,901][00517] Num frames 11200...
[2024-09-26 11:38:58,030][00517] Num frames 11300...
[2024-09-26 11:38:58,161][00517] Num frames 11400...
[2024-09-26 11:38:58,293][00517] Num frames 11500...
[2024-09-26 11:38:58,416][00517] Num frames 11600...
[2024-09-26 11:38:58,544][00517] Num frames 11700...
[2024-09-26 11:38:58,676][00517] Avg episode rewards: #0: 32.506, true rewards: #0: 13.062
[2024-09-26 11:38:58,678][00517] Avg episode reward: 32.506, avg true_objective: 13.062
[2024-09-26 11:38:58,743][00517] Num frames 11800...
[2024-09-26 11:38:58,867][00517] Num frames 11900...
[2024-09-26 11:38:58,995][00517] Num frames 12000...
[2024-09-26 11:38:59,122][00517] Num frames 12100...
[2024-09-26 11:38:59,257][00517] Num frames 12200...
[2024-09-26 11:38:59,384][00517] Num frames 12300...
[2024-09-26 11:38:59,507][00517] Num frames 12400...
[2024-09-26 11:38:59,648][00517] Avg episode rewards: #0: 30.661, true rewards: #0: 12.461
[2024-09-26 11:38:59,650][00517] Avg episode reward: 30.661, avg true_objective: 12.461
[2024-09-26 11:40:11,924][00517] Replay video saved to /content/train_dir/default_experiment/replay.mp4!
[2024-09-26 11:42:47,077][00517] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
[2024-09-26 11:42:47,079][00517] Overriding arg 'num_workers' with value 1 passed from command line
[2024-09-26 11:42:47,082][00517] Adding new argument 'no_render'=True that is not in the saved config file!
[2024-09-26 11:42:47,083][00517] Adding new argument 'save_video'=True that is not in the saved config file!
[2024-09-26 11:42:47,087][00517] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
[2024-09-26 11:42:47,088][00517] Adding new argument 'video_name'=None that is not in the saved config file!
[2024-09-26 11:42:47,090][00517] Adding new argument 'max_num_frames'=100000 that is not in the saved config file!
[2024-09-26 11:42:47,093][00517] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
[2024-09-26 11:42:47,094][00517] Adding new argument 'push_to_hub'=True that is not in the saved config file!
[2024-09-26 11:42:47,095][00517] Adding new argument 'hf_repository'='Dorian-T/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file!
[2024-09-26 11:42:47,096][00517] Adding new argument 'policy_index'=0 that is not in the saved config file!
[2024-09-26 11:42:47,097][00517] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
[2024-09-26 11:42:47,098][00517] Adding new argument 'train_script'=None that is not in the saved config file!
[2024-09-26 11:42:47,099][00517] Adding new argument 'enjoy_script'=None that is not in the saved config file!
[2024-09-26 11:42:47,100][00517] Using frameskip 1 and render_action_repeat=4 for evaluation
[2024-09-26 11:42:47,147][00517] RunningMeanStd input shape: (3, 72, 128)
[2024-09-26 11:42:47,149][00517] RunningMeanStd input shape: (1,)
[2024-09-26 11:42:47,172][00517] ConvEncoder: input_channels=3
[2024-09-26 11:42:47,233][00517] Conv encoder output size: 512
[2024-09-26 11:42:47,236][00517] Policy head output size: 512
[2024-09-26 11:42:47,270][00517] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
[2024-09-26 11:42:47,905][00517] Num frames 100...
[2024-09-26 11:42:48,074][00517] Num frames 200...
[2024-09-26 11:42:48,239][00517] Num frames 300...
[2024-09-26 11:42:48,422][00517] Num frames 400...
[2024-09-26 11:42:48,598][00517] Num frames 500...
[2024-09-26 11:42:48,781][00517] Num frames 600...
[2024-09-26 11:42:48,962][00517] Num frames 700...
[2024-09-26 11:42:49,139][00517] Num frames 800...
[2024-09-26 11:42:49,316][00517] Num frames 900...
[2024-09-26 11:42:49,493][00517] Num frames 1000...
[2024-09-26 11:42:49,668][00517] Num frames 1100...
[2024-09-26 11:42:49,795][00517] Num frames 1200...
[2024-09-26 11:42:49,919][00517] Num frames 1300...
[2024-09-26 11:42:50,051][00517] Num frames 1400...
[2024-09-26 11:42:50,175][00517] Num frames 1500...
[2024-09-26 11:42:50,304][00517] Num frames 1600...
[2024-09-26 11:42:50,373][00517] Avg episode rewards: #0: 39.090, true rewards: #0: 16.090
[2024-09-26 11:42:50,374][00517] Avg episode reward: 39.090, avg true_objective: 16.090
[2024-09-26 11:42:50,486][00517] Num frames 1700...
[2024-09-26 11:42:50,607][00517] Num frames 1800...
[2024-09-26 11:42:50,734][00517] Num frames 1900...
[2024-09-26 11:42:50,904][00517] Avg episode rewards: #0: 21.965, true rewards: #0: 9.965
[2024-09-26 11:42:50,905][00517] Avg episode reward: 21.965, avg true_objective: 9.965
[2024-09-26 11:42:50,917][00517] Num frames 2000...
[2024-09-26 11:42:51,050][00517] Num frames 2100...
[2024-09-26 11:42:51,175][00517] Num frames 2200...
[2024-09-26 11:42:51,307][00517] Num frames 2300...
[2024-09-26 11:42:51,427][00517] Num frames 2400...
[2024-09-26 11:42:51,551][00517] Num frames 2500...
[2024-09-26 11:42:51,673][00517] Num frames 2600...
[2024-09-26 11:42:51,732][00517] Avg episode rewards: #0: 19.003, true rewards: #0: 8.670
[2024-09-26 11:42:51,733][00517] Avg episode reward: 19.003, avg true_objective: 8.670
[2024-09-26 11:42:51,860][00517] Num frames 2700...
[2024-09-26 11:42:51,984][00517] Num frames 2800...
[2024-09-26 11:42:52,120][00517] Num frames 2900...
[2024-09-26 11:42:52,250][00517] Num frames 3000...
[2024-09-26 11:42:52,373][00517] Num frames 3100...
[2024-09-26 11:42:52,499][00517] Num frames 3200...
[2024-09-26 11:42:52,621][00517] Num frames 3300...
[2024-09-26 11:42:52,746][00517] Num frames 3400...
[2024-09-26 11:42:52,870][00517] Num frames 3500...
[2024-09-26 11:42:53,054][00517] Avg episode rewards: #0: 19.982, true rewards: #0: 8.982
[2024-09-26 11:42:53,056][00517] Avg episode reward: 19.982, avg true_objective: 8.982
[2024-09-26 11:42:53,067][00517] Num frames 3600...
[2024-09-26 11:42:53,188][00517] Num frames 3700...
[2024-09-26 11:42:53,320][00517] Num frames 3800...
[2024-09-26 11:42:53,441][00517] Num frames 3900...
[2024-09-26 11:42:53,566][00517] Num frames 4000...
[2024-09-26 11:42:53,689][00517] Num frames 4100...
[2024-09-26 11:42:53,816][00517] Num frames 4200...
[2024-09-26 11:42:53,942][00517] Num frames 4300...
[2024-09-26 11:42:54,118][00517] Avg episode rewards: #0: 19.386, true rewards: #0: 8.786
[2024-09-26 11:42:54,120][00517] Avg episode reward: 19.386, avg true_objective: 8.786
[2024-09-26 11:42:54,134][00517] Num frames 4400...
[2024-09-26 11:42:54,263][00517] Num frames 4500...
[2024-09-26 11:42:54,386][00517] Num frames 4600...
[2024-09-26 11:42:54,513][00517] Num frames 4700...
[2024-09-26 11:42:54,635][00517] Num frames 4800...
[2024-09-26 11:42:54,762][00517] Num frames 4900...
[2024-09-26 11:42:54,885][00517] Num frames 5000...
[2024-09-26 11:42:55,011][00517] Num frames 5100...
[2024-09-26 11:42:55,147][00517] Num frames 5200...
[2024-09-26 11:42:55,276][00517] Num frames 5300...
[2024-09-26 11:42:55,402][00517] Num frames 5400...
[2024-09-26 11:42:55,526][00517] Num frames 5500...
[2024-09-26 11:42:55,650][00517] Num frames 5600...
[2024-09-26 11:42:55,772][00517] Num frames 5700...
[2024-09-26 11:42:55,898][00517] Num frames 5800...
[2024-09-26 11:42:56,020][00517] Num frames 5900...
[2024-09-26 11:42:56,159][00517] Num frames 6000...
[2024-09-26 11:42:56,291][00517] Num frames 6100...
[2024-09-26 11:42:56,414][00517] Avg episode rewards: #0: 23.588, true rewards: #0: 10.255
[2024-09-26 11:42:56,415][00517] Avg episode reward: 23.588, avg true_objective: 10.255
[2024-09-26 11:42:56,481][00517] Num frames 6200...
[2024-09-26 11:42:56,606][00517] Num frames 6300...
[2024-09-26 11:42:56,756][00517] Avg episode rewards: #0: 20.681, true rewards: #0: 9.110
[2024-09-26 11:42:56,758][00517] Avg episode reward: 20.681, avg true_objective: 9.110
[2024-09-26 11:42:56,790][00517] Num frames 6400...
[2024-09-26 11:42:56,914][00517] Num frames 6500...
[2024-09-26 11:42:57,041][00517] Num frames 6600...
[2024-09-26 11:42:57,173][00517] Num frames 6700...
[2024-09-26 11:42:57,305][00517] Num frames 6800...
[2024-09-26 11:42:57,430][00517] Num frames 6900...
[2024-09-26 11:42:57,553][00517] Num frames 7000...
[2024-09-26 11:42:57,678][00517] Num frames 7100...
[2024-09-26 11:42:57,803][00517] Num frames 7200...
[2024-09-26 11:42:57,927][00517] Num frames 7300...
[2024-09-26 11:42:58,051][00517] Num frames 7400...
[2024-09-26 11:42:58,187][00517] Num frames 7500...
[2024-09-26 11:42:58,319][00517] Num frames 7600...
[2024-09-26 11:42:58,383][00517] Avg episode rewards: #0: 21.631, true rewards: #0: 9.506
[2024-09-26 11:42:58,385][00517] Avg episode reward: 21.631, avg true_objective: 9.506
[2024-09-26 11:42:58,506][00517] Num frames 7700...
[2024-09-26 11:42:58,633][00517] Num frames 7800...
[2024-09-26 11:42:58,759][00517] Num frames 7900...
[2024-09-26 11:42:58,888][00517] Avg episode rewards: #0: 19.841, true rewards: #0: 8.841
[2024-09-26 11:42:58,890][00517] Avg episode reward: 19.841, avg true_objective: 8.841
[2024-09-26 11:42:58,947][00517] Num frames 8000...
[2024-09-26 11:42:59,073][00517] Num frames 8100...
[2024-09-26 11:42:59,205][00517] Num frames 8200...
[2024-09-26 11:42:59,339][00517] Num frames 8300...
[2024-09-26 11:42:59,463][00517] Num frames 8400...
[2024-09-26 11:42:59,589][00517] Num frames 8500...
[2024-09-26 11:42:59,687][00517] Avg episode rewards: #0: 18.833, true rewards: #0: 8.533
[2024-09-26 11:42:59,689][00517] Avg episode reward: 18.833, avg true_objective: 8.533
[2024-09-26 11:43:48,365][00517] Replay video saved to /content/train_dir/default_experiment/replay.mp4!
[2024-09-26 11:45:43,993][00517] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
[2024-09-26 11:45:43,995][00517] Overriding arg 'num_workers' with value 1 passed from command line
[2024-09-26 11:45:43,996][00517] Adding new argument 'no_render'=True that is not in the saved config file!
[2024-09-26 11:45:43,998][00517] Adding new argument 'save_video'=True that is not in the saved config file!
[2024-09-26 11:45:44,000][00517] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
[2024-09-26 11:45:44,002][00517] Adding new argument 'video_name'=None that is not in the saved config file!
[2024-09-26 11:45:44,003][00517] Adding new argument 'max_num_frames'=100000 that is not in the saved config file!
[2024-09-26 11:45:44,005][00517] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
[2024-09-26 11:45:44,007][00517] Adding new argument 'push_to_hub'=True that is not in the saved config file!
[2024-09-26 11:45:44,009][00517] Adding new argument 'hf_repository'='Dorian-T/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file!
[2024-09-26 11:45:44,010][00517] Adding new argument 'policy_index'=0 that is not in the saved config file!
[2024-09-26 11:45:44,011][00517] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
[2024-09-26 11:45:44,012][00517] Adding new argument 'train_script'=None that is not in the saved config file!
[2024-09-26 11:45:44,015][00517] Adding new argument 'enjoy_script'=None that is not in the saved config file!
[2024-09-26 11:45:44,016][00517] Using frameskip 1 and render_action_repeat=4 for evaluation
[2024-09-26 11:45:44,050][00517] RunningMeanStd input shape: (3, 72, 128)
[2024-09-26 11:45:44,052][00517] RunningMeanStd input shape: (1,)
[2024-09-26 11:45:44,066][00517] ConvEncoder: input_channels=3
[2024-09-26 11:45:44,102][00517] Conv encoder output size: 512
[2024-09-26 11:45:44,104][00517] Policy head output size: 512
[2024-09-26 11:45:44,126][00517] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
[2024-09-26 11:45:44,581][00517] Num frames 100...
[2024-09-26 11:45:44,700][00517] Num frames 200...
[2024-09-26 11:45:44,824][00517] Num frames 300...
[2024-09-26 11:45:44,954][00517] Num frames 400...
[2024-09-26 11:45:45,077][00517] Num frames 500...
[2024-09-26 11:45:45,222][00517] Num frames 600...
[2024-09-26 11:45:45,346][00517] Num frames 700...
[2024-09-26 11:45:45,445][00517] Avg episode rewards: #0: 14.360, true rewards: #0: 7.360
[2024-09-26 11:45:45,447][00517] Avg episode reward: 14.360, avg true_objective: 7.360
[2024-09-26 11:45:45,530][00517] Num frames 800...
[2024-09-26 11:45:45,650][00517] Num frames 900...
[2024-09-26 11:45:45,777][00517] Num frames 1000...
[2024-09-26 11:45:45,897][00517] Num frames 1100...
[2024-09-26 11:45:46,035][00517] Num frames 1200...
[2024-09-26 11:45:46,209][00517] Num frames 1300...
[2024-09-26 11:45:46,390][00517] Num frames 1400...
[2024-09-26 11:45:46,559][00517] Num frames 1500...
[2024-09-26 11:45:46,725][00517] Num frames 1600...
[2024-09-26 11:45:46,898][00517] Num frames 1700...
[2024-09-26 11:45:47,070][00517] Num frames 1800...
[2024-09-26 11:45:47,287][00517] Avg episode rewards: #0: 21.440, true rewards: #0: 9.440
[2024-09-26 11:45:47,289][00517] Avg episode reward: 21.440, avg true_objective: 9.440
[2024-09-26 11:45:47,316][00517] Num frames 1900...
[2024-09-26 11:45:47,490][00517] Num frames 2000...
[2024-09-26 11:45:47,663][00517] Num frames 2100...
[2024-09-26 11:45:47,839][00517] Num frames 2200...
[2024-09-26 11:45:48,019][00517] Num frames 2300...
[2024-09-26 11:45:48,195][00517] Num frames 2400...
[2024-09-26 11:45:48,381][00517] Num frames 2500...
[2024-09-26 11:45:48,549][00517] Num frames 2600...
[2024-09-26 11:45:48,671][00517] Num frames 2700...
[2024-09-26 11:45:48,797][00517] Num frames 2800...
[2024-09-26 11:45:48,920][00517] Num frames 2900...
[2024-09-26 11:45:49,040][00517] Num frames 3000...
[2024-09-26 11:45:49,199][00517] Num frames 3100...
[2024-09-26 11:45:49,338][00517] Num frames 3200...
[2024-09-26 11:45:49,463][00517] Num frames 3300...
[2024-09-26 11:45:49,585][00517] Num frames 3400...
[2024-09-26 11:45:49,709][00517] Num frames 3500...
[2024-09-26 11:45:49,831][00517] Num frames 3600...
[2024-09-26 11:45:49,956][00517] Num frames 3700...
[2024-09-26 11:45:50,072][00517] Avg episode rewards: #0: 28.147, true rewards: #0: 12.480
[2024-09-26 11:45:50,073][00517] Avg episode reward: 28.147, avg true_objective: 12.480
[2024-09-26 11:45:50,144][00517] Num frames 3800...
[2024-09-26 11:45:50,269][00517] Num frames 3900...
[2024-09-26 11:45:50,493][00517] Num frames 4000...
[2024-09-26 11:45:50,710][00517] Num frames 4100...
[2024-09-26 11:45:50,890][00517] Num frames 4200...
[2024-09-26 11:45:51,072][00517] Num frames 4300...
[2024-09-26 11:45:51,229][00517] Avg episode rewards: #0: 23.300, true rewards: #0: 10.800
[2024-09-26 11:45:51,234][00517] Avg episode reward: 23.300, avg true_objective: 10.800
[2024-09-26 11:45:51,463][00517] Num frames 4400...
[2024-09-26 11:45:51,656][00517] Num frames 4500...
[2024-09-26 11:45:51,785][00517] Num frames 4600...
[2024-09-26 11:45:51,906][00517] Num frames 4700...
[2024-09-26 11:45:52,029][00517] Num frames 4800...
[2024-09-26 11:45:52,155][00517] Num frames 4900...
[2024-09-26 11:45:52,284][00517] Num frames 5000...
[2024-09-26 11:45:52,418][00517] Num frames 5100...
[2024-09-26 11:45:52,542][00517] Num frames 5200...
[2024-09-26 11:45:52,667][00517] Num frames 5300...
[2024-09-26 11:45:52,791][00517] Num frames 5400...
[2024-09-26 11:45:52,913][00517] Num frames 5500...
[2024-09-26 11:45:53,045][00517] Num frames 5600...
[2024-09-26 11:45:53,170][00517] Num frames 5700...
[2024-09-26 11:45:53,299][00517] Num frames 5800...
[2024-09-26 11:45:53,425][00517] Num frames 5900...
[2024-09-26 11:45:53,561][00517] Num frames 6000...
[2024-09-26 11:45:53,684][00517] Num frames 6100...
[2024-09-26 11:45:53,812][00517] Num frames 6200...
[2024-09-26 11:45:53,940][00517] Num frames 6300...
[2024-09-26 11:45:54,072][00517] Num frames 6400...
[2024-09-26 11:45:54,125][00517] Avg episode rewards: #0: 29.600, true rewards: #0: 12.800
[2024-09-26 11:45:54,126][00517] Avg episode reward: 29.600, avg true_objective: 12.800
[2024-09-26 11:45:54,255][00517] Num frames 6500...
[2024-09-26 11:45:54,379][00517] Num frames 6600...
[2024-09-26 11:45:54,511][00517] Num frames 6700...
[2024-09-26 11:45:54,633][00517] Num frames 6800...
[2024-09-26 11:45:54,756][00517] Num frames 6900...
[2024-09-26 11:45:54,877][00517] Num frames 7000...
[2024-09-26 11:45:55,004][00517] Num frames 7100...
[2024-09-26 11:45:55,127][00517] Num frames 7200...
[2024-09-26 11:45:55,259][00517] Num frames 7300...
[2024-09-26 11:45:55,380][00517] Num frames 7400...
[2024-09-26 11:45:55,514][00517] Num frames 7500...
[2024-09-26 11:45:55,637][00517] Num frames 7600...
[2024-09-26 11:45:55,807][00517] Avg episode rewards: #0: 29.491, true rewards: #0: 12.825
[2024-09-26 11:45:55,809][00517] Avg episode reward: 29.491, avg true_objective: 12.825
[2024-09-26 11:45:55,819][00517] Num frames 7700...
[2024-09-26 11:45:55,942][00517] Num frames 7800...
[2024-09-26 11:45:56,067][00517] Num frames 7900...
[2024-09-26 11:45:56,191][00517] Num frames 8000...
[2024-09-26 11:45:56,323][00517] Num frames 8100...
[2024-09-26 11:45:56,450][00517] Num frames 8200...
[2024-09-26 11:45:56,583][00517] Num frames 8300...
[2024-09-26 11:45:56,709][00517] Num frames 8400...
[2024-09-26 11:45:56,835][00517] Num frames 8500...
[2024-09-26 11:45:56,965][00517] Num frames 8600...
[2024-09-26 11:45:57,088][00517] Num frames 8700...
[2024-09-26 11:45:57,213][00517] Num frames 8800...
[2024-09-26 11:45:57,401][00517] Avg episode rewards: #0: 29.278, true rewards: #0: 12.707
[2024-09-26 11:45:57,403][00517] Avg episode reward: 29.278, avg true_objective: 12.707
[2024-09-26 11:45:57,413][00517] Num frames 8900...
[2024-09-26 11:45:57,547][00517] Num frames 9000...
[2024-09-26 11:45:57,670][00517] Num frames 9100...
[2024-09-26 11:45:57,797][00517] Num frames 9200...
[2024-09-26 11:45:57,917][00517] Num frames 9300...
[2024-09-26 11:45:58,026][00517] Avg episode rewards: #0: 26.304, true rewards: #0: 11.679
[2024-09-26 11:45:58,028][00517] Avg episode reward: 26.304, avg true_objective: 11.679
[2024-09-26 11:45:58,106][00517] Num frames 9400...
[2024-09-26 11:45:58,234][00517] Num frames 9500...
[2024-09-26 11:45:58,361][00517] Num frames 9600...
[2024-09-26 11:45:58,485][00517] Num frames 9700...
[2024-09-26 11:45:58,669][00517] Num frames 9800...
[2024-09-26 11:45:58,837][00517] Num frames 9900...
[2024-09-26 11:45:59,009][00517] Num frames 10000...
[2024-09-26 11:45:59,177][00517] Num frames 10100...
[2024-09-26 11:45:59,345][00517] Num frames 10200...
[2024-09-26 11:45:59,414][00517] Avg episode rewards: #0: 25.117, true rewards: #0: 11.339
[2024-09-26 11:45:59,419][00517] Avg episode reward: 25.117, avg true_objective: 11.339
[2024-09-26 11:45:59,577][00517] Num frames 10300...
[2024-09-26 11:45:59,755][00517] Num frames 10400...
[2024-09-26 11:45:59,946][00517] Num frames 10500...
[2024-09-26 11:46:00,115][00517] Num frames 10600...
[2024-09-26 11:46:00,301][00517] Num frames 10700...
[2024-09-26 11:46:00,476][00517] Num frames 10800...
[2024-09-26 11:46:00,652][00517] Num frames 10900...
[2024-09-26 11:46:00,833][00517] Num frames 11000...
[2024-09-26 11:46:00,996][00517] Num frames 11100...
[2024-09-26 11:46:01,150][00517] Avg episode rewards: #0: 24.572, true rewards: #0: 11.172
[2024-09-26 11:46:01,152][00517] Avg episode reward: 24.572, avg true_objective: 11.172
[2024-09-26 11:47:03,358][00517] Replay video saved to /content/train_dir/default_experiment/replay.mp4!