[2024-09-19 19:07:32,462][00186] Saving configuration to /content/train_dir/default_experiment/config.json... [2024-09-19 19:07:32,465][00186] Rollout worker 0 uses device cpu [2024-09-19 19:07:32,466][00186] Rollout worker 1 uses device cpu [2024-09-19 19:07:32,467][00186] Rollout worker 2 uses device cpu [2024-09-19 19:07:32,468][00186] Rollout worker 3 uses device cpu [2024-09-19 19:07:32,469][00186] Rollout worker 4 uses device cpu [2024-09-19 19:07:32,470][00186] Rollout worker 5 uses device cpu [2024-09-19 19:07:32,471][00186] Rollout worker 6 uses device cpu [2024-09-19 19:07:32,472][00186] Rollout worker 7 uses device cpu [2024-09-19 19:07:32,636][00186] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-09-19 19:07:32,639][00186] InferenceWorker_p0-w0: min num requests: 2 [2024-09-19 19:07:32,672][00186] Starting all processes... [2024-09-19 19:07:32,674][00186] Starting process learner_proc0 [2024-09-19 19:07:33,442][00186] Starting all processes... [2024-09-19 19:07:33,450][00186] Starting process inference_proc0-0 [2024-09-19 19:07:33,451][00186] Starting process rollout_proc0 [2024-09-19 19:07:33,454][00186] Starting process rollout_proc1 [2024-09-19 19:07:33,461][00186] Starting process rollout_proc2 [2024-09-19 19:07:33,461][00186] Starting process rollout_proc3 [2024-09-19 19:07:33,461][00186] Starting process rollout_proc4 [2024-09-19 19:07:33,461][00186] Starting process rollout_proc5 [2024-09-19 19:07:33,464][00186] Starting process rollout_proc6 [2024-09-19 19:07:33,464][00186] Starting process rollout_proc7 [2024-09-19 19:07:48,645][02174] Worker 2 uses CPU cores [0] [2024-09-19 19:07:48,886][02159] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-09-19 19:07:48,899][02159] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2024-09-19 19:07:49,016][02159] Num visible devices: 1 [2024-09-19 19:07:49,055][02159] Starting seed is not provided [2024-09-19 19:07:49,056][02159] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-09-19 19:07:49,056][02159] Initializing actor-critic model on device cuda:0 [2024-09-19 19:07:49,057][02159] RunningMeanStd input shape: (3, 72, 128) [2024-09-19 19:07:49,061][02159] RunningMeanStd input shape: (1,) [2024-09-19 19:07:49,207][02159] ConvEncoder: input_channels=3 [2024-09-19 19:07:49,298][02173] Worker 0 uses CPU cores [0] [2024-09-19 19:07:49,688][02177] Worker 3 uses CPU cores [1] [2024-09-19 19:07:49,722][02182] Worker 7 uses CPU cores [1] [2024-09-19 19:07:49,752][02172] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-09-19 19:07:49,752][02172] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2024-09-19 19:07:49,855][02172] Num visible devices: 1 [2024-09-19 19:07:49,857][02180] Worker 5 uses CPU cores [1] [2024-09-19 19:07:49,956][02179] Worker 4 uses CPU cores [0] [2024-09-19 19:07:49,964][02178] Worker 1 uses CPU cores [1] [2024-09-19 19:07:49,982][02181] Worker 6 uses CPU cores [0] [2024-09-19 19:07:50,022][02159] Conv encoder output size: 512 [2024-09-19 19:07:50,022][02159] Policy head output size: 512 [2024-09-19 19:07:50,080][02159] Created Actor Critic model with architecture: [2024-09-19 19:07:50,080][02159] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2024-09-19 19:07:50,447][02159] Using optimizer [2024-09-19 19:07:51,192][02159] No checkpoints found [2024-09-19 19:07:51,192][02159] Did not load from checkpoint, starting from scratch! [2024-09-19 19:07:51,193][02159] Initialized policy 0 weights for model version 0 [2024-09-19 19:07:51,197][02159] LearnerWorker_p0 finished initialization! [2024-09-19 19:07:51,198][02159] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-09-19 19:07:51,388][02172] RunningMeanStd input shape: (3, 72, 128) [2024-09-19 19:07:51,389][02172] RunningMeanStd input shape: (1,) [2024-09-19 19:07:51,404][02172] ConvEncoder: input_channels=3 [2024-09-19 19:07:51,505][02172] Conv encoder output size: 512 [2024-09-19 19:07:51,506][02172] Policy head output size: 512 [2024-09-19 19:07:51,556][00186] Inference worker 0-0 is ready! [2024-09-19 19:07:51,559][00186] All inference workers are ready! Signal rollout workers to start! [2024-09-19 19:07:51,769][02178] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-19 19:07:51,771][02180] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-19 19:07:51,773][02182] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-19 19:07:51,772][02177] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-19 19:07:51,776][02179] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-19 19:07:51,773][02174] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-19 19:07:51,767][02173] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-19 19:07:51,775][02181] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-19 19:07:52,628][00186] Heartbeat connected on Batcher_0 [2024-09-19 19:07:52,637][00186] Heartbeat connected on LearnerWorker_p0 [2024-09-19 19:07:52,680][00186] Heartbeat connected on InferenceWorker_p0-w0 [2024-09-19 19:07:53,230][02173] Decorrelating experience for 0 frames... [2024-09-19 19:07:53,231][02181] Decorrelating experience for 0 frames... [2024-09-19 19:07:53,228][02174] Decorrelating experience for 0 frames... [2024-09-19 19:07:53,628][02177] Decorrelating experience for 0 frames... [2024-09-19 19:07:53,636][02178] Decorrelating experience for 0 frames... [2024-09-19 19:07:53,637][02182] Decorrelating experience for 0 frames... [2024-09-19 19:07:53,640][02180] Decorrelating experience for 0 frames... [2024-09-19 19:07:54,487][02180] Decorrelating experience for 32 frames... [2024-09-19 19:07:55,039][02181] Decorrelating experience for 32 frames... [2024-09-19 19:07:55,042][02173] Decorrelating experience for 32 frames... [2024-09-19 19:07:55,047][02174] Decorrelating experience for 32 frames... [2024-09-19 19:07:55,214][00186] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-09-19 19:07:55,459][02179] Decorrelating experience for 0 frames... [2024-09-19 19:07:56,110][02177] Decorrelating experience for 32 frames... [2024-09-19 19:07:56,473][02178] Decorrelating experience for 32 frames... [2024-09-19 19:07:56,879][02180] Decorrelating experience for 64 frames... [2024-09-19 19:07:57,519][02179] Decorrelating experience for 32 frames... [2024-09-19 19:07:57,560][02174] Decorrelating experience for 64 frames... [2024-09-19 19:07:57,570][02173] Decorrelating experience for 64 frames... [2024-09-19 19:07:58,011][02180] Decorrelating experience for 96 frames... [2024-09-19 19:07:58,367][00186] Heartbeat connected on RolloutWorker_w5 [2024-09-19 19:07:58,432][02181] Decorrelating experience for 64 frames... [2024-09-19 19:07:59,352][02173] Decorrelating experience for 96 frames... [2024-09-19 19:07:59,361][02174] Decorrelating experience for 96 frames... [2024-09-19 19:07:59,440][02177] Decorrelating experience for 64 frames... [2024-09-19 19:07:59,534][02178] Decorrelating experience for 64 frames... [2024-09-19 19:07:59,645][00186] Heartbeat connected on RolloutWorker_w2 [2024-09-19 19:07:59,654][00186] Heartbeat connected on RolloutWorker_w0 [2024-09-19 19:08:00,008][02182] Decorrelating experience for 32 frames... [2024-09-19 19:08:00,214][00186] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-09-19 19:08:00,547][02177] Decorrelating experience for 96 frames... [2024-09-19 19:08:00,756][02179] Decorrelating experience for 64 frames... [2024-09-19 19:08:00,794][00186] Heartbeat connected on RolloutWorker_w3 [2024-09-19 19:08:00,825][02181] Decorrelating experience for 96 frames... [2024-09-19 19:08:01,061][00186] Heartbeat connected on RolloutWorker_w6 [2024-09-19 19:08:01,910][02182] Decorrelating experience for 64 frames... [2024-09-19 19:08:02,689][02178] Decorrelating experience for 96 frames... [2024-09-19 19:08:03,213][00186] Heartbeat connected on RolloutWorker_w1 [2024-09-19 19:08:04,062][02159] Signal inference workers to stop experience collection... [2024-09-19 19:08:04,071][02172] InferenceWorker_p0-w0: stopping experience collection [2024-09-19 19:08:04,131][02179] Decorrelating experience for 96 frames... [2024-09-19 19:08:04,186][02182] Decorrelating experience for 96 frames... [2024-09-19 19:08:04,215][00186] Heartbeat connected on RolloutWorker_w4 [2024-09-19 19:08:04,274][00186] Heartbeat connected on RolloutWorker_w7 [2024-09-19 19:08:05,215][00186] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 200.2. Samples: 2002. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-09-19 19:08:05,221][00186] Avg episode reward: [(0, '2.564')] [2024-09-19 19:08:08,924][02159] Signal inference workers to resume experience collection... [2024-09-19 19:08:08,925][02172] InferenceWorker_p0-w0: resuming experience collection [2024-09-19 19:08:10,214][00186] Fps is (10 sec: 819.2, 60 sec: 546.1, 300 sec: 546.1). Total num frames: 8192. Throughput: 0: 167.2. Samples: 2508. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-09-19 19:08:10,216][00186] Avg episode reward: [(0, '2.738')] [2024-09-19 19:08:15,214][00186] Fps is (10 sec: 2048.1, 60 sec: 1024.0, 300 sec: 1024.0). Total num frames: 20480. Throughput: 0: 224.4. Samples: 4488. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-19 19:08:15,219][00186] Avg episode reward: [(0, '3.446')] [2024-09-19 19:08:19,112][02172] Updated weights for policy 0, policy_version 10 (0.0203) [2024-09-19 19:08:20,216][00186] Fps is (10 sec: 3685.7, 60 sec: 1802.1, 300 sec: 1802.1). Total num frames: 45056. Throughput: 0: 401.4. Samples: 10036. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-19 19:08:20,221][00186] Avg episode reward: [(0, '4.135')] [2024-09-19 19:08:25,214][00186] Fps is (10 sec: 4505.6, 60 sec: 2184.5, 300 sec: 2184.5). Total num frames: 65536. Throughput: 0: 547.0. Samples: 16410. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-19 19:08:25,220][00186] Avg episode reward: [(0, '4.477')] [2024-09-19 19:08:30,214][00186] Fps is (10 sec: 3277.4, 60 sec: 2223.5, 300 sec: 2223.5). Total num frames: 77824. Throughput: 0: 547.8. Samples: 19172. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-19 19:08:30,217][00186] Avg episode reward: [(0, '4.435')] [2024-09-19 19:08:30,295][02172] Updated weights for policy 0, policy_version 20 (0.0027) [2024-09-19 19:08:35,217][00186] Fps is (10 sec: 2456.9, 60 sec: 2252.6, 300 sec: 2252.6). Total num frames: 90112. Throughput: 0: 570.8. Samples: 22832. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-19 19:08:35,219][00186] Avg episode reward: [(0, '4.195')] [2024-09-19 19:08:40,214][00186] Fps is (10 sec: 3276.8, 60 sec: 2457.6, 300 sec: 2457.6). Total num frames: 110592. Throughput: 0: 616.1. Samples: 27724. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-19 19:08:40,217][00186] Avg episode reward: [(0, '4.276')] [2024-09-19 19:08:40,219][02159] Saving new best policy, reward=4.276! [2024-09-19 19:08:42,895][02172] Updated weights for policy 0, policy_version 30 (0.0034) [2024-09-19 19:08:45,214][00186] Fps is (10 sec: 3687.5, 60 sec: 2539.5, 300 sec: 2539.5). Total num frames: 126976. Throughput: 0: 690.5. Samples: 31074. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-19 19:08:45,222][00186] Avg episode reward: [(0, '4.586')] [2024-09-19 19:08:45,232][02159] Saving new best policy, reward=4.586! [2024-09-19 19:08:50,217][00186] Fps is (10 sec: 3275.9, 60 sec: 2606.4, 300 sec: 2606.4). Total num frames: 143360. Throughput: 0: 745.6. Samples: 35554. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-09-19 19:08:50,219][00186] Avg episode reward: [(0, '4.489')] [2024-09-19 19:08:54,870][02172] Updated weights for policy 0, policy_version 40 (0.0027) [2024-09-19 19:08:55,214][00186] Fps is (10 sec: 3686.4, 60 sec: 2730.7, 300 sec: 2730.7). Total num frames: 163840. Throughput: 0: 858.0. Samples: 41118. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-19 19:08:55,219][00186] Avg episode reward: [(0, '4.417')] [2024-09-19 19:09:00,214][00186] Fps is (10 sec: 4097.1, 60 sec: 3072.0, 300 sec: 2835.7). Total num frames: 184320. Throughput: 0: 888.7. Samples: 44480. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-19 19:09:00,218][00186] Avg episode reward: [(0, '4.298')] [2024-09-19 19:09:05,214][00186] Fps is (10 sec: 3686.3, 60 sec: 3345.1, 300 sec: 2867.2). Total num frames: 200704. Throughput: 0: 899.1. Samples: 50496. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-19 19:09:05,219][00186] Avg episode reward: [(0, '4.243')] [2024-09-19 19:09:05,374][02172] Updated weights for policy 0, policy_version 50 (0.0043) [2024-09-19 19:09:10,214][00186] Fps is (10 sec: 3686.5, 60 sec: 3549.9, 300 sec: 2949.1). Total num frames: 221184. Throughput: 0: 863.3. Samples: 55258. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-19 19:09:10,219][00186] Avg episode reward: [(0, '4.400')] [2024-09-19 19:09:15,214][00186] Fps is (10 sec: 4096.1, 60 sec: 3686.4, 300 sec: 3020.8). Total num frames: 241664. Throughput: 0: 882.2. Samples: 58870. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-19 19:09:15,219][00186] Avg episode reward: [(0, '4.632')] [2024-09-19 19:09:15,228][02159] Saving new best policy, reward=4.632! [2024-09-19 19:09:15,487][02172] Updated weights for policy 0, policy_version 60 (0.0022) [2024-09-19 19:09:20,214][00186] Fps is (10 sec: 4096.0, 60 sec: 3618.2, 300 sec: 3084.0). Total num frames: 262144. Throughput: 0: 958.1. Samples: 65944. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-09-19 19:09:20,224][00186] Avg episode reward: [(0, '4.463')] [2024-09-19 19:09:25,214][00186] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3094.8). Total num frames: 278528. Throughput: 0: 944.0. Samples: 70202. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-09-19 19:09:25,216][00186] Avg episode reward: [(0, '4.533')] [2024-09-19 19:09:25,225][02159] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000068_278528.pth... [2024-09-19 19:09:27,069][02172] Updated weights for policy 0, policy_version 70 (0.0032) [2024-09-19 19:09:30,214][00186] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3147.5). Total num frames: 299008. Throughput: 0: 935.0. Samples: 73150. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-09-19 19:09:30,218][00186] Avg episode reward: [(0, '4.551')] [2024-09-19 19:09:35,214][00186] Fps is (10 sec: 4505.6, 60 sec: 3891.4, 300 sec: 3235.8). Total num frames: 323584. Throughput: 0: 993.3. Samples: 80248. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-19 19:09:35,218][00186] Avg episode reward: [(0, '4.602')] [2024-09-19 19:09:35,435][02172] Updated weights for policy 0, policy_version 80 (0.0027) [2024-09-19 19:09:40,220][00186] Fps is (10 sec: 4093.4, 60 sec: 3822.5, 300 sec: 3237.6). Total num frames: 339968. Throughput: 0: 986.3. Samples: 85506. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-19 19:09:40,223][00186] Avg episode reward: [(0, '4.519')] [2024-09-19 19:09:45,214][00186] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3276.8). Total num frames: 360448. Throughput: 0: 961.1. Samples: 87730. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-19 19:09:45,218][00186] Avg episode reward: [(0, '4.381')] [2024-09-19 19:09:46,785][02172] Updated weights for policy 0, policy_version 90 (0.0027) [2024-09-19 19:09:50,214][00186] Fps is (10 sec: 4098.6, 60 sec: 3959.7, 300 sec: 3312.4). Total num frames: 380928. Throughput: 0: 985.7. Samples: 94854. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-19 19:09:50,217][00186] Avg episode reward: [(0, '4.566')] [2024-09-19 19:09:55,214][00186] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3345.1). Total num frames: 401408. Throughput: 0: 1017.0. Samples: 101022. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-09-19 19:09:55,217][00186] Avg episode reward: [(0, '4.699')] [2024-09-19 19:09:55,226][02159] Saving new best policy, reward=4.699! [2024-09-19 19:09:57,434][02172] Updated weights for policy 0, policy_version 100 (0.0027) [2024-09-19 19:10:00,214][00186] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3342.3). Total num frames: 417792. Throughput: 0: 982.2. Samples: 103068. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-19 19:10:00,217][00186] Avg episode reward: [(0, '4.795')] [2024-09-19 19:10:00,225][02159] Saving new best policy, reward=4.795! [2024-09-19 19:10:05,214][00186] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3371.3). Total num frames: 438272. Throughput: 0: 959.6. Samples: 109128. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-19 19:10:05,217][00186] Avg episode reward: [(0, '4.988')] [2024-09-19 19:10:05,226][02159] Saving new best policy, reward=4.988! [2024-09-19 19:10:07,075][02172] Updated weights for policy 0, policy_version 110 (0.0024) [2024-09-19 19:10:10,214][00186] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3428.5). Total num frames: 462848. Throughput: 0: 1023.6. Samples: 116264. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-19 19:10:10,218][00186] Avg episode reward: [(0, '4.954')] [2024-09-19 19:10:15,216][00186] Fps is (10 sec: 4095.2, 60 sec: 3959.3, 300 sec: 3423.0). Total num frames: 479232. Throughput: 0: 1009.5. Samples: 118578. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-19 19:10:15,219][00186] Avg episode reward: [(0, '4.966')] [2024-09-19 19:10:18,292][02172] Updated weights for policy 0, policy_version 120 (0.0040) [2024-09-19 19:10:20,214][00186] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3446.3). Total num frames: 499712. Throughput: 0: 968.8. Samples: 123844. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-19 19:10:20,216][00186] Avg episode reward: [(0, '5.000')] [2024-09-19 19:10:20,223][02159] Saving new best policy, reward=5.000! [2024-09-19 19:10:25,215][00186] Fps is (10 sec: 4096.4, 60 sec: 4027.7, 300 sec: 3467.9). Total num frames: 520192. Throughput: 0: 1007.3. Samples: 130828. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-19 19:10:25,221][00186] Avg episode reward: [(0, '4.821')] [2024-09-19 19:10:26,884][02172] Updated weights for policy 0, policy_version 130 (0.0035) [2024-09-19 19:10:30,214][00186] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3488.2). Total num frames: 540672. Throughput: 0: 1029.0. Samples: 134034. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-19 19:10:30,216][00186] Avg episode reward: [(0, '4.993')] [2024-09-19 19:10:35,214][00186] Fps is (10 sec: 3686.8, 60 sec: 3891.2, 300 sec: 3481.6). Total num frames: 557056. Throughput: 0: 968.1. Samples: 138418. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-19 19:10:35,216][00186] Avg episode reward: [(0, '5.352')] [2024-09-19 19:10:35,228][02159] Saving new best policy, reward=5.352! [2024-09-19 19:10:38,190][02172] Updated weights for policy 0, policy_version 140 (0.0026) [2024-09-19 19:10:40,214][00186] Fps is (10 sec: 4096.0, 60 sec: 4028.2, 300 sec: 3525.0). Total num frames: 581632. Throughput: 0: 988.1. Samples: 145486. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-19 19:10:40,218][00186] Avg episode reward: [(0, '5.486')] [2024-09-19 19:10:40,222][02159] Saving new best policy, reward=5.486! [2024-09-19 19:10:45,214][00186] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3541.8). Total num frames: 602112. Throughput: 0: 1021.2. Samples: 149022. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-19 19:10:45,218][00186] Avg episode reward: [(0, '5.146')] [2024-09-19 19:10:48,951][02172] Updated weights for policy 0, policy_version 150 (0.0034) [2024-09-19 19:10:50,214][00186] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3510.9). Total num frames: 614400. Throughput: 0: 995.2. Samples: 153912. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-19 19:10:50,216][00186] Avg episode reward: [(0, '5.153')] [2024-09-19 19:10:55,214][00186] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3549.9). Total num frames: 638976. Throughput: 0: 975.7. Samples: 160170. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-19 19:10:55,216][00186] Avg episode reward: [(0, '5.585')] [2024-09-19 19:10:55,228][02159] Saving new best policy, reward=5.585! [2024-09-19 19:10:58,289][02172] Updated weights for policy 0, policy_version 160 (0.0043) [2024-09-19 19:11:00,214][00186] Fps is (10 sec: 4915.2, 60 sec: 4096.0, 300 sec: 3586.8). Total num frames: 663552. Throughput: 0: 1001.1. Samples: 163624. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-19 19:11:00,216][00186] Avg episode reward: [(0, '5.777')] [2024-09-19 19:11:00,224][02159] Saving new best policy, reward=5.777! [2024-09-19 19:11:05,221][00186] Fps is (10 sec: 3683.8, 60 sec: 3959.0, 300 sec: 3556.9). Total num frames: 675840. Throughput: 0: 1011.4. Samples: 169364. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-19 19:11:05,229][00186] Avg episode reward: [(0, '5.935')] [2024-09-19 19:11:05,346][02159] Saving new best policy, reward=5.935! [2024-09-19 19:11:09,490][02172] Updated weights for policy 0, policy_version 170 (0.0022) [2024-09-19 19:11:10,214][00186] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3570.9). Total num frames: 696320. Throughput: 0: 975.8. Samples: 174740. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-19 19:11:10,220][00186] Avg episode reward: [(0, '6.040')] [2024-09-19 19:11:10,223][02159] Saving new best policy, reward=6.040! [2024-09-19 19:11:15,214][00186] Fps is (10 sec: 4508.7, 60 sec: 4027.9, 300 sec: 3604.5). Total num frames: 720896. Throughput: 0: 983.7. Samples: 178302. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-19 19:11:15,219][00186] Avg episode reward: [(0, '5.893')] [2024-09-19 19:11:18,529][02172] Updated weights for policy 0, policy_version 180 (0.0013) [2024-09-19 19:11:20,214][00186] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3616.5). Total num frames: 741376. Throughput: 0: 1031.0. Samples: 184812. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-19 19:11:20,217][00186] Avg episode reward: [(0, '6.043')] [2024-09-19 19:11:20,220][02159] Saving new best policy, reward=6.043! [2024-09-19 19:11:25,214][00186] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3608.4). Total num frames: 757760. Throughput: 0: 973.0. Samples: 189270. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-19 19:11:25,217][00186] Avg episode reward: [(0, '6.058')] [2024-09-19 19:11:25,229][02159] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000185_757760.pth... [2024-09-19 19:11:25,346][02159] Saving new best policy, reward=6.058! [2024-09-19 19:11:29,709][02172] Updated weights for policy 0, policy_version 190 (0.0015) [2024-09-19 19:11:30,214][00186] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3619.7). Total num frames: 778240. Throughput: 0: 968.5. Samples: 192604. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-19 19:11:30,221][00186] Avg episode reward: [(0, '6.176')] [2024-09-19 19:11:30,224][02159] Saving new best policy, reward=6.176! [2024-09-19 19:11:35,214][00186] Fps is (10 sec: 4095.9, 60 sec: 4027.7, 300 sec: 3630.5). Total num frames: 798720. Throughput: 0: 1020.8. Samples: 199848. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-19 19:11:35,219][00186] Avg episode reward: [(0, '6.458')] [2024-09-19 19:11:35,229][02159] Saving new best policy, reward=6.458! [2024-09-19 19:11:40,216][00186] Fps is (10 sec: 3685.7, 60 sec: 3891.1, 300 sec: 3622.7). Total num frames: 815104. Throughput: 0: 984.4. Samples: 204470. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-19 19:11:40,218][00186] Avg episode reward: [(0, '6.313')] [2024-09-19 19:11:40,968][02172] Updated weights for policy 0, policy_version 200 (0.0033) [2024-09-19 19:11:45,218][00186] Fps is (10 sec: 4094.6, 60 sec: 3959.2, 300 sec: 3650.7). Total num frames: 839680. Throughput: 0: 974.9. Samples: 207496. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-09-19 19:11:45,222][00186] Avg episode reward: [(0, '6.128')] [2024-09-19 19:11:49,363][02172] Updated weights for policy 0, policy_version 210 (0.0033) [2024-09-19 19:11:50,214][00186] Fps is (10 sec: 4916.0, 60 sec: 4164.2, 300 sec: 3677.7). Total num frames: 864256. Throughput: 0: 1008.8. Samples: 214752. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-19 19:11:50,217][00186] Avg episode reward: [(0, '6.050')] [2024-09-19 19:11:55,216][00186] Fps is (10 sec: 3687.0, 60 sec: 3959.3, 300 sec: 3652.2). Total num frames: 876544. Throughput: 0: 1011.5. Samples: 220260. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-19 19:11:55,223][00186] Avg episode reward: [(0, '5.845')] [2024-09-19 19:12:00,214][00186] Fps is (10 sec: 3276.9, 60 sec: 3891.2, 300 sec: 3661.3). Total num frames: 897024. Throughput: 0: 981.9. Samples: 222486. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-19 19:12:00,216][00186] Avg episode reward: [(0, '5.934')] [2024-09-19 19:12:00,555][02172] Updated weights for policy 0, policy_version 220 (0.0021) [2024-09-19 19:12:05,214][00186] Fps is (10 sec: 4506.4, 60 sec: 4096.5, 300 sec: 3686.4). Total num frames: 921600. Throughput: 0: 992.5. Samples: 229476. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-19 19:12:05,217][00186] Avg episode reward: [(0, '6.404')] [2024-09-19 19:12:09,793][02172] Updated weights for policy 0, policy_version 230 (0.0023) [2024-09-19 19:12:10,214][00186] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 3694.4). Total num frames: 942080. Throughput: 0: 1040.2. Samples: 236080. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-09-19 19:12:10,217][00186] Avg episode reward: [(0, '6.745')] [2024-09-19 19:12:10,220][02159] Saving new best policy, reward=6.745! [2024-09-19 19:12:15,214][00186] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3670.6). Total num frames: 954368. Throughput: 0: 1012.8. Samples: 238180. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-09-19 19:12:15,216][00186] Avg episode reward: [(0, '6.581')] [2024-09-19 19:12:20,214][00186] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3694.1). Total num frames: 978944. Throughput: 0: 985.6. Samples: 244202. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-19 19:12:20,220][00186] Avg episode reward: [(0, '6.307')] [2024-09-19 19:12:20,417][02172] Updated weights for policy 0, policy_version 240 (0.0018) [2024-09-19 19:12:25,214][00186] Fps is (10 sec: 4915.2, 60 sec: 4096.0, 300 sec: 3716.7). Total num frames: 1003520. Throughput: 0: 1041.8. Samples: 251350. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-19 19:12:25,216][00186] Avg episode reward: [(0, '7.174')] [2024-09-19 19:12:25,227][02159] Saving new best policy, reward=7.174! [2024-09-19 19:12:30,214][00186] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3693.8). Total num frames: 1015808. Throughput: 0: 1024.7. Samples: 253602. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-19 19:12:30,218][00186] Avg episode reward: [(0, '7.337')] [2024-09-19 19:12:30,227][02159] Saving new best policy, reward=7.337! [2024-09-19 19:12:31,675][02172] Updated weights for policy 0, policy_version 250 (0.0018) [2024-09-19 19:12:35,214][00186] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3715.7). Total num frames: 1040384. Throughput: 0: 978.1. Samples: 258768. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-19 19:12:35,217][00186] Avg episode reward: [(0, '7.691')] [2024-09-19 19:12:35,224][02159] Saving new best policy, reward=7.691! [2024-09-19 19:12:40,214][00186] Fps is (10 sec: 4505.6, 60 sec: 4096.1, 300 sec: 3722.3). Total num frames: 1060864. Throughput: 0: 1013.3. Samples: 265856. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-19 19:12:40,217][00186] Avg episode reward: [(0, '8.538')] [2024-09-19 19:12:40,219][02159] Saving new best policy, reward=8.538! [2024-09-19 19:12:40,450][02172] Updated weights for policy 0, policy_version 260 (0.0019) [2024-09-19 19:12:45,214][00186] Fps is (10 sec: 4096.0, 60 sec: 4028.0, 300 sec: 3728.8). Total num frames: 1081344. Throughput: 0: 1036.3. Samples: 269120. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-19 19:12:45,218][00186] Avg episode reward: [(0, '9.040')] [2024-09-19 19:12:45,229][02159] Saving new best policy, reward=9.040! [2024-09-19 19:12:50,214][00186] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3721.1). Total num frames: 1097728. Throughput: 0: 978.1. Samples: 273490. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-19 19:12:50,218][00186] Avg episode reward: [(0, '9.057')] [2024-09-19 19:12:50,223][02159] Saving new best policy, reward=9.057! [2024-09-19 19:12:51,824][02172] Updated weights for policy 0, policy_version 270 (0.0028) [2024-09-19 19:12:55,214][00186] Fps is (10 sec: 3686.4, 60 sec: 4027.9, 300 sec: 3790.5). Total num frames: 1118208. Throughput: 0: 983.0. Samples: 280314. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-19 19:12:55,216][00186] Avg episode reward: [(0, '8.683')] [2024-09-19 19:13:00,214][00186] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 3873.9). Total num frames: 1142784. Throughput: 0: 1015.5. Samples: 283878. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-19 19:13:00,216][00186] Avg episode reward: [(0, '8.616')] [2024-09-19 19:13:01,156][02172] Updated weights for policy 0, policy_version 280 (0.0027) [2024-09-19 19:13:05,214][00186] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3887.7). Total num frames: 1155072. Throughput: 0: 993.2. Samples: 288898. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-19 19:13:05,221][00186] Avg episode reward: [(0, '9.998')] [2024-09-19 19:13:05,235][02159] Saving new best policy, reward=9.998! [2024-09-19 19:13:10,214][00186] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3929.4). Total num frames: 1179648. Throughput: 0: 968.1. Samples: 294916. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-19 19:13:10,221][00186] Avg episode reward: [(0, '10.355')] [2024-09-19 19:13:10,225][02159] Saving new best policy, reward=10.355! [2024-09-19 19:13:11,877][02172] Updated weights for policy 0, policy_version 290 (0.0028) [2024-09-19 19:13:15,214][00186] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 3915.5). Total num frames: 1200128. Throughput: 0: 997.3. Samples: 298480. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-19 19:13:15,221][00186] Avg episode reward: [(0, '9.182')] [2024-09-19 19:13:20,214][00186] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3901.6). Total num frames: 1216512. Throughput: 0: 1015.0. Samples: 304444. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-19 19:13:20,220][00186] Avg episode reward: [(0, '9.067')] [2024-09-19 19:13:23,057][02172] Updated weights for policy 0, policy_version 300 (0.0048) [2024-09-19 19:13:25,214][00186] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3929.4). Total num frames: 1236992. Throughput: 0: 970.6. Samples: 309534. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-19 19:13:25,221][00186] Avg episode reward: [(0, '9.069')] [2024-09-19 19:13:25,233][02159] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000302_1236992.pth... [2024-09-19 19:13:25,361][02159] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000068_278528.pth [2024-09-19 19:13:30,214][00186] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 3971.1). Total num frames: 1261568. Throughput: 0: 973.0. Samples: 312906. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-19 19:13:30,219][00186] Avg episode reward: [(0, '9.290')] [2024-09-19 19:13:31,757][02172] Updated weights for policy 0, policy_version 310 (0.0022) [2024-09-19 19:13:35,217][00186] Fps is (10 sec: 4504.3, 60 sec: 4027.5, 300 sec: 3971.0). Total num frames: 1282048. Throughput: 0: 1029.8. Samples: 319832. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-19 19:13:35,223][00186] Avg episode reward: [(0, '10.001')] [2024-09-19 19:13:40,215][00186] Fps is (10 sec: 2867.0, 60 sec: 3822.9, 300 sec: 3943.3). Total num frames: 1290240. Throughput: 0: 957.7. Samples: 323412. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-19 19:13:40,217][00186] Avg episode reward: [(0, '10.219')] [2024-09-19 19:13:45,214][00186] Fps is (10 sec: 2458.3, 60 sec: 3754.7, 300 sec: 3943.3). Total num frames: 1306624. Throughput: 0: 917.4. Samples: 325160. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-19 19:13:45,219][00186] Avg episode reward: [(0, '9.781')] [2024-09-19 19:13:45,695][02172] Updated weights for policy 0, policy_version 320 (0.0051) [2024-09-19 19:13:50,214][00186] Fps is (10 sec: 4096.3, 60 sec: 3891.2, 300 sec: 3957.2). Total num frames: 1331200. Throughput: 0: 948.8. Samples: 331594. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-19 19:13:50,219][00186] Avg episode reward: [(0, '10.099')] [2024-09-19 19:13:55,063][02172] Updated weights for policy 0, policy_version 330 (0.0031) [2024-09-19 19:13:55,215][00186] Fps is (10 sec: 4505.2, 60 sec: 3891.1, 300 sec: 3957.1). Total num frames: 1351680. Throughput: 0: 959.0. Samples: 338070. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-19 19:13:55,222][00186] Avg episode reward: [(0, '10.407')] [2024-09-19 19:13:55,237][02159] Saving new best policy, reward=10.407! [2024-09-19 19:14:00,214][00186] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3943.3). Total num frames: 1363968. Throughput: 0: 924.3. Samples: 340072. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-19 19:14:00,217][00186] Avg episode reward: [(0, '11.127')] [2024-09-19 19:14:00,221][02159] Saving new best policy, reward=11.127! [2024-09-19 19:14:05,214][00186] Fps is (10 sec: 3686.7, 60 sec: 3891.2, 300 sec: 3957.2). Total num frames: 1388544. Throughput: 0: 927.8. Samples: 346196. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-19 19:14:05,216][00186] Avg episode reward: [(0, '10.737')] [2024-09-19 19:14:05,698][02172] Updated weights for policy 0, policy_version 340 (0.0016) [2024-09-19 19:14:10,217][00186] Fps is (10 sec: 4913.8, 60 sec: 3891.0, 300 sec: 3971.0). Total num frames: 1413120. Throughput: 0: 975.3. Samples: 353426. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-19 19:14:10,221][00186] Avg episode reward: [(0, '10.613')] [2024-09-19 19:14:15,214][00186] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3943.3). Total num frames: 1425408. Throughput: 0: 951.4. Samples: 355718. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-19 19:14:15,221][00186] Avg episode reward: [(0, '10.565')] [2024-09-19 19:14:16,944][02172] Updated weights for policy 0, policy_version 350 (0.0017) [2024-09-19 19:14:20,214][00186] Fps is (10 sec: 3277.7, 60 sec: 3822.9, 300 sec: 3957.2). Total num frames: 1445888. Throughput: 0: 913.6. Samples: 360940. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-19 19:14:20,220][00186] Avg episode reward: [(0, '10.766')] [2024-09-19 19:14:25,214][00186] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3971.0). Total num frames: 1470464. Throughput: 0: 995.3. Samples: 368198. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-19 19:14:25,223][00186] Avg episode reward: [(0, '11.265')] [2024-09-19 19:14:25,239][02159] Saving new best policy, reward=11.265! [2024-09-19 19:14:25,749][02172] Updated weights for policy 0, policy_version 360 (0.0027) [2024-09-19 19:14:30,214][00186] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3943.3). Total num frames: 1486848. Throughput: 0: 1019.9. Samples: 371054. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-19 19:14:30,222][00186] Avg episode reward: [(0, '11.589')] [2024-09-19 19:14:30,224][02159] Saving new best policy, reward=11.589! [2024-09-19 19:14:35,214][00186] Fps is (10 sec: 3276.8, 60 sec: 3686.6, 300 sec: 3943.4). Total num frames: 1503232. Throughput: 0: 971.9. Samples: 375330. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-19 19:14:35,219][00186] Avg episode reward: [(0, '12.251')] [2024-09-19 19:14:35,232][02159] Saving new best policy, reward=12.251! [2024-09-19 19:14:37,267][02172] Updated weights for policy 0, policy_version 370 (0.0022) [2024-09-19 19:14:40,214][00186] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3957.2). Total num frames: 1527808. Throughput: 0: 985.9. Samples: 382436. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-19 19:14:40,216][00186] Avg episode reward: [(0, '12.083')] [2024-09-19 19:14:45,214][00186] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3957.2). Total num frames: 1548288. Throughput: 0: 1020.8. Samples: 386006. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-19 19:14:45,216][00186] Avg episode reward: [(0, '12.322')] [2024-09-19 19:14:45,231][02159] Saving new best policy, reward=12.322! [2024-09-19 19:14:47,142][02172] Updated weights for policy 0, policy_version 380 (0.0030) [2024-09-19 19:14:50,214][00186] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3943.3). Total num frames: 1564672. Throughput: 0: 987.4. Samples: 390630. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-19 19:14:50,223][00186] Avg episode reward: [(0, '12.406')] [2024-09-19 19:14:50,225][02159] Saving new best policy, reward=12.406! [2024-09-19 19:14:55,214][00186] Fps is (10 sec: 3686.4, 60 sec: 3891.3, 300 sec: 3957.2). Total num frames: 1585152. Throughput: 0: 966.7. Samples: 396924. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-19 19:14:55,221][00186] Avg episode reward: [(0, '12.763')] [2024-09-19 19:14:55,229][02159] Saving new best policy, reward=12.763! [2024-09-19 19:14:57,477][02172] Updated weights for policy 0, policy_version 390 (0.0034) [2024-09-19 19:15:00,214][00186] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 3971.0). Total num frames: 1609728. Throughput: 0: 989.2. Samples: 400232. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-19 19:15:00,222][00186] Avg episode reward: [(0, '14.596')] [2024-09-19 19:15:00,227][02159] Saving new best policy, reward=14.596! [2024-09-19 19:15:05,214][00186] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3929.4). Total num frames: 1622016. Throughput: 0: 997.2. Samples: 405812. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-19 19:15:05,219][00186] Avg episode reward: [(0, '13.911')] [2024-09-19 19:15:08,728][02172] Updated weights for policy 0, policy_version 400 (0.0035) [2024-09-19 19:15:10,214][00186] Fps is (10 sec: 3276.8, 60 sec: 3823.1, 300 sec: 3943.3). Total num frames: 1642496. Throughput: 0: 958.0. Samples: 411310. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-19 19:15:10,219][00186] Avg episode reward: [(0, '14.226')] [2024-09-19 19:15:15,214][00186] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3957.2). Total num frames: 1667072. Throughput: 0: 975.5. Samples: 414950. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-19 19:15:15,220][00186] Avg episode reward: [(0, '14.216')] [2024-09-19 19:15:16,974][02172] Updated weights for policy 0, policy_version 410 (0.0036) [2024-09-19 19:15:20,215][00186] Fps is (10 sec: 4505.3, 60 sec: 4027.7, 300 sec: 3957.2). Total num frames: 1687552. Throughput: 0: 1026.9. Samples: 421540. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-19 19:15:20,219][00186] Avg episode reward: [(0, '13.865')] [2024-09-19 19:15:25,214][00186] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3943.3). Total num frames: 1703936. Throughput: 0: 971.3. Samples: 426146. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-19 19:15:25,219][00186] Avg episode reward: [(0, '13.871')] [2024-09-19 19:15:25,228][02159] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000416_1703936.pth... [2024-09-19 19:15:25,345][02159] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000185_757760.pth [2024-09-19 19:15:28,402][02172] Updated weights for policy 0, policy_version 420 (0.0029) [2024-09-19 19:15:30,214][00186] Fps is (10 sec: 4096.3, 60 sec: 4027.7, 300 sec: 3971.0). Total num frames: 1728512. Throughput: 0: 966.1. Samples: 429480. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-19 19:15:30,218][00186] Avg episode reward: [(0, '14.165')] [2024-09-19 19:15:35,214][00186] Fps is (10 sec: 4505.7, 60 sec: 4096.0, 300 sec: 3957.2). Total num frames: 1748992. Throughput: 0: 1023.7. Samples: 436696. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-19 19:15:35,218][00186] Avg episode reward: [(0, '14.597')] [2024-09-19 19:15:35,234][02159] Saving new best policy, reward=14.597! [2024-09-19 19:15:39,071][02172] Updated weights for policy 0, policy_version 430 (0.0030) [2024-09-19 19:15:40,218][00186] Fps is (10 sec: 3275.5, 60 sec: 3890.9, 300 sec: 3929.3). Total num frames: 1761280. Throughput: 0: 986.1. Samples: 441304. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-19 19:15:40,220][00186] Avg episode reward: [(0, '14.119')] [2024-09-19 19:15:45,214][00186] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3971.0). Total num frames: 1785856. Throughput: 0: 976.1. Samples: 444158. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-19 19:15:45,219][00186] Avg episode reward: [(0, '14.454')] [2024-09-19 19:15:48,305][02172] Updated weights for policy 0, policy_version 440 (0.0026) [2024-09-19 19:15:50,214][00186] Fps is (10 sec: 4917.2, 60 sec: 4096.0, 300 sec: 3971.0). Total num frames: 1810432. Throughput: 0: 1013.7. Samples: 451428. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-19 19:15:50,217][00186] Avg episode reward: [(0, '14.066')] [2024-09-19 19:15:55,214][00186] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3943.3). Total num frames: 1826816. Throughput: 0: 1014.4. Samples: 456960. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-19 19:15:55,217][00186] Avg episode reward: [(0, '14.566')] [2024-09-19 19:15:59,680][02172] Updated weights for policy 0, policy_version 450 (0.0018) [2024-09-19 19:16:00,214][00186] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3957.2). Total num frames: 1843200. Throughput: 0: 980.1. Samples: 459054. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-19 19:16:00,217][00186] Avg episode reward: [(0, '15.856')] [2024-09-19 19:16:00,219][02159] Saving new best policy, reward=15.856! [2024-09-19 19:16:05,214][00186] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 3971.0). Total num frames: 1867776. Throughput: 0: 986.4. Samples: 465928. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-19 19:16:05,222][00186] Avg episode reward: [(0, '16.253')] [2024-09-19 19:16:05,232][02159] Saving new best policy, reward=16.253! [2024-09-19 19:16:08,296][02172] Updated weights for policy 0, policy_version 460 (0.0025) [2024-09-19 19:16:10,214][00186] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 3957.2). Total num frames: 1888256. Throughput: 0: 1028.4. Samples: 472422. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-19 19:16:10,216][00186] Avg episode reward: [(0, '16.580')] [2024-09-19 19:16:10,221][02159] Saving new best policy, reward=16.580! [2024-09-19 19:16:15,214][00186] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3929.4). Total num frames: 1900544. Throughput: 0: 1000.4. Samples: 474498. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-19 19:16:15,223][00186] Avg episode reward: [(0, '16.771')] [2024-09-19 19:16:15,243][02159] Saving new best policy, reward=16.771! [2024-09-19 19:16:19,698][02172] Updated weights for policy 0, policy_version 470 (0.0027) [2024-09-19 19:16:20,214][00186] Fps is (10 sec: 3686.3, 60 sec: 3959.5, 300 sec: 3957.1). Total num frames: 1925120. Throughput: 0: 970.8. Samples: 480382. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-19 19:16:20,221][00186] Avg episode reward: [(0, '18.799')] [2024-09-19 19:16:20,225][02159] Saving new best policy, reward=18.799! [2024-09-19 19:16:25,214][00186] Fps is (10 sec: 4915.2, 60 sec: 4096.0, 300 sec: 3971.0). Total num frames: 1949696. Throughput: 0: 1027.6. Samples: 487542. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-19 19:16:25,222][00186] Avg episode reward: [(0, '18.221')] [2024-09-19 19:16:30,214][00186] Fps is (10 sec: 3686.5, 60 sec: 3891.2, 300 sec: 3943.3). Total num frames: 1961984. Throughput: 0: 1015.5. Samples: 489856. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-19 19:16:30,219][00186] Avg episode reward: [(0, '18.538')] [2024-09-19 19:16:30,761][02172] Updated weights for policy 0, policy_version 480 (0.0030) [2024-09-19 19:16:35,214][00186] Fps is (10 sec: 3276.7, 60 sec: 3891.2, 300 sec: 3957.2). Total num frames: 1982464. Throughput: 0: 963.9. Samples: 494804. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-19 19:16:35,217][00186] Avg episode reward: [(0, '17.429')] [2024-09-19 19:16:39,856][02172] Updated weights for policy 0, policy_version 490 (0.0037) [2024-09-19 19:16:40,214][00186] Fps is (10 sec: 4505.6, 60 sec: 4096.3, 300 sec: 3957.2). Total num frames: 2007040. Throughput: 0: 1001.9. Samples: 502044. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-19 19:16:40,216][00186] Avg episode reward: [(0, '15.537')] [2024-09-19 19:16:45,216][00186] Fps is (10 sec: 4095.4, 60 sec: 3959.4, 300 sec: 3929.4). Total num frames: 2023424. Throughput: 0: 1031.0. Samples: 505452. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-19 19:16:45,218][00186] Avg episode reward: [(0, '16.553')] [2024-09-19 19:16:50,214][00186] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3957.2). Total num frames: 2043904. Throughput: 0: 975.2. Samples: 509814. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-19 19:16:50,217][00186] Avg episode reward: [(0, '17.229')] [2024-09-19 19:16:51,057][02172] Updated weights for policy 0, policy_version 500 (0.0055) [2024-09-19 19:16:55,214][00186] Fps is (10 sec: 4096.7, 60 sec: 3959.5, 300 sec: 3957.2). Total num frames: 2064384. Throughput: 0: 986.7. Samples: 516822. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-19 19:16:55,217][00186] Avg episode reward: [(0, '18.447')] [2024-09-19 19:17:00,216][00186] Fps is (10 sec: 4095.4, 60 sec: 4027.6, 300 sec: 3943.2). Total num frames: 2084864. Throughput: 0: 1017.5. Samples: 520288. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-19 19:17:00,222][00186] Avg episode reward: [(0, '18.786')] [2024-09-19 19:17:00,286][02172] Updated weights for policy 0, policy_version 510 (0.0023) [2024-09-19 19:17:05,221][00186] Fps is (10 sec: 3683.9, 60 sec: 3890.8, 300 sec: 3929.3). Total num frames: 2101248. Throughput: 0: 997.6. Samples: 525280. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-19 19:17:05,226][00186] Avg episode reward: [(0, '19.726')] [2024-09-19 19:17:05,242][02159] Saving new best policy, reward=19.726! [2024-09-19 19:17:10,214][00186] Fps is (10 sec: 4096.6, 60 sec: 3959.5, 300 sec: 3971.0). Total num frames: 2125824. Throughput: 0: 973.5. Samples: 531348. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-19 19:17:10,221][00186] Avg episode reward: [(0, '20.050')] [2024-09-19 19:17:10,225][02159] Saving new best policy, reward=20.050! [2024-09-19 19:17:11,056][02172] Updated weights for policy 0, policy_version 520 (0.0031) [2024-09-19 19:17:15,214][00186] Fps is (10 sec: 4508.7, 60 sec: 4096.0, 300 sec: 3957.2). Total num frames: 2146304. Throughput: 0: 1000.3. Samples: 534870. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-19 19:17:15,219][00186] Avg episode reward: [(0, '19.316')] [2024-09-19 19:17:20,214][00186] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3929.4). Total num frames: 2162688. Throughput: 0: 1020.8. Samples: 540740. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-19 19:17:20,220][00186] Avg episode reward: [(0, '20.235')] [2024-09-19 19:17:20,224][02159] Saving new best policy, reward=20.235! [2024-09-19 19:17:22,216][02172] Updated weights for policy 0, policy_version 530 (0.0015) [2024-09-19 19:17:25,214][00186] Fps is (10 sec: 3686.3, 60 sec: 3891.2, 300 sec: 3957.1). Total num frames: 2183168. Throughput: 0: 972.7. Samples: 545818. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-19 19:17:25,217][00186] Avg episode reward: [(0, '20.483')] [2024-09-19 19:17:25,224][02159] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000533_2183168.pth... [2024-09-19 19:17:25,341][02159] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000302_1236992.pth [2024-09-19 19:17:25,368][02159] Saving new best policy, reward=20.483! [2024-09-19 19:17:30,214][00186] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3943.3). Total num frames: 2203648. Throughput: 0: 969.2. Samples: 549064. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-19 19:17:30,222][00186] Avg episode reward: [(0, '18.331')] [2024-09-19 19:17:31,425][02172] Updated weights for policy 0, policy_version 540 (0.0040) [2024-09-19 19:17:35,219][00186] Fps is (10 sec: 4094.3, 60 sec: 4027.4, 300 sec: 3943.2). Total num frames: 2224128. Throughput: 0: 1026.0. Samples: 555988. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-19 19:17:35,223][00186] Avg episode reward: [(0, '17.221')] [2024-09-19 19:17:40,214][00186] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3929.4). Total num frames: 2240512. Throughput: 0: 967.8. Samples: 560374. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-19 19:17:40,221][00186] Avg episode reward: [(0, '16.329')] [2024-09-19 19:17:42,526][02172] Updated weights for policy 0, policy_version 550 (0.0017) [2024-09-19 19:17:45,214][00186] Fps is (10 sec: 4097.8, 60 sec: 4027.8, 300 sec: 3957.2). Total num frames: 2265088. Throughput: 0: 968.3. Samples: 563858. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-19 19:17:45,216][00186] Avg episode reward: [(0, '16.615')] [2024-09-19 19:17:50,214][00186] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3957.2). Total num frames: 2285568. Throughput: 0: 1018.2. Samples: 571090. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-19 19:17:50,219][00186] Avg episode reward: [(0, '16.631')] [2024-09-19 19:17:51,775][02172] Updated weights for policy 0, policy_version 560 (0.0023) [2024-09-19 19:17:55,214][00186] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3929.4). Total num frames: 2301952. Throughput: 0: 994.2. Samples: 576086. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-19 19:17:55,220][00186] Avg episode reward: [(0, '17.877')] [2024-09-19 19:18:00,214][00186] Fps is (10 sec: 3686.4, 60 sec: 3959.6, 300 sec: 3957.2). Total num frames: 2322432. Throughput: 0: 967.5. Samples: 578406. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-19 19:18:00,216][00186] Avg episode reward: [(0, '19.158')] [2024-09-19 19:18:02,485][02172] Updated weights for policy 0, policy_version 570 (0.0032) [2024-09-19 19:18:05,214][00186] Fps is (10 sec: 4505.5, 60 sec: 4096.5, 300 sec: 3957.1). Total num frames: 2347008. Throughput: 0: 994.9. Samples: 585510. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-19 19:18:05,220][00186] Avg episode reward: [(0, '20.039')] [2024-09-19 19:18:10,214][00186] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3943.3). Total num frames: 2363392. Throughput: 0: 1014.3. Samples: 591460. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-19 19:18:10,221][00186] Avg episode reward: [(0, '19.899')] [2024-09-19 19:18:13,563][02172] Updated weights for policy 0, policy_version 580 (0.0027) [2024-09-19 19:18:15,216][00186] Fps is (10 sec: 3276.3, 60 sec: 3891.1, 300 sec: 3943.2). Total num frames: 2379776. Throughput: 0: 988.7. Samples: 593556. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-19 19:18:15,219][00186] Avg episode reward: [(0, '20.463')] [2024-09-19 19:18:20,214][00186] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3957.2). Total num frames: 2404352. Throughput: 0: 984.3. Samples: 600278. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-19 19:18:20,216][00186] Avg episode reward: [(0, '19.103')] [2024-09-19 19:18:22,151][02172] Updated weights for policy 0, policy_version 590 (0.0021) [2024-09-19 19:18:25,214][00186] Fps is (10 sec: 4916.2, 60 sec: 4096.0, 300 sec: 3957.2). Total num frames: 2428928. Throughput: 0: 1040.5. Samples: 607198. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-19 19:18:25,222][00186] Avg episode reward: [(0, '19.779')] [2024-09-19 19:18:30,214][00186] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3929.4). Total num frames: 2441216. Throughput: 0: 1010.5. Samples: 609330. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-19 19:18:30,218][00186] Avg episode reward: [(0, '19.925')] [2024-09-19 19:18:33,582][02172] Updated weights for policy 0, policy_version 600 (0.0052) [2024-09-19 19:18:35,214][00186] Fps is (10 sec: 3276.8, 60 sec: 3959.8, 300 sec: 3971.0). Total num frames: 2461696. Throughput: 0: 974.4. Samples: 614938. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-19 19:18:35,217][00186] Avg episode reward: [(0, '20.402')] [2024-09-19 19:18:40,216][00186] Fps is (10 sec: 3685.7, 60 sec: 3959.3, 300 sec: 3971.0). Total num frames: 2478080. Throughput: 0: 987.6. Samples: 620528. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-19 19:18:40,219][00186] Avg episode reward: [(0, '20.077')] [2024-09-19 19:18:45,214][00186] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3943.3). Total num frames: 2494464. Throughput: 0: 979.3. Samples: 622476. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-19 19:18:45,220][00186] Avg episode reward: [(0, '21.217')] [2024-09-19 19:18:45,229][02159] Saving new best policy, reward=21.217! [2024-09-19 19:18:46,746][02172] Updated weights for policy 0, policy_version 610 (0.0041) [2024-09-19 19:18:50,214][00186] Fps is (10 sec: 3277.4, 60 sec: 3754.7, 300 sec: 3929.4). Total num frames: 2510848. Throughput: 0: 909.7. Samples: 626448. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-19 19:18:50,225][00186] Avg episode reward: [(0, '20.751')] [2024-09-19 19:18:55,214][00186] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3957.2). Total num frames: 2531328. Throughput: 0: 934.5. Samples: 633512. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-19 19:18:55,221][00186] Avg episode reward: [(0, '21.553')] [2024-09-19 19:18:55,234][02159] Saving new best policy, reward=21.553! [2024-09-19 19:18:56,366][02172] Updated weights for policy 0, policy_version 620 (0.0046) [2024-09-19 19:19:00,214][00186] Fps is (10 sec: 4095.9, 60 sec: 3822.9, 300 sec: 3943.3). Total num frames: 2551808. Throughput: 0: 964.6. Samples: 636962. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-19 19:19:00,217][00186] Avg episode reward: [(0, '23.339')] [2024-09-19 19:19:00,243][02159] Saving new best policy, reward=23.339! [2024-09-19 19:19:05,214][00186] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3915.5). Total num frames: 2568192. Throughput: 0: 920.8. Samples: 641714. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-19 19:19:05,218][00186] Avg episode reward: [(0, '23.227')] [2024-09-19 19:19:07,815][02172] Updated weights for policy 0, policy_version 630 (0.0053) [2024-09-19 19:19:10,214][00186] Fps is (10 sec: 3686.5, 60 sec: 3754.7, 300 sec: 3943.3). Total num frames: 2588672. Throughput: 0: 901.6. Samples: 647768. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-19 19:19:10,221][00186] Avg episode reward: [(0, '23.377')] [2024-09-19 19:19:10,226][02159] Saving new best policy, reward=23.377! [2024-09-19 19:19:15,214][00186] Fps is (10 sec: 4505.6, 60 sec: 3891.3, 300 sec: 3957.2). Total num frames: 2613248. Throughput: 0: 932.1. Samples: 651276. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-19 19:19:15,220][00186] Avg episode reward: [(0, '23.678')] [2024-09-19 19:19:15,230][02159] Saving new best policy, reward=23.678! [2024-09-19 19:19:16,957][02172] Updated weights for policy 0, policy_version 640 (0.0019) [2024-09-19 19:19:20,214][00186] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3929.4). Total num frames: 2629632. Throughput: 0: 936.6. Samples: 657086. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-19 19:19:20,222][00186] Avg episode reward: [(0, '22.445')] [2024-09-19 19:19:25,214][00186] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3943.3). Total num frames: 2650112. Throughput: 0: 928.4. Samples: 662306. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-19 19:19:25,217][00186] Avg episode reward: [(0, '21.005')] [2024-09-19 19:19:25,228][02159] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000647_2650112.pth... [2024-09-19 19:19:25,350][02159] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000416_1703936.pth [2024-09-19 19:19:27,899][02172] Updated weights for policy 0, policy_version 650 (0.0038) [2024-09-19 19:19:30,214][00186] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3957.2). Total num frames: 2670592. Throughput: 0: 964.7. Samples: 665886. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-19 19:19:30,221][00186] Avg episode reward: [(0, '19.922')] [2024-09-19 19:19:35,215][00186] Fps is (10 sec: 4095.8, 60 sec: 3822.9, 300 sec: 3943.3). Total num frames: 2691072. Throughput: 0: 1023.9. Samples: 672524. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-19 19:19:35,222][00186] Avg episode reward: [(0, '20.387')] [2024-09-19 19:19:38,873][02172] Updated weights for policy 0, policy_version 660 (0.0052) [2024-09-19 19:19:40,214][00186] Fps is (10 sec: 3686.4, 60 sec: 3823.1, 300 sec: 3929.4). Total num frames: 2707456. Throughput: 0: 963.6. Samples: 676876. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-19 19:19:40,220][00186] Avg episode reward: [(0, '21.685')] [2024-09-19 19:19:45,214][00186] Fps is (10 sec: 4096.2, 60 sec: 3959.5, 300 sec: 3957.2). Total num frames: 2732032. Throughput: 0: 963.6. Samples: 680322. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-19 19:19:45,216][00186] Avg episode reward: [(0, '22.166')] [2024-09-19 19:19:47,789][02172] Updated weights for policy 0, policy_version 670 (0.0031) [2024-09-19 19:19:50,214][00186] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3957.2). Total num frames: 2752512. Throughput: 0: 1017.4. Samples: 687496. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-19 19:19:50,218][00186] Avg episode reward: [(0, '22.029')] [2024-09-19 19:19:55,214][00186] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3929.4). Total num frames: 2768896. Throughput: 0: 992.0. Samples: 692410. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-19 19:19:55,224][00186] Avg episode reward: [(0, '23.823')] [2024-09-19 19:19:55,233][02159] Saving new best policy, reward=23.823! [2024-09-19 19:19:59,216][02172] Updated weights for policy 0, policy_version 680 (0.0019) [2024-09-19 19:20:00,214][00186] Fps is (10 sec: 3686.3, 60 sec: 3959.5, 300 sec: 3957.1). Total num frames: 2789376. Throughput: 0: 968.9. Samples: 694876. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-19 19:20:00,218][00186] Avg episode reward: [(0, '23.579')] [2024-09-19 19:20:05,214][00186] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 3971.0). Total num frames: 2813952. Throughput: 0: 997.8. Samples: 701988. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-19 19:20:05,216][00186] Avg episode reward: [(0, '22.664')] [2024-09-19 19:20:08,282][02172] Updated weights for policy 0, policy_version 690 (0.0028) [2024-09-19 19:20:10,215][00186] Fps is (10 sec: 4095.7, 60 sec: 4027.7, 300 sec: 3943.3). Total num frames: 2830336. Throughput: 0: 1014.1. Samples: 707942. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-19 19:20:10,219][00186] Avg episode reward: [(0, '22.446')] [2024-09-19 19:20:15,214][00186] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3929.4). Total num frames: 2846720. Throughput: 0: 981.8. Samples: 710066. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-19 19:20:15,220][00186] Avg episode reward: [(0, '23.140')] [2024-09-19 19:20:19,128][02172] Updated weights for policy 0, policy_version 700 (0.0037) [2024-09-19 19:20:20,214][00186] Fps is (10 sec: 4096.4, 60 sec: 4027.7, 300 sec: 3957.2). Total num frames: 2871296. Throughput: 0: 980.6. Samples: 716652. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-19 19:20:20,216][00186] Avg episode reward: [(0, '22.609')] [2024-09-19 19:20:25,214][00186] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3943.3). Total num frames: 2891776. Throughput: 0: 1037.8. Samples: 723578. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-19 19:20:25,216][00186] Avg episode reward: [(0, '22.890')] [2024-09-19 19:20:30,115][02172] Updated weights for policy 0, policy_version 710 (0.0018) [2024-09-19 19:20:30,218][00186] Fps is (10 sec: 3684.9, 60 sec: 3959.2, 300 sec: 3929.3). Total num frames: 2908160. Throughput: 0: 1008.4. Samples: 725702. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-19 19:20:30,221][00186] Avg episode reward: [(0, '23.137')] [2024-09-19 19:20:35,214][00186] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3957.2). Total num frames: 2928640. Throughput: 0: 971.7. Samples: 731222. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-19 19:20:35,223][00186] Avg episode reward: [(0, '23.689')] [2024-09-19 19:20:39,206][02172] Updated weights for policy 0, policy_version 720 (0.0028) [2024-09-19 19:20:40,214][00186] Fps is (10 sec: 4507.4, 60 sec: 4096.0, 300 sec: 3957.2). Total num frames: 2953216. Throughput: 0: 1022.1. Samples: 738404. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-19 19:20:40,219][00186] Avg episode reward: [(0, '26.205')] [2024-09-19 19:20:40,221][02159] Saving new best policy, reward=26.205! [2024-09-19 19:20:45,214][00186] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3929.4). Total num frames: 2969600. Throughput: 0: 1028.9. Samples: 741176. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-19 19:20:45,219][00186] Avg episode reward: [(0, '25.590')] [2024-09-19 19:20:50,214][00186] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3929.4). Total num frames: 2985984. Throughput: 0: 971.6. Samples: 745708. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-19 19:20:50,222][00186] Avg episode reward: [(0, '25.304')] [2024-09-19 19:20:50,593][02172] Updated weights for policy 0, policy_version 730 (0.0031) [2024-09-19 19:20:55,214][00186] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3957.2). Total num frames: 3010560. Throughput: 0: 998.3. Samples: 752864. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-19 19:20:55,221][00186] Avg episode reward: [(0, '25.494')] [2024-09-19 19:21:00,061][02172] Updated weights for policy 0, policy_version 740 (0.0034) [2024-09-19 19:21:00,214][00186] Fps is (10 sec: 4505.6, 60 sec: 4027.8, 300 sec: 3943.3). Total num frames: 3031040. Throughput: 0: 1030.2. Samples: 756424. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-19 19:21:00,216][00186] Avg episode reward: [(0, '25.779')] [2024-09-19 19:21:05,214][00186] Fps is (10 sec: 3276.7, 60 sec: 3822.9, 300 sec: 3915.5). Total num frames: 3043328. Throughput: 0: 982.7. Samples: 760876. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-19 19:21:05,217][00186] Avg episode reward: [(0, '24.961')] [2024-09-19 19:21:10,214][00186] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3957.2). Total num frames: 3067904. Throughput: 0: 973.3. Samples: 767378. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-19 19:21:10,225][00186] Avg episode reward: [(0, '22.868')] [2024-09-19 19:21:10,637][02172] Updated weights for policy 0, policy_version 750 (0.0028) [2024-09-19 19:21:15,218][00186] Fps is (10 sec: 4913.4, 60 sec: 4095.7, 300 sec: 3957.1). Total num frames: 3092480. Throughput: 0: 1005.9. Samples: 770968. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-19 19:21:15,224][00186] Avg episode reward: [(0, '22.245')] [2024-09-19 19:21:20,220][00186] Fps is (10 sec: 3684.3, 60 sec: 3890.8, 300 sec: 3915.4). Total num frames: 3104768. Throughput: 0: 1005.0. Samples: 776454. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-19 19:21:20,226][00186] Avg episode reward: [(0, '21.428')] [2024-09-19 19:21:21,721][02172] Updated weights for policy 0, policy_version 760 (0.0014) [2024-09-19 19:21:25,214][00186] Fps is (10 sec: 3687.8, 60 sec: 3959.5, 300 sec: 3957.2). Total num frames: 3129344. Throughput: 0: 971.8. Samples: 782136. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-19 19:21:25,219][00186] Avg episode reward: [(0, '20.691')] [2024-09-19 19:21:25,228][02159] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000764_3129344.pth... [2024-09-19 19:21:25,359][02159] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000533_2183168.pth [2024-09-19 19:21:30,214][00186] Fps is (10 sec: 4508.2, 60 sec: 4028.0, 300 sec: 3957.2). Total num frames: 3149824. Throughput: 0: 988.3. Samples: 785648. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-19 19:21:30,217][00186] Avg episode reward: [(0, '21.611')] [2024-09-19 19:21:30,557][02172] Updated weights for policy 0, policy_version 770 (0.0018) [2024-09-19 19:21:35,214][00186] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3943.3). Total num frames: 3170304. Throughput: 0: 1028.7. Samples: 791998. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-19 19:21:35,219][00186] Avg episode reward: [(0, '22.810')] [2024-09-19 19:21:40,214][00186] Fps is (10 sec: 3686.3, 60 sec: 3891.2, 300 sec: 3943.3). Total num frames: 3186688. Throughput: 0: 972.0. Samples: 796602. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-19 19:21:40,217][00186] Avg episode reward: [(0, '22.955')] [2024-09-19 19:21:41,791][02172] Updated weights for policy 0, policy_version 780 (0.0025) [2024-09-19 19:21:45,214][00186] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3957.2). Total num frames: 3211264. Throughput: 0: 974.4. Samples: 800270. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-19 19:21:45,218][00186] Avg episode reward: [(0, '22.949')] [2024-09-19 19:21:50,214][00186] Fps is (10 sec: 4505.7, 60 sec: 4096.0, 300 sec: 3957.2). Total num frames: 3231744. Throughput: 0: 1035.6. Samples: 807480. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-19 19:21:50,224][00186] Avg episode reward: [(0, '24.177')] [2024-09-19 19:21:51,228][02172] Updated weights for policy 0, policy_version 790 (0.0041) [2024-09-19 19:21:55,214][00186] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3929.4). Total num frames: 3244032. Throughput: 0: 991.2. Samples: 811984. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-19 19:21:55,216][00186] Avg episode reward: [(0, '26.315')] [2024-09-19 19:21:55,310][02159] Saving new best policy, reward=26.315! [2024-09-19 19:22:00,214][00186] Fps is (10 sec: 3686.3, 60 sec: 3959.4, 300 sec: 3957.2). Total num frames: 3268608. Throughput: 0: 979.0. Samples: 815018. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-19 19:22:00,222][00186] Avg episode reward: [(0, '25.650')] [2024-09-19 19:22:01,642][02172] Updated weights for policy 0, policy_version 800 (0.0020) [2024-09-19 19:22:05,214][00186] Fps is (10 sec: 4915.2, 60 sec: 4164.3, 300 sec: 3957.2). Total num frames: 3293184. Throughput: 0: 1013.5. Samples: 822056. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-19 19:22:05,221][00186] Avg episode reward: [(0, '26.114')] [2024-09-19 19:22:10,217][00186] Fps is (10 sec: 3685.3, 60 sec: 3959.2, 300 sec: 3929.3). Total num frames: 3305472. Throughput: 0: 1006.5. Samples: 827432. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-19 19:22:10,220][00186] Avg episode reward: [(0, '26.416')] [2024-09-19 19:22:10,236][02159] Saving new best policy, reward=26.416! [2024-09-19 19:22:13,004][02172] Updated weights for policy 0, policy_version 810 (0.0045) [2024-09-19 19:22:15,214][00186] Fps is (10 sec: 3276.8, 60 sec: 3891.5, 300 sec: 3943.3). Total num frames: 3325952. Throughput: 0: 976.2. Samples: 829578. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-19 19:22:15,222][00186] Avg episode reward: [(0, '26.590')] [2024-09-19 19:22:15,232][02159] Saving new best policy, reward=26.590! [2024-09-19 19:22:20,214][00186] Fps is (10 sec: 4507.1, 60 sec: 4096.4, 300 sec: 3957.2). Total num frames: 3350528. Throughput: 0: 992.3. Samples: 836652. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-19 19:22:20,221][00186] Avg episode reward: [(0, '25.416')] [2024-09-19 19:22:21,553][02172] Updated weights for policy 0, policy_version 820 (0.0026) [2024-09-19 19:22:25,214][00186] Fps is (10 sec: 4505.5, 60 sec: 4027.7, 300 sec: 3957.1). Total num frames: 3371008. Throughput: 0: 1029.6. Samples: 842934. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-19 19:22:25,219][00186] Avg episode reward: [(0, '24.530')] [2024-09-19 19:22:30,214][00186] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3943.3). Total num frames: 3387392. Throughput: 0: 996.4. Samples: 845108. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-19 19:22:30,216][00186] Avg episode reward: [(0, '24.387')] [2024-09-19 19:22:32,948][02172] Updated weights for policy 0, policy_version 830 (0.0025) [2024-09-19 19:22:35,214][00186] Fps is (10 sec: 3686.5, 60 sec: 3959.5, 300 sec: 3957.2). Total num frames: 3407872. Throughput: 0: 974.7. Samples: 851342. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-19 19:22:35,219][00186] Avg episode reward: [(0, '23.442')] [2024-09-19 19:22:40,214][00186] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 3957.2). Total num frames: 3432448. Throughput: 0: 1033.7. Samples: 858502. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-19 19:22:40,220][00186] Avg episode reward: [(0, '23.078')] [2024-09-19 19:22:42,652][02172] Updated weights for policy 0, policy_version 840 (0.0032) [2024-09-19 19:22:45,216][00186] Fps is (10 sec: 3685.8, 60 sec: 3891.1, 300 sec: 3929.4). Total num frames: 3444736. Throughput: 0: 1014.5. Samples: 860672. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-19 19:22:45,218][00186] Avg episode reward: [(0, '23.203')] [2024-09-19 19:22:50,214][00186] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3957.2). Total num frames: 3469312. Throughput: 0: 977.7. Samples: 866054. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-19 19:22:50,216][00186] Avg episode reward: [(0, '23.466')] [2024-09-19 19:22:52,606][02172] Updated weights for policy 0, policy_version 850 (0.0028) [2024-09-19 19:22:55,214][00186] Fps is (10 sec: 4915.9, 60 sec: 4164.2, 300 sec: 3971.0). Total num frames: 3493888. Throughput: 0: 1021.3. Samples: 873386. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-19 19:22:55,218][00186] Avg episode reward: [(0, '23.993')] [2024-09-19 19:23:00,216][00186] Fps is (10 sec: 4095.2, 60 sec: 4027.6, 300 sec: 3943.2). Total num frames: 3510272. Throughput: 0: 1040.8. Samples: 876414. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-19 19:23:00,218][00186] Avg episode reward: [(0, '23.736')] [2024-09-19 19:23:04,013][02172] Updated weights for policy 0, policy_version 860 (0.0028) [2024-09-19 19:23:05,214][00186] Fps is (10 sec: 3276.9, 60 sec: 3891.2, 300 sec: 3943.3). Total num frames: 3526656. Throughput: 0: 980.4. Samples: 880770. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-19 19:23:05,217][00186] Avg episode reward: [(0, '24.501')] [2024-09-19 19:23:10,214][00186] Fps is (10 sec: 4096.8, 60 sec: 4096.2, 300 sec: 3971.1). Total num frames: 3551232. Throughput: 0: 1000.7. Samples: 887966. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-19 19:23:10,217][00186] Avg episode reward: [(0, '23.818')] [2024-09-19 19:23:12,532][02172] Updated weights for policy 0, policy_version 870 (0.0028) [2024-09-19 19:23:15,216][00186] Fps is (10 sec: 4504.8, 60 sec: 4095.9, 300 sec: 3957.1). Total num frames: 3571712. Throughput: 0: 1031.5. Samples: 891528. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-19 19:23:15,222][00186] Avg episode reward: [(0, '24.046')] [2024-09-19 19:23:20,217][00186] Fps is (10 sec: 3685.4, 60 sec: 3959.3, 300 sec: 3929.3). Total num frames: 3588096. Throughput: 0: 999.0. Samples: 896298. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-19 19:23:20,219][00186] Avg episode reward: [(0, '24.509')] [2024-09-19 19:23:23,674][02172] Updated weights for policy 0, policy_version 880 (0.0033) [2024-09-19 19:23:25,214][00186] Fps is (10 sec: 3687.1, 60 sec: 3959.5, 300 sec: 3957.2). Total num frames: 3608576. Throughput: 0: 981.2. Samples: 902656. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-19 19:23:25,217][00186] Avg episode reward: [(0, '24.029')] [2024-09-19 19:23:25,228][02159] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000881_3608576.pth... [2024-09-19 19:23:25,352][02159] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000647_2650112.pth [2024-09-19 19:23:30,214][00186] Fps is (10 sec: 4506.8, 60 sec: 4096.0, 300 sec: 3971.0). Total num frames: 3633152. Throughput: 0: 1013.6. Samples: 906280. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-19 19:23:30,217][00186] Avg episode reward: [(0, '23.488')] [2024-09-19 19:23:34,401][02172] Updated weights for policy 0, policy_version 890 (0.0026) [2024-09-19 19:23:35,214][00186] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3957.2). Total num frames: 3645440. Throughput: 0: 1008.9. Samples: 911454. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-19 19:23:35,220][00186] Avg episode reward: [(0, '23.271')] [2024-09-19 19:23:40,214][00186] Fps is (10 sec: 2457.6, 60 sec: 3754.7, 300 sec: 3943.3). Total num frames: 3657728. Throughput: 0: 923.5. Samples: 914942. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-19 19:23:40,217][00186] Avg episode reward: [(0, '23.154')] [2024-09-19 19:23:45,214][00186] Fps is (10 sec: 3276.8, 60 sec: 3891.3, 300 sec: 3957.2). Total num frames: 3678208. Throughput: 0: 916.8. Samples: 917670. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-19 19:23:45,219][00186] Avg episode reward: [(0, '24.514')] [2024-09-19 19:23:46,370][02172] Updated weights for policy 0, policy_version 900 (0.0035) [2024-09-19 19:23:50,214][00186] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3971.0). Total num frames: 3702784. Throughput: 0: 981.2. Samples: 924926. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-19 19:23:50,221][00186] Avg episode reward: [(0, '24.719')] [2024-09-19 19:23:55,214][00186] Fps is (10 sec: 4095.9, 60 sec: 3754.7, 300 sec: 3957.2). Total num frames: 3719168. Throughput: 0: 944.6. Samples: 930474. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-19 19:23:55,217][00186] Avg episode reward: [(0, '26.358')] [2024-09-19 19:23:57,197][02172] Updated weights for policy 0, policy_version 910 (0.0041) [2024-09-19 19:24:00,214][00186] Fps is (10 sec: 3686.4, 60 sec: 3823.1, 300 sec: 3971.0). Total num frames: 3739648. Throughput: 0: 915.3. Samples: 932714. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-19 19:24:00,216][00186] Avg episode reward: [(0, '26.712')] [2024-09-19 19:24:00,223][02159] Saving new best policy, reward=26.712! [2024-09-19 19:24:05,214][00186] Fps is (10 sec: 4096.1, 60 sec: 3891.2, 300 sec: 3971.0). Total num frames: 3760128. Throughput: 0: 960.9. Samples: 939538. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-19 19:24:05,222][00186] Avg episode reward: [(0, '26.639')] [2024-09-19 19:24:06,257][02172] Updated weights for policy 0, policy_version 920 (0.0048) [2024-09-19 19:24:10,215][00186] Fps is (10 sec: 4095.4, 60 sec: 3822.8, 300 sec: 3957.1). Total num frames: 3780608. Throughput: 0: 962.4. Samples: 945964. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-19 19:24:10,220][00186] Avg episode reward: [(0, '24.990')] [2024-09-19 19:24:15,214][00186] Fps is (10 sec: 3686.4, 60 sec: 3754.8, 300 sec: 3957.2). Total num frames: 3796992. Throughput: 0: 928.8. Samples: 948078. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-19 19:24:15,219][00186] Avg episode reward: [(0, '24.809')] [2024-09-19 19:24:17,499][02172] Updated weights for policy 0, policy_version 930 (0.0015) [2024-09-19 19:24:20,214][00186] Fps is (10 sec: 4096.6, 60 sec: 3891.4, 300 sec: 3971.0). Total num frames: 3821568. Throughput: 0: 949.6. Samples: 954186. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-19 19:24:20,221][00186] Avg episode reward: [(0, '25.558')] [2024-09-19 19:24:25,214][00186] Fps is (10 sec: 4915.2, 60 sec: 3959.5, 300 sec: 3984.9). Total num frames: 3846144. Throughput: 0: 1033.6. Samples: 961454. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-19 19:24:25,221][00186] Avg episode reward: [(0, '24.744')] [2024-09-19 19:24:26,549][02172] Updated weights for policy 0, policy_version 940 (0.0030) [2024-09-19 19:24:30,214][00186] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3957.2). Total num frames: 3858432. Throughput: 0: 1024.3. Samples: 963764. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-19 19:24:30,217][00186] Avg episode reward: [(0, '24.042')] [2024-09-19 19:24:35,214][00186] Fps is (10 sec: 3276.7, 60 sec: 3891.2, 300 sec: 3971.0). Total num frames: 3878912. Throughput: 0: 975.1. Samples: 968808. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-19 19:24:35,217][00186] Avg episode reward: [(0, '24.951')] [2024-09-19 19:24:37,433][02172] Updated weights for policy 0, policy_version 950 (0.0022) [2024-09-19 19:24:40,214][00186] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 3971.0). Total num frames: 3903488. Throughput: 0: 1012.8. Samples: 976048. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-19 19:24:40,217][00186] Avg episode reward: [(0, '24.979')] [2024-09-19 19:24:45,214][00186] Fps is (10 sec: 4096.1, 60 sec: 4027.7, 300 sec: 3957.2). Total num frames: 3919872. Throughput: 0: 1034.7. Samples: 979276. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-19 19:24:45,219][00186] Avg episode reward: [(0, '23.879')] [2024-09-19 19:24:48,361][02172] Updated weights for policy 0, policy_version 960 (0.0015) [2024-09-19 19:24:50,214][00186] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3971.0). Total num frames: 3940352. Throughput: 0: 980.8. Samples: 983672. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-19 19:24:50,218][00186] Avg episode reward: [(0, '23.787')] [2024-09-19 19:24:55,214][00186] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 3984.9). Total num frames: 3964928. Throughput: 0: 997.7. Samples: 990858. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-19 19:24:55,218][00186] Avg episode reward: [(0, '25.713')] [2024-09-19 19:24:56,916][02172] Updated weights for policy 0, policy_version 970 (0.0019) [2024-09-19 19:25:00,215][00186] Fps is (10 sec: 4505.2, 60 sec: 4096.0, 300 sec: 3971.0). Total num frames: 3985408. Throughput: 0: 1032.0. Samples: 994518. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-19 19:25:00,217][00186] Avg episode reward: [(0, '24.168')] [2024-09-19 19:25:05,214][00186] Fps is (10 sec: 3276.8, 60 sec: 3959.5, 300 sec: 3957.2). Total num frames: 3997696. Throughput: 0: 1003.5. Samples: 999342. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-19 19:25:05,219][00186] Avg episode reward: [(0, '23.498')] [2024-09-19 19:25:06,758][02159] Stopping Batcher_0... [2024-09-19 19:25:06,758][02159] Loop batcher_evt_loop terminating... [2024-09-19 19:25:06,759][00186] Component Batcher_0 stopped! [2024-09-19 19:25:06,765][02159] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-09-19 19:25:06,804][02172] Weights refcount: 2 0 [2024-09-19 19:25:06,808][02172] Stopping InferenceWorker_p0-w0... [2024-09-19 19:25:06,807][00186] Component InferenceWorker_p0-w0 stopped! [2024-09-19 19:25:06,809][02172] Loop inference_proc0-0_evt_loop terminating... [2024-09-19 19:25:06,892][02159] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000764_3129344.pth [2024-09-19 19:25:06,912][02159] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-09-19 19:25:07,097][02159] Stopping LearnerWorker_p0... [2024-09-19 19:25:07,097][02159] Loop learner_proc0_evt_loop terminating... [2024-09-19 19:25:07,098][00186] Component LearnerWorker_p0 stopped! [2024-09-19 19:25:07,109][00186] Component RolloutWorker_w3 stopped! [2024-09-19 19:25:07,113][02177] Stopping RolloutWorker_w3... [2024-09-19 19:25:07,115][02177] Loop rollout_proc3_evt_loop terminating... [2024-09-19 19:25:07,124][00186] Component RolloutWorker_w1 stopped! [2024-09-19 19:25:07,130][02178] Stopping RolloutWorker_w1... [2024-09-19 19:25:07,137][00186] Component RolloutWorker_w7 stopped! [2024-09-19 19:25:07,141][02182] Stopping RolloutWorker_w7... [2024-09-19 19:25:07,131][02178] Loop rollout_proc1_evt_loop terminating... [2024-09-19 19:25:07,144][02182] Loop rollout_proc7_evt_loop terminating... [2024-09-19 19:25:07,158][00186] Component RolloutWorker_w5 stopped! [2024-09-19 19:25:07,162][02180] Stopping RolloutWorker_w5... [2024-09-19 19:25:07,164][02180] Loop rollout_proc5_evt_loop terminating... [2024-09-19 19:25:07,228][00186] Component RolloutWorker_w6 stopped! [2024-09-19 19:25:07,230][02181] Stopping RolloutWorker_w6... [2024-09-19 19:25:07,235][02181] Loop rollout_proc6_evt_loop terminating... [2024-09-19 19:25:07,282][00186] Component RolloutWorker_w0 stopped! [2024-09-19 19:25:07,287][00186] Component RolloutWorker_w4 stopped! [2024-09-19 19:25:07,289][02179] Stopping RolloutWorker_w4... [2024-09-19 19:25:07,285][02173] Stopping RolloutWorker_w0... [2024-09-19 19:25:07,294][00186] Component RolloutWorker_w2 stopped! [2024-09-19 19:25:07,296][00186] Waiting for process learner_proc0 to stop... [2024-09-19 19:25:07,296][02174] Stopping RolloutWorker_w2... [2024-09-19 19:25:07,299][02174] Loop rollout_proc2_evt_loop terminating... [2024-09-19 19:25:07,300][02179] Loop rollout_proc4_evt_loop terminating... [2024-09-19 19:25:07,292][02173] Loop rollout_proc0_evt_loop terminating... [2024-09-19 19:25:08,645][00186] Waiting for process inference_proc0-0 to join... [2024-09-19 19:25:08,649][00186] Waiting for process rollout_proc0 to join... [2024-09-19 19:25:10,581][00186] Waiting for process rollout_proc1 to join... [2024-09-19 19:25:10,587][00186] Waiting for process rollout_proc2 to join... [2024-09-19 19:25:10,593][00186] Waiting for process rollout_proc3 to join... [2024-09-19 19:25:10,596][00186] Waiting for process rollout_proc4 to join... [2024-09-19 19:25:10,601][00186] Waiting for process rollout_proc5 to join... [2024-09-19 19:25:10,604][00186] Waiting for process rollout_proc6 to join... [2024-09-19 19:25:10,607][00186] Waiting for process rollout_proc7 to join... [2024-09-19 19:25:10,614][00186] Batcher 0 profile tree view: batching: 26.6068, releasing_batches: 0.0286 [2024-09-19 19:25:10,615][00186] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0000 wait_policy_total: 389.5252 update_model: 9.1650 weight_update: 0.0028 one_step: 0.0059 handle_policy_step: 590.2036 deserialize: 14.3118, stack: 3.2379, obs_to_device_normalize: 119.7810, forward: 314.3029, send_messages: 28.5452 prepare_outputs: 81.5685 to_cpu: 47.1508 [2024-09-19 19:25:10,617][00186] Learner 0 profile tree view: misc: 0.0049, prepare_batch: 13.5013 train: 74.7131 epoch_init: 0.0059, minibatch_init: 0.0067, losses_postprocess: 0.6950, kl_divergence: 0.7441, after_optimizer: 33.7933 calculate_losses: 26.5339 losses_init: 0.0063, forward_head: 1.2828, bptt_initial: 17.6417, tail: 1.0796, advantages_returns: 0.2927, losses: 3.9626 bptt: 1.9406 bptt_forward_core: 1.8266 update: 12.3265 clip: 0.8740 [2024-09-19 19:25:10,619][00186] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.2529, enqueue_policy_requests: 91.8572, env_step: 806.7612, overhead: 11.9251, complete_rollouts: 6.7556 save_policy_outputs: 20.1596 split_output_tensors: 8.0251 [2024-09-19 19:25:10,620][00186] RolloutWorker_w7 profile tree view: wait_for_trajectories: 0.3016, enqueue_policy_requests: 90.4724, env_step: 803.0621, overhead: 12.2329, complete_rollouts: 7.0381 save_policy_outputs: 19.7129 split_output_tensors: 8.0215 [2024-09-19 19:25:10,621][00186] Loop Runner_EvtLoop terminating... [2024-09-19 19:25:10,622][00186] Runner profile tree view: main_loop: 1057.9504 [2024-09-19 19:25:10,623][00186] Collected {0: 4005888}, FPS: 3786.5 [2024-09-19 19:44:38,716][00186] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-09-19 19:44:38,717][00186] Overriding arg 'num_workers' with value 1 passed from command line [2024-09-19 19:44:38,721][00186] Adding new argument 'no_render'=True that is not in the saved config file! [2024-09-19 19:44:38,723][00186] Adding new argument 'save_video'=True that is not in the saved config file! [2024-09-19 19:44:38,725][00186] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-09-19 19:44:38,728][00186] Adding new argument 'video_name'=None that is not in the saved config file! [2024-09-19 19:44:38,729][00186] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! [2024-09-19 19:44:38,731][00186] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-09-19 19:44:38,732][00186] Adding new argument 'push_to_hub'=False that is not in the saved config file! [2024-09-19 19:44:38,733][00186] Adding new argument 'hf_repository'=None that is not in the saved config file! [2024-09-19 19:44:38,734][00186] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-09-19 19:44:38,735][00186] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-09-19 19:44:38,736][00186] Adding new argument 'train_script'=None that is not in the saved config file! [2024-09-19 19:44:38,737][00186] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-09-19 19:44:38,738][00186] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-09-19 19:44:38,773][00186] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-19 19:44:38,776][00186] RunningMeanStd input shape: (3, 72, 128) [2024-09-19 19:44:38,778][00186] RunningMeanStd input shape: (1,) [2024-09-19 19:44:38,793][00186] ConvEncoder: input_channels=3 [2024-09-19 19:44:38,898][00186] Conv encoder output size: 512 [2024-09-19 19:44:38,899][00186] Policy head output size: 512 [2024-09-19 19:44:39,162][00186] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-09-19 19:44:40,030][00186] Num frames 100... [2024-09-19 19:44:40,155][00186] Num frames 200... [2024-09-19 19:44:40,290][00186] Num frames 300... [2024-09-19 19:44:40,415][00186] Num frames 400... [2024-09-19 19:44:40,534][00186] Num frames 500... [2024-09-19 19:44:40,687][00186] Num frames 600... [2024-09-19 19:44:40,859][00186] Num frames 700... [2024-09-19 19:44:41,039][00186] Num frames 800... [2024-09-19 19:44:41,094][00186] Avg episode rewards: #0: 16.000, true rewards: #0: 8.000 [2024-09-19 19:44:41,099][00186] Avg episode reward: 16.000, avg true_objective: 8.000 [2024-09-19 19:44:41,292][00186] Num frames 900... [2024-09-19 19:44:41,458][00186] Num frames 1000... [2024-09-19 19:44:41,633][00186] Num frames 1100... [2024-09-19 19:44:41,807][00186] Num frames 1200... [2024-09-19 19:44:41,981][00186] Num frames 1300... [2024-09-19 19:44:42,160][00186] Num frames 1400... [2024-09-19 19:44:42,337][00186] Num frames 1500... [2024-09-19 19:44:42,510][00186] Num frames 1600... [2024-09-19 19:44:42,688][00186] Num frames 1700... [2024-09-19 19:44:42,861][00186] Num frames 1800... [2024-09-19 19:44:43,047][00186] Num frames 1900... [2024-09-19 19:44:43,166][00186] Avg episode rewards: #0: 21.170, true rewards: #0: 9.670 [2024-09-19 19:44:43,168][00186] Avg episode reward: 21.170, avg true_objective: 9.670 [2024-09-19 19:44:43,252][00186] Num frames 2000... [2024-09-19 19:44:43,383][00186] Num frames 2100... [2024-09-19 19:44:43,507][00186] Num frames 2200... [2024-09-19 19:44:43,640][00186] Num frames 2300... [2024-09-19 19:44:43,765][00186] Num frames 2400... [2024-09-19 19:44:43,886][00186] Num frames 2500... [2024-09-19 19:44:44,007][00186] Num frames 2600... [2024-09-19 19:44:44,135][00186] Num frames 2700... [2024-09-19 19:44:44,260][00186] Num frames 2800... [2024-09-19 19:44:44,394][00186] Num frames 2900... [2024-09-19 19:44:44,521][00186] Num frames 3000... [2024-09-19 19:44:44,648][00186] Num frames 3100... [2024-09-19 19:44:44,775][00186] Num frames 3200... [2024-09-19 19:44:44,902][00186] Num frames 3300... [2024-09-19 19:44:45,026][00186] Num frames 3400... [2024-09-19 19:44:45,135][00186] Avg episode rewards: #0: 25.807, true rewards: #0: 11.473 [2024-09-19 19:44:45,138][00186] Avg episode reward: 25.807, avg true_objective: 11.473 [2024-09-19 19:44:45,211][00186] Num frames 3500... [2024-09-19 19:44:45,331][00186] Num frames 3600... [2024-09-19 19:44:45,467][00186] Num frames 3700... [2024-09-19 19:44:45,599][00186] Num frames 3800... [2024-09-19 19:44:45,725][00186] Num frames 3900... [2024-09-19 19:44:45,851][00186] Num frames 4000... [2024-09-19 19:44:45,986][00186] Num frames 4100... [2024-09-19 19:44:46,060][00186] Avg episode rewards: #0: 23.035, true rewards: #0: 10.285 [2024-09-19 19:44:46,061][00186] Avg episode reward: 23.035, avg true_objective: 10.285 [2024-09-19 19:44:46,173][00186] Num frames 4200... [2024-09-19 19:44:46,296][00186] Num frames 4300... [2024-09-19 19:44:46,425][00186] Num frames 4400... [2024-09-19 19:44:46,550][00186] Num frames 4500... [2024-09-19 19:44:46,683][00186] Num frames 4600... [2024-09-19 19:44:46,804][00186] Num frames 4700... [2024-09-19 19:44:46,928][00186] Num frames 4800... [2024-09-19 19:44:47,051][00186] Num frames 4900... [2024-09-19 19:44:47,179][00186] Num frames 5000... [2024-09-19 19:44:47,306][00186] Num frames 5100... [2024-09-19 19:44:47,429][00186] Num frames 5200... [2024-09-19 19:44:47,559][00186] Num frames 5300... [2024-09-19 19:44:47,691][00186] Num frames 5400... [2024-09-19 19:44:47,816][00186] Num frames 5500... [2024-09-19 19:44:47,947][00186] Num frames 5600... [2024-09-19 19:44:48,073][00186] Num frames 5700... [2024-09-19 19:44:48,201][00186] Num frames 5800... [2024-09-19 19:44:48,326][00186] Num frames 5900... [2024-09-19 19:44:48,449][00186] Num frames 6000... [2024-09-19 19:44:48,591][00186] Avg episode rewards: #0: 28.132, true rewards: #0: 12.132 [2024-09-19 19:44:48,593][00186] Avg episode reward: 28.132, avg true_objective: 12.132 [2024-09-19 19:44:48,644][00186] Num frames 6100... [2024-09-19 19:44:48,768][00186] Num frames 6200... [2024-09-19 19:44:48,889][00186] Num frames 6300... [2024-09-19 19:44:49,011][00186] Num frames 6400... [2024-09-19 19:44:49,135][00186] Num frames 6500... [2024-09-19 19:44:49,258][00186] Num frames 6600... [2024-09-19 19:44:49,379][00186] Num frames 6700... [2024-09-19 19:44:49,513][00186] Num frames 6800... [2024-09-19 19:44:49,639][00186] Num frames 6900... [2024-09-19 19:44:49,760][00186] Num frames 7000... [2024-09-19 19:44:49,881][00186] Num frames 7100... [2024-09-19 19:44:50,001][00186] Num frames 7200... [2024-09-19 19:44:50,155][00186] Avg episode rewards: #0: 28.137, true rewards: #0: 12.137 [2024-09-19 19:44:50,158][00186] Avg episode reward: 28.137, avg true_objective: 12.137 [2024-09-19 19:44:50,183][00186] Num frames 7300... [2024-09-19 19:44:50,304][00186] Num frames 7400... [2024-09-19 19:44:50,427][00186] Num frames 7500... [2024-09-19 19:44:50,557][00186] Num frames 7600... [2024-09-19 19:44:50,685][00186] Num frames 7700... [2024-09-19 19:44:50,819][00186] Num frames 7800... [2024-09-19 19:44:50,946][00186] Num frames 7900... [2024-09-19 19:44:51,070][00186] Num frames 8000... [2024-09-19 19:44:51,196][00186] Num frames 8100... [2024-09-19 19:44:51,318][00186] Num frames 8200... [2024-09-19 19:44:51,460][00186] Avg episode rewards: #0: 27.386, true rewards: #0: 11.814 [2024-09-19 19:44:51,461][00186] Avg episode reward: 27.386, avg true_objective: 11.814 [2024-09-19 19:44:51,501][00186] Num frames 8300... [2024-09-19 19:44:51,636][00186] Num frames 8400... [2024-09-19 19:44:51,760][00186] Num frames 8500... [2024-09-19 19:44:51,884][00186] Num frames 8600... [2024-09-19 19:44:52,007][00186] Num frames 8700... [2024-09-19 19:44:52,131][00186] Num frames 8800... [2024-09-19 19:44:52,254][00186] Num frames 8900... [2024-09-19 19:44:52,375][00186] Num frames 9000... [2024-09-19 19:44:52,512][00186] Avg episode rewards: #0: 25.838, true rewards: #0: 11.337 [2024-09-19 19:44:52,514][00186] Avg episode reward: 25.838, avg true_objective: 11.337 [2024-09-19 19:44:52,552][00186] Num frames 9100... [2024-09-19 19:44:52,691][00186] Num frames 9200... [2024-09-19 19:44:52,815][00186] Num frames 9300... [2024-09-19 19:44:52,939][00186] Num frames 9400... [2024-09-19 19:44:53,069][00186] Num frames 9500... [2024-09-19 19:44:53,239][00186] Num frames 9600... [2024-09-19 19:44:53,410][00186] Num frames 9700... [2024-09-19 19:44:53,585][00186] Num frames 9800... [2024-09-19 19:44:53,759][00186] Num frames 9900... [2024-09-19 19:44:53,930][00186] Num frames 10000... [2024-09-19 19:44:54,105][00186] Num frames 10100... [2024-09-19 19:44:54,278][00186] Num frames 10200... [2024-09-19 19:44:54,445][00186] Num frames 10300... [2024-09-19 19:44:54,621][00186] Num frames 10400... [2024-09-19 19:44:54,809][00186] Num frames 10500... [2024-09-19 19:44:54,992][00186] Num frames 10600... [2024-09-19 19:44:55,178][00186] Num frames 10700... [2024-09-19 19:44:55,247][00186] Avg episode rewards: #0: 27.562, true rewards: #0: 11.896 [2024-09-19 19:44:55,249][00186] Avg episode reward: 27.562, avg true_objective: 11.896 [2024-09-19 19:44:55,414][00186] Num frames 10800... [2024-09-19 19:44:55,582][00186] Num frames 10900... [2024-09-19 19:44:55,719][00186] Num frames 11000... [2024-09-19 19:44:55,842][00186] Num frames 11100... [2024-09-19 19:44:55,963][00186] Num frames 11200... [2024-09-19 19:44:56,089][00186] Num frames 11300... [2024-09-19 19:44:56,223][00186] Num frames 11400... [2024-09-19 19:44:56,353][00186] Num frames 11500... [2024-09-19 19:44:56,417][00186] Avg episode rewards: #0: 26.806, true rewards: #0: 11.506 [2024-09-19 19:44:56,419][00186] Avg episode reward: 26.806, avg true_objective: 11.506 [2024-09-19 19:46:04,100][00186] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2024-09-19 19:49:04,527][00186] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-09-19 19:49:04,529][00186] Overriding arg 'num_workers' with value 1 passed from command line [2024-09-19 19:49:04,531][00186] Adding new argument 'no_render'=True that is not in the saved config file! [2024-09-19 19:49:04,533][00186] Adding new argument 'save_video'=True that is not in the saved config file! [2024-09-19 19:49:04,535][00186] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-09-19 19:49:04,536][00186] Adding new argument 'video_name'=None that is not in the saved config file! [2024-09-19 19:49:04,538][00186] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2024-09-19 19:49:04,539][00186] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-09-19 19:49:04,540][00186] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2024-09-19 19:49:04,541][00186] Adding new argument 'hf_repository'='ThomasSimonini/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2024-09-19 19:49:04,542][00186] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-09-19 19:49:04,546][00186] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-09-19 19:49:04,547][00186] Adding new argument 'train_script'=None that is not in the saved config file! [2024-09-19 19:49:04,548][00186] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-09-19 19:49:04,549][00186] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-09-19 19:49:04,579][00186] RunningMeanStd input shape: (3, 72, 128) [2024-09-19 19:49:04,582][00186] RunningMeanStd input shape: (1,) [2024-09-19 19:49:04,596][00186] ConvEncoder: input_channels=3 [2024-09-19 19:49:04,641][00186] Conv encoder output size: 512 [2024-09-19 19:49:04,643][00186] Policy head output size: 512 [2024-09-19 19:49:04,664][00186] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-09-19 19:49:05,085][00186] Num frames 100... [2024-09-19 19:49:05,209][00186] Num frames 200... [2024-09-19 19:49:05,339][00186] Num frames 300... [2024-09-19 19:49:05,461][00186] Num frames 400... [2024-09-19 19:49:05,590][00186] Num frames 500... [2024-09-19 19:49:05,721][00186] Num frames 600... [2024-09-19 19:49:05,848][00186] Num frames 700... [2024-09-19 19:49:05,970][00186] Num frames 800... [2024-09-19 19:49:06,143][00186] Avg episode rewards: #0: 18.960, true rewards: #0: 8.960 [2024-09-19 19:49:06,144][00186] Avg episode reward: 18.960, avg true_objective: 8.960 [2024-09-19 19:49:06,153][00186] Num frames 900... [2024-09-19 19:49:06,278][00186] Num frames 1000... [2024-09-19 19:49:06,408][00186] Num frames 1100... [2024-09-19 19:49:06,534][00186] Num frames 1200... [2024-09-19 19:49:06,665][00186] Num frames 1300... [2024-09-19 19:49:06,789][00186] Num frames 1400... [2024-09-19 19:49:06,911][00186] Num frames 1500... [2024-09-19 19:49:07,028][00186] Avg episode rewards: #0: 15.755, true rewards: #0: 7.755 [2024-09-19 19:49:07,030][00186] Avg episode reward: 15.755, avg true_objective: 7.755 [2024-09-19 19:49:07,093][00186] Num frames 1600... [2024-09-19 19:49:07,218][00186] Num frames 1700... [2024-09-19 19:49:07,340][00186] Num frames 1800... [2024-09-19 19:49:07,477][00186] Num frames 1900... [2024-09-19 19:49:07,609][00186] Num frames 2000... [2024-09-19 19:49:12,342][00186] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-09-19 19:49:12,344][00186] Overriding arg 'num_workers' with value 1 passed from command line [2024-09-19 19:49:12,346][00186] Adding new argument 'no_render'=True that is not in the saved config file! [2024-09-19 19:49:12,347][00186] Adding new argument 'save_video'=True that is not in the saved config file! [2024-09-19 19:49:12,350][00186] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-09-19 19:49:12,352][00186] Adding new argument 'video_name'=None that is not in the saved config file! [2024-09-19 19:49:12,353][00186] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2024-09-19 19:49:12,355][00186] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-09-19 19:49:12,356][00186] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2024-09-19 19:49:12,358][00186] Adding new argument 'hf_repository'='evgeniypark/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2024-09-19 19:49:12,359][00186] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-09-19 19:49:12,360][00186] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-09-19 19:49:12,361][00186] Adding new argument 'train_script'=None that is not in the saved config file! [2024-09-19 19:49:12,363][00186] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-09-19 19:49:12,364][00186] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-09-19 19:49:12,395][00186] RunningMeanStd input shape: (3, 72, 128) [2024-09-19 19:49:12,398][00186] RunningMeanStd input shape: (1,) [2024-09-19 19:49:12,410][00186] ConvEncoder: input_channels=3 [2024-09-19 19:49:12,458][00186] Conv encoder output size: 512 [2024-09-19 19:49:12,459][00186] Policy head output size: 512 [2024-09-19 19:49:12,478][00186] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-09-19 19:49:12,996][00186] Num frames 100... [2024-09-19 19:49:13,190][00186] Num frames 200... [2024-09-19 19:49:13,354][00186] Avg episode rewards: #0: 5.560, true rewards: #0: 2.560 [2024-09-19 19:49:13,356][00186] Avg episode reward: 5.560, avg true_objective: 2.560 [2024-09-19 19:49:13,461][00186] Num frames 300... [2024-09-19 19:49:13,949][00186] Num frames 400... [2024-09-19 19:49:14,324][00186] Num frames 500... [2024-09-19 19:49:14,491][00186] Num frames 600... [2024-09-19 19:49:14,663][00186] Num frames 700... [2024-09-19 19:49:14,826][00186] Num frames 800... [2024-09-19 19:49:14,999][00186] Num frames 900... [2024-09-19 19:49:15,173][00186] Num frames 1000... [2024-09-19 19:49:15,352][00186] Num frames 1100... [2024-09-19 19:49:15,534][00186] Num frames 1200... [2024-09-19 19:49:15,723][00186] Num frames 1300... [2024-09-19 19:49:15,890][00186] Num frames 1400... [2024-09-19 19:49:16,063][00186] Num frames 1500... [2024-09-19 19:49:16,229][00186] Num frames 1600... [2024-09-19 19:49:16,337][00186] Avg episode rewards: #0: 19.715, true rewards: #0: 8.215 [2024-09-19 19:49:16,339][00186] Avg episode reward: 19.715, avg true_objective: 8.215 [2024-09-19 19:49:16,412][00186] Num frames 1700... [2024-09-19 19:49:16,532][00186] Num frames 1800... [2024-09-19 19:49:16,672][00186] Num frames 1900... [2024-09-19 19:49:16,797][00186] Num frames 2000... [2024-09-19 19:49:16,916][00186] Num frames 2100... [2024-09-19 19:49:17,037][00186] Num frames 2200... [2024-09-19 19:49:17,161][00186] Num frames 2300... [2024-09-19 19:49:17,283][00186] Num frames 2400... [2024-09-19 19:49:17,411][00186] Num frames 2500... [2024-09-19 19:49:17,494][00186] Avg episode rewards: #0: 20.073, true rewards: #0: 8.407 [2024-09-19 19:49:17,496][00186] Avg episode reward: 20.073, avg true_objective: 8.407 [2024-09-19 19:49:17,596][00186] Num frames 2600... [2024-09-19 19:49:17,736][00186] Num frames 2700... [2024-09-19 19:49:17,857][00186] Num frames 2800... [2024-09-19 19:49:17,980][00186] Num frames 2900... [2024-09-19 19:49:18,100][00186] Num frames 3000... [2024-09-19 19:49:18,219][00186] Num frames 3100... [2024-09-19 19:49:18,342][00186] Num frames 3200... [2024-09-19 19:49:18,461][00186] Num frames 3300... [2024-09-19 19:49:18,588][00186] Num frames 3400... [2024-09-19 19:49:18,721][00186] Num frames 3500... [2024-09-19 19:49:18,841][00186] Num frames 3600... [2024-09-19 19:49:18,910][00186] Avg episode rewards: #0: 21.275, true rewards: #0: 9.025 [2024-09-19 19:49:18,912][00186] Avg episode reward: 21.275, avg true_objective: 9.025 [2024-09-19 19:49:19,024][00186] Num frames 3700... [2024-09-19 19:49:19,147][00186] Num frames 3800... [2024-09-19 19:49:19,272][00186] Num frames 3900... [2024-09-19 19:49:19,393][00186] Num frames 4000... [2024-09-19 19:49:19,515][00186] Num frames 4100... [2024-09-19 19:49:19,652][00186] Num frames 4200... [2024-09-19 19:49:19,782][00186] Num frames 4300... [2024-09-19 19:49:19,912][00186] Num frames 4400... [2024-09-19 19:49:20,059][00186] Avg episode rewards: #0: 21.152, true rewards: #0: 8.952 [2024-09-19 19:49:20,061][00186] Avg episode reward: 21.152, avg true_objective: 8.952 [2024-09-19 19:49:20,094][00186] Num frames 4500... [2024-09-19 19:49:20,213][00186] Num frames 4600... [2024-09-19 19:49:20,338][00186] Num frames 4700... [2024-09-19 19:49:20,457][00186] Num frames 4800... [2024-09-19 19:49:20,578][00186] Num frames 4900... [2024-09-19 19:49:20,710][00186] Num frames 5000... [2024-09-19 19:49:20,837][00186] Num frames 5100... [2024-09-19 19:49:20,960][00186] Num frames 5200... [2024-09-19 19:49:21,032][00186] Avg episode rewards: #0: 20.020, true rewards: #0: 8.687 [2024-09-19 19:49:21,033][00186] Avg episode reward: 20.020, avg true_objective: 8.687 [2024-09-19 19:49:21,140][00186] Num frames 5300... [2024-09-19 19:49:21,264][00186] Num frames 5400... [2024-09-19 19:49:21,384][00186] Num frames 5500... [2024-09-19 19:49:21,500][00186] Num frames 5600... [2024-09-19 19:49:21,628][00186] Num frames 5700... [2024-09-19 19:49:21,750][00186] Num frames 5800... [2024-09-19 19:49:21,876][00186] Num frames 5900... [2024-09-19 19:49:21,995][00186] Num frames 6000... [2024-09-19 19:49:22,117][00186] Num frames 6100... [2024-09-19 19:49:22,238][00186] Num frames 6200... [2024-09-19 19:49:22,361][00186] Num frames 6300... [2024-09-19 19:49:22,482][00186] Num frames 6400... [2024-09-19 19:49:22,614][00186] Num frames 6500... [2024-09-19 19:49:22,736][00186] Num frames 6600... [2024-09-19 19:49:22,872][00186] Num frames 6700... [2024-09-19 19:49:22,988][00186] Avg episode rewards: #0: 22.497, true rewards: #0: 9.640 [2024-09-19 19:49:22,990][00186] Avg episode reward: 22.497, avg true_objective: 9.640 [2024-09-19 19:49:23,058][00186] Num frames 6800... [2024-09-19 19:49:23,179][00186] Num frames 6900... [2024-09-19 19:49:23,305][00186] Num frames 7000... [2024-09-19 19:49:23,425][00186] Num frames 7100... [2024-09-19 19:49:23,545][00186] Num frames 7200... [2024-09-19 19:49:23,678][00186] Num frames 7300... [2024-09-19 19:49:23,802][00186] Avg episode rewards: #0: 20.820, true rewards: #0: 9.195 [2024-09-19 19:49:23,805][00186] Avg episode reward: 20.820, avg true_objective: 9.195 [2024-09-19 19:49:23,867][00186] Num frames 7400... [2024-09-19 19:49:23,988][00186] Num frames 7500... [2024-09-19 19:49:24,114][00186] Num frames 7600... [2024-09-19 19:49:24,237][00186] Num frames 7700... [2024-09-19 19:49:24,364][00186] Num frames 7800... [2024-09-19 19:49:24,489][00186] Num frames 7900... [2024-09-19 19:49:24,620][00186] Num frames 8000... [2024-09-19 19:49:24,744][00186] Num frames 8100... [2024-09-19 19:49:24,875][00186] Num frames 8200... [2024-09-19 19:49:24,997][00186] Num frames 8300... [2024-09-19 19:49:25,122][00186] Num frames 8400... [2024-09-19 19:49:25,248][00186] Num frames 8500... [2024-09-19 19:49:25,387][00186] Avg episode rewards: #0: 21.519, true rewards: #0: 9.519 [2024-09-19 19:49:25,389][00186] Avg episode reward: 21.519, avg true_objective: 9.519 [2024-09-19 19:49:25,434][00186] Num frames 8600... [2024-09-19 19:49:25,555][00186] Num frames 8700... [2024-09-19 19:49:25,685][00186] Num frames 8800... [2024-09-19 19:49:25,815][00186] Num frames 8900... [2024-09-19 19:49:25,950][00186] Num frames 9000... [2024-09-19 19:49:26,125][00186] Avg episode rewards: #0: 20.197, true rewards: #0: 9.097 [2024-09-19 19:49:26,128][00186] Avg episode reward: 20.197, avg true_objective: 9.097 [2024-09-19 19:49:26,134][00186] Num frames 9100... [2024-09-19 19:50:18,736][00186] Replay video saved to /content/train_dir/default_experiment/replay.mp4!