[2023-03-03 18:43:06,141][01413] Saving configuration to /content/train_dir/default_experiment/config.json... [2023-03-03 18:43:06,145][01413] Rollout worker 0 uses device cpu [2023-03-03 18:43:06,148][01413] Rollout worker 1 uses device cpu [2023-03-03 18:43:06,150][01413] Rollout worker 2 uses device cpu [2023-03-03 18:43:06,152][01413] Rollout worker 3 uses device cpu [2023-03-03 18:43:06,153][01413] Rollout worker 4 uses device cpu [2023-03-03 18:43:06,154][01413] Rollout worker 5 uses device cpu [2023-03-03 18:43:06,155][01413] Rollout worker 6 uses device cpu [2023-03-03 18:43:06,156][01413] Rollout worker 7 uses device cpu [2023-03-03 18:43:06,346][01413] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-03-03 18:43:06,350][01413] InferenceWorker_p0-w0: min num requests: 2 [2023-03-03 18:43:06,381][01413] Starting all processes... [2023-03-03 18:43:06,382][01413] Starting process learner_proc0 [2023-03-03 18:43:06,437][01413] Starting all processes... [2023-03-03 18:43:06,450][01413] Starting process inference_proc0-0 [2023-03-03 18:43:06,451][01413] Starting process rollout_proc0 [2023-03-03 18:43:06,452][01413] Starting process rollout_proc1 [2023-03-03 18:43:06,453][01413] Starting process rollout_proc2 [2023-03-03 18:43:06,454][01413] Starting process rollout_proc3 [2023-03-03 18:43:06,454][01413] Starting process rollout_proc4 [2023-03-03 18:43:06,454][01413] Starting process rollout_proc5 [2023-03-03 18:43:06,454][01413] Starting process rollout_proc6 [2023-03-03 18:43:06,454][01413] Starting process rollout_proc7 [2023-03-03 18:43:17,706][12968] Worker 0 uses CPU cores [0] [2023-03-03 18:43:17,727][12970] Worker 3 uses CPU cores [1] [2023-03-03 18:43:17,926][12978] Worker 5 uses CPU cores [1] [2023-03-03 18:43:17,949][12979] Worker 6 uses CPU cores [0] [2023-03-03 18:43:17,977][12953] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-03-03 18:43:17,978][12953] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2023-03-03 18:43:18,094][12969] Worker 2 uses CPU cores [0] [2023-03-03 18:43:18,099][12972] Worker 4 uses CPU cores [0] [2023-03-03 18:43:18,105][12977] Worker 7 uses CPU cores [1] [2023-03-03 18:43:18,168][12967] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-03-03 18:43:18,169][12967] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2023-03-03 18:43:18,177][12971] Worker 1 uses CPU cores [1] [2023-03-03 18:43:18,645][12953] Num visible devices: 1 [2023-03-03 18:43:18,647][12967] Num visible devices: 1 [2023-03-03 18:43:18,655][12953] Starting seed is not provided [2023-03-03 18:43:18,655][12953] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-03-03 18:43:18,655][12953] Initializing actor-critic model on device cuda:0 [2023-03-03 18:43:18,656][12953] RunningMeanStd input shape: (3, 72, 128) [2023-03-03 18:43:18,658][12953] RunningMeanStd input shape: (1,) [2023-03-03 18:43:18,670][12953] ConvEncoder: input_channels=3 [2023-03-03 18:43:18,936][12953] Conv encoder output size: 512 [2023-03-03 18:43:18,936][12953] Policy head output size: 512 [2023-03-03 18:43:18,986][12953] Created Actor Critic model with architecture: [2023-03-03 18:43:18,986][12953] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2023-03-03 18:43:26,338][01413] Heartbeat connected on Batcher_0 [2023-03-03 18:43:26,347][01413] Heartbeat connected on InferenceWorker_p0-w0 [2023-03-03 18:43:26,357][01413] Heartbeat connected on RolloutWorker_w0 [2023-03-03 18:43:26,362][01413] Heartbeat connected on RolloutWorker_w1 [2023-03-03 18:43:26,365][01413] Heartbeat connected on RolloutWorker_w2 [2023-03-03 18:43:26,367][01413] Heartbeat connected on RolloutWorker_w3 [2023-03-03 18:43:26,371][01413] Heartbeat connected on RolloutWorker_w4 [2023-03-03 18:43:26,374][01413] Heartbeat connected on RolloutWorker_w5 [2023-03-03 18:43:26,378][01413] Heartbeat connected on RolloutWorker_w6 [2023-03-03 18:43:26,381][01413] Heartbeat connected on RolloutWorker_w7 [2023-03-03 18:43:26,731][12953] Using optimizer [2023-03-03 18:43:26,732][12953] No checkpoints found [2023-03-03 18:43:26,733][12953] Did not load from checkpoint, starting from scratch! [2023-03-03 18:43:26,733][12953] Initialized policy 0 weights for model version 0 [2023-03-03 18:43:26,745][12953] LearnerWorker_p0 finished initialization! [2023-03-03 18:43:26,745][12953] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-03-03 18:43:26,751][01413] Heartbeat connected on LearnerWorker_p0 [2023-03-03 18:43:26,962][12967] RunningMeanStd input shape: (3, 72, 128) [2023-03-03 18:43:26,965][12967] RunningMeanStd input shape: (1,) [2023-03-03 18:43:26,990][12967] ConvEncoder: input_channels=3 [2023-03-03 18:43:27,150][12967] Conv encoder output size: 512 [2023-03-03 18:43:27,151][12967] Policy head output size: 512 [2023-03-03 18:43:27,438][01413] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-03-03 18:43:30,249][01413] Inference worker 0-0 is ready! [2023-03-03 18:43:30,252][01413] All inference workers are ready! Signal rollout workers to start! [2023-03-03 18:43:30,328][12968] Doom resolution: 160x120, resize resolution: (128, 72) [2023-03-03 18:43:30,393][12970] Doom resolution: 160x120, resize resolution: (128, 72) [2023-03-03 18:43:30,432][12978] Doom resolution: 160x120, resize resolution: (128, 72) [2023-03-03 18:43:30,440][12971] Doom resolution: 160x120, resize resolution: (128, 72) [2023-03-03 18:43:30,450][12977] Doom resolution: 160x120, resize resolution: (128, 72) [2023-03-03 18:43:30,516][12979] Doom resolution: 160x120, resize resolution: (128, 72) [2023-03-03 18:43:30,521][12972] Doom resolution: 160x120, resize resolution: (128, 72) [2023-03-03 18:43:30,544][12969] Doom resolution: 160x120, resize resolution: (128, 72) [2023-03-03 18:43:31,897][12978] Decorrelating experience for 0 frames... [2023-03-03 18:43:31,896][12971] Decorrelating experience for 0 frames... [2023-03-03 18:43:31,899][12970] Decorrelating experience for 0 frames... [2023-03-03 18:43:32,191][12968] Decorrelating experience for 0 frames... [2023-03-03 18:43:32,226][12969] Decorrelating experience for 0 frames... [2023-03-03 18:43:32,235][12972] Decorrelating experience for 0 frames... [2023-03-03 18:43:32,237][12979] Decorrelating experience for 0 frames... [2023-03-03 18:43:32,292][12977] Decorrelating experience for 0 frames... [2023-03-03 18:43:32,438][01413] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-03-03 18:43:32,954][12978] Decorrelating experience for 32 frames... [2023-03-03 18:43:33,019][12971] Decorrelating experience for 32 frames... [2023-03-03 18:43:33,240][12969] Decorrelating experience for 32 frames... [2023-03-03 18:43:33,256][12968] Decorrelating experience for 32 frames... [2023-03-03 18:43:33,254][12979] Decorrelating experience for 32 frames... [2023-03-03 18:43:33,811][12968] Decorrelating experience for 64 frames... [2023-03-03 18:43:33,871][12978] Decorrelating experience for 64 frames... [2023-03-03 18:43:34,148][12971] Decorrelating experience for 64 frames... [2023-03-03 18:43:34,182][12977] Decorrelating experience for 32 frames... [2023-03-03 18:43:34,845][12968] Decorrelating experience for 96 frames... [2023-03-03 18:43:34,871][12978] Decorrelating experience for 96 frames... [2023-03-03 18:43:34,978][12979] Decorrelating experience for 64 frames... [2023-03-03 18:43:35,443][12971] Decorrelating experience for 96 frames... [2023-03-03 18:43:35,648][12977] Decorrelating experience for 64 frames... [2023-03-03 18:43:35,749][12969] Decorrelating experience for 64 frames... [2023-03-03 18:43:35,938][12979] Decorrelating experience for 96 frames... [2023-03-03 18:43:36,458][12970] Decorrelating experience for 32 frames... [2023-03-03 18:43:36,637][12972] Decorrelating experience for 32 frames... [2023-03-03 18:43:36,694][12969] Decorrelating experience for 96 frames... [2023-03-03 18:43:37,156][12977] Decorrelating experience for 96 frames... [2023-03-03 18:43:37,186][12972] Decorrelating experience for 64 frames... [2023-03-03 18:43:37,438][01413] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-03-03 18:43:37,582][12972] Decorrelating experience for 96 frames... [2023-03-03 18:43:37,699][12970] Decorrelating experience for 64 frames... [2023-03-03 18:43:38,035][12970] Decorrelating experience for 96 frames... [2023-03-03 18:43:42,027][12953] Signal inference workers to stop experience collection... [2023-03-03 18:43:42,077][12967] InferenceWorker_p0-w0: stopping experience collection [2023-03-03 18:43:42,438][01413] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 153.9. Samples: 2308. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-03-03 18:43:42,444][01413] Avg episode reward: [(0, '1.998')] [2023-03-03 18:43:44,999][12953] Signal inference workers to resume experience collection... [2023-03-03 18:43:44,999][12967] InferenceWorker_p0-w0: resuming experience collection [2023-03-03 18:43:47,438][01413] Fps is (10 sec: 409.6, 60 sec: 204.8, 300 sec: 204.8). Total num frames: 4096. Throughput: 0: 121.1. Samples: 2422. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-03-03 18:43:47,444][01413] Avg episode reward: [(0, '2.873')] [2023-03-03 18:43:52,438][01413] Fps is (10 sec: 2867.2, 60 sec: 1146.9, 300 sec: 1146.9). Total num frames: 28672. Throughput: 0: 267.0. Samples: 6674. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2023-03-03 18:43:52,440][01413] Avg episode reward: [(0, '3.882')] [2023-03-03 18:43:54,919][12967] Updated weights for policy 0, policy_version 10 (0.0013) [2023-03-03 18:43:57,438][01413] Fps is (10 sec: 4505.7, 60 sec: 1638.4, 300 sec: 1638.4). Total num frames: 49152. Throughput: 0: 448.3. Samples: 13448. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-03 18:43:57,443][01413] Avg episode reward: [(0, '4.501')] [2023-03-03 18:44:02,439][01413] Fps is (10 sec: 4095.7, 60 sec: 1989.4, 300 sec: 1989.4). Total num frames: 69632. Throughput: 0: 468.9. Samples: 16412. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2023-03-03 18:44:02,443][01413] Avg episode reward: [(0, '4.544')] [2023-03-03 18:44:06,323][12967] Updated weights for policy 0, policy_version 20 (0.0011) [2023-03-03 18:44:07,438][01413] Fps is (10 sec: 3276.8, 60 sec: 2048.0, 300 sec: 2048.0). Total num frames: 81920. Throughput: 0: 517.5. Samples: 20702. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2023-03-03 18:44:07,443][01413] Avg episode reward: [(0, '4.601')] [2023-03-03 18:44:12,438][01413] Fps is (10 sec: 3277.0, 60 sec: 2275.6, 300 sec: 2275.6). Total num frames: 102400. Throughput: 0: 590.1. Samples: 26556. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-03-03 18:44:12,448][01413] Avg episode reward: [(0, '4.478')] [2023-03-03 18:44:12,455][12953] Saving new best policy, reward=4.478! [2023-03-03 18:44:16,442][12967] Updated weights for policy 0, policy_version 30 (0.0022) [2023-03-03 18:44:17,438][01413] Fps is (10 sec: 4505.6, 60 sec: 2539.5, 300 sec: 2539.5). Total num frames: 126976. Throughput: 0: 665.6. Samples: 29952. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-03 18:44:17,440][01413] Avg episode reward: [(0, '4.575')] [2023-03-03 18:44:17,447][12953] Saving new best policy, reward=4.575! [2023-03-03 18:44:22,438][01413] Fps is (10 sec: 4095.9, 60 sec: 2606.5, 300 sec: 2606.5). Total num frames: 143360. Throughput: 0: 794.6. Samples: 35756. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-03-03 18:44:22,445][01413] Avg episode reward: [(0, '4.412')] [2023-03-03 18:44:27,439][01413] Fps is (10 sec: 2867.0, 60 sec: 2594.1, 300 sec: 2594.1). Total num frames: 155648. Throughput: 0: 841.8. Samples: 40190. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-03 18:44:27,444][01413] Avg episode reward: [(0, '4.297')] [2023-03-03 18:44:28,918][12967] Updated weights for policy 0, policy_version 40 (0.0033) [2023-03-03 18:44:32,438][01413] Fps is (10 sec: 3276.9, 60 sec: 2935.5, 300 sec: 2709.7). Total num frames: 176128. Throughput: 0: 904.8. Samples: 43140. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-03 18:44:32,445][01413] Avg episode reward: [(0, '4.362')] [2023-03-03 18:44:37,438][01413] Fps is (10 sec: 4505.9, 60 sec: 3345.1, 300 sec: 2867.2). Total num frames: 200704. Throughput: 0: 964.3. Samples: 50066. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-03-03 18:44:37,441][01413] Avg episode reward: [(0, '4.506')] [2023-03-03 18:44:37,761][12967] Updated weights for policy 0, policy_version 50 (0.0014) [2023-03-03 18:44:42,438][01413] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 2894.5). Total num frames: 217088. Throughput: 0: 932.6. Samples: 55416. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-03 18:44:42,444][01413] Avg episode reward: [(0, '4.370')] [2023-03-03 18:44:47,438][01413] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 2918.4). Total num frames: 233472. Throughput: 0: 914.5. Samples: 57564. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-03-03 18:44:47,443][01413] Avg episode reward: [(0, '4.345')] [2023-03-03 18:44:50,416][12967] Updated weights for policy 0, policy_version 60 (0.0019) [2023-03-03 18:44:52,438][01413] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 2987.7). Total num frames: 253952. Throughput: 0: 941.6. Samples: 63074. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-03-03 18:44:52,440][01413] Avg episode reward: [(0, '4.474')] [2023-03-03 18:44:57,438][01413] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3049.2). Total num frames: 274432. Throughput: 0: 965.1. Samples: 69986. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-03 18:44:57,446][01413] Avg episode reward: [(0, '4.612')] [2023-03-03 18:44:57,501][12953] Saving new best policy, reward=4.612! [2023-03-03 18:44:59,953][12967] Updated weights for policy 0, policy_version 70 (0.0014) [2023-03-03 18:45:02,438][01413] Fps is (10 sec: 3686.3, 60 sec: 3686.4, 300 sec: 3061.2). Total num frames: 290816. Throughput: 0: 944.4. Samples: 72450. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-03 18:45:02,445][01413] Avg episode reward: [(0, '4.419')] [2023-03-03 18:45:02,457][12953] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000071_290816.pth... [2023-03-03 18:45:07,438][01413] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3072.0). Total num frames: 307200. Throughput: 0: 911.5. Samples: 76772. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-03-03 18:45:07,443][01413] Avg episode reward: [(0, '4.388')] [2023-03-03 18:45:11,786][12967] Updated weights for policy 0, policy_version 80 (0.0028) [2023-03-03 18:45:12,438][01413] Fps is (10 sec: 3686.5, 60 sec: 3754.7, 300 sec: 3120.8). Total num frames: 327680. Throughput: 0: 951.4. Samples: 83002. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-03 18:45:12,445][01413] Avg episode reward: [(0, '4.423')] [2023-03-03 18:45:17,438][01413] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3202.3). Total num frames: 352256. Throughput: 0: 961.4. Samples: 86404. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-03 18:45:17,443][01413] Avg episode reward: [(0, '4.442')] [2023-03-03 18:45:22,438][01413] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3169.9). Total num frames: 364544. Throughput: 0: 925.8. Samples: 91728. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-03-03 18:45:22,440][01413] Avg episode reward: [(0, '4.520')] [2023-03-03 18:45:22,530][12967] Updated weights for policy 0, policy_version 90 (0.0027) [2023-03-03 18:45:27,438][01413] Fps is (10 sec: 2867.2, 60 sec: 3754.7, 300 sec: 3174.4). Total num frames: 380928. Throughput: 0: 905.8. Samples: 96176. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-03 18:45:27,441][01413] Avg episode reward: [(0, '4.520')] [2023-03-03 18:45:32,438][01413] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3244.0). Total num frames: 405504. Throughput: 0: 932.4. Samples: 99520. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-03 18:45:32,440][01413] Avg episode reward: [(0, '4.696')] [2023-03-03 18:45:32,452][12953] Saving new best policy, reward=4.696! [2023-03-03 18:45:33,438][12967] Updated weights for policy 0, policy_version 100 (0.0021) [2023-03-03 18:45:37,438][01413] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3276.8). Total num frames: 425984. Throughput: 0: 955.4. Samples: 106066. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-03-03 18:45:37,440][01413] Avg episode reward: [(0, '4.576')] [2023-03-03 18:45:42,441][01413] Fps is (10 sec: 3275.9, 60 sec: 3686.2, 300 sec: 3246.4). Total num frames: 438272. Throughput: 0: 912.2. Samples: 111038. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-03-03 18:45:42,443][01413] Avg episode reward: [(0, '4.350')] [2023-03-03 18:45:45,385][12967] Updated weights for policy 0, policy_version 110 (0.0020) [2023-03-03 18:45:47,438][01413] Fps is (10 sec: 2867.2, 60 sec: 3686.4, 300 sec: 3247.5). Total num frames: 454656. Throughput: 0: 905.8. Samples: 113210. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-03-03 18:45:47,440][01413] Avg episode reward: [(0, '4.389')] [2023-03-03 18:45:52,438][01413] Fps is (10 sec: 4097.1, 60 sec: 3754.7, 300 sec: 3305.0). Total num frames: 479232. Throughput: 0: 942.3. Samples: 119176. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-03 18:45:52,443][01413] Avg episode reward: [(0, '4.402')] [2023-03-03 18:45:55,016][12967] Updated weights for policy 0, policy_version 120 (0.0023) [2023-03-03 18:45:57,442][01413] Fps is (10 sec: 4503.9, 60 sec: 3754.4, 300 sec: 3331.3). Total num frames: 499712. Throughput: 0: 955.0. Samples: 125982. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-03-03 18:45:57,444][01413] Avg episode reward: [(0, '4.541')] [2023-03-03 18:46:02,438][01413] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3329.7). Total num frames: 516096. Throughput: 0: 927.9. Samples: 128158. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-03 18:46:02,443][01413] Avg episode reward: [(0, '4.623')] [2023-03-03 18:46:07,438][01413] Fps is (10 sec: 2868.3, 60 sec: 3686.4, 300 sec: 3302.4). Total num frames: 528384. Throughput: 0: 905.8. Samples: 132490. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-03 18:46:07,445][01413] Avg episode reward: [(0, '4.865')] [2023-03-03 18:46:07,449][12953] Saving new best policy, reward=4.865! [2023-03-03 18:46:07,715][12967] Updated weights for policy 0, policy_version 130 (0.0034) [2023-03-03 18:46:12,438][01413] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3351.3). Total num frames: 552960. Throughput: 0: 953.8. Samples: 139096. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-03 18:46:12,440][01413] Avg episode reward: [(0, '4.790')] [2023-03-03 18:46:16,458][12967] Updated weights for policy 0, policy_version 140 (0.0022) [2023-03-03 18:46:17,438][01413] Fps is (10 sec: 4505.6, 60 sec: 3686.4, 300 sec: 3373.2). Total num frames: 573440. Throughput: 0: 953.8. Samples: 142442. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-03-03 18:46:17,441][01413] Avg episode reward: [(0, '4.470')] [2023-03-03 18:46:22,438][01413] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3370.4). Total num frames: 589824. Throughput: 0: 923.3. Samples: 147616. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-03-03 18:46:22,441][01413] Avg episode reward: [(0, '4.377')] [2023-03-03 18:46:27,438][01413] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3367.8). Total num frames: 606208. Throughput: 0: 917.9. Samples: 152340. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-03-03 18:46:27,443][01413] Avg episode reward: [(0, '4.227')] [2023-03-03 18:46:28,940][12967] Updated weights for policy 0, policy_version 150 (0.0011) [2023-03-03 18:46:32,438][01413] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3409.6). Total num frames: 630784. Throughput: 0: 947.5. Samples: 155846. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-03 18:46:32,440][01413] Avg episode reward: [(0, '4.310')] [2023-03-03 18:46:37,438][01413] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3427.7). Total num frames: 651264. Throughput: 0: 966.5. Samples: 162670. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-03-03 18:46:37,443][01413] Avg episode reward: [(0, '4.697')] [2023-03-03 18:46:38,689][12967] Updated weights for policy 0, policy_version 160 (0.0011) [2023-03-03 18:46:42,438][01413] Fps is (10 sec: 3276.8, 60 sec: 3754.8, 300 sec: 3402.8). Total num frames: 663552. Throughput: 0: 913.8. Samples: 167098. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-03 18:46:42,441][01413] Avg episode reward: [(0, '4.743')] [2023-03-03 18:46:47,438][01413] Fps is (10 sec: 2867.2, 60 sec: 3754.7, 300 sec: 3399.7). Total num frames: 679936. Throughput: 0: 912.6. Samples: 169224. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-03-03 18:46:47,443][01413] Avg episode reward: [(0, '4.680')] [2023-03-03 18:46:50,371][12967] Updated weights for policy 0, policy_version 170 (0.0026) [2023-03-03 18:46:52,438][01413] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3436.6). Total num frames: 704512. Throughput: 0: 958.8. Samples: 175638. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-03 18:46:52,444][01413] Avg episode reward: [(0, '4.515')] [2023-03-03 18:46:57,438][01413] Fps is (10 sec: 4505.6, 60 sec: 3754.9, 300 sec: 3452.3). Total num frames: 724992. Throughput: 0: 957.6. Samples: 182188. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-03 18:46:57,441][01413] Avg episode reward: [(0, '4.675')] [2023-03-03 18:47:01,079][12967] Updated weights for policy 0, policy_version 180 (0.0012) [2023-03-03 18:47:02,438][01413] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3429.2). Total num frames: 737280. Throughput: 0: 930.9. Samples: 184334. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-03 18:47:02,446][01413] Avg episode reward: [(0, '4.585')] [2023-03-03 18:47:02,460][12953] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000180_737280.pth... [2023-03-03 18:47:07,438][01413] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3444.4). Total num frames: 757760. Throughput: 0: 918.5. Samples: 188950. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-03 18:47:07,446][01413] Avg episode reward: [(0, '4.599')] [2023-03-03 18:47:11,514][12967] Updated weights for policy 0, policy_version 190 (0.0042) [2023-03-03 18:47:12,438][01413] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3477.0). Total num frames: 782336. Throughput: 0: 967.9. Samples: 195894. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-03 18:47:12,441][01413] Avg episode reward: [(0, '4.682')] [2023-03-03 18:47:17,438][01413] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3472.7). Total num frames: 798720. Throughput: 0: 964.8. Samples: 199264. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-03-03 18:47:17,446][01413] Avg episode reward: [(0, '4.735')] [2023-03-03 18:47:22,438][01413] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3468.5). Total num frames: 815104. Throughput: 0: 911.6. Samples: 203692. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-03-03 18:47:22,448][01413] Avg episode reward: [(0, '4.647')] [2023-03-03 18:47:23,502][12967] Updated weights for policy 0, policy_version 200 (0.0018) [2023-03-03 18:47:27,438][01413] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3464.5). Total num frames: 831488. Throughput: 0: 929.1. Samples: 208908. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-03 18:47:27,444][01413] Avg episode reward: [(0, '4.693')] [2023-03-03 18:47:32,438][01413] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3494.1). Total num frames: 856064. Throughput: 0: 958.4. Samples: 212352. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-03 18:47:32,442][01413] Avg episode reward: [(0, '4.684')] [2023-03-03 18:47:33,089][12967] Updated weights for policy 0, policy_version 210 (0.0020) [2023-03-03 18:47:37,442][01413] Fps is (10 sec: 4094.5, 60 sec: 3686.2, 300 sec: 3489.7). Total num frames: 872448. Throughput: 0: 958.6. Samples: 218778. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-03 18:47:37,447][01413] Avg episode reward: [(0, '4.521')] [2023-03-03 18:47:42,439][01413] Fps is (10 sec: 3276.6, 60 sec: 3754.6, 300 sec: 3485.6). Total num frames: 888832. Throughput: 0: 906.3. Samples: 222970. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-03-03 18:47:42,441][01413] Avg episode reward: [(0, '4.594')] [2023-03-03 18:47:45,643][12967] Updated weights for policy 0, policy_version 220 (0.0026) [2023-03-03 18:47:47,438][01413] Fps is (10 sec: 3278.0, 60 sec: 3754.7, 300 sec: 3481.6). Total num frames: 905216. Throughput: 0: 910.2. Samples: 225292. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2023-03-03 18:47:47,443][01413] Avg episode reward: [(0, '4.701')] [2023-03-03 18:47:52,438][01413] Fps is (10 sec: 4096.2, 60 sec: 3754.7, 300 sec: 3508.6). Total num frames: 929792. Throughput: 0: 958.2. Samples: 232068. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-03-03 18:47:52,446][01413] Avg episode reward: [(0, '4.610')] [2023-03-03 18:47:54,876][12967] Updated weights for policy 0, policy_version 230 (0.0023) [2023-03-03 18:47:57,438][01413] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3504.4). Total num frames: 946176. Throughput: 0: 934.5. Samples: 237946. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-03 18:47:57,445][01413] Avg episode reward: [(0, '4.544')] [2023-03-03 18:48:02,438][01413] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3500.2). Total num frames: 962560. Throughput: 0: 906.7. Samples: 240066. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-03 18:48:02,443][01413] Avg episode reward: [(0, '4.748')] [2023-03-03 18:48:07,438][01413] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3496.2). Total num frames: 978944. Throughput: 0: 919.6. Samples: 245076. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-03-03 18:48:07,440][01413] Avg episode reward: [(0, '4.685')] [2023-03-03 18:48:07,464][12967] Updated weights for policy 0, policy_version 240 (0.0016) [2023-03-03 18:48:12,438][01413] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3521.1). Total num frames: 1003520. Throughput: 0: 956.9. Samples: 251970. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-03-03 18:48:12,442][01413] Avg episode reward: [(0, '4.377')] [2023-03-03 18:48:17,281][12967] Updated weights for policy 0, policy_version 250 (0.0013) [2023-03-03 18:48:17,440][01413] Fps is (10 sec: 4504.9, 60 sec: 3754.6, 300 sec: 3531.0). Total num frames: 1024000. Throughput: 0: 951.7. Samples: 255178. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-03 18:48:17,441][01413] Avg episode reward: [(0, '4.480')] [2023-03-03 18:48:22,438][01413] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3512.8). Total num frames: 1036288. Throughput: 0: 904.5. Samples: 259478. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-03-03 18:48:22,440][01413] Avg episode reward: [(0, '4.617')] [2023-03-03 18:48:27,438][01413] Fps is (10 sec: 3277.3, 60 sec: 3754.7, 300 sec: 3582.3). Total num frames: 1056768. Throughput: 0: 937.7. Samples: 265166. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-03 18:48:27,445][01413] Avg episode reward: [(0, '4.628')] [2023-03-03 18:48:28,635][12967] Updated weights for policy 0, policy_version 260 (0.0014) [2023-03-03 18:48:32,438][01413] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3665.6). Total num frames: 1081344. Throughput: 0: 963.2. Samples: 268636. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-03 18:48:32,440][01413] Avg episode reward: [(0, '4.417')] [2023-03-03 18:48:37,438][01413] Fps is (10 sec: 4096.0, 60 sec: 3754.9, 300 sec: 3721.1). Total num frames: 1097728. Throughput: 0: 945.1. Samples: 274598. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-03 18:48:37,440][01413] Avg episode reward: [(0, '4.415')] [2023-03-03 18:48:39,680][12967] Updated weights for policy 0, policy_version 270 (0.0019) [2023-03-03 18:48:42,438][01413] Fps is (10 sec: 2867.2, 60 sec: 3686.4, 300 sec: 3748.9). Total num frames: 1110016. Throughput: 0: 910.2. Samples: 278906. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-03 18:48:42,445][01413] Avg episode reward: [(0, '4.583')] [2023-03-03 18:48:47,438][01413] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3748.9). Total num frames: 1134592. Throughput: 0: 927.6. Samples: 281806. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-03 18:48:47,449][01413] Avg episode reward: [(0, '4.905')] [2023-03-03 18:48:47,452][12953] Saving new best policy, reward=4.905! [2023-03-03 18:48:50,054][12967] Updated weights for policy 0, policy_version 280 (0.0020) [2023-03-03 18:48:52,438][01413] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 1155072. Throughput: 0: 967.1. Samples: 288596. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-03 18:48:52,440][01413] Avg episode reward: [(0, '5.035')] [2023-03-03 18:48:52,452][12953] Saving new best policy, reward=5.035! [2023-03-03 18:48:57,438][01413] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3735.0). Total num frames: 1171456. Throughput: 0: 931.8. Samples: 293900. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-03 18:48:57,445][01413] Avg episode reward: [(0, '4.861')] [2023-03-03 18:49:02,262][12967] Updated weights for policy 0, policy_version 290 (0.0017) [2023-03-03 18:49:02,439][01413] Fps is (10 sec: 3276.6, 60 sec: 3754.6, 300 sec: 3748.9). Total num frames: 1187840. Throughput: 0: 909.7. Samples: 296114. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-03-03 18:49:02,441][01413] Avg episode reward: [(0, '4.941')] [2023-03-03 18:49:02,454][12953] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000290_1187840.pth... [2023-03-03 18:49:02,626][12953] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000071_290816.pth [2023-03-03 18:49:07,438][01413] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3748.9). Total num frames: 1208320. Throughput: 0: 934.1. Samples: 301512. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-03 18:49:07,443][01413] Avg episode reward: [(0, '4.781')] [2023-03-03 18:49:11,588][12967] Updated weights for policy 0, policy_version 300 (0.0014) [2023-03-03 18:49:12,438][01413] Fps is (10 sec: 4096.3, 60 sec: 3754.7, 300 sec: 3735.0). Total num frames: 1228800. Throughput: 0: 962.1. Samples: 308462. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-03 18:49:12,440][01413] Avg episode reward: [(0, '4.795')] [2023-03-03 18:49:17,445][01413] Fps is (10 sec: 3683.9, 60 sec: 3686.1, 300 sec: 3734.9). Total num frames: 1245184. Throughput: 0: 943.3. Samples: 311090. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-03-03 18:49:17,451][01413] Avg episode reward: [(0, '4.927')] [2023-03-03 18:49:22,438][01413] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 1261568. Throughput: 0: 906.8. Samples: 315402. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-03 18:49:22,442][01413] Avg episode reward: [(0, '4.844')] [2023-03-03 18:49:24,162][12967] Updated weights for policy 0, policy_version 310 (0.0019) [2023-03-03 18:49:27,438][01413] Fps is (10 sec: 3688.9, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 1282048. Throughput: 0: 947.4. Samples: 321540. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-03-03 18:49:27,440][01413] Avg episode reward: [(0, '4.767')] [2023-03-03 18:49:32,438][01413] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 1306624. Throughput: 0: 960.5. Samples: 325030. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-03 18:49:32,441][01413] Avg episode reward: [(0, '4.941')] [2023-03-03 18:49:33,167][12967] Updated weights for policy 0, policy_version 320 (0.0018) [2023-03-03 18:49:37,438][01413] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 1323008. Throughput: 0: 933.0. Samples: 330580. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-03 18:49:37,442][01413] Avg episode reward: [(0, '5.079')] [2023-03-03 18:49:37,444][12953] Saving new best policy, reward=5.079! [2023-03-03 18:49:42,439][01413] Fps is (10 sec: 2867.1, 60 sec: 3754.6, 300 sec: 3735.0). Total num frames: 1335296. Throughput: 0: 912.9. Samples: 334980. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-03 18:49:42,444][01413] Avg episode reward: [(0, '4.729')] [2023-03-03 18:49:45,526][12967] Updated weights for policy 0, policy_version 330 (0.0022) [2023-03-03 18:49:47,438][01413] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 1359872. Throughput: 0: 937.7. Samples: 338310. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-03 18:49:47,445][01413] Avg episode reward: [(0, '4.839')] [2023-03-03 18:49:52,438][01413] Fps is (10 sec: 4505.8, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 1380352. Throughput: 0: 968.4. Samples: 345088. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-03 18:49:52,445][01413] Avg episode reward: [(0, '5.048')] [2023-03-03 18:49:55,294][12967] Updated weights for policy 0, policy_version 340 (0.0018) [2023-03-03 18:49:57,438][01413] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 1396736. Throughput: 0: 923.7. Samples: 350028. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-03-03 18:49:57,440][01413] Avg episode reward: [(0, '5.031')] [2023-03-03 18:50:02,438][01413] Fps is (10 sec: 2867.2, 60 sec: 3686.5, 300 sec: 3735.0). Total num frames: 1409024. Throughput: 0: 912.7. Samples: 352154. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-03-03 18:50:02,444][01413] Avg episode reward: [(0, '4.976')] [2023-03-03 18:50:06,880][12967] Updated weights for policy 0, policy_version 350 (0.0023) [2023-03-03 18:50:07,438][01413] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 1433600. Throughput: 0: 953.6. Samples: 358316. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-03 18:50:07,445][01413] Avg episode reward: [(0, '4.939')] [2023-03-03 18:50:12,438][01413] Fps is (10 sec: 4915.2, 60 sec: 3822.9, 300 sec: 3748.9). Total num frames: 1458176. Throughput: 0: 971.3. Samples: 365250. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-03 18:50:12,447][01413] Avg episode reward: [(0, '5.067')] [2023-03-03 18:50:17,438][01413] Fps is (10 sec: 3686.4, 60 sec: 3755.1, 300 sec: 3748.9). Total num frames: 1470464. Throughput: 0: 942.7. Samples: 367452. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-03 18:50:17,440][01413] Avg episode reward: [(0, '4.917')] [2023-03-03 18:50:17,740][12967] Updated weights for policy 0, policy_version 360 (0.0032) [2023-03-03 18:50:22,438][01413] Fps is (10 sec: 2867.2, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 1486848. Throughput: 0: 916.2. Samples: 371808. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-03-03 18:50:22,443][01413] Avg episode reward: [(0, '4.940')] [2023-03-03 18:50:27,438][01413] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3748.9). Total num frames: 1511424. Throughput: 0: 967.9. Samples: 378536. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-03 18:50:27,440][01413] Avg episode reward: [(0, '4.795')] [2023-03-03 18:50:27,965][12967] Updated weights for policy 0, policy_version 370 (0.0016) [2023-03-03 18:50:32,438][01413] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 1531904. Throughput: 0: 972.0. Samples: 382048. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-03 18:50:32,448][01413] Avg episode reward: [(0, '4.905')] [2023-03-03 18:50:37,440][01413] Fps is (10 sec: 3685.8, 60 sec: 3754.6, 300 sec: 3762.8). Total num frames: 1548288. Throughput: 0: 931.7. Samples: 387014. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-03 18:50:37,442][01413] Avg episode reward: [(0, '4.869')] [2023-03-03 18:50:39,992][12967] Updated weights for policy 0, policy_version 380 (0.0028) [2023-03-03 18:50:42,438][01413] Fps is (10 sec: 3276.8, 60 sec: 3823.0, 300 sec: 3762.8). Total num frames: 1564672. Throughput: 0: 924.8. Samples: 391646. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-03 18:50:42,445][01413] Avg episode reward: [(0, '5.133')] [2023-03-03 18:50:42,456][12953] Saving new best policy, reward=5.133! [2023-03-03 18:50:47,438][01413] Fps is (10 sec: 3687.0, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 1585152. Throughput: 0: 951.6. Samples: 394974. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-03 18:50:47,445][01413] Avg episode reward: [(0, '5.138')] [2023-03-03 18:50:47,448][12953] Saving new best policy, reward=5.138! [2023-03-03 18:50:49,676][12967] Updated weights for policy 0, policy_version 390 (0.0018) [2023-03-03 18:50:52,441][01413] Fps is (10 sec: 4094.9, 60 sec: 3754.5, 300 sec: 3748.9). Total num frames: 1605632. Throughput: 0: 966.3. Samples: 401804. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-03 18:50:52,443][01413] Avg episode reward: [(0, '4.978')] [2023-03-03 18:50:57,442][01413] Fps is (10 sec: 3685.0, 60 sec: 3754.4, 300 sec: 3748.8). Total num frames: 1622016. Throughput: 0: 909.7. Samples: 406192. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-03 18:50:57,444][01413] Avg episode reward: [(0, '4.874')] [2023-03-03 18:51:02,070][12967] Updated weights for policy 0, policy_version 400 (0.0026) [2023-03-03 18:51:02,438][01413] Fps is (10 sec: 3277.7, 60 sec: 3822.9, 300 sec: 3762.8). Total num frames: 1638400. Throughput: 0: 909.7. Samples: 408388. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-03-03 18:51:02,441][01413] Avg episode reward: [(0, '4.846')] [2023-03-03 18:51:02,453][12953] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000400_1638400.pth... [2023-03-03 18:51:02,552][12953] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000180_737280.pth [2023-03-03 18:51:07,438][01413] Fps is (10 sec: 4097.6, 60 sec: 3822.9, 300 sec: 3762.8). Total num frames: 1662976. Throughput: 0: 958.5. Samples: 414942. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-03 18:51:07,446][01413] Avg episode reward: [(0, '4.596')] [2023-03-03 18:51:10,984][12967] Updated weights for policy 0, policy_version 410 (0.0012) [2023-03-03 18:51:12,438][01413] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3762.8). Total num frames: 1683456. Throughput: 0: 952.4. Samples: 421394. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-03 18:51:12,442][01413] Avg episode reward: [(0, '4.891')] [2023-03-03 18:51:17,443][01413] Fps is (10 sec: 3275.3, 60 sec: 3754.4, 300 sec: 3748.8). Total num frames: 1695744. Throughput: 0: 922.0. Samples: 423542. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-03 18:51:17,451][01413] Avg episode reward: [(0, '5.009')] [2023-03-03 18:51:22,438][01413] Fps is (10 sec: 2867.2, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 1712128. Throughput: 0: 916.1. Samples: 428236. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-03 18:51:22,446][01413] Avg episode reward: [(0, '5.042')] [2023-03-03 18:51:23,519][12967] Updated weights for policy 0, policy_version 420 (0.0011) [2023-03-03 18:51:27,438][01413] Fps is (10 sec: 4097.9, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 1736704. Throughput: 0: 961.6. Samples: 434916. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-03-03 18:51:27,444][01413] Avg episode reward: [(0, '5.496')] [2023-03-03 18:51:27,448][12953] Saving new best policy, reward=5.496! [2023-03-03 18:51:32,438][01413] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 1757184. Throughput: 0: 962.3. Samples: 438278. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-03 18:51:32,444][01413] Avg episode reward: [(0, '5.471')] [2023-03-03 18:51:33,921][12967] Updated weights for policy 0, policy_version 430 (0.0017) [2023-03-03 18:51:37,438][01413] Fps is (10 sec: 3276.8, 60 sec: 3686.5, 300 sec: 3748.9). Total num frames: 1769472. Throughput: 0: 909.2. Samples: 442716. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2023-03-03 18:51:37,444][01413] Avg episode reward: [(0, '5.403')] [2023-03-03 18:51:42,438][01413] Fps is (10 sec: 2867.2, 60 sec: 3686.4, 300 sec: 3748.9). Total num frames: 1785856. Throughput: 0: 928.5. Samples: 447970. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-03-03 18:51:42,443][01413] Avg episode reward: [(0, '5.557')] [2023-03-03 18:51:42,453][12953] Saving new best policy, reward=5.557! [2023-03-03 18:51:45,143][12967] Updated weights for policy 0, policy_version 440 (0.0018) [2023-03-03 18:51:47,438][01413] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 1810432. Throughput: 0: 954.3. Samples: 451330. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-03-03 18:51:47,442][01413] Avg episode reward: [(0, '5.847')] [2023-03-03 18:51:47,452][12953] Saving new best policy, reward=5.847! [2023-03-03 18:51:52,438][01413] Fps is (10 sec: 4505.6, 60 sec: 3754.8, 300 sec: 3748.9). Total num frames: 1830912. Throughput: 0: 949.2. Samples: 457658. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-03 18:51:52,440][01413] Avg episode reward: [(0, '5.843')] [2023-03-03 18:51:56,541][12967] Updated weights for policy 0, policy_version 450 (0.0011) [2023-03-03 18:51:57,438][01413] Fps is (10 sec: 3276.8, 60 sec: 3686.6, 300 sec: 3748.9). Total num frames: 1843200. Throughput: 0: 900.4. Samples: 461910. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-03-03 18:51:57,442][01413] Avg episode reward: [(0, '5.967')] [2023-03-03 18:51:57,447][12953] Saving new best policy, reward=5.967! [2023-03-03 18:52:02,438][01413] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 1863680. Throughput: 0: 906.1. Samples: 464312. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-03-03 18:52:02,444][01413] Avg episode reward: [(0, '5.889')] [2023-03-03 18:52:06,826][12967] Updated weights for policy 0, policy_version 460 (0.0013) [2023-03-03 18:52:07,438][01413] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3735.0). Total num frames: 1884160. Throughput: 0: 952.6. Samples: 471102. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-03-03 18:52:07,440][01413] Avg episode reward: [(0, '6.176')] [2023-03-03 18:52:07,443][12953] Saving new best policy, reward=6.176! [2023-03-03 18:52:12,438][01413] Fps is (10 sec: 4095.9, 60 sec: 3686.4, 300 sec: 3748.9). Total num frames: 1904640. Throughput: 0: 931.9. Samples: 476854. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-03 18:52:12,441][01413] Avg episode reward: [(0, '5.984')] [2023-03-03 18:52:17,439][01413] Fps is (10 sec: 3276.6, 60 sec: 3686.6, 300 sec: 3735.0). Total num frames: 1916928. Throughput: 0: 905.6. Samples: 479030. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-03-03 18:52:17,441][01413] Avg episode reward: [(0, '5.397')] [2023-03-03 18:52:19,201][12967] Updated weights for policy 0, policy_version 470 (0.0035) [2023-03-03 18:52:22,438][01413] Fps is (10 sec: 3276.9, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 1937408. Throughput: 0: 922.3. Samples: 484218. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-03 18:52:22,440][01413] Avg episode reward: [(0, '5.645')] [2023-03-03 18:52:27,438][01413] Fps is (10 sec: 4505.9, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 1961984. Throughput: 0: 956.7. Samples: 491020. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-03 18:52:27,441][01413] Avg episode reward: [(0, '5.952')] [2023-03-03 18:52:28,198][12967] Updated weights for policy 0, policy_version 480 (0.0012) [2023-03-03 18:52:32,438][01413] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3748.9). Total num frames: 1978368. Throughput: 0: 949.3. Samples: 494048. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-03-03 18:52:32,451][01413] Avg episode reward: [(0, '5.727')] [2023-03-03 18:52:37,438][01413] Fps is (10 sec: 3276.7, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 1994752. Throughput: 0: 902.8. Samples: 498286. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-03-03 18:52:37,443][01413] Avg episode reward: [(0, '5.845')] [2023-03-03 18:52:40,606][12967] Updated weights for policy 0, policy_version 490 (0.0038) [2023-03-03 18:52:42,438][01413] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 2011136. Throughput: 0: 938.7. Samples: 504152. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-03-03 18:52:42,440][01413] Avg episode reward: [(0, '6.005')] [2023-03-03 18:52:47,438][01413] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 2035712. Throughput: 0: 959.9. Samples: 507508. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-03-03 18:52:47,440][01413] Avg episode reward: [(0, '6.090')] [2023-03-03 18:52:50,284][12967] Updated weights for policy 0, policy_version 500 (0.0013) [2023-03-03 18:52:52,438][01413] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3748.9). Total num frames: 2052096. Throughput: 0: 937.7. Samples: 513300. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-03 18:52:52,445][01413] Avg episode reward: [(0, '5.836')] [2023-03-03 18:52:57,438][01413] Fps is (10 sec: 2867.2, 60 sec: 3686.4, 300 sec: 3735.0). Total num frames: 2064384. Throughput: 0: 904.8. Samples: 517572. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-03 18:52:57,446][01413] Avg episode reward: [(0, '5.700')] [2023-03-03 18:53:02,302][12967] Updated weights for policy 0, policy_version 510 (0.0013) [2023-03-03 18:53:02,438][01413] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3762.8). Total num frames: 2088960. Throughput: 0: 918.9. Samples: 520382. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-03-03 18:53:02,445][01413] Avg episode reward: [(0, '5.397')] [2023-03-03 18:53:02,459][12953] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000510_2088960.pth... [2023-03-03 18:53:02,577][12953] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000290_1187840.pth [2023-03-03 18:53:07,438][01413] Fps is (10 sec: 4505.7, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 2109440. Throughput: 0: 953.9. Samples: 527144. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-03 18:53:07,440][01413] Avg episode reward: [(0, '5.373')] [2023-03-03 18:53:12,438][01413] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3735.0). Total num frames: 2125824. Throughput: 0: 923.7. Samples: 532588. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-03 18:53:12,447][01413] Avg episode reward: [(0, '5.712')] [2023-03-03 18:53:12,712][12967] Updated weights for policy 0, policy_version 520 (0.0026) [2023-03-03 18:53:17,440][01413] Fps is (10 sec: 3276.1, 60 sec: 3754.6, 300 sec: 3748.9). Total num frames: 2142208. Throughput: 0: 904.5. Samples: 534752. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-03-03 18:53:17,443][01413] Avg episode reward: [(0, '5.866')] [2023-03-03 18:53:22,438][01413] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 2162688. Throughput: 0: 933.7. Samples: 540302. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-03 18:53:22,440][01413] Avg episode reward: [(0, '6.238')] [2023-03-03 18:53:22,459][12953] Saving new best policy, reward=6.238! [2023-03-03 18:53:23,816][12967] Updated weights for policy 0, policy_version 530 (0.0019) [2023-03-03 18:53:27,438][01413] Fps is (10 sec: 4096.8, 60 sec: 3686.4, 300 sec: 3735.0). Total num frames: 2183168. Throughput: 0: 955.3. Samples: 547140. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-03-03 18:53:27,441][01413] Avg episode reward: [(0, '6.210')] [2023-03-03 18:53:32,438][01413] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3735.0). Total num frames: 2199552. Throughput: 0: 938.2. Samples: 549726. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-03 18:53:32,441][01413] Avg episode reward: [(0, '6.476')] [2023-03-03 18:53:32,471][12953] Saving new best policy, reward=6.476! [2023-03-03 18:53:35,404][12967] Updated weights for policy 0, policy_version 540 (0.0023) [2023-03-03 18:53:37,439][01413] Fps is (10 sec: 3276.3, 60 sec: 3686.3, 300 sec: 3748.9). Total num frames: 2215936. Throughput: 0: 904.3. Samples: 553994. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-03-03 18:53:37,444][01413] Avg episode reward: [(0, '6.329')] [2023-03-03 18:53:42,438][01413] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3735.0). Total num frames: 2236416. Throughput: 0: 949.7. Samples: 560308. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-03 18:53:42,440][01413] Avg episode reward: [(0, '6.473')] [2023-03-03 18:53:45,284][12967] Updated weights for policy 0, policy_version 550 (0.0033) [2023-03-03 18:53:47,438][01413] Fps is (10 sec: 4506.2, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 2260992. Throughput: 0: 964.5. Samples: 563784. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-03 18:53:47,444][01413] Avg episode reward: [(0, '6.631')] [2023-03-03 18:53:47,450][12953] Saving new best policy, reward=6.631! [2023-03-03 18:53:52,438][01413] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 2277376. Throughput: 0: 933.6. Samples: 569156. Policy #0 lag: (min: 0.0, avg: 0.3, max: 2.0) [2023-03-03 18:53:52,440][01413] Avg episode reward: [(0, '6.557')] [2023-03-03 18:53:57,441][01413] Fps is (10 sec: 2866.5, 60 sec: 3754.5, 300 sec: 3735.0). Total num frames: 2289664. Throughput: 0: 909.3. Samples: 573508. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-03-03 18:53:57,442][01413] Avg episode reward: [(0, '6.718')] [2023-03-03 18:53:57,451][12953] Saving new best policy, reward=6.718! [2023-03-03 18:53:57,809][12967] Updated weights for policy 0, policy_version 560 (0.0012) [2023-03-03 18:54:02,438][01413] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 2314240. Throughput: 0: 934.8. Samples: 576816. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-03-03 18:54:02,440][01413] Avg episode reward: [(0, '7.310')] [2023-03-03 18:54:02,451][12953] Saving new best policy, reward=7.310! [2023-03-03 18:54:06,601][12967] Updated weights for policy 0, policy_version 570 (0.0020) [2023-03-03 18:54:07,438][01413] Fps is (10 sec: 4506.8, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 2334720. Throughput: 0: 964.7. Samples: 583712. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-03 18:54:07,444][01413] Avg episode reward: [(0, '7.606')] [2023-03-03 18:54:07,516][12953] Saving new best policy, reward=7.606! [2023-03-03 18:54:12,441][01413] Fps is (10 sec: 3685.4, 60 sec: 3754.5, 300 sec: 3748.9). Total num frames: 2351104. Throughput: 0: 919.4. Samples: 588516. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-03-03 18:54:12,447][01413] Avg episode reward: [(0, '7.483')] [2023-03-03 18:54:17,439][01413] Fps is (10 sec: 3276.5, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 2367488. Throughput: 0: 908.9. Samples: 590628. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2023-03-03 18:54:17,448][01413] Avg episode reward: [(0, '7.591')] [2023-03-03 18:54:19,038][12967] Updated weights for policy 0, policy_version 580 (0.0028) [2023-03-03 18:54:22,438][01413] Fps is (10 sec: 3687.4, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 2387968. Throughput: 0: 954.8. Samples: 596960. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2023-03-03 18:54:22,441][01413] Avg episode reward: [(0, '7.494')] [2023-03-03 18:54:27,440][01413] Fps is (10 sec: 4505.4, 60 sec: 3822.8, 300 sec: 3748.9). Total num frames: 2412544. Throughput: 0: 965.4. Samples: 603752. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-03-03 18:54:27,442][01413] Avg episode reward: [(0, '7.864')] [2023-03-03 18:54:27,444][12953] Saving new best policy, reward=7.864! [2023-03-03 18:54:28,509][12967] Updated weights for policy 0, policy_version 590 (0.0013) [2023-03-03 18:54:32,438][01413] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3735.0). Total num frames: 2424832. Throughput: 0: 934.9. Samples: 605856. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-03-03 18:54:32,443][01413] Avg episode reward: [(0, '8.017')] [2023-03-03 18:54:32,459][12953] Saving new best policy, reward=8.017! [2023-03-03 18:54:37,438][01413] Fps is (10 sec: 2867.6, 60 sec: 3754.8, 300 sec: 3748.9). Total num frames: 2441216. Throughput: 0: 912.8. Samples: 610234. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-03-03 18:54:37,440][01413] Avg episode reward: [(0, '8.079')] [2023-03-03 18:54:37,443][12953] Saving new best policy, reward=8.079! [2023-03-03 18:54:40,589][12967] Updated weights for policy 0, policy_version 600 (0.0027) [2023-03-03 18:54:42,438][01413] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3748.9). Total num frames: 2465792. Throughput: 0: 966.1. Samples: 616982. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-03 18:54:42,446][01413] Avg episode reward: [(0, '8.078')] [2023-03-03 18:54:47,438][01413] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 2486272. Throughput: 0: 967.9. Samples: 620372. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-03-03 18:54:47,440][01413] Avg episode reward: [(0, '7.803')] [2023-03-03 18:54:51,120][12967] Updated weights for policy 0, policy_version 610 (0.0026) [2023-03-03 18:54:52,438][01413] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3735.0). Total num frames: 2498560. Throughput: 0: 921.9. Samples: 625196. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-03-03 18:54:52,440][01413] Avg episode reward: [(0, '8.067')] [2023-03-03 18:54:57,438][01413] Fps is (10 sec: 2867.2, 60 sec: 3754.8, 300 sec: 3748.9). Total num frames: 2514944. Throughput: 0: 921.9. Samples: 630000. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-03 18:54:57,446][01413] Avg episode reward: [(0, '8.353')] [2023-03-03 18:54:57,524][12953] Saving new best policy, reward=8.353! [2023-03-03 18:55:01,919][12967] Updated weights for policy 0, policy_version 620 (0.0023) [2023-03-03 18:55:02,438][01413] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 2539520. Throughput: 0: 950.6. Samples: 633406. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-03-03 18:55:02,446][01413] Avg episode reward: [(0, '8.415')] [2023-03-03 18:55:02,459][12953] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000620_2539520.pth... [2023-03-03 18:55:02,569][12953] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000400_1638400.pth [2023-03-03 18:55:02,580][12953] Saving new best policy, reward=8.415! [2023-03-03 18:55:07,439][01413] Fps is (10 sec: 4505.2, 60 sec: 3754.6, 300 sec: 3735.0). Total num frames: 2560000. Throughput: 0: 964.1. Samples: 640346. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-03 18:55:07,443][01413] Avg episode reward: [(0, '8.883')] [2023-03-03 18:55:07,449][12953] Saving new best policy, reward=8.883! [2023-03-03 18:55:12,442][01413] Fps is (10 sec: 3685.0, 60 sec: 3754.6, 300 sec: 3748.8). Total num frames: 2576384. Throughput: 0: 908.2. Samples: 644622. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-03 18:55:12,444][01413] Avg episode reward: [(0, '9.097')] [2023-03-03 18:55:12,463][12953] Saving new best policy, reward=9.097! [2023-03-03 18:55:13,877][12967] Updated weights for policy 0, policy_version 630 (0.0011) [2023-03-03 18:55:17,438][01413] Fps is (10 sec: 3277.1, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 2592768. Throughput: 0: 909.0. Samples: 646760. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-03 18:55:17,440][01413] Avg episode reward: [(0, '9.345')] [2023-03-03 18:55:17,445][12953] Saving new best policy, reward=9.345! [2023-03-03 18:55:22,438][01413] Fps is (10 sec: 3687.8, 60 sec: 3754.7, 300 sec: 3735.0). Total num frames: 2613248. Throughput: 0: 961.3. Samples: 653492. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-03 18:55:22,440][01413] Avg episode reward: [(0, '9.583')] [2023-03-03 18:55:22,519][12953] Saving new best policy, reward=9.583! [2023-03-03 18:55:23,563][12967] Updated weights for policy 0, policy_version 640 (0.0015) [2023-03-03 18:55:27,438][01413] Fps is (10 sec: 4096.0, 60 sec: 3686.5, 300 sec: 3735.0). Total num frames: 2633728. Throughput: 0: 945.3. Samples: 659520. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-03 18:55:27,445][01413] Avg episode reward: [(0, '9.321')] [2023-03-03 18:55:32,438][01413] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3735.0). Total num frames: 2650112. Throughput: 0: 919.2. Samples: 661736. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-03 18:55:32,445][01413] Avg episode reward: [(0, '9.700')] [2023-03-03 18:55:32,462][12953] Saving new best policy, reward=9.700! [2023-03-03 18:55:35,985][12967] Updated weights for policy 0, policy_version 650 (0.0026) [2023-03-03 18:55:37,438][01413] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3735.0). Total num frames: 2666496. Throughput: 0: 917.5. Samples: 666484. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-03-03 18:55:37,441][01413] Avg episode reward: [(0, '10.028')] [2023-03-03 18:55:37,447][12953] Saving new best policy, reward=10.028! [2023-03-03 18:55:42,438][01413] Fps is (10 sec: 4095.9, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 2691072. Throughput: 0: 962.5. Samples: 673314. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-03 18:55:42,440][01413] Avg episode reward: [(0, '9.781')] [2023-03-03 18:55:44,893][12967] Updated weights for policy 0, policy_version 660 (0.0018) [2023-03-03 18:55:47,438][01413] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 2711552. Throughput: 0: 966.4. Samples: 676896. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-03 18:55:47,443][01413] Avg episode reward: [(0, '9.251')] [2023-03-03 18:55:52,441][01413] Fps is (10 sec: 3275.8, 60 sec: 3754.5, 300 sec: 3735.0). Total num frames: 2723840. Throughput: 0: 910.0. Samples: 681298. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-03 18:55:52,443][01413] Avg episode reward: [(0, '9.455')] [2023-03-03 18:55:57,118][12967] Updated weights for policy 0, policy_version 670 (0.0016) [2023-03-03 18:55:57,438][01413] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3748.9). Total num frames: 2744320. Throughput: 0: 936.0. Samples: 686740. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-03-03 18:55:57,440][01413] Avg episode reward: [(0, '9.464')] [2023-03-03 18:56:02,438][01413] Fps is (10 sec: 4097.2, 60 sec: 3754.6, 300 sec: 3735.0). Total num frames: 2764800. Throughput: 0: 966.2. Samples: 690238. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-03-03 18:56:02,447][01413] Avg episode reward: [(0, '9.759')] [2023-03-03 18:56:06,870][12967] Updated weights for policy 0, policy_version 680 (0.0025) [2023-03-03 18:56:07,438][01413] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3735.0). Total num frames: 2785280. Throughput: 0: 954.0. Samples: 696424. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-03 18:56:07,441][01413] Avg episode reward: [(0, '9.344')] [2023-03-03 18:56:12,438][01413] Fps is (10 sec: 3276.8, 60 sec: 3686.6, 300 sec: 3735.0). Total num frames: 2797568. Throughput: 0: 918.6. Samples: 700858. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-03 18:56:12,447][01413] Avg episode reward: [(0, '9.161')] [2023-03-03 18:56:17,438][01413] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 2818048. Throughput: 0: 928.3. Samples: 703510. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-03 18:56:17,440][01413] Avg episode reward: [(0, '8.384')] [2023-03-03 18:56:18,404][12967] Updated weights for policy 0, policy_version 690 (0.0013) [2023-03-03 18:56:22,438][01413] Fps is (10 sec: 4505.8, 60 sec: 3822.9, 300 sec: 3748.9). Total num frames: 2842624. Throughput: 0: 978.8. Samples: 710532. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-03-03 18:56:22,440][01413] Avg episode reward: [(0, '8.509')] [2023-03-03 18:56:27,443][01413] Fps is (10 sec: 4503.5, 60 sec: 3822.6, 300 sec: 3748.8). Total num frames: 2863104. Throughput: 0: 956.6. Samples: 716366. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-03-03 18:56:27,445][01413] Avg episode reward: [(0, '9.465')] [2023-03-03 18:56:28,910][12967] Updated weights for policy 0, policy_version 700 (0.0028) [2023-03-03 18:56:32,439][01413] Fps is (10 sec: 3276.4, 60 sec: 3754.6, 300 sec: 3748.9). Total num frames: 2875392. Throughput: 0: 925.0. Samples: 718520. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-03-03 18:56:32,441][01413] Avg episode reward: [(0, '10.048')] [2023-03-03 18:56:32,457][12953] Saving new best policy, reward=10.048! [2023-03-03 18:56:37,438][01413] Fps is (10 sec: 3278.2, 60 sec: 3822.9, 300 sec: 3762.8). Total num frames: 2895872. Throughput: 0: 944.8. Samples: 723810. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-03-03 18:56:37,440][01413] Avg episode reward: [(0, '10.700')] [2023-03-03 18:56:37,446][12953] Saving new best policy, reward=10.700! [2023-03-03 18:56:39,522][12967] Updated weights for policy 0, policy_version 710 (0.0034) [2023-03-03 18:56:42,438][01413] Fps is (10 sec: 4506.2, 60 sec: 3822.9, 300 sec: 3762.8). Total num frames: 2920448. Throughput: 0: 975.9. Samples: 730656. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-03-03 18:56:42,440][01413] Avg episode reward: [(0, '9.804')] [2023-03-03 18:56:47,438][01413] Fps is (10 sec: 4096.1, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 2936832. Throughput: 0: 965.4. Samples: 733680. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-03-03 18:56:47,444][01413] Avg episode reward: [(0, '9.878')] [2023-03-03 18:56:50,944][12967] Updated weights for policy 0, policy_version 720 (0.0018) [2023-03-03 18:56:52,438][01413] Fps is (10 sec: 3276.8, 60 sec: 3823.1, 300 sec: 3762.8). Total num frames: 2953216. Throughput: 0: 925.2. Samples: 738058. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-03 18:56:52,445][01413] Avg episode reward: [(0, '10.114')] [2023-03-03 18:56:57,439][01413] Fps is (10 sec: 3686.2, 60 sec: 3822.9, 300 sec: 3762.8). Total num frames: 2973696. Throughput: 0: 958.5. Samples: 743992. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-03 18:56:57,444][01413] Avg episode reward: [(0, '11.013')] [2023-03-03 18:56:57,447][12953] Saving new best policy, reward=11.013! [2023-03-03 18:57:00,794][12967] Updated weights for policy 0, policy_version 730 (0.0023) [2023-03-03 18:57:02,438][01413] Fps is (10 sec: 4096.0, 60 sec: 3823.0, 300 sec: 3762.8). Total num frames: 2994176. Throughput: 0: 973.7. Samples: 747328. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-03 18:57:02,441][01413] Avg episode reward: [(0, '11.581')] [2023-03-03 18:57:02,455][12953] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000731_2994176.pth... [2023-03-03 18:57:02,573][12953] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000510_2088960.pth [2023-03-03 18:57:02,587][12953] Saving new best policy, reward=11.581! [2023-03-03 18:57:07,438][01413] Fps is (10 sec: 3686.6, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 3010560. Throughput: 0: 945.6. Samples: 753086. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-03-03 18:57:07,441][01413] Avg episode reward: [(0, '12.280')] [2023-03-03 18:57:07,443][12953] Saving new best policy, reward=12.280! [2023-03-03 18:57:12,438][01413] Fps is (10 sec: 3276.7, 60 sec: 3822.9, 300 sec: 3762.8). Total num frames: 3026944. Throughput: 0: 913.2. Samples: 757454. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-03 18:57:12,446][01413] Avg episode reward: [(0, '12.164')] [2023-03-03 18:57:13,307][12967] Updated weights for policy 0, policy_version 740 (0.0039) [2023-03-03 18:57:17,438][01413] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3762.8). Total num frames: 3047424. Throughput: 0: 933.0. Samples: 760506. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-03 18:57:17,441][01413] Avg episode reward: [(0, '13.119')] [2023-03-03 18:57:17,449][12953] Saving new best policy, reward=13.119! [2023-03-03 18:57:22,320][12967] Updated weights for policy 0, policy_version 750 (0.0012) [2023-03-03 18:57:22,438][01413] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3762.8). Total num frames: 3072000. Throughput: 0: 967.6. Samples: 767354. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-03-03 18:57:22,443][01413] Avg episode reward: [(0, '12.705')] [2023-03-03 18:57:27,438][01413] Fps is (10 sec: 4096.0, 60 sec: 3755.0, 300 sec: 3762.8). Total num frames: 3088384. Throughput: 0: 932.1. Samples: 772602. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-03 18:57:27,443][01413] Avg episode reward: [(0, '12.612')] [2023-03-03 18:57:32,439][01413] Fps is (10 sec: 2867.1, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 3100672. Throughput: 0: 911.5. Samples: 774698. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-03 18:57:32,444][01413] Avg episode reward: [(0, '12.511')] [2023-03-03 18:57:34,751][12967] Updated weights for policy 0, policy_version 760 (0.0012) [2023-03-03 18:57:37,438][01413] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3762.8). Total num frames: 3121152. Throughput: 0: 944.8. Samples: 780572. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-03 18:57:37,442][01413] Avg episode reward: [(0, '11.984')] [2023-03-03 18:57:42,438][01413] Fps is (10 sec: 4505.9, 60 sec: 3754.7, 300 sec: 3762.8). Total num frames: 3145728. Throughput: 0: 966.2. Samples: 787472. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-03 18:57:42,446][01413] Avg episode reward: [(0, '13.389')] [2023-03-03 18:57:42,458][12953] Saving new best policy, reward=13.389! [2023-03-03 18:57:44,166][12967] Updated weights for policy 0, policy_version 770 (0.0014) [2023-03-03 18:57:47,443][01413] Fps is (10 sec: 4094.1, 60 sec: 3754.4, 300 sec: 3762.7). Total num frames: 3162112. Throughput: 0: 947.9. Samples: 789986. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-03-03 18:57:47,445][01413] Avg episode reward: [(0, '13.474')] [2023-03-03 18:57:47,448][12953] Saving new best policy, reward=13.474! [2023-03-03 18:57:52,438][01413] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3776.7). Total num frames: 3178496. Throughput: 0: 916.1. Samples: 794312. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-03 18:57:52,441][01413] Avg episode reward: [(0, '13.985')] [2023-03-03 18:57:52,451][12953] Saving new best policy, reward=13.985! [2023-03-03 18:57:56,106][12967] Updated weights for policy 0, policy_version 780 (0.0022) [2023-03-03 18:57:57,438][01413] Fps is (10 sec: 3688.1, 60 sec: 3754.7, 300 sec: 3762.8). Total num frames: 3198976. Throughput: 0: 959.1. Samples: 800612. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-03 18:57:57,445][01413] Avg episode reward: [(0, '15.525')] [2023-03-03 18:57:57,447][12953] Saving new best policy, reward=15.525! [2023-03-03 18:58:02,438][01413] Fps is (10 sec: 4505.5, 60 sec: 3822.9, 300 sec: 3776.6). Total num frames: 3223552. Throughput: 0: 964.9. Samples: 803926. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-03 18:58:02,445][01413] Avg episode reward: [(0, '14.895')] [2023-03-03 18:58:06,219][12967] Updated weights for policy 0, policy_version 790 (0.0018) [2023-03-03 18:58:07,438][01413] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3762.8). Total num frames: 3235840. Throughput: 0: 934.7. Samples: 809416. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-03 18:58:07,445][01413] Avg episode reward: [(0, '16.416')] [2023-03-03 18:58:07,446][12953] Saving new best policy, reward=16.416! [2023-03-03 18:58:12,438][01413] Fps is (10 sec: 2867.2, 60 sec: 3754.7, 300 sec: 3762.8). Total num frames: 3252224. Throughput: 0: 915.2. Samples: 813784. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-03 18:58:12,446][01413] Avg episode reward: [(0, '16.533')] [2023-03-03 18:58:12,460][12953] Saving new best policy, reward=16.533! [2023-03-03 18:58:17,406][12967] Updated weights for policy 0, policy_version 800 (0.0014) [2023-03-03 18:58:17,438][01413] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3776.7). Total num frames: 3276800. Throughput: 0: 945.0. Samples: 817224. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-03-03 18:58:17,447][01413] Avg episode reward: [(0, '15.955')] [2023-03-03 18:58:22,438][01413] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3776.6). Total num frames: 3297280. Throughput: 0: 970.0. Samples: 824224. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-03-03 18:58:22,443][01413] Avg episode reward: [(0, '15.302')] [2023-03-03 18:58:27,438][01413] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3776.7). Total num frames: 3313664. Throughput: 0: 922.8. Samples: 828996. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-03-03 18:58:27,449][01413] Avg episode reward: [(0, '15.269')] [2023-03-03 18:58:28,659][12967] Updated weights for policy 0, policy_version 810 (0.0012) [2023-03-03 18:58:32,438][01413] Fps is (10 sec: 3276.8, 60 sec: 3823.0, 300 sec: 3776.7). Total num frames: 3330048. Throughput: 0: 916.4. Samples: 831220. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-03 18:58:32,441][01413] Avg episode reward: [(0, '14.690')] [2023-03-03 18:58:37,438][01413] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3776.7). Total num frames: 3350528. Throughput: 0: 958.0. Samples: 837422. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-03-03 18:58:37,440][01413] Avg episode reward: [(0, '14.799')] [2023-03-03 18:58:38,711][12967] Updated weights for policy 0, policy_version 820 (0.0019) [2023-03-03 18:58:42,438][01413] Fps is (10 sec: 4505.5, 60 sec: 3822.9, 300 sec: 3776.6). Total num frames: 3375104. Throughput: 0: 969.2. Samples: 844228. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-03 18:58:42,445][01413] Avg episode reward: [(0, '15.430')] [2023-03-03 18:58:47,438][01413] Fps is (10 sec: 3686.4, 60 sec: 3755.0, 300 sec: 3762.8). Total num frames: 3387392. Throughput: 0: 943.1. Samples: 846366. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-03 18:58:47,440][01413] Avg episode reward: [(0, '17.051')] [2023-03-03 18:58:47,444][12953] Saving new best policy, reward=17.051! [2023-03-03 18:58:50,922][12967] Updated weights for policy 0, policy_version 830 (0.0019) [2023-03-03 18:58:52,438][01413] Fps is (10 sec: 2867.3, 60 sec: 3754.7, 300 sec: 3776.7). Total num frames: 3403776. Throughput: 0: 917.7. Samples: 850714. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-03-03 18:58:52,446][01413] Avg episode reward: [(0, '17.031')] [2023-03-03 18:58:57,438][01413] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3762.8). Total num frames: 3424256. Throughput: 0: 971.9. Samples: 857520. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-03 18:58:57,440][01413] Avg episode reward: [(0, '16.798')] [2023-03-03 18:59:00,059][12967] Updated weights for policy 0, policy_version 840 (0.0015) [2023-03-03 18:59:02,440][01413] Fps is (10 sec: 4504.7, 60 sec: 3754.6, 300 sec: 3776.6). Total num frames: 3448832. Throughput: 0: 972.5. Samples: 860988. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-03 18:59:02,447][01413] Avg episode reward: [(0, '16.723')] [2023-03-03 18:59:02,462][12953] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000842_3448832.pth... [2023-03-03 18:59:02,591][12953] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000620_2539520.pth [2023-03-03 18:59:07,438][01413] Fps is (10 sec: 4095.9, 60 sec: 3822.9, 300 sec: 3776.7). Total num frames: 3465216. Throughput: 0: 927.9. Samples: 865980. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-03 18:59:07,445][01413] Avg episode reward: [(0, '15.545')] [2023-03-03 18:59:12,438][01413] Fps is (10 sec: 3277.5, 60 sec: 3822.9, 300 sec: 3776.7). Total num frames: 3481600. Throughput: 0: 929.5. Samples: 870824. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-03 18:59:12,444][01413] Avg episode reward: [(0, '15.394')] [2023-03-03 18:59:12,440][12967] Updated weights for policy 0, policy_version 850 (0.0012) [2023-03-03 18:59:17,438][01413] Fps is (10 sec: 3686.3, 60 sec: 3754.6, 300 sec: 3776.6). Total num frames: 3502080. Throughput: 0: 956.7. Samples: 874270. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-03 18:59:17,441][01413] Avg episode reward: [(0, '16.293')] [2023-03-03 18:59:21,096][12967] Updated weights for policy 0, policy_version 860 (0.0027) [2023-03-03 18:59:22,441][01413] Fps is (10 sec: 4504.4, 60 sec: 3822.8, 300 sec: 3776.6). Total num frames: 3526656. Throughput: 0: 974.7. Samples: 881284. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-03-03 18:59:22,443][01413] Avg episode reward: [(0, '15.778')] [2023-03-03 18:59:27,438][01413] Fps is (10 sec: 3686.5, 60 sec: 3754.7, 300 sec: 3776.7). Total num frames: 3538944. Throughput: 0: 923.9. Samples: 885804. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-03-03 18:59:27,445][01413] Avg episode reward: [(0, '15.857')] [2023-03-03 18:59:32,438][01413] Fps is (10 sec: 2868.0, 60 sec: 3754.7, 300 sec: 3776.6). Total num frames: 3555328. Throughput: 0: 925.7. Samples: 888022. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-03 18:59:32,441][01413] Avg episode reward: [(0, '17.119')] [2023-03-03 18:59:32,451][12953] Saving new best policy, reward=17.119! [2023-03-03 18:59:33,785][12967] Updated weights for policy 0, policy_version 870 (0.0034) [2023-03-03 18:59:37,438][01413] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3776.7). Total num frames: 3579904. Throughput: 0: 973.9. Samples: 894538. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-03 18:59:37,446][01413] Avg episode reward: [(0, '16.355')] [2023-03-03 18:59:42,446][01413] Fps is (10 sec: 4502.2, 60 sec: 3754.2, 300 sec: 3776.6). Total num frames: 3600384. Throughput: 0: 965.9. Samples: 900992. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-03 18:59:42,451][01413] Avg episode reward: [(0, '16.983')] [2023-03-03 18:59:43,147][12967] Updated weights for policy 0, policy_version 880 (0.0013) [2023-03-03 18:59:47,438][01413] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 3616768. Throughput: 0: 937.2. Samples: 903160. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-03-03 18:59:47,445][01413] Avg episode reward: [(0, '15.807')] [2023-03-03 18:59:52,438][01413] Fps is (10 sec: 3279.3, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 3633152. Throughput: 0: 929.1. Samples: 907788. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-03-03 18:59:52,440][01413] Avg episode reward: [(0, '14.853')] [2023-03-03 18:59:54,812][12967] Updated weights for policy 0, policy_version 890 (0.0011) [2023-03-03 18:59:57,438][01413] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3790.5). Total num frames: 3657728. Throughput: 0: 978.8. Samples: 914868. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-03-03 18:59:57,440][01413] Avg episode reward: [(0, '16.040')] [2023-03-03 19:00:02,438][01413] Fps is (10 sec: 4096.0, 60 sec: 3754.8, 300 sec: 3776.7). Total num frames: 3674112. Throughput: 0: 976.8. Samples: 918224. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-03 19:00:02,443][01413] Avg episode reward: [(0, '17.499')] [2023-03-03 19:00:02,466][12953] Saving new best policy, reward=17.499! [2023-03-03 19:00:05,594][12967] Updated weights for policy 0, policy_version 900 (0.0048) [2023-03-03 19:00:07,441][01413] Fps is (10 sec: 3275.9, 60 sec: 3754.5, 300 sec: 3776.7). Total num frames: 3690496. Throughput: 0: 920.5. Samples: 922706. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-03-03 19:00:07,443][01413] Avg episode reward: [(0, '17.684')] [2023-03-03 19:00:07,452][12953] Saving new best policy, reward=17.684! [2023-03-03 19:00:12,438][01413] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3776.7). Total num frames: 3706880. Throughput: 0: 936.9. Samples: 927966. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-03 19:00:12,440][01413] Avg episode reward: [(0, '19.317')] [2023-03-03 19:00:12,455][12953] Saving new best policy, reward=19.317! [2023-03-03 19:00:16,169][12967] Updated weights for policy 0, policy_version 910 (0.0020) [2023-03-03 19:00:17,438][01413] Fps is (10 sec: 4097.1, 60 sec: 3823.0, 300 sec: 3790.5). Total num frames: 3731456. Throughput: 0: 963.4. Samples: 931374. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-03 19:00:17,441][01413] Avg episode reward: [(0, '18.957')] [2023-03-03 19:00:22,443][01413] Fps is (10 sec: 4503.5, 60 sec: 3754.5, 300 sec: 3790.5). Total num frames: 3751936. Throughput: 0: 965.7. Samples: 938000. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-03 19:00:22,445][01413] Avg episode reward: [(0, '17.262')] [2023-03-03 19:00:27,438][01413] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3776.7). Total num frames: 3764224. Throughput: 0: 919.0. Samples: 942340. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-03 19:00:27,440][01413] Avg episode reward: [(0, '17.409')] [2023-03-03 19:00:27,682][12967] Updated weights for policy 0, policy_version 920 (0.0013) [2023-03-03 19:00:32,438][01413] Fps is (10 sec: 3278.4, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 3784704. Throughput: 0: 925.7. Samples: 944816. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-03 19:00:32,444][01413] Avg episode reward: [(0, '17.983')] [2023-03-03 19:00:37,324][12967] Updated weights for policy 0, policy_version 930 (0.0027) [2023-03-03 19:00:37,438][01413] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 3809280. Throughput: 0: 979.1. Samples: 951848. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-03 19:00:37,441][01413] Avg episode reward: [(0, '18.868')] [2023-03-03 19:00:42,438][01413] Fps is (10 sec: 4096.0, 60 sec: 3755.1, 300 sec: 3776.6). Total num frames: 3825664. Throughput: 0: 949.4. Samples: 957592. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-03 19:00:42,442][01413] Avg episode reward: [(0, '18.647')] [2023-03-03 19:00:47,438][01413] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3790.6). Total num frames: 3842048. Throughput: 0: 923.7. Samples: 959792. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-03 19:00:47,447][01413] Avg episode reward: [(0, '18.885')] [2023-03-03 19:00:49,541][12967] Updated weights for policy 0, policy_version 940 (0.0016) [2023-03-03 19:00:52,438][01413] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 3862528. Throughput: 0: 941.3. Samples: 965062. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-03-03 19:00:52,440][01413] Avg episode reward: [(0, '17.934')] [2023-03-03 19:00:57,438][01413] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3790.5). Total num frames: 3883008. Throughput: 0: 980.8. Samples: 972100. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-03 19:00:57,443][01413] Avg episode reward: [(0, '17.628')] [2023-03-03 19:00:58,414][12967] Updated weights for policy 0, policy_version 950 (0.0014) [2023-03-03 19:01:02,438][01413] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 3903488. Throughput: 0: 974.5. Samples: 975228. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-03 19:01:02,442][01413] Avg episode reward: [(0, '17.653')] [2023-03-03 19:01:02,456][12953] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000953_3903488.pth... [2023-03-03 19:01:02,582][12953] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000731_2994176.pth [2023-03-03 19:01:07,438][01413] Fps is (10 sec: 3276.8, 60 sec: 3754.8, 300 sec: 3790.5). Total num frames: 3915776. Throughput: 0: 920.0. Samples: 979394. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-03-03 19:01:07,440][01413] Avg episode reward: [(0, '18.079')] [2023-03-03 19:01:11,008][12967] Updated weights for policy 0, policy_version 960 (0.0035) [2023-03-03 19:01:12,438][01413] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 3936256. Throughput: 0: 949.6. Samples: 985070. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-03 19:01:12,443][01413] Avg episode reward: [(0, '18.461')] [2023-03-03 19:01:17,438][01413] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 3960832. Throughput: 0: 972.2. Samples: 988566. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-03 19:01:17,442][01413] Avg episode reward: [(0, '18.276')] [2023-03-03 19:01:20,267][12967] Updated weights for policy 0, policy_version 970 (0.0017) [2023-03-03 19:01:22,438][01413] Fps is (10 sec: 4096.0, 60 sec: 3755.0, 300 sec: 3776.7). Total num frames: 3977216. Throughput: 0: 951.2. Samples: 994650. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-03 19:01:22,443][01413] Avg episode reward: [(0, '18.164')] [2023-03-03 19:01:27,438][01413] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3790.6). Total num frames: 3993600. Throughput: 0: 919.8. Samples: 998982. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-03 19:01:27,441][01413] Avg episode reward: [(0, '17.316')] [2023-03-03 19:01:30,427][12953] Stopping Batcher_0... [2023-03-03 19:01:30,428][12953] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2023-03-03 19:01:30,429][12953] Loop batcher_evt_loop terminating... [2023-03-03 19:01:30,437][01413] Component Batcher_0 stopped! [2023-03-03 19:01:30,478][01413] Component RolloutWorker_w3 stopped! [2023-03-03 19:01:30,485][12970] Stopping RolloutWorker_w3... [2023-03-03 19:01:30,490][12970] Loop rollout_proc3_evt_loop terminating... [2023-03-03 19:01:30,506][12967] Weights refcount: 2 0 [2023-03-03 19:01:30,522][01413] Component InferenceWorker_p0-w0 stopped! [2023-03-03 19:01:30,526][12967] Stopping InferenceWorker_p0-w0... [2023-03-03 19:01:30,528][12967] Loop inference_proc0-0_evt_loop terminating... [2023-03-03 19:01:30,531][01413] Component RolloutWorker_w5 stopped! [2023-03-03 19:01:30,536][12978] Stopping RolloutWorker_w5... [2023-03-03 19:01:30,537][12978] Loop rollout_proc5_evt_loop terminating... [2023-03-03 19:01:30,541][12968] Stopping RolloutWorker_w0... [2023-03-03 19:01:30,542][12968] Loop rollout_proc0_evt_loop terminating... [2023-03-03 19:01:30,541][01413] Component RolloutWorker_w0 stopped! [2023-03-03 19:01:30,547][12972] Stopping RolloutWorker_w4... [2023-03-03 19:01:30,548][12972] Loop rollout_proc4_evt_loop terminating... [2023-03-03 19:01:30,551][12969] Stopping RolloutWorker_w2... [2023-03-03 19:01:30,552][12979] Stopping RolloutWorker_w6... [2023-03-03 19:01:30,552][12979] Loop rollout_proc6_evt_loop terminating... [2023-03-03 19:01:30,554][12969] Loop rollout_proc2_evt_loop terminating... [2023-03-03 19:01:30,554][01413] Component RolloutWorker_w4 stopped! [2023-03-03 19:01:30,556][01413] Component RolloutWorker_w2 stopped! [2023-03-03 19:01:30,558][01413] Component RolloutWorker_w6 stopped! [2023-03-03 19:01:30,578][12953] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000842_3448832.pth [2023-03-03 19:01:30,593][12953] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2023-03-03 19:01:30,593][01413] Component RolloutWorker_w7 stopped! [2023-03-03 19:01:30,599][12977] Stopping RolloutWorker_w7... [2023-03-03 19:01:30,600][12977] Loop rollout_proc7_evt_loop terminating... [2023-03-03 19:01:30,603][01413] Component RolloutWorker_w1 stopped! [2023-03-03 19:01:30,609][12971] Stopping RolloutWorker_w1... [2023-03-03 19:01:30,609][12971] Loop rollout_proc1_evt_loop terminating... [2023-03-03 19:01:30,780][01413] Component LearnerWorker_p0 stopped! [2023-03-03 19:01:30,788][01413] Waiting for process learner_proc0 to stop... [2023-03-03 19:01:30,791][12953] Stopping LearnerWorker_p0... [2023-03-03 19:01:30,792][12953] Loop learner_proc0_evt_loop terminating... [2023-03-03 19:01:32,565][01413] Waiting for process inference_proc0-0 to join... [2023-03-03 19:01:32,933][01413] Waiting for process rollout_proc0 to join... [2023-03-03 19:01:32,935][01413] Waiting for process rollout_proc1 to join... [2023-03-03 19:01:33,410][01413] Waiting for process rollout_proc2 to join... [2023-03-03 19:01:33,411][01413] Waiting for process rollout_proc3 to join... [2023-03-03 19:01:33,415][01413] Waiting for process rollout_proc4 to join... [2023-03-03 19:01:33,417][01413] Waiting for process rollout_proc5 to join... [2023-03-03 19:01:33,431][01413] Waiting for process rollout_proc6 to join... [2023-03-03 19:01:33,432][01413] Waiting for process rollout_proc7 to join... [2023-03-03 19:01:33,433][01413] Batcher 0 profile tree view: batching: 25.1310, releasing_batches: 0.0253 [2023-03-03 19:01:33,436][01413] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0149 wait_policy_total: 546.1123 update_model: 7.6312 weight_update: 0.0030 one_step: 0.0049 handle_policy_step: 486.1538 deserialize: 14.6856, stack: 2.7954, obs_to_device_normalize: 111.0544, forward: 230.9038, send_messages: 24.7416 prepare_outputs: 77.8759 to_cpu: 48.4307 [2023-03-03 19:01:33,438][01413] Learner 0 profile tree view: misc: 0.0058, prepare_batch: 17.0338 train: 75.2196 epoch_init: 0.0112, minibatch_init: 0.0059, losses_postprocess: 0.6037, kl_divergence: 0.5793, after_optimizer: 33.2569 calculate_losses: 26.7228 losses_init: 0.0032, forward_head: 1.7327, bptt_initial: 17.8141, tail: 1.1496, advantages_returns: 0.3218, losses: 3.3976 bptt: 1.9892 bptt_forward_core: 1.9123 update: 13.4732 clip: 1.3284 [2023-03-03 19:01:33,439][01413] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.3635, enqueue_policy_requests: 150.9250, env_step: 805.5556, overhead: 20.2514, complete_rollouts: 7.5988 save_policy_outputs: 19.1495 split_output_tensors: 9.0648 [2023-03-03 19:01:33,440][01413] RolloutWorker_w7 profile tree view: wait_for_trajectories: 0.3947, enqueue_policy_requests: 150.8822, env_step: 806.9332, overhead: 19.8105, complete_rollouts: 6.8501 save_policy_outputs: 19.3773 split_output_tensors: 9.2869 [2023-03-03 19:01:33,441][01413] Loop Runner_EvtLoop terminating... [2023-03-03 19:01:33,444][01413] Runner profile tree view: main_loop: 1107.0635 [2023-03-03 19:01:33,450][01413] Collected {0: 4005888}, FPS: 3618.5 [2023-03-03 19:08:07,363][01413] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2023-03-03 19:08:07,365][01413] Overriding arg 'num_workers' with value 1 passed from command line [2023-03-03 19:08:07,367][01413] Adding new argument 'no_render'=True that is not in the saved config file! [2023-03-03 19:08:07,371][01413] Adding new argument 'save_video'=True that is not in the saved config file! [2023-03-03 19:08:07,373][01413] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2023-03-03 19:08:07,374][01413] Adding new argument 'video_name'=None that is not in the saved config file! [2023-03-03 19:08:07,376][01413] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! [2023-03-03 19:08:07,377][01413] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2023-03-03 19:08:07,378][01413] Adding new argument 'push_to_hub'=False that is not in the saved config file! [2023-03-03 19:08:07,382][01413] Adding new argument 'hf_repository'=None that is not in the saved config file! [2023-03-03 19:08:07,384][01413] Adding new argument 'policy_index'=0 that is not in the saved config file! [2023-03-03 19:08:07,385][01413] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2023-03-03 19:08:07,386][01413] Adding new argument 'train_script'=None that is not in the saved config file! [2023-03-03 19:08:07,387][01413] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2023-03-03 19:08:07,389][01413] Using frameskip 1 and render_action_repeat=4 for evaluation [2023-03-03 19:08:07,426][01413] Doom resolution: 160x120, resize resolution: (128, 72) [2023-03-03 19:08:07,434][01413] RunningMeanStd input shape: (3, 72, 128) [2023-03-03 19:08:07,438][01413] RunningMeanStd input shape: (1,) [2023-03-03 19:08:07,458][01413] ConvEncoder: input_channels=3 [2023-03-03 19:08:08,134][01413] Conv encoder output size: 512 [2023-03-03 19:08:08,135][01413] Policy head output size: 512 [2023-03-03 19:08:10,523][01413] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2023-03-03 19:08:11,788][01413] Num frames 100... [2023-03-03 19:08:11,903][01413] Num frames 200... [2023-03-03 19:08:12,056][01413] Num frames 300... [2023-03-03 19:08:12,227][01413] Num frames 400... [2023-03-03 19:08:12,393][01413] Num frames 500... [2023-03-03 19:08:12,562][01413] Num frames 600... [2023-03-03 19:08:12,728][01413] Num frames 700... [2023-03-03 19:08:12,887][01413] Num frames 800... [2023-03-03 19:08:13,033][01413] Avg episode rewards: #0: 17.530, true rewards: #0: 8.530 [2023-03-03 19:08:13,035][01413] Avg episode reward: 17.530, avg true_objective: 8.530 [2023-03-03 19:08:13,112][01413] Num frames 900... [2023-03-03 19:08:13,277][01413] Num frames 1000... [2023-03-03 19:08:13,450][01413] Num frames 1100... [2023-03-03 19:08:13,614][01413] Num frames 1200... [2023-03-03 19:08:13,782][01413] Num frames 1300... [2023-03-03 19:08:13,948][01413] Num frames 1400... [2023-03-03 19:08:14,021][01413] Avg episode rewards: #0: 14.040, true rewards: #0: 7.040 [2023-03-03 19:08:14,023][01413] Avg episode reward: 14.040, avg true_objective: 7.040 [2023-03-03 19:08:14,178][01413] Num frames 1500... [2023-03-03 19:08:14,351][01413] Num frames 1600... [2023-03-03 19:08:14,522][01413] Num frames 1700... [2023-03-03 19:08:14,691][01413] Num frames 1800... [2023-03-03 19:08:14,854][01413] Num frames 1900... [2023-03-03 19:08:15,021][01413] Num frames 2000... [2023-03-03 19:08:15,195][01413] Num frames 2100... [2023-03-03 19:08:15,357][01413] Num frames 2200... [2023-03-03 19:08:15,478][01413] Num frames 2300... [2023-03-03 19:08:15,601][01413] Avg episode rewards: #0: 16.860, true rewards: #0: 7.860 [2023-03-03 19:08:15,603][01413] Avg episode reward: 16.860, avg true_objective: 7.860 [2023-03-03 19:08:15,661][01413] Num frames 2400... [2023-03-03 19:08:15,777][01413] Num frames 2500... [2023-03-03 19:08:15,901][01413] Num frames 2600... [2023-03-03 19:08:16,018][01413] Num frames 2700... [2023-03-03 19:08:16,133][01413] Num frames 2800... [2023-03-03 19:08:16,248][01413] Num frames 2900... [2023-03-03 19:08:16,370][01413] Num frames 3000... [2023-03-03 19:08:16,489][01413] Num frames 3100... [2023-03-03 19:08:16,619][01413] Num frames 3200... [2023-03-03 19:08:16,754][01413] Num frames 3300... [2023-03-03 19:08:16,889][01413] Num frames 3400... [2023-03-03 19:08:17,006][01413] Num frames 3500... [2023-03-03 19:08:17,123][01413] Num frames 3600... [2023-03-03 19:08:17,239][01413] Num frames 3700... [2023-03-03 19:08:17,358][01413] Num frames 3800... [2023-03-03 19:08:17,518][01413] Avg episode rewards: #0: 21.973, true rewards: #0: 9.722 [2023-03-03 19:08:17,520][01413] Avg episode reward: 21.973, avg true_objective: 9.722 [2023-03-03 19:08:17,535][01413] Num frames 3900... [2023-03-03 19:08:17,650][01413] Num frames 4000... [2023-03-03 19:08:17,775][01413] Num frames 4100... [2023-03-03 19:08:17,904][01413] Num frames 4200... [2023-03-03 19:08:18,038][01413] Num frames 4300... [2023-03-03 19:08:18,173][01413] Num frames 4400... [2023-03-03 19:08:18,300][01413] Num frames 4500... [2023-03-03 19:08:18,436][01413] Num frames 4600... [2023-03-03 19:08:18,564][01413] Num frames 4700... [2023-03-03 19:08:18,698][01413] Num frames 4800... [2023-03-03 19:08:18,821][01413] Avg episode rewards: #0: 21.498, true rewards: #0: 9.698 [2023-03-03 19:08:18,824][01413] Avg episode reward: 21.498, avg true_objective: 9.698 [2023-03-03 19:08:18,889][01413] Num frames 4900... [2023-03-03 19:08:19,016][01413] Num frames 5000... [2023-03-03 19:08:19,132][01413] Num frames 5100... [2023-03-03 19:08:19,253][01413] Num frames 5200... [2023-03-03 19:08:19,372][01413] Num frames 5300... [2023-03-03 19:08:19,493][01413] Num frames 5400... [2023-03-03 19:08:19,615][01413] Num frames 5500... [2023-03-03 19:08:19,740][01413] Num frames 5600... [2023-03-03 19:08:19,858][01413] Num frames 5700... [2023-03-03 19:08:19,975][01413] Num frames 5800... [2023-03-03 19:08:20,098][01413] Num frames 5900... [2023-03-03 19:08:20,218][01413] Num frames 6000... [2023-03-03 19:08:20,330][01413] Num frames 6100... [2023-03-03 19:08:20,447][01413] Num frames 6200... [2023-03-03 19:08:20,559][01413] Num frames 6300... [2023-03-03 19:08:20,679][01413] Num frames 6400... [2023-03-03 19:08:20,798][01413] Num frames 6500... [2023-03-03 19:08:20,911][01413] Num frames 6600... [2023-03-03 19:08:21,028][01413] Num frames 6700... [2023-03-03 19:08:21,092][01413] Avg episode rewards: #0: 25.842, true rewards: #0: 11.175 [2023-03-03 19:08:21,094][01413] Avg episode reward: 25.842, avg true_objective: 11.175 [2023-03-03 19:08:21,201][01413] Num frames 6800... [2023-03-03 19:08:21,312][01413] Num frames 6900... [2023-03-03 19:08:21,426][01413] Num frames 7000... [2023-03-03 19:08:21,537][01413] Num frames 7100... [2023-03-03 19:08:21,658][01413] Num frames 7200... [2023-03-03 19:08:21,794][01413] Num frames 7300... [2023-03-03 19:08:21,909][01413] Num frames 7400... [2023-03-03 19:08:22,024][01413] Avg episode rewards: #0: 24.631, true rewards: #0: 10.631 [2023-03-03 19:08:22,026][01413] Avg episode reward: 24.631, avg true_objective: 10.631 [2023-03-03 19:08:22,094][01413] Num frames 7500... [2023-03-03 19:08:22,207][01413] Num frames 7600... [2023-03-03 19:08:22,321][01413] Num frames 7700... [2023-03-03 19:08:22,435][01413] Num frames 7800... [2023-03-03 19:08:22,549][01413] Num frames 7900... [2023-03-03 19:08:22,662][01413] Num frames 8000... [2023-03-03 19:08:22,774][01413] Num frames 8100... [2023-03-03 19:08:22,894][01413] Num frames 8200... [2023-03-03 19:08:23,007][01413] Num frames 8300... [2023-03-03 19:08:23,130][01413] Num frames 8400... [2023-03-03 19:08:23,248][01413] Num frames 8500... [2023-03-03 19:08:23,364][01413] Num frames 8600... [2023-03-03 19:08:23,479][01413] Num frames 8700... [2023-03-03 19:08:23,601][01413] Num frames 8800... [2023-03-03 19:08:23,716][01413] Num frames 8900... [2023-03-03 19:08:23,846][01413] Num frames 9000... [2023-03-03 19:08:23,976][01413] Num frames 9100... [2023-03-03 19:08:24,111][01413] Num frames 9200... [2023-03-03 19:08:24,235][01413] Num frames 9300... [2023-03-03 19:08:24,357][01413] Num frames 9400... [2023-03-03 19:08:24,480][01413] Num frames 9500... [2023-03-03 19:08:24,587][01413] Avg episode rewards: #0: 28.427, true rewards: #0: 11.927 [2023-03-03 19:08:24,589][01413] Avg episode reward: 28.427, avg true_objective: 11.927 [2023-03-03 19:08:24,664][01413] Num frames 9600... [2023-03-03 19:08:24,778][01413] Num frames 9700... [2023-03-03 19:08:24,896][01413] Num frames 9800... [2023-03-03 19:08:25,012][01413] Num frames 9900... [2023-03-03 19:08:25,168][01413] Avg episode rewards: #0: 25.878, true rewards: #0: 11.100 [2023-03-03 19:08:25,169][01413] Avg episode reward: 25.878, avg true_objective: 11.100 [2023-03-03 19:08:25,187][01413] Num frames 10000... [2023-03-03 19:08:25,300][01413] Num frames 10100... [2023-03-03 19:08:25,458][01413] Num frames 10200... [2023-03-03 19:08:25,616][01413] Num frames 10300... [2023-03-03 19:08:25,772][01413] Num frames 10400... [2023-03-03 19:08:25,934][01413] Num frames 10500... [2023-03-03 19:08:25,998][01413] Avg episode rewards: #0: 24.002, true rewards: #0: 10.502 [2023-03-03 19:08:26,004][01413] Avg episode reward: 24.002, avg true_objective: 10.502 [2023-03-03 19:09:26,493][01413] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2023-03-03 19:14:18,948][01413] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2023-03-03 19:14:18,950][01413] Overriding arg 'num_workers' with value 1 passed from command line [2023-03-03 19:14:18,952][01413] Adding new argument 'no_render'=True that is not in the saved config file! [2023-03-03 19:14:18,955][01413] Adding new argument 'save_video'=True that is not in the saved config file! [2023-03-03 19:14:18,958][01413] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2023-03-03 19:14:18,959][01413] Adding new argument 'video_name'=None that is not in the saved config file! [2023-03-03 19:14:18,963][01413] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2023-03-03 19:14:18,966][01413] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2023-03-03 19:14:18,967][01413] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2023-03-03 19:14:18,968][01413] Adding new argument 'hf_repository'='DiegoD616/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2023-03-03 19:14:18,970][01413] Adding new argument 'policy_index'=0 that is not in the saved config file! [2023-03-03 19:14:18,971][01413] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2023-03-03 19:14:18,972][01413] Adding new argument 'train_script'=None that is not in the saved config file! [2023-03-03 19:14:18,973][01413] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2023-03-03 19:14:18,974][01413] Using frameskip 1 and render_action_repeat=4 for evaluation [2023-03-03 19:14:19,004][01413] RunningMeanStd input shape: (3, 72, 128) [2023-03-03 19:14:19,007][01413] RunningMeanStd input shape: (1,) [2023-03-03 19:14:19,026][01413] ConvEncoder: input_channels=3 [2023-03-03 19:14:19,074][01413] Conv encoder output size: 512 [2023-03-03 19:14:19,075][01413] Policy head output size: 512 [2023-03-03 19:14:19,098][01413] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2023-03-03 19:14:19,572][01413] Num frames 100... [2023-03-03 19:14:19,698][01413] Num frames 200... [2023-03-03 19:14:19,825][01413] Num frames 300... [2023-03-03 19:14:19,950][01413] Num frames 400... [2023-03-03 19:14:20,066][01413] Num frames 500... [2023-03-03 19:14:20,185][01413] Num frames 600... [2023-03-03 19:14:20,302][01413] Num frames 700... [2023-03-03 19:14:20,424][01413] Num frames 800... [2023-03-03 19:14:20,591][01413] Avg episode rewards: #0: 19.960, true rewards: #0: 8.960 [2023-03-03 19:14:20,593][01413] Avg episode reward: 19.960, avg true_objective: 8.960 [2023-03-03 19:14:20,603][01413] Num frames 900... [2023-03-03 19:14:20,716][01413] Num frames 1000... [2023-03-03 19:14:20,849][01413] Num frames 1100... [2023-03-03 19:14:20,961][01413] Num frames 1200... [2023-03-03 19:14:21,076][01413] Num frames 1300... [2023-03-03 19:14:21,180][01413] Avg episode rewards: #0: 12.720, true rewards: #0: 6.720 [2023-03-03 19:14:21,182][01413] Avg episode reward: 12.720, avg true_objective: 6.720 [2023-03-03 19:14:21,253][01413] Num frames 1400... [2023-03-03 19:14:21,366][01413] Num frames 1500... [2023-03-03 19:14:21,486][01413] Num frames 1600... [2023-03-03 19:14:21,601][01413] Num frames 1700... [2023-03-03 19:14:21,714][01413] Num frames 1800... [2023-03-03 19:14:21,827][01413] Num frames 1900... [2023-03-03 19:14:21,942][01413] Num frames 2000... [2023-03-03 19:14:22,055][01413] Num frames 2100... [2023-03-03 19:14:22,184][01413] Num frames 2200... [2023-03-03 19:14:22,300][01413] Num frames 2300... [2023-03-03 19:14:22,422][01413] Num frames 2400... [2023-03-03 19:14:22,541][01413] Num frames 2500... [2023-03-03 19:14:22,669][01413] Num frames 2600... [2023-03-03 19:14:22,785][01413] Num frames 2700... [2023-03-03 19:14:22,898][01413] Num frames 2800... [2023-03-03 19:14:23,013][01413] Num frames 2900... [2023-03-03 19:14:23,129][01413] Num frames 3000... [2023-03-03 19:14:23,244][01413] Num frames 3100... [2023-03-03 19:14:23,359][01413] Num frames 3200... [2023-03-03 19:14:23,494][01413] Avg episode rewards: #0: 22.227, true rewards: #0: 10.893 [2023-03-03 19:14:23,496][01413] Avg episode reward: 22.227, avg true_objective: 10.893 [2023-03-03 19:14:23,536][01413] Num frames 3300... [2023-03-03 19:14:23,650][01413] Num frames 3400... [2023-03-03 19:14:23,774][01413] Num frames 3500... [2023-03-03 19:14:23,887][01413] Num frames 3600... [2023-03-03 19:14:24,002][01413] Num frames 3700... [2023-03-03 19:14:24,117][01413] Num frames 3800... [2023-03-03 19:14:24,239][01413] Num frames 3900... [2023-03-03 19:14:24,355][01413] Num frames 4000... [2023-03-03 19:14:24,474][01413] Num frames 4100... [2023-03-03 19:14:24,530][01413] Avg episode rewards: #0: 20.500, true rewards: #0: 10.250 [2023-03-03 19:14:24,532][01413] Avg episode reward: 20.500, avg true_objective: 10.250 [2023-03-03 19:14:24,652][01413] Num frames 4200... [2023-03-03 19:14:24,766][01413] Num frames 4300... [2023-03-03 19:14:24,877][01413] Num frames 4400... [2023-03-03 19:14:24,997][01413] Num frames 4500... [2023-03-03 19:14:25,116][01413] Num frames 4600... [2023-03-03 19:14:25,232][01413] Num frames 4700... [2023-03-03 19:14:25,346][01413] Num frames 4800... [2023-03-03 19:14:25,469][01413] Num frames 4900... [2023-03-03 19:14:25,643][01413] Num frames 5000... [2023-03-03 19:14:25,801][01413] Num frames 5100... [2023-03-03 19:14:25,963][01413] Num frames 5200... [2023-03-03 19:14:26,126][01413] Num frames 5300... [2023-03-03 19:14:26,283][01413] Num frames 5400... [2023-03-03 19:14:26,451][01413] Num frames 5500... [2023-03-03 19:14:26,622][01413] Num frames 5600... [2023-03-03 19:14:26,776][01413] Num frames 5700... [2023-03-03 19:14:26,934][01413] Num frames 5800... [2023-03-03 19:14:27,115][01413] Num frames 5900... [2023-03-03 19:14:27,273][01413] Num frames 6000... [2023-03-03 19:14:27,446][01413] Num frames 6100... [2023-03-03 19:14:27,616][01413] Num frames 6200... [2023-03-03 19:14:27,672][01413] Avg episode rewards: #0: 27.400, true rewards: #0: 12.400 [2023-03-03 19:14:27,674][01413] Avg episode reward: 27.400, avg true_objective: 12.400 [2023-03-03 19:14:27,837][01413] Num frames 6300... [2023-03-03 19:14:28,011][01413] Num frames 6400... [2023-03-03 19:14:28,172][01413] Num frames 6500... [2023-03-03 19:14:28,333][01413] Num frames 6600... [2023-03-03 19:14:28,500][01413] Num frames 6700... [2023-03-03 19:14:28,632][01413] Avg episode rewards: #0: 24.406, true rewards: #0: 11.240 [2023-03-03 19:14:28,635][01413] Avg episode reward: 24.406, avg true_objective: 11.240 [2023-03-03 19:14:28,732][01413] Num frames 6800... [2023-03-03 19:14:28,892][01413] Num frames 6900... [2023-03-03 19:14:29,019][01413] Num frames 7000... [2023-03-03 19:14:29,133][01413] Num frames 7100... [2023-03-03 19:14:29,249][01413] Num frames 7200... [2023-03-03 19:14:29,361][01413] Num frames 7300... [2023-03-03 19:14:29,481][01413] Num frames 7400... [2023-03-03 19:14:29,598][01413] Avg episode rewards: #0: 23.068, true rewards: #0: 10.640 [2023-03-03 19:14:29,600][01413] Avg episode reward: 23.068, avg true_objective: 10.640 [2023-03-03 19:14:29,670][01413] Num frames 7500... [2023-03-03 19:14:29,784][01413] Num frames 7600... [2023-03-03 19:14:29,897][01413] Num frames 7700... [2023-03-03 19:14:30,019][01413] Num frames 7800... [2023-03-03 19:14:30,136][01413] Num frames 7900... [2023-03-03 19:14:30,251][01413] Num frames 8000... [2023-03-03 19:14:30,364][01413] Num frames 8100... [2023-03-03 19:14:30,480][01413] Avg episode rewards: #0: 22.315, true rewards: #0: 10.190 [2023-03-03 19:14:30,484][01413] Avg episode reward: 22.315, avg true_objective: 10.190 [2023-03-03 19:14:30,541][01413] Num frames 8200... [2023-03-03 19:14:30,663][01413] Num frames 8300... [2023-03-03 19:14:30,784][01413] Num frames 8400... [2023-03-03 19:14:30,897][01413] Num frames 8500... [2023-03-03 19:14:31,016][01413] Num frames 8600... [2023-03-03 19:14:31,134][01413] Num frames 8700... [2023-03-03 19:14:31,268][01413] Avg episode rewards: #0: 20.965, true rewards: #0: 9.743 [2023-03-03 19:14:31,269][01413] Avg episode reward: 20.965, avg true_objective: 9.743 [2023-03-03 19:14:31,310][01413] Num frames 8800... [2023-03-03 19:14:31,447][01413] Num frames 8900... [2023-03-03 19:14:31,560][01413] Num frames 9000... [2023-03-03 19:14:31,688][01413] Num frames 9100... [2023-03-03 19:14:31,805][01413] Num frames 9200... [2023-03-03 19:14:31,919][01413] Num frames 9300... [2023-03-03 19:14:32,037][01413] Num frames 9400... [2023-03-03 19:14:32,153][01413] Num frames 9500... [2023-03-03 19:14:32,279][01413] Num frames 9600... [2023-03-03 19:14:32,403][01413] Num frames 9700... [2023-03-03 19:14:32,522][01413] Num frames 9800... [2023-03-03 19:14:32,636][01413] Num frames 9900... [2023-03-03 19:14:32,770][01413] Num frames 10000... [2023-03-03 19:14:32,886][01413] Num frames 10100... [2023-03-03 19:14:33,002][01413] Num frames 10200... [2023-03-03 19:14:33,127][01413] Num frames 10300... [2023-03-03 19:14:33,280][01413] Avg episode rewards: #0: 22.586, true rewards: #0: 10.386 [2023-03-03 19:14:33,283][01413] Avg episode reward: 22.586, avg true_objective: 10.386 [2023-03-03 19:15:32,206][01413] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2023-03-03 19:15:49,031][01413] The model has been pushed to https://huggingface.co/DiegoD616/rl_course_vizdoom_health_gathering_supreme [2023-03-03 19:21:43,604][01413] Environment doom_basic already registered, overwriting... [2023-03-03 19:21:43,609][01413] Environment doom_two_colors_easy already registered, overwriting... [2023-03-03 19:21:43,612][01413] Environment doom_two_colors_hard already registered, overwriting... [2023-03-03 19:21:43,614][01413] Environment doom_dm already registered, overwriting... [2023-03-03 19:21:43,615][01413] Environment doom_dwango5 already registered, overwriting... [2023-03-03 19:21:43,617][01413] Environment doom_my_way_home_flat_actions already registered, overwriting... [2023-03-03 19:21:43,618][01413] Environment doom_defend_the_center_flat_actions already registered, overwriting... [2023-03-03 19:21:43,623][01413] Environment doom_my_way_home already registered, overwriting... [2023-03-03 19:21:43,624][01413] Environment doom_deadly_corridor already registered, overwriting... [2023-03-03 19:21:43,625][01413] Environment doom_defend_the_center already registered, overwriting... [2023-03-03 19:21:43,626][01413] Environment doom_defend_the_line already registered, overwriting... [2023-03-03 19:21:43,627][01413] Environment doom_health_gathering already registered, overwriting... [2023-03-03 19:21:43,628][01413] Environment doom_health_gathering_supreme already registered, overwriting... [2023-03-03 19:21:43,629][01413] Environment doom_battle already registered, overwriting... [2023-03-03 19:21:43,631][01413] Environment doom_battle2 already registered, overwriting... [2023-03-03 19:21:43,633][01413] Environment doom_duel_bots already registered, overwriting... [2023-03-03 19:21:43,635][01413] Environment doom_deathmatch_bots already registered, overwriting... [2023-03-03 19:21:43,637][01413] Environment doom_duel already registered, overwriting... [2023-03-03 19:21:43,639][01413] Environment doom_deathmatch_full already registered, overwriting... [2023-03-03 19:21:43,641][01413] Environment doom_benchmark already registered, overwriting... [2023-03-03 19:21:43,643][01413] register_encoder_factory: [2023-03-03 19:21:43,682][01413] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2023-03-03 19:21:43,688][01413] Experiment dir /content/train_dir/default_experiment already exists! [2023-03-03 19:21:43,689][01413] Resuming existing experiment from /content/train_dir/default_experiment... [2023-03-03 19:21:43,691][01413] Weights and Biases integration disabled [2023-03-03 19:21:43,698][01413] Environment var CUDA_VISIBLE_DEVICES is 0 [2023-03-03 19:21:45,420][01413] Starting experiment with the following configuration: help=False algo=APPO env=doom_health_gathering_supreme experiment=default_experiment train_dir=/content/train_dir restart_behavior=resume device=gpu seed=None num_policies=1 async_rl=True serial_mode=False batched_sampling=False num_batches_to_accumulate=2 worker_num_splits=2 policy_workers_per_policy=1 max_policy_lag=1000 num_workers=8 num_envs_per_worker=4 batch_size=1024 num_batches_per_epoch=1 num_epochs=1 rollout=32 recurrence=32 shuffle_minibatches=False gamma=0.99 reward_scale=1.0 reward_clip=1000.0 value_bootstrap=False normalize_returns=True exploration_loss_coeff=0.001 value_loss_coeff=0.5 kl_loss_coeff=0.0 exploration_loss=symmetric_kl gae_lambda=0.95 ppo_clip_ratio=0.1 ppo_clip_value=0.2 with_vtrace=False vtrace_rho=1.0 vtrace_c=1.0 optimizer=adam adam_eps=1e-06 adam_beta1=0.9 adam_beta2=0.999 max_grad_norm=4.0 learning_rate=0.0001 lr_schedule=constant lr_schedule_kl_threshold=0.008 lr_adaptive_min=1e-06 lr_adaptive_max=0.01 obs_subtract_mean=0.0 obs_scale=255.0 normalize_input=True normalize_input_keys=None decorrelate_experience_max_seconds=0 decorrelate_envs_on_one_worker=True actor_worker_gpus=[] set_workers_cpu_affinity=True force_envs_single_thread=False default_niceness=0 log_to_file=True experiment_summaries_interval=10 flush_summaries_interval=30 stats_avg=100 summaries_use_frameskip=True heartbeat_interval=20 heartbeat_reporting_interval=600 train_for_env_steps=4000000 train_for_seconds=10000000000 save_every_sec=120 keep_checkpoints=2 load_checkpoint_kind=latest save_milestones_sec=-1 save_best_every_sec=5 save_best_metric=reward save_best_after=100000 benchmark=False encoder_mlp_layers=[512, 512] encoder_conv_architecture=convnet_simple encoder_conv_mlp_layers=[512] use_rnn=True rnn_size=512 rnn_type=gru rnn_num_layers=1 decoder_mlp_layers=[] nonlinearity=elu policy_initialization=orthogonal policy_init_gain=1.0 actor_critic_share_weights=True adaptive_stddev=True continuous_tanh_scale=0.0 initial_stddev=1.0 use_env_info_cache=False env_gpu_actions=False env_gpu_observations=True env_frameskip=4 env_framestack=1 pixel_format=CHW use_record_episode_statistics=False with_wandb=False wandb_user=None wandb_project=sample_factory wandb_group=None wandb_job_type=SF wandb_tags=[] with_pbt=False pbt_mix_policies_in_one_env=True pbt_period_env_steps=5000000 pbt_start_mutation=20000000 pbt_replace_fraction=0.3 pbt_mutation_rate=0.15 pbt_replace_reward_gap=0.1 pbt_replace_reward_gap_absolute=1e-06 pbt_optimize_gamma=False pbt_target_objective=true_objective pbt_perturb_min=1.1 pbt_perturb_max=1.5 num_agents=-1 num_humans=0 num_bots=-1 start_bot_difficulty=None timelimit=None res_w=128 res_h=72 wide_aspect_ratio=False eval_env_frameskip=1 fps=35 command_line=--env=doom_health_gathering_supreme --num_workers=8 --num_envs_per_worker=4 --train_for_env_steps=4000000 cli_args={'env': 'doom_health_gathering_supreme', 'num_workers': 8, 'num_envs_per_worker': 4, 'train_for_env_steps': 4000000} git_hash=unknown git_repo_name=not a git repository [2023-03-03 19:21:45,426][01413] Saving configuration to /content/train_dir/default_experiment/config.json... [2023-03-03 19:21:45,432][01413] Rollout worker 0 uses device cpu [2023-03-03 19:21:45,433][01413] Rollout worker 1 uses device cpu [2023-03-03 19:21:45,437][01413] Rollout worker 2 uses device cpu [2023-03-03 19:21:45,439][01413] Rollout worker 3 uses device cpu [2023-03-03 19:21:45,440][01413] Rollout worker 4 uses device cpu [2023-03-03 19:21:45,441][01413] Rollout worker 5 uses device cpu [2023-03-03 19:21:45,446][01413] Rollout worker 6 uses device cpu [2023-03-03 19:21:45,446][01413] Rollout worker 7 uses device cpu [2023-03-03 19:21:45,595][01413] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-03-03 19:21:45,600][01413] InferenceWorker_p0-w0: min num requests: 2 [2023-03-03 19:21:45,640][01413] Starting all processes... [2023-03-03 19:21:45,642][01413] Starting process learner_proc0 [2023-03-03 19:21:45,834][01413] Starting all processes... [2023-03-03 19:21:45,928][01413] Starting process inference_proc0-0 [2023-03-03 19:21:45,928][01413] Starting process rollout_proc0 [2023-03-03 19:21:45,928][01413] Starting process rollout_proc1 [2023-03-03 19:21:45,928][01413] Starting process rollout_proc2 [2023-03-03 19:21:45,928][01413] Starting process rollout_proc3 [2023-03-03 19:21:45,929][01413] Starting process rollout_proc4 [2023-03-03 19:21:45,929][01413] Starting process rollout_proc5 [2023-03-03 19:21:45,929][01413] Starting process rollout_proc6 [2023-03-03 19:21:45,929][01413] Starting process rollout_proc7 [2023-03-03 19:21:54,244][24817] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-03-03 19:21:54,249][24817] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2023-03-03 19:21:54,293][24817] Num visible devices: 1 [2023-03-03 19:21:54,325][24817] Starting seed is not provided [2023-03-03 19:21:54,327][24817] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-03-03 19:21:54,328][24817] Initializing actor-critic model on device cuda:0 [2023-03-03 19:21:54,329][24817] RunningMeanStd input shape: (3, 72, 128) [2023-03-03 19:21:54,330][24817] RunningMeanStd input shape: (1,) [2023-03-03 19:21:54,422][24817] ConvEncoder: input_channels=3 [2023-03-03 19:21:55,364][24817] Conv encoder output size: 512 [2023-03-03 19:21:55,374][24817] Policy head output size: 512 [2023-03-03 19:21:55,484][24817] Created Actor Critic model with architecture: [2023-03-03 19:21:55,498][24817] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2023-03-03 19:21:55,946][24835] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-03-03 19:21:55,950][24835] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2023-03-03 19:21:55,986][24837] Worker 1 uses CPU cores [1] [2023-03-03 19:21:56,026][24835] Num visible devices: 1 [2023-03-03 19:21:56,232][24836] Worker 0 uses CPU cores [0] [2023-03-03 19:21:56,249][24840] Worker 4 uses CPU cores [0] [2023-03-03 19:21:56,642][24846] Worker 2 uses CPU cores [0] [2023-03-03 19:21:56,836][24850] Worker 3 uses CPU cores [1] [2023-03-03 19:21:56,886][24848] Worker 6 uses CPU cores [0] [2023-03-03 19:21:57,154][24856] Worker 5 uses CPU cores [1] [2023-03-03 19:21:57,169][24858] Worker 7 uses CPU cores [1] [2023-03-03 19:21:59,019][24817] Using optimizer [2023-03-03 19:21:59,021][24817] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2023-03-03 19:21:59,071][24817] Loading model from checkpoint [2023-03-03 19:21:59,086][24817] Loaded experiment state at self.train_step=978, self.env_steps=4005888 [2023-03-03 19:21:59,087][24817] Initialized policy 0 weights for model version 978 [2023-03-03 19:21:59,099][24817] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-03-03 19:21:59,105][24817] LearnerWorker_p0 finished initialization! [2023-03-03 19:21:59,369][24835] RunningMeanStd input shape: (3, 72, 128) [2023-03-03 19:21:59,371][24835] RunningMeanStd input shape: (1,) [2023-03-03 19:21:59,392][24835] ConvEncoder: input_channels=3 [2023-03-03 19:21:59,564][24835] Conv encoder output size: 512 [2023-03-03 19:21:59,565][24835] Policy head output size: 512 [2023-03-03 19:22:02,822][01413] Inference worker 0-0 is ready! [2023-03-03 19:22:02,825][01413] All inference workers are ready! Signal rollout workers to start! [2023-03-03 19:22:02,962][24840] Doom resolution: 160x120, resize resolution: (128, 72) [2023-03-03 19:22:02,966][24846] Doom resolution: 160x120, resize resolution: (128, 72) [2023-03-03 19:22:03,005][24848] Doom resolution: 160x120, resize resolution: (128, 72) [2023-03-03 19:22:03,007][24836] Doom resolution: 160x120, resize resolution: (128, 72) [2023-03-03 19:22:03,033][24850] Doom resolution: 160x120, resize resolution: (128, 72) [2023-03-03 19:22:03,029][24858] Doom resolution: 160x120, resize resolution: (128, 72) [2023-03-03 19:22:03,043][24837] Doom resolution: 160x120, resize resolution: (128, 72) [2023-03-03 19:22:03,052][24856] Doom resolution: 160x120, resize resolution: (128, 72) [2023-03-03 19:22:03,698][01413] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 4005888. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-03-03 19:22:04,367][24840] Decorrelating experience for 0 frames... [2023-03-03 19:22:04,370][24848] Decorrelating experience for 0 frames... [2023-03-03 19:22:04,372][24846] Decorrelating experience for 0 frames... [2023-03-03 19:22:04,443][24858] Decorrelating experience for 0 frames... [2023-03-03 19:22:04,444][24850] Decorrelating experience for 0 frames... [2023-03-03 19:22:04,450][24837] Decorrelating experience for 0 frames... [2023-03-03 19:22:05,124][24837] Decorrelating experience for 32 frames... [2023-03-03 19:22:05,131][24858] Decorrelating experience for 32 frames... [2023-03-03 19:22:05,336][24840] Decorrelating experience for 32 frames... [2023-03-03 19:22:05,338][24836] Decorrelating experience for 0 frames... [2023-03-03 19:22:05,540][24848] Decorrelating experience for 32 frames... [2023-03-03 19:22:05,585][01413] Heartbeat connected on Batcher_0 [2023-03-03 19:22:05,592][01413] Heartbeat connected on LearnerWorker_p0 [2023-03-03 19:22:05,619][24837] Decorrelating experience for 64 frames... [2023-03-03 19:22:05,637][01413] Heartbeat connected on InferenceWorker_p0-w0 [2023-03-03 19:22:06,001][24858] Decorrelating experience for 64 frames... [2023-03-03 19:22:06,434][24846] Decorrelating experience for 32 frames... [2023-03-03 19:22:06,436][24858] Decorrelating experience for 96 frames... [2023-03-03 19:22:06,441][24836] Decorrelating experience for 32 frames... [2023-03-03 19:22:06,576][01413] Heartbeat connected on RolloutWorker_w7 [2023-03-03 19:22:06,940][24848] Decorrelating experience for 64 frames... [2023-03-03 19:22:07,290][24850] Decorrelating experience for 32 frames... [2023-03-03 19:22:07,992][24856] Decorrelating experience for 0 frames... [2023-03-03 19:22:08,155][24840] Decorrelating experience for 64 frames... [2023-03-03 19:22:08,547][24836] Decorrelating experience for 64 frames... [2023-03-03 19:22:08,689][24856] Decorrelating experience for 32 frames... [2023-03-03 19:22:08,698][01413] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 4005888. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-03-03 19:22:08,785][24848] Decorrelating experience for 96 frames... [2023-03-03 19:22:08,995][01413] Heartbeat connected on RolloutWorker_w6 [2023-03-03 19:22:09,730][24846] Decorrelating experience for 64 frames... [2023-03-03 19:22:09,885][24840] Decorrelating experience for 96 frames... [2023-03-03 19:22:10,083][01413] Heartbeat connected on RolloutWorker_w4 [2023-03-03 19:22:10,359][24850] Decorrelating experience for 64 frames... [2023-03-03 19:22:10,519][24836] Decorrelating experience for 96 frames... [2023-03-03 19:22:10,864][24856] Decorrelating experience for 64 frames... [2023-03-03 19:22:10,924][01413] Heartbeat connected on RolloutWorker_w0 [2023-03-03 19:22:11,930][24837] Decorrelating experience for 96 frames... [2023-03-03 19:22:12,251][01413] Heartbeat connected on RolloutWorker_w1 [2023-03-03 19:22:12,704][24850] Decorrelating experience for 96 frames... [2023-03-03 19:22:13,058][01413] Heartbeat connected on RolloutWorker_w3 [2023-03-03 19:22:13,541][24817] Signal inference workers to stop experience collection... [2023-03-03 19:22:13,552][24835] InferenceWorker_p0-w0: stopping experience collection [2023-03-03 19:22:13,698][01413] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 4005888. Throughput: 0: 163.0. Samples: 1630. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-03-03 19:22:13,704][01413] Avg episode reward: [(0, '7.849')] [2023-03-03 19:22:13,741][24856] Decorrelating experience for 96 frames... [2023-03-03 19:22:13,830][24846] Decorrelating experience for 96 frames... [2023-03-03 19:22:13,843][01413] Heartbeat connected on RolloutWorker_w5 [2023-03-03 19:22:13,899][01413] Heartbeat connected on RolloutWorker_w2 [2023-03-03 19:22:16,171][24817] Signal inference workers to resume experience collection... [2023-03-03 19:22:16,172][24835] InferenceWorker_p0-w0: resuming experience collection [2023-03-03 19:22:16,173][24817] Stopping Batcher_0... [2023-03-03 19:22:16,173][24817] Loop batcher_evt_loop terminating... [2023-03-03 19:22:16,174][01413] Component Batcher_0 stopped! [2023-03-03 19:22:16,209][24835] Weights refcount: 2 0 [2023-03-03 19:22:16,215][24835] Stopping InferenceWorker_p0-w0... [2023-03-03 19:22:16,216][24835] Loop inference_proc0-0_evt_loop terminating... [2023-03-03 19:22:16,217][01413] Component InferenceWorker_p0-w0 stopped! [2023-03-03 19:22:16,305][01413] Component RolloutWorker_w5 stopped! [2023-03-03 19:22:16,308][24856] Stopping RolloutWorker_w5... [2023-03-03 19:22:16,319][01413] Component RolloutWorker_w6 stopped! [2023-03-03 19:22:16,323][24850] Stopping RolloutWorker_w3... [2023-03-03 19:22:16,327][24850] Loop rollout_proc3_evt_loop terminating... [2023-03-03 19:22:16,325][01413] Component RolloutWorker_w3 stopped! [2023-03-03 19:22:16,327][24848] Stopping RolloutWorker_w6... [2023-03-03 19:22:16,328][24848] Loop rollout_proc6_evt_loop terminating... [2023-03-03 19:22:16,334][24837] Stopping RolloutWorker_w1... [2023-03-03 19:22:16,335][24837] Loop rollout_proc1_evt_loop terminating... [2023-03-03 19:22:16,315][24856] Loop rollout_proc5_evt_loop terminating... [2023-03-03 19:22:16,335][01413] Component RolloutWorker_w1 stopped! [2023-03-03 19:22:16,345][01413] Component RolloutWorker_w7 stopped! [2023-03-03 19:22:16,344][24858] Stopping RolloutWorker_w7... [2023-03-03 19:22:16,347][24858] Loop rollout_proc7_evt_loop terminating... [2023-03-03 19:22:16,357][01413] Component RolloutWorker_w0 stopped! [2023-03-03 19:22:16,361][24836] Stopping RolloutWorker_w0... [2023-03-03 19:22:16,377][01413] Component RolloutWorker_w2 stopped! [2023-03-03 19:22:16,381][24846] Stopping RolloutWorker_w2... [2023-03-03 19:22:16,407][24836] Loop rollout_proc0_evt_loop terminating... [2023-03-03 19:22:16,381][24846] Loop rollout_proc2_evt_loop terminating... [2023-03-03 19:22:16,441][01413] Component RolloutWorker_w4 stopped! [2023-03-03 19:22:16,447][24840] Stopping RolloutWorker_w4... [2023-03-03 19:22:16,447][24840] Loop rollout_proc4_evt_loop terminating... [2023-03-03 19:22:19,457][24817] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000980_4014080.pth... [2023-03-03 19:22:19,592][24817] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000953_3903488.pth [2023-03-03 19:22:19,606][24817] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000980_4014080.pth... [2023-03-03 19:22:19,761][24817] Stopping LearnerWorker_p0... [2023-03-03 19:22:19,763][24817] Loop learner_proc0_evt_loop terminating... [2023-03-03 19:22:19,761][01413] Component LearnerWorker_p0 stopped! [2023-03-03 19:22:19,768][01413] Waiting for process learner_proc0 to stop... [2023-03-03 19:22:21,098][01413] Waiting for process inference_proc0-0 to join... [2023-03-03 19:22:21,099][01413] Waiting for process rollout_proc0 to join... [2023-03-03 19:22:21,105][01413] Waiting for process rollout_proc1 to join... [2023-03-03 19:22:21,108][01413] Waiting for process rollout_proc2 to join... [2023-03-03 19:22:21,109][01413] Waiting for process rollout_proc3 to join... [2023-03-03 19:22:21,110][01413] Waiting for process rollout_proc4 to join... [2023-03-03 19:22:21,111][01413] Waiting for process rollout_proc5 to join... [2023-03-03 19:22:21,112][01413] Waiting for process rollout_proc6 to join... [2023-03-03 19:22:21,115][01413] Waiting for process rollout_proc7 to join... [2023-03-03 19:22:21,116][01413] Batcher 0 profile tree view: batching: 0.0377, releasing_batches: 0.0005 [2023-03-03 19:22:21,118][01413] InferenceWorker_p0-w0 profile tree view: update_model: 0.0172 wait_policy: 0.0000 wait_policy_total: 6.9901 one_step: 0.0028 handle_policy_step: 3.5149 deserialize: 0.0474, stack: 0.0086, obs_to_device_normalize: 0.3634, forward: 2.6996, send_messages: 0.0727 prepare_outputs: 0.2298 to_cpu: 0.1218 [2023-03-03 19:22:21,120][01413] Learner 0 profile tree view: misc: 0.0000, prepare_batch: 6.9874 train: 1.6930 epoch_init: 0.0000, minibatch_init: 0.0000, losses_postprocess: 0.0005, kl_divergence: 0.0068, after_optimizer: 0.0278 calculate_losses: 0.2558 losses_init: 0.0000, forward_head: 0.1143, bptt_initial: 0.0974, tail: 0.0044, advantages_returns: 0.0012, losses: 0.0302 bptt: 0.0078 bptt_forward_core: 0.0077 update: 1.3976 clip: 0.0057 [2023-03-03 19:22:21,122][01413] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.0005, enqueue_policy_requests: 0.5381, env_step: 1.8304, overhead: 0.0632, complete_rollouts: 0.0143 save_policy_outputs: 0.0280 split_output_tensors: 0.0079 [2023-03-03 19:22:21,123][01413] RolloutWorker_w7 profile tree view: wait_for_trajectories: 0.0038, enqueue_policy_requests: 1.5041, env_step: 2.9376, overhead: 0.1378, complete_rollouts: 0.0101 save_policy_outputs: 0.1014 split_output_tensors: 0.0543 [2023-03-03 19:22:21,125][01413] Loop Runner_EvtLoop terminating... [2023-03-03 19:22:21,127][01413] Runner profile tree view: main_loop: 35.4870 [2023-03-03 19:22:21,128][01413] Collected {0: 4014080}, FPS: 230.8 [2023-03-03 19:22:21,171][01413] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2023-03-03 19:22:21,173][01413] Overriding arg 'num_workers' with value 1 passed from command line [2023-03-03 19:22:21,176][01413] Adding new argument 'no_render'=True that is not in the saved config file! [2023-03-03 19:22:21,178][01413] Adding new argument 'save_video'=True that is not in the saved config file! [2023-03-03 19:22:21,179][01413] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2023-03-03 19:22:21,183][01413] Adding new argument 'video_name'=None that is not in the saved config file! [2023-03-03 19:22:21,184][01413] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! [2023-03-03 19:22:21,185][01413] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2023-03-03 19:22:21,188][01413] Adding new argument 'push_to_hub'=False that is not in the saved config file! [2023-03-03 19:22:21,190][01413] Adding new argument 'hf_repository'=None that is not in the saved config file! [2023-03-03 19:22:21,195][01413] Adding new argument 'policy_index'=0 that is not in the saved config file! [2023-03-03 19:22:21,196][01413] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2023-03-03 19:22:21,203][01413] Adding new argument 'train_script'=None that is not in the saved config file! [2023-03-03 19:22:21,204][01413] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2023-03-03 19:22:21,205][01413] Using frameskip 1 and render_action_repeat=4 for evaluation [2023-03-03 19:22:21,226][01413] RunningMeanStd input shape: (3, 72, 128) [2023-03-03 19:22:21,228][01413] RunningMeanStd input shape: (1,) [2023-03-03 19:22:21,246][01413] ConvEncoder: input_channels=3 [2023-03-03 19:22:21,302][01413] Conv encoder output size: 512 [2023-03-03 19:22:21,304][01413] Policy head output size: 512 [2023-03-03 19:22:21,335][01413] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000980_4014080.pth... [2023-03-03 19:22:21,843][01413] Num frames 100... [2023-03-03 19:22:21,974][01413] Num frames 200... [2023-03-03 19:22:22,088][01413] Num frames 300... [2023-03-03 19:22:22,205][01413] Num frames 400... [2023-03-03 19:22:22,324][01413] Num frames 500... [2023-03-03 19:22:22,437][01413] Num frames 600... [2023-03-03 19:22:22,557][01413] Num frames 700... [2023-03-03 19:22:22,675][01413] Num frames 800... [2023-03-03 19:22:22,789][01413] Num frames 900... [2023-03-03 19:22:22,911][01413] Avg episode rewards: #0: 19.600, true rewards: #0: 9.600 [2023-03-03 19:22:22,913][01413] Avg episode reward: 19.600, avg true_objective: 9.600 [2023-03-03 19:22:22,961][01413] Num frames 1000... [2023-03-03 19:22:23,072][01413] Num frames 1100... [2023-03-03 19:22:23,187][01413] Num frames 1200... [2023-03-03 19:22:23,323][01413] Avg episode rewards: #0: 11.360, true rewards: #0: 6.360 [2023-03-03 19:22:23,325][01413] Avg episode reward: 11.360, avg true_objective: 6.360 [2023-03-03 19:22:23,364][01413] Num frames 1300... [2023-03-03 19:22:23,480][01413] Num frames 1400... [2023-03-03 19:22:23,605][01413] Num frames 1500... [2023-03-03 19:22:23,723][01413] Num frames 1600... [2023-03-03 19:22:23,845][01413] Num frames 1700... [2023-03-03 19:22:23,963][01413] Num frames 1800... [2023-03-03 19:22:24,110][01413] Num frames 1900... [2023-03-03 19:22:24,227][01413] Num frames 2000... [2023-03-03 19:22:24,349][01413] Num frames 2100... [2023-03-03 19:22:24,466][01413] Num frames 2200... [2023-03-03 19:22:24,589][01413] Num frames 2300... [2023-03-03 19:22:24,710][01413] Num frames 2400... [2023-03-03 19:22:24,834][01413] Num frames 2500... [2023-03-03 19:22:24,953][01413] Num frames 2600... [2023-03-03 19:22:25,072][01413] Num frames 2700... [2023-03-03 19:22:25,187][01413] Num frames 2800... [2023-03-03 19:22:25,309][01413] Num frames 2900... [2023-03-03 19:22:25,425][01413] Num frames 3000... [2023-03-03 19:22:25,538][01413] Num frames 3100... [2023-03-03 19:22:25,661][01413] Num frames 3200... [2023-03-03 19:22:25,751][01413] Avg episode rewards: #0: 23.080, true rewards: #0: 10.747 [2023-03-03 19:22:25,753][01413] Avg episode reward: 23.080, avg true_objective: 10.747 [2023-03-03 19:22:25,843][01413] Num frames 3300... [2023-03-03 19:22:25,958][01413] Num frames 3400... [2023-03-03 19:22:26,071][01413] Num frames 3500... [2023-03-03 19:22:26,192][01413] Num frames 3600... [2023-03-03 19:22:26,305][01413] Num frames 3700... [2023-03-03 19:22:26,435][01413] Avg episode rewards: #0: 19.170, true rewards: #0: 9.420 [2023-03-03 19:22:26,437][01413] Avg episode reward: 19.170, avg true_objective: 9.420 [2023-03-03 19:22:26,478][01413] Num frames 3800... [2023-03-03 19:22:26,592][01413] Num frames 3900... [2023-03-03 19:22:26,719][01413] Num frames 4000... [2023-03-03 19:22:26,840][01413] Num frames 4100... [2023-03-03 19:22:26,964][01413] Num frames 4200... [2023-03-03 19:22:27,081][01413] Num frames 4300... [2023-03-03 19:22:27,195][01413] Num frames 4400... [2023-03-03 19:22:27,309][01413] Num frames 4500... [2023-03-03 19:22:27,431][01413] Num frames 4600... [2023-03-03 19:22:27,543][01413] Num frames 4700... [2023-03-03 19:22:27,655][01413] Num frames 4800... [2023-03-03 19:22:27,771][01413] Num frames 4900... [2023-03-03 19:22:27,885][01413] Num frames 5000... [2023-03-03 19:22:28,004][01413] Num frames 5100... [2023-03-03 19:22:28,110][01413] Avg episode rewards: #0: 21.488, true rewards: #0: 10.288 [2023-03-03 19:22:28,111][01413] Avg episode reward: 21.488, avg true_objective: 10.288 [2023-03-03 19:22:28,180][01413] Num frames 5200... [2023-03-03 19:22:28,298][01413] Num frames 5300... [2023-03-03 19:22:28,413][01413] Num frames 5400... [2023-03-03 19:22:28,538][01413] Num frames 5500... [2023-03-03 19:22:28,652][01413] Num frames 5600... [2023-03-03 19:22:28,779][01413] Num frames 5700... [2023-03-03 19:22:28,894][01413] Avg episode rewards: #0: 19.753, true rewards: #0: 9.587 [2023-03-03 19:22:28,895][01413] Avg episode reward: 19.753, avg true_objective: 9.587 [2023-03-03 19:22:28,955][01413] Num frames 5800... [2023-03-03 19:22:29,074][01413] Num frames 5900... [2023-03-03 19:22:29,217][01413] Num frames 6000... [2023-03-03 19:22:29,347][01413] Num frames 6100... [2023-03-03 19:22:29,477][01413] Num frames 6200... [2023-03-03 19:22:29,604][01413] Num frames 6300... [2023-03-03 19:22:29,741][01413] Num frames 6400... [2023-03-03 19:22:29,868][01413] Num frames 6500... [2023-03-03 19:22:29,994][01413] Num frames 6600... [2023-03-03 19:22:30,128][01413] Num frames 6700... [2023-03-03 19:22:30,248][01413] Num frames 6800... [2023-03-03 19:22:30,368][01413] Num frames 6900... [2023-03-03 19:22:30,499][01413] Num frames 7000... [2023-03-03 19:22:30,622][01413] Num frames 7100... [2023-03-03 19:22:30,744][01413] Num frames 7200... [2023-03-03 19:22:30,869][01413] Num frames 7300... [2023-03-03 19:22:30,991][01413] Num frames 7400... [2023-03-03 19:22:31,200][01413] Num frames 7500... [2023-03-03 19:22:31,365][01413] Num frames 7600... [2023-03-03 19:22:31,529][01413] Num frames 7700... [2023-03-03 19:22:31,691][01413] Num frames 7800... [2023-03-03 19:22:31,837][01413] Avg episode rewards: #0: 24.646, true rewards: #0: 11.217 [2023-03-03 19:22:31,839][01413] Avg episode reward: 24.646, avg true_objective: 11.217 [2023-03-03 19:22:31,920][01413] Num frames 7900... [2023-03-03 19:22:32,079][01413] Num frames 8000... [2023-03-03 19:22:32,242][01413] Num frames 8100... [2023-03-03 19:22:32,410][01413] Num frames 8200... [2023-03-03 19:22:32,603][01413] Avg episode rewards: #0: 22.352, true rewards: #0: 10.353 [2023-03-03 19:22:32,610][01413] Avg episode reward: 22.352, avg true_objective: 10.353 [2023-03-03 19:22:32,642][01413] Num frames 8300... [2023-03-03 19:22:32,804][01413] Num frames 8400... [2023-03-03 19:22:32,968][01413] Num frames 8500... [2023-03-03 19:22:33,196][01413] Num frames 8600... [2023-03-03 19:22:33,356][01413] Num frames 8700... [2023-03-03 19:22:33,520][01413] Num frames 8800... [2023-03-03 19:22:33,687][01413] Num frames 8900... [2023-03-03 19:22:33,853][01413] Num frames 9000... [2023-03-03 19:22:34,024][01413] Num frames 9100... [2023-03-03 19:22:34,156][01413] Avg episode rewards: #0: 21.495, true rewards: #0: 10.162 [2023-03-03 19:22:34,158][01413] Avg episode reward: 21.495, avg true_objective: 10.162 [2023-03-03 19:22:34,250][01413] Num frames 9200... [2023-03-03 19:22:34,415][01413] Num frames 9300... [2023-03-03 19:22:34,581][01413] Num frames 9400... [2023-03-03 19:22:34,755][01413] Num frames 9500... [2023-03-03 19:22:34,898][01413] Num frames 9600... [2023-03-03 19:22:35,014][01413] Num frames 9700... [2023-03-03 19:22:35,156][01413] Num frames 9800... [2023-03-03 19:22:35,278][01413] Num frames 9900... [2023-03-03 19:22:35,393][01413] Num frames 10000... [2023-03-03 19:22:35,510][01413] Num frames 10100... [2023-03-03 19:22:35,627][01413] Num frames 10200... [2023-03-03 19:22:35,749][01413] Num frames 10300... [2023-03-03 19:22:35,865][01413] Num frames 10400... [2023-03-03 19:22:35,987][01413] Num frames 10500... [2023-03-03 19:22:36,106][01413] Num frames 10600... [2023-03-03 19:22:36,255][01413] Avg episode rewards: #0: 22.580, true rewards: #0: 10.680 [2023-03-03 19:22:36,257][01413] Avg episode reward: 22.580, avg true_objective: 10.680 [2023-03-03 19:23:36,785][01413] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2023-03-03 19:27:42,398][01413] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2023-03-03 19:27:42,400][01413] Overriding arg 'num_workers' with value 1 passed from command line [2023-03-03 19:27:42,402][01413] Adding new argument 'no_render'=True that is not in the saved config file! [2023-03-03 19:27:42,404][01413] Adding new argument 'save_video'=True that is not in the saved config file! [2023-03-03 19:27:42,406][01413] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2023-03-03 19:27:42,408][01413] Adding new argument 'video_name'=None that is not in the saved config file! [2023-03-03 19:27:42,409][01413] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2023-03-03 19:27:42,411][01413] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2023-03-03 19:27:42,412][01413] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2023-03-03 19:27:42,413][01413] Adding new argument 'hf_repository'='DiegoD616/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2023-03-03 19:27:42,414][01413] Adding new argument 'policy_index'=0 that is not in the saved config file! [2023-03-03 19:27:42,415][01413] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2023-03-03 19:27:42,416][01413] Adding new argument 'train_script'=None that is not in the saved config file! [2023-03-03 19:27:42,417][01413] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2023-03-03 19:27:42,418][01413] Using frameskip 1 and render_action_repeat=4 for evaluation [2023-03-03 19:27:42,437][01413] RunningMeanStd input shape: (3, 72, 128) [2023-03-03 19:27:42,441][01413] RunningMeanStd input shape: (1,) [2023-03-03 19:27:42,470][01413] ConvEncoder: input_channels=3 [2023-03-03 19:27:42,549][01413] Conv encoder output size: 512 [2023-03-03 19:27:42,551][01413] Policy head output size: 512 [2023-03-03 19:27:42,579][01413] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000980_4014080.pth... [2023-03-03 19:27:43,336][01413] Num frames 100... [2023-03-03 19:27:43,496][01413] Num frames 200... [2023-03-03 19:27:43,656][01413] Num frames 300... [2023-03-03 19:27:43,828][01413] Num frames 400... [2023-03-03 19:27:43,992][01413] Num frames 500... [2023-03-03 19:27:44,162][01413] Num frames 600... [2023-03-03 19:27:44,320][01413] Num frames 700... [2023-03-03 19:27:44,481][01413] Num frames 800... [2023-03-03 19:27:44,686][01413] Avg episode rewards: #0: 16.960, true rewards: #0: 8.960 [2023-03-03 19:27:44,688][01413] Avg episode reward: 16.960, avg true_objective: 8.960 [2023-03-03 19:27:44,704][01413] Num frames 900... [2023-03-03 19:27:44,866][01413] Num frames 1000... [2023-03-03 19:27:45,026][01413] Num frames 1100... [2023-03-03 19:27:45,188][01413] Num frames 1200... [2023-03-03 19:27:45,326][01413] Avg episode rewards: #0: 10.740, true rewards: #0: 6.240 [2023-03-03 19:27:45,328][01413] Avg episode reward: 10.740, avg true_objective: 6.240 [2023-03-03 19:27:45,419][01413] Num frames 1300... [2023-03-03 19:27:45,590][01413] Num frames 1400... [2023-03-03 19:27:45,757][01413] Num frames 1500... [2023-03-03 19:27:45,892][01413] Avg episode rewards: #0: 8.490, true rewards: #0: 5.157 [2023-03-03 19:27:45,894][01413] Avg episode reward: 8.490, avg true_objective: 5.157 [2023-03-03 19:27:45,979][01413] Num frames 1600... [2023-03-03 19:27:46,094][01413] Num frames 1700... [2023-03-03 19:27:46,209][01413] Num frames 1800... [2023-03-03 19:27:46,322][01413] Num frames 1900... [2023-03-03 19:27:46,439][01413] Num frames 2000... [2023-03-03 19:27:46,552][01413] Num frames 2100... [2023-03-03 19:27:46,666][01413] Num frames 2200... [2023-03-03 19:27:46,794][01413] Num frames 2300... [2023-03-03 19:27:46,914][01413] Num frames 2400... [2023-03-03 19:27:47,028][01413] Num frames 2500... [2023-03-03 19:27:47,179][01413] Num frames 2600... [2023-03-03 19:27:47,349][01413] Num frames 2700... [2023-03-03 19:27:47,477][01413] Num frames 2800... [2023-03-03 19:27:47,596][01413] Num frames 2900... [2023-03-03 19:27:47,716][01413] Num frames 3000... [2023-03-03 19:27:47,828][01413] Num frames 3100... [2023-03-03 19:27:47,902][01413] Avg episode rewards: #0: 16.537, true rewards: #0: 7.787 [2023-03-03 19:27:47,904][01413] Avg episode reward: 16.537, avg true_objective: 7.787 [2023-03-03 19:27:48,009][01413] Num frames 3200... [2023-03-03 19:27:48,127][01413] Num frames 3300... [2023-03-03 19:27:48,249][01413] Num frames 3400... [2023-03-03 19:27:48,369][01413] Num frames 3500... [2023-03-03 19:27:48,494][01413] Num frames 3600... [2023-03-03 19:27:48,610][01413] Num frames 3700... [2023-03-03 19:27:48,725][01413] Num frames 3800... [2023-03-03 19:27:48,848][01413] Num frames 3900... [2023-03-03 19:27:48,974][01413] Num frames 4000... [2023-03-03 19:27:49,103][01413] Num frames 4100... [2023-03-03 19:27:49,217][01413] Num frames 4200... [2023-03-03 19:27:49,332][01413] Num frames 4300... [2023-03-03 19:27:49,450][01413] Num frames 4400... [2023-03-03 19:27:49,566][01413] Num frames 4500... [2023-03-03 19:27:49,682][01413] Num frames 4600... [2023-03-03 19:27:49,800][01413] Avg episode rewards: #0: 21.102, true rewards: #0: 9.302 [2023-03-03 19:27:49,801][01413] Avg episode reward: 21.102, avg true_objective: 9.302 [2023-03-03 19:27:49,865][01413] Num frames 4700... [2023-03-03 19:27:49,986][01413] Num frames 4800... [2023-03-03 19:27:50,111][01413] Num frames 4900... [2023-03-03 19:27:50,227][01413] Num frames 5000... [2023-03-03 19:27:50,341][01413] Num frames 5100... [2023-03-03 19:27:50,508][01413] Avg episode rewards: #0: 18.825, true rewards: #0: 8.658 [2023-03-03 19:27:50,510][01413] Avg episode reward: 18.825, avg true_objective: 8.658 [2023-03-03 19:27:50,520][01413] Num frames 5200... [2023-03-03 19:27:50,633][01413] Num frames 5300... [2023-03-03 19:27:50,749][01413] Num frames 5400... [2023-03-03 19:27:50,861][01413] Num frames 5500... [2023-03-03 19:27:50,990][01413] Num frames 5600... [2023-03-03 19:27:51,118][01413] Num frames 5700... [2023-03-03 19:27:51,231][01413] Num frames 5800... [2023-03-03 19:27:51,345][01413] Num frames 5900... [2023-03-03 19:27:51,461][01413] Num frames 6000... [2023-03-03 19:27:51,558][01413] Avg episode rewards: #0: 18.753, true rewards: #0: 8.610 [2023-03-03 19:27:51,559][01413] Avg episode reward: 18.753, avg true_objective: 8.610 [2023-03-03 19:27:51,650][01413] Num frames 6100... [2023-03-03 19:27:51,765][01413] Num frames 6200... [2023-03-03 19:27:51,877][01413] Num frames 6300... [2023-03-03 19:27:52,010][01413] Num frames 6400... [2023-03-03 19:27:52,126][01413] Num frames 6500... [2023-03-03 19:27:52,246][01413] Num frames 6600... [2023-03-03 19:27:52,362][01413] Num frames 6700... [2023-03-03 19:27:52,478][01413] Num frames 6800... [2023-03-03 19:27:52,597][01413] Num frames 6900... [2023-03-03 19:27:52,711][01413] Num frames 7000... [2023-03-03 19:27:52,843][01413] Avg episode rewards: #0: 19.064, true rewards: #0: 8.814 [2023-03-03 19:27:52,844][01413] Avg episode reward: 19.064, avg true_objective: 8.814 [2023-03-03 19:27:52,905][01413] Num frames 7100... [2023-03-03 19:27:53,029][01413] Num frames 7200... [2023-03-03 19:27:53,163][01413] Num frames 7300... [2023-03-03 19:27:53,286][01413] Num frames 7400... [2023-03-03 19:27:53,404][01413] Num frames 7500... [2023-03-03 19:27:53,553][01413] Num frames 7600... [2023-03-03 19:27:53,816][01413] Avg episode rewards: #0: 18.212, true rewards: #0: 8.546 [2023-03-03 19:27:53,819][01413] Avg episode reward: 18.212, avg true_objective: 8.546 [2023-03-03 19:27:53,846][01413] Num frames 7700... [2023-03-03 19:27:54,086][01413] Num frames 7800... [2023-03-03 19:27:54,275][01413] Num frames 7900... [2023-03-03 19:27:54,446][01413] Num frames 8000... [2023-03-03 19:27:54,669][01413] Num frames 8100... [2023-03-03 19:27:54,885][01413] Num frames 8200... [2023-03-03 19:27:55,110][01413] Num frames 8300... [2023-03-03 19:27:55,303][01413] Num frames 8400... [2023-03-03 19:27:55,498][01413] Num frames 8500... [2023-03-03 19:27:55,695][01413] Num frames 8600... [2023-03-03 19:27:55,922][01413] Num frames 8700... [2023-03-03 19:27:56,334][01413] Num frames 8800... [2023-03-03 19:27:56,664][01413] Num frames 8900... [2023-03-03 19:27:57,125][01413] Num frames 9000... [2023-03-03 19:27:57,440][01413] Avg episode rewards: #0: 19.858, true rewards: #0: 9.058 [2023-03-03 19:27:57,445][01413] Avg episode reward: 19.858, avg true_objective: 9.058 [2023-03-03 19:28:50,326][01413] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2023-03-03 19:29:01,897][01413] The model has been pushed to https://huggingface.co/DiegoD616/rl_course_vizdoom_health_gathering_supreme [2023-03-03 19:30:21,321][01413] Environment doom_basic already registered, overwriting... [2023-03-03 19:30:21,324][01413] Environment doom_two_colors_easy already registered, overwriting... [2023-03-03 19:30:21,326][01413] Environment doom_two_colors_hard already registered, overwriting... [2023-03-03 19:30:21,328][01413] Environment doom_dm already registered, overwriting... [2023-03-03 19:30:21,330][01413] Environment doom_dwango5 already registered, overwriting... [2023-03-03 19:30:21,336][01413] Environment doom_my_way_home_flat_actions already registered, overwriting... [2023-03-03 19:30:21,338][01413] Environment doom_defend_the_center_flat_actions already registered, overwriting... [2023-03-03 19:30:21,339][01413] Environment doom_my_way_home already registered, overwriting... [2023-03-03 19:30:21,340][01413] Environment doom_deadly_corridor already registered, overwriting... [2023-03-03 19:30:21,341][01413] Environment doom_defend_the_center already registered, overwriting... [2023-03-03 19:30:21,342][01413] Environment doom_defend_the_line already registered, overwriting... [2023-03-03 19:30:21,348][01413] Environment doom_health_gathering already registered, overwriting... [2023-03-03 19:30:21,349][01413] Environment doom_health_gathering_supreme already registered, overwriting... [2023-03-03 19:30:21,350][01413] Environment doom_battle already registered, overwriting... [2023-03-03 19:30:21,351][01413] Environment doom_battle2 already registered, overwriting... [2023-03-03 19:30:21,352][01413] Environment doom_duel_bots already registered, overwriting... [2023-03-03 19:30:21,356][01413] Environment doom_deathmatch_bots already registered, overwriting... [2023-03-03 19:30:21,357][01413] Environment doom_duel already registered, overwriting... [2023-03-03 19:30:21,358][01413] Environment doom_deathmatch_full already registered, overwriting... [2023-03-03 19:30:21,361][01413] Environment doom_benchmark already registered, overwriting... [2023-03-03 19:30:21,362][01413] register_encoder_factory: [2023-03-03 19:30:21,411][01413] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2023-03-03 19:30:21,420][01413] Experiment dir /content/train_dir/default_experiment already exists! [2023-03-03 19:30:21,422][01413] Resuming existing experiment from /content/train_dir/default_experiment... [2023-03-03 19:30:21,428][01413] Weights and Biases integration disabled [2023-03-03 19:30:21,434][01413] Environment var CUDA_VISIBLE_DEVICES is 0 [2023-03-03 19:30:24,200][01413] Starting experiment with the following configuration: help=False algo=APPO env=doom_health_gathering_supreme experiment=default_experiment train_dir=/content/train_dir restart_behavior=resume device=gpu seed=None num_policies=1 async_rl=True serial_mode=False batched_sampling=False num_batches_to_accumulate=2 worker_num_splits=2 policy_workers_per_policy=1 max_policy_lag=1000 num_workers=8 num_envs_per_worker=4 batch_size=1024 num_batches_per_epoch=1 num_epochs=1 rollout=32 recurrence=32 shuffle_minibatches=False gamma=0.99 reward_scale=1.0 reward_clip=1000.0 value_bootstrap=False normalize_returns=True exploration_loss_coeff=0.001 value_loss_coeff=0.5 kl_loss_coeff=0.0 exploration_loss=symmetric_kl gae_lambda=0.95 ppo_clip_ratio=0.1 ppo_clip_value=0.2 with_vtrace=False vtrace_rho=1.0 vtrace_c=1.0 optimizer=adam adam_eps=1e-06 adam_beta1=0.9 adam_beta2=0.999 max_grad_norm=4.0 learning_rate=0.0001 lr_schedule=constant lr_schedule_kl_threshold=0.008 lr_adaptive_min=1e-06 lr_adaptive_max=0.01 obs_subtract_mean=0.0 obs_scale=255.0 normalize_input=True normalize_input_keys=None decorrelate_experience_max_seconds=0 decorrelate_envs_on_one_worker=True actor_worker_gpus=[] set_workers_cpu_affinity=True force_envs_single_thread=False default_niceness=0 log_to_file=True experiment_summaries_interval=10 flush_summaries_interval=30 stats_avg=100 summaries_use_frameskip=True heartbeat_interval=20 heartbeat_reporting_interval=600 train_for_env_steps=4000000 train_for_seconds=10000000000 save_every_sec=120 keep_checkpoints=2 load_checkpoint_kind=latest save_milestones_sec=-1 save_best_every_sec=5 save_best_metric=reward save_best_after=100000 benchmark=False encoder_mlp_layers=[512, 512] encoder_conv_architecture=convnet_simple encoder_conv_mlp_layers=[512] use_rnn=True rnn_size=512 rnn_type=gru rnn_num_layers=1 decoder_mlp_layers=[] nonlinearity=elu policy_initialization=orthogonal policy_init_gain=1.0 actor_critic_share_weights=True adaptive_stddev=True continuous_tanh_scale=0.0 initial_stddev=1.0 use_env_info_cache=False env_gpu_actions=False env_gpu_observations=True env_frameskip=4 env_framestack=1 pixel_format=CHW use_record_episode_statistics=False with_wandb=False wandb_user=None wandb_project=sample_factory wandb_group=None wandb_job_type=SF wandb_tags=[] with_pbt=False pbt_mix_policies_in_one_env=True pbt_period_env_steps=5000000 pbt_start_mutation=20000000 pbt_replace_fraction=0.3 pbt_mutation_rate=0.15 pbt_replace_reward_gap=0.1 pbt_replace_reward_gap_absolute=1e-06 pbt_optimize_gamma=False pbt_target_objective=true_objective pbt_perturb_min=1.1 pbt_perturb_max=1.5 num_agents=-1 num_humans=0 num_bots=-1 start_bot_difficulty=None timelimit=None res_w=128 res_h=72 wide_aspect_ratio=False eval_env_frameskip=1 fps=35 command_line=--env=doom_health_gathering_supreme --num_workers=8 --num_envs_per_worker=4 --train_for_env_steps=4000000 cli_args={'env': 'doom_health_gathering_supreme', 'num_workers': 8, 'num_envs_per_worker': 4, 'train_for_env_steps': 4000000} git_hash=unknown git_repo_name=not a git repository [2023-03-03 19:30:24,202][01413] Saving configuration to /content/train_dir/default_experiment/config.json... [2023-03-03 19:30:24,206][01413] Rollout worker 0 uses device cpu [2023-03-03 19:30:24,209][01413] Rollout worker 1 uses device cpu [2023-03-03 19:30:24,210][01413] Rollout worker 2 uses device cpu [2023-03-03 19:30:24,212][01413] Rollout worker 3 uses device cpu [2023-03-03 19:30:24,214][01413] Rollout worker 4 uses device cpu [2023-03-03 19:30:24,215][01413] Rollout worker 5 uses device cpu [2023-03-03 19:30:24,217][01413] Rollout worker 6 uses device cpu [2023-03-03 19:30:24,219][01413] Rollout worker 7 uses device cpu [2023-03-03 19:30:24,334][01413] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-03-03 19:30:24,336][01413] InferenceWorker_p0-w0: min num requests: 2 [2023-03-03 19:30:24,374][01413] Starting all processes... [2023-03-03 19:30:24,375][01413] Starting process learner_proc0 [2023-03-03 19:30:24,510][01413] Starting all processes... [2023-03-03 19:30:24,525][01413] Starting process inference_proc0-0 [2023-03-03 19:30:24,528][01413] Starting process rollout_proc0 [2023-03-03 19:30:24,529][01413] Starting process rollout_proc1 [2023-03-03 19:30:24,529][01413] Starting process rollout_proc2 [2023-03-03 19:30:24,529][01413] Starting process rollout_proc3 [2023-03-03 19:30:24,529][01413] Starting process rollout_proc4 [2023-03-03 19:30:24,529][01413] Starting process rollout_proc5 [2023-03-03 19:30:24,529][01413] Starting process rollout_proc6 [2023-03-03 19:30:24,529][01413] Starting process rollout_proc7 [2023-03-03 19:30:32,769][27260] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-03-03 19:30:32,771][27260] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2023-03-03 19:30:32,870][27260] Num visible devices: 1 [2023-03-03 19:30:32,901][27260] Starting seed is not provided [2023-03-03 19:30:32,902][27260] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-03-03 19:30:32,903][27260] Initializing actor-critic model on device cuda:0 [2023-03-03 19:30:32,904][27260] RunningMeanStd input shape: (3, 72, 128) [2023-03-03 19:30:32,905][27260] RunningMeanStd input shape: (1,) [2023-03-03 19:30:33,102][27260] ConvEncoder: input_channels=3 [2023-03-03 19:30:34,178][27274] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-03-03 19:30:34,187][27274] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2023-03-03 19:30:34,323][27274] Num visible devices: 1 [2023-03-03 19:30:34,639][27260] Conv encoder output size: 512 [2023-03-03 19:30:34,639][27260] Policy head output size: 512 [2023-03-03 19:30:34,806][27260] Created Actor Critic model with architecture: [2023-03-03 19:30:34,806][27260] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2023-03-03 19:30:34,918][27279] Worker 0 uses CPU cores [0] [2023-03-03 19:30:35,310][27282] Worker 2 uses CPU cores [0] [2023-03-03 19:30:35,703][27289] Worker 3 uses CPU cores [1] [2023-03-03 19:30:35,725][27283] Worker 1 uses CPU cores [1] [2023-03-03 19:30:35,868][27285] Worker 4 uses CPU cores [0] [2023-03-03 19:30:36,323][27296] Worker 6 uses CPU cores [0] [2023-03-03 19:30:36,323][27299] Worker 7 uses CPU cores [1] [2023-03-03 19:30:36,466][27297] Worker 5 uses CPU cores [1] [2023-03-03 19:30:40,760][27260] Using optimizer [2023-03-03 19:30:40,761][27260] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000980_4014080.pth... [2023-03-03 19:30:40,793][27260] Loading model from checkpoint [2023-03-03 19:30:40,798][27260] Loaded experiment state at self.train_step=980, self.env_steps=4014080 [2023-03-03 19:30:40,798][27260] Initialized policy 0 weights for model version 980 [2023-03-03 19:30:40,806][27260] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-03-03 19:30:40,813][27260] LearnerWorker_p0 finished initialization! [2023-03-03 19:30:41,001][27274] RunningMeanStd input shape: (3, 72, 128) [2023-03-03 19:30:41,003][27274] RunningMeanStd input shape: (1,) [2023-03-03 19:30:41,014][27274] ConvEncoder: input_channels=3 [2023-03-03 19:30:41,119][27274] Conv encoder output size: 512 [2023-03-03 19:30:41,119][27274] Policy head output size: 512 [2023-03-03 19:30:41,441][01413] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 4014080. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-03-03 19:30:43,294][01413] Inference worker 0-0 is ready! [2023-03-03 19:30:43,296][01413] All inference workers are ready! Signal rollout workers to start! [2023-03-03 19:30:43,393][27283] Doom resolution: 160x120, resize resolution: (128, 72) [2023-03-03 19:30:43,395][27289] Doom resolution: 160x120, resize resolution: (128, 72) [2023-03-03 19:30:43,398][27299] Doom resolution: 160x120, resize resolution: (128, 72) [2023-03-03 19:30:43,397][27279] Doom resolution: 160x120, resize resolution: (128, 72) [2023-03-03 19:30:43,395][27282] Doom resolution: 160x120, resize resolution: (128, 72) [2023-03-03 19:30:43,399][27297] Doom resolution: 160x120, resize resolution: (128, 72) [2023-03-03 19:30:43,398][27285] Doom resolution: 160x120, resize resolution: (128, 72) [2023-03-03 19:30:43,393][27296] Doom resolution: 160x120, resize resolution: (128, 72) [2023-03-03 19:30:43,841][27289] Decorrelating experience for 0 frames... [2023-03-03 19:30:44,170][27289] Decorrelating experience for 32 frames... [2023-03-03 19:30:44,326][01413] Heartbeat connected on Batcher_0 [2023-03-03 19:30:44,329][01413] Heartbeat connected on LearnerWorker_p0 [2023-03-03 19:30:44,364][01413] Heartbeat connected on InferenceWorker_p0-w0 [2023-03-03 19:30:44,594][27289] Decorrelating experience for 64 frames... [2023-03-03 19:30:44,786][27282] Decorrelating experience for 0 frames... [2023-03-03 19:30:44,792][27279] Decorrelating experience for 0 frames... [2023-03-03 19:30:44,794][27285] Decorrelating experience for 0 frames... [2023-03-03 19:30:44,799][27296] Decorrelating experience for 0 frames... [2023-03-03 19:30:45,423][27299] Decorrelating experience for 0 frames... [2023-03-03 19:30:45,559][27297] Decorrelating experience for 0 frames... [2023-03-03 19:30:45,854][27279] Decorrelating experience for 32 frames... [2023-03-03 19:30:45,873][27296] Decorrelating experience for 32 frames... [2023-03-03 19:30:45,878][27285] Decorrelating experience for 32 frames... [2023-03-03 19:30:46,156][27299] Decorrelating experience for 32 frames... [2023-03-03 19:30:46,201][27283] Decorrelating experience for 0 frames... [2023-03-03 19:30:46,434][01413] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 4014080. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-03-03 19:30:46,927][27282] Decorrelating experience for 32 frames... [2023-03-03 19:30:46,936][27289] Decorrelating experience for 96 frames... [2023-03-03 19:30:46,983][27299] Decorrelating experience for 64 frames... [2023-03-03 19:30:47,195][01413] Heartbeat connected on RolloutWorker_w3 [2023-03-03 19:30:47,669][27285] Decorrelating experience for 64 frames... [2023-03-03 19:30:47,682][27279] Decorrelating experience for 64 frames... [2023-03-03 19:30:47,955][27296] Decorrelating experience for 64 frames... [2023-03-03 19:30:49,564][27297] Decorrelating experience for 32 frames... [2023-03-03 19:30:49,819][27299] Decorrelating experience for 96 frames... [2023-03-03 19:30:50,222][01413] Heartbeat connected on RolloutWorker_w7 [2023-03-03 19:30:50,485][27285] Decorrelating experience for 96 frames... [2023-03-03 19:30:50,536][27283] Decorrelating experience for 32 frames... [2023-03-03 19:30:50,678][27296] Decorrelating experience for 96 frames... [2023-03-03 19:30:50,965][01413] Heartbeat connected on RolloutWorker_w4 [2023-03-03 19:30:51,311][01413] Heartbeat connected on RolloutWorker_w6 [2023-03-03 19:30:51,434][01413] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 4014080. Throughput: 0: 1.2. Samples: 12. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-03-03 19:30:52,668][27282] Decorrelating experience for 64 frames... [2023-03-03 19:30:53,310][27297] Decorrelating experience for 64 frames... [2023-03-03 19:30:54,213][27279] Decorrelating experience for 96 frames... [2023-03-03 19:30:55,015][01413] Heartbeat connected on RolloutWorker_w0 [2023-03-03 19:30:55,056][27283] Decorrelating experience for 64 frames... [2023-03-03 19:30:56,350][27260] Signal inference workers to stop experience collection... [2023-03-03 19:30:56,355][27274] InferenceWorker_p0-w0: stopping experience collection [2023-03-03 19:30:56,434][01413] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 4014080. Throughput: 0: 128.1. Samples: 1920. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-03-03 19:30:56,440][01413] Avg episode reward: [(0, '5.991')] [2023-03-03 19:30:56,669][27283] Decorrelating experience for 96 frames... [2023-03-03 19:30:56,722][27282] Decorrelating experience for 96 frames... [2023-03-03 19:30:56,756][01413] Heartbeat connected on RolloutWorker_w1 [2023-03-03 19:30:56,909][01413] Heartbeat connected on RolloutWorker_w2 [2023-03-03 19:30:57,107][27297] Decorrelating experience for 96 frames... [2023-03-03 19:30:57,166][01413] Heartbeat connected on RolloutWorker_w5 [2023-03-03 19:30:58,314][27260] Signal inference workers to resume experience collection... [2023-03-03 19:30:58,315][27274] InferenceWorker_p0-w0: resuming experience collection [2023-03-03 19:30:58,318][27260] Stopping Batcher_0... [2023-03-03 19:30:58,325][27260] Loop batcher_evt_loop terminating... [2023-03-03 19:30:58,318][01413] Component Batcher_0 stopped! [2023-03-03 19:30:58,392][27274] Weights refcount: 2 0 [2023-03-03 19:30:58,394][01413] Component InferenceWorker_p0-w0 stopped! [2023-03-03 19:30:58,400][27274] Stopping InferenceWorker_p0-w0... [2023-03-03 19:30:58,401][27274] Loop inference_proc0-0_evt_loop terminating... [2023-03-03 19:30:58,475][27289] Stopping RolloutWorker_w3... [2023-03-03 19:30:58,475][01413] Component RolloutWorker_w3 stopped! [2023-03-03 19:30:58,484][27297] Stopping RolloutWorker_w5... [2023-03-03 19:30:58,475][27289] Loop rollout_proc3_evt_loop terminating... [2023-03-03 19:30:58,484][01413] Component RolloutWorker_w5 stopped! [2023-03-03 19:30:58,491][27299] Stopping RolloutWorker_w7... [2023-03-03 19:30:58,495][27297] Loop rollout_proc5_evt_loop terminating... [2023-03-03 19:30:58,491][01413] Component RolloutWorker_w7 stopped! [2023-03-03 19:30:58,498][27299] Loop rollout_proc7_evt_loop terminating... [2023-03-03 19:30:58,507][01413] Component RolloutWorker_w4 stopped! [2023-03-03 19:30:58,511][27285] Stopping RolloutWorker_w4... [2023-03-03 19:30:58,514][01413] Component RolloutWorker_w0 stopped! [2023-03-03 19:30:58,521][27283] Stopping RolloutWorker_w1... [2023-03-03 19:30:58,524][27283] Loop rollout_proc1_evt_loop terminating... [2023-03-03 19:30:58,522][01413] Component RolloutWorker_w1 stopped! [2023-03-03 19:30:58,521][27279] Stopping RolloutWorker_w0... [2023-03-03 19:30:58,530][27285] Loop rollout_proc4_evt_loop terminating... [2023-03-03 19:30:58,533][01413] Component RolloutWorker_w6 stopped! [2023-03-03 19:30:58,537][27296] Stopping RolloutWorker_w6... [2023-03-03 19:30:58,542][01413] Component RolloutWorker_w2 stopped! [2023-03-03 19:30:58,546][27282] Stopping RolloutWorker_w2... [2023-03-03 19:30:58,531][27279] Loop rollout_proc0_evt_loop terminating... [2023-03-03 19:30:58,547][27282] Loop rollout_proc2_evt_loop terminating... [2023-03-03 19:30:58,551][27296] Loop rollout_proc6_evt_loop terminating... [2023-03-03 19:31:00,628][27260] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000982_4022272.pth... [2023-03-03 19:31:00,768][27260] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth [2023-03-03 19:31:00,771][27260] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000982_4022272.pth... [2023-03-03 19:31:00,915][27260] Stopping LearnerWorker_p0... [2023-03-03 19:31:00,917][27260] Loop learner_proc0_evt_loop terminating... [2023-03-03 19:31:00,916][01413] Component LearnerWorker_p0 stopped! [2023-03-03 19:31:00,923][01413] Waiting for process learner_proc0 to stop... [2023-03-03 19:31:01,971][01413] Waiting for process inference_proc0-0 to join... [2023-03-03 19:31:01,977][01413] Waiting for process rollout_proc0 to join... [2023-03-03 19:31:01,979][01413] Waiting for process rollout_proc1 to join... [2023-03-03 19:31:01,983][01413] Waiting for process rollout_proc2 to join... [2023-03-03 19:31:01,986][01413] Waiting for process rollout_proc3 to join... [2023-03-03 19:31:01,988][01413] Waiting for process rollout_proc4 to join... [2023-03-03 19:31:01,991][01413] Waiting for process rollout_proc5 to join... [2023-03-03 19:31:01,993][01413] Waiting for process rollout_proc6 to join... [2023-03-03 19:31:02,001][01413] Waiting for process rollout_proc7 to join... [2023-03-03 19:31:02,003][01413] Batcher 0 profile tree view: batching: 0.0525, releasing_batches: 0.0005 [2023-03-03 19:31:02,006][01413] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0051 wait_policy_total: 8.2867 update_model: 0.0384 weight_update: 0.0150 one_step: 0.0231 handle_policy_step: 4.5366 deserialize: 0.0602, stack: 0.0136, obs_to_device_normalize: 0.3926, forward: 3.5731, send_messages: 0.1274 prepare_outputs: 0.2802 to_cpu: 0.1516 [2023-03-03 19:31:02,010][01413] Learner 0 profile tree view: misc: 0.0000, prepare_batch: 6.0489 train: 1.3914 epoch_init: 0.0000, minibatch_init: 0.0000, losses_postprocess: 0.0004, kl_divergence: 0.0004, after_optimizer: 0.0141 calculate_losses: 0.2152 losses_init: 0.0000, forward_head: 0.1160, bptt_initial: 0.0694, tail: 0.0015, advantages_returns: 0.0010, losses: 0.0183 bptt: 0.0085 bptt_forward_core: 0.0084 update: 1.1575 clip: 0.0072 [2023-03-03 19:31:02,012][01413] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.0003, enqueue_policy_requests: 0.2630, env_step: 0.8738, overhead: 0.0184, complete_rollouts: 0.0001 save_policy_outputs: 0.0678 split_output_tensors: 0.0459 [2023-03-03 19:31:02,014][01413] RolloutWorker_w7 profile tree view: wait_for_trajectories: 0.0012, enqueue_policy_requests: 1.8971, env_step: 3.5446, overhead: 0.1575, complete_rollouts: 0.0094 save_policy_outputs: 0.1102 split_output_tensors: 0.0681 [2023-03-03 19:31:02,017][01413] Loop Runner_EvtLoop terminating... [2023-03-03 19:31:02,020][01413] Runner profile tree view: main_loop: 37.6467 [2023-03-03 19:31:02,022][01413] Collected {0: 4022272}, FPS: 217.6 [2023-03-03 19:31:02,068][01413] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2023-03-03 19:31:02,070][01413] Overriding arg 'num_workers' with value 1 passed from command line [2023-03-03 19:31:02,072][01413] Adding new argument 'no_render'=True that is not in the saved config file! [2023-03-03 19:31:02,074][01413] Adding new argument 'save_video'=True that is not in the saved config file! [2023-03-03 19:31:02,075][01413] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2023-03-03 19:31:02,076][01413] Adding new argument 'video_name'=None that is not in the saved config file! [2023-03-03 19:31:02,077][01413] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2023-03-03 19:31:02,078][01413] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2023-03-03 19:31:02,080][01413] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2023-03-03 19:31:02,081][01413] Adding new argument 'hf_repository'='DiegoD616/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2023-03-03 19:31:02,082][01413] Adding new argument 'policy_index'=0 that is not in the saved config file! [2023-03-03 19:31:02,083][01413] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2023-03-03 19:31:02,084][01413] Adding new argument 'train_script'=None that is not in the saved config file! [2023-03-03 19:31:02,085][01413] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2023-03-03 19:31:02,086][01413] Using frameskip 1 and render_action_repeat=4 for evaluation [2023-03-03 19:31:02,110][01413] RunningMeanStd input shape: (3, 72, 128) [2023-03-03 19:31:02,112][01413] RunningMeanStd input shape: (1,) [2023-03-03 19:31:02,135][01413] ConvEncoder: input_channels=3 [2023-03-03 19:31:02,181][01413] Conv encoder output size: 512 [2023-03-03 19:31:02,182][01413] Policy head output size: 512 [2023-03-03 19:31:02,205][01413] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000982_4022272.pth... [2023-03-03 19:31:02,778][01413] Num frames 100... [2023-03-03 19:31:02,898][01413] Num frames 200... [2023-03-03 19:31:03,013][01413] Num frames 300... [2023-03-03 19:31:03,128][01413] Num frames 400... [2023-03-03 19:31:03,257][01413] Avg episode rewards: #0: 5.480, true rewards: #0: 4.480 [2023-03-03 19:31:03,259][01413] Avg episode reward: 5.480, avg true_objective: 4.480 [2023-03-03 19:31:03,322][01413] Num frames 500... [2023-03-03 19:31:03,440][01413] Num frames 600... [2023-03-03 19:31:03,557][01413] Num frames 700... [2023-03-03 19:31:03,673][01413] Num frames 800... [2023-03-03 19:31:03,835][01413] Avg episode rewards: #0: 5.480, true rewards: #0: 4.480 [2023-03-03 19:31:03,838][01413] Avg episode reward: 5.480, avg true_objective: 4.480 [2023-03-03 19:31:03,848][01413] Num frames 900... [2023-03-03 19:31:03,967][01413] Num frames 1000... [2023-03-03 19:31:04,083][01413] Num frames 1100... [2023-03-03 19:31:04,213][01413] Num frames 1200... [2023-03-03 19:31:04,333][01413] Num frames 1300... [2023-03-03 19:31:04,452][01413] Num frames 1400... [2023-03-03 19:31:04,589][01413] Avg episode rewards: #0: 6.573, true rewards: #0: 4.907 [2023-03-03 19:31:04,591][01413] Avg episode reward: 6.573, avg true_objective: 4.907 [2023-03-03 19:31:04,627][01413] Num frames 1500... [2023-03-03 19:31:04,744][01413] Num frames 1600... [2023-03-03 19:31:04,867][01413] Num frames 1700... [2023-03-03 19:31:04,983][01413] Num frames 1800... [2023-03-03 19:31:05,105][01413] Num frames 1900... [2023-03-03 19:31:05,219][01413] Num frames 2000... [2023-03-03 19:31:05,295][01413] Avg episode rewards: #0: 6.790, true rewards: #0: 5.040 [2023-03-03 19:31:05,297][01413] Avg episode reward: 6.790, avg true_objective: 5.040 [2023-03-03 19:31:05,408][01413] Num frames 2100... [2023-03-03 19:31:05,532][01413] Num frames 2200... [2023-03-03 19:31:05,651][01413] Num frames 2300... [2023-03-03 19:31:05,775][01413] Num frames 2400... [2023-03-03 19:31:05,891][01413] Num frames 2500... [2023-03-03 19:31:06,015][01413] Num frames 2600... [2023-03-03 19:31:06,186][01413] Num frames 2700... [2023-03-03 19:31:06,381][01413] Avg episode rewards: #0: 8.336, true rewards: #0: 5.536 [2023-03-03 19:31:06,384][01413] Avg episode reward: 8.336, avg true_objective: 5.536 [2023-03-03 19:31:06,445][01413] Num frames 2800... [2023-03-03 19:31:06,615][01413] Num frames 2900... [2023-03-03 19:31:06,782][01413] Num frames 3000... [2023-03-03 19:31:06,964][01413] Num frames 3100... [2023-03-03 19:31:07,135][01413] Num frames 3200... [2023-03-03 19:31:07,220][01413] Avg episode rewards: #0: 7.860, true rewards: #0: 5.360 [2023-03-03 19:31:07,222][01413] Avg episode reward: 7.860, avg true_objective: 5.360 [2023-03-03 19:31:07,374][01413] Num frames 3300... [2023-03-03 19:31:07,580][01413] Num frames 3400... [2023-03-03 19:31:07,743][01413] Num frames 3500... [2023-03-03 19:31:07,908][01413] Num frames 3600... [2023-03-03 19:31:08,079][01413] Num frames 3700... [2023-03-03 19:31:08,254][01413] Num frames 3800... [2023-03-03 19:31:08,445][01413] Num frames 3900... [2023-03-03 19:31:08,612][01413] Num frames 4000... [2023-03-03 19:31:08,772][01413] Num frames 4100... [2023-03-03 19:31:08,943][01413] Num frames 4200... [2023-03-03 19:31:09,012][01413] Avg episode rewards: #0: 9.583, true rewards: #0: 6.011 [2023-03-03 19:31:09,014][01413] Avg episode reward: 9.583, avg true_objective: 6.011 [2023-03-03 19:31:09,170][01413] Num frames 4300... [2023-03-03 19:31:09,335][01413] Num frames 4400... [2023-03-03 19:31:09,532][01413] Num frames 4500... [2023-03-03 19:31:09,702][01413] Num frames 4600... [2023-03-03 19:31:09,871][01413] Num frames 4700... [2023-03-03 19:31:10,051][01413] Num frames 4800... [2023-03-03 19:31:10,204][01413] Num frames 4900... [2023-03-03 19:31:10,318][01413] Avg episode rewards: #0: 9.930, true rewards: #0: 6.180 [2023-03-03 19:31:10,319][01413] Avg episode reward: 9.930, avg true_objective: 6.180 [2023-03-03 19:31:10,396][01413] Num frames 5000... [2023-03-03 19:31:10,528][01413] Num frames 5100... [2023-03-03 19:31:10,656][01413] Num frames 5200... [2023-03-03 19:31:10,791][01413] Num frames 5300... [2023-03-03 19:31:10,917][01413] Num frames 5400... [2023-03-03 19:31:11,044][01413] Num frames 5500... [2023-03-03 19:31:11,218][01413] Avg episode rewards: #0: 10.217, true rewards: #0: 6.217 [2023-03-03 19:31:11,220][01413] Avg episode reward: 10.217, avg true_objective: 6.217 [2023-03-03 19:31:11,230][01413] Num frames 5600... [2023-03-03 19:31:11,346][01413] Num frames 5700... [2023-03-03 19:31:11,471][01413] Num frames 5800... [2023-03-03 19:31:11,587][01413] Num frames 5900... [2023-03-03 19:31:11,711][01413] Num frames 6000... [2023-03-03 19:31:11,827][01413] Num frames 6100... [2023-03-03 19:31:11,948][01413] Num frames 6200... [2023-03-03 19:31:12,083][01413] Avg episode rewards: #0: 10.267, true rewards: #0: 6.267 [2023-03-03 19:31:12,084][01413] Avg episode reward: 10.267, avg true_objective: 6.267 [2023-03-03 19:31:47,158][01413] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2023-03-03 19:31:58,934][01413] The model has been pushed to https://huggingface.co/DiegoD616/rl_course_vizdoom_health_gathering_supreme [2023-03-03 19:45:17,290][01413] Environment doom_basic already registered, overwriting... [2023-03-03 19:45:17,292][01413] Environment doom_two_colors_easy already registered, overwriting... [2023-03-03 19:45:17,294][01413] Environment doom_two_colors_hard already registered, overwriting... [2023-03-03 19:45:17,295][01413] Environment doom_dm already registered, overwriting... [2023-03-03 19:45:17,297][01413] Environment doom_dwango5 already registered, overwriting... [2023-03-03 19:45:17,298][01413] Environment doom_my_way_home_flat_actions already registered, overwriting... [2023-03-03 19:45:17,300][01413] Environment doom_defend_the_center_flat_actions already registered, overwriting... [2023-03-03 19:45:17,302][01413] Environment doom_my_way_home already registered, overwriting... [2023-03-03 19:45:17,304][01413] Environment doom_deadly_corridor already registered, overwriting... [2023-03-03 19:45:17,305][01413] Environment doom_defend_the_center already registered, overwriting... [2023-03-03 19:45:17,307][01413] Environment doom_defend_the_line already registered, overwriting... [2023-03-03 19:45:17,308][01413] Environment doom_health_gathering already registered, overwriting... [2023-03-03 19:45:17,310][01413] Environment doom_health_gathering_supreme already registered, overwriting... [2023-03-03 19:45:17,313][01413] Environment doom_battle already registered, overwriting... [2023-03-03 19:45:17,315][01413] Environment doom_battle2 already registered, overwriting... [2023-03-03 19:45:17,316][01413] Environment doom_duel_bots already registered, overwriting... [2023-03-03 19:45:17,317][01413] Environment doom_deathmatch_bots already registered, overwriting... [2023-03-03 19:45:17,319][01413] Environment doom_duel already registered, overwriting... [2023-03-03 19:45:17,321][01413] Environment doom_deathmatch_full already registered, overwriting... [2023-03-03 19:45:17,322][01413] Environment doom_benchmark already registered, overwriting... [2023-03-03 19:45:17,326][01413] register_encoder_factory: [2023-03-03 19:45:17,357][01413] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2023-03-03 19:45:17,359][01413] Overriding arg 'train_for_env_steps' with value 8000000 passed from command line [2023-03-03 19:45:17,365][01413] Experiment dir /content/train_dir/default_experiment already exists! [2023-03-03 19:45:17,366][01413] Resuming existing experiment from /content/train_dir/default_experiment... [2023-03-03 19:45:17,367][01413] Weights and Biases integration disabled [2023-03-03 19:45:17,373][01413] Environment var CUDA_VISIBLE_DEVICES is 0 [2023-03-03 19:45:19,328][01413] Starting experiment with the following configuration: help=False algo=APPO env=doom_health_gathering_supreme experiment=default_experiment train_dir=/content/train_dir restart_behavior=resume device=gpu seed=None num_policies=1 async_rl=True serial_mode=False batched_sampling=False num_batches_to_accumulate=2 worker_num_splits=2 policy_workers_per_policy=1 max_policy_lag=1000 num_workers=8 num_envs_per_worker=4 batch_size=1024 num_batches_per_epoch=1 num_epochs=1 rollout=32 recurrence=32 shuffle_minibatches=False gamma=0.99 reward_scale=1.0 reward_clip=1000.0 value_bootstrap=False normalize_returns=True exploration_loss_coeff=0.001 value_loss_coeff=0.5 kl_loss_coeff=0.0 exploration_loss=symmetric_kl gae_lambda=0.95 ppo_clip_ratio=0.1 ppo_clip_value=0.2 with_vtrace=False vtrace_rho=1.0 vtrace_c=1.0 optimizer=adam adam_eps=1e-06 adam_beta1=0.9 adam_beta2=0.999 max_grad_norm=4.0 learning_rate=0.0001 lr_schedule=constant lr_schedule_kl_threshold=0.008 lr_adaptive_min=1e-06 lr_adaptive_max=0.01 obs_subtract_mean=0.0 obs_scale=255.0 normalize_input=True normalize_input_keys=None decorrelate_experience_max_seconds=0 decorrelate_envs_on_one_worker=True actor_worker_gpus=[] set_workers_cpu_affinity=True force_envs_single_thread=False default_niceness=0 log_to_file=True experiment_summaries_interval=10 flush_summaries_interval=30 stats_avg=100 summaries_use_frameskip=True heartbeat_interval=20 heartbeat_reporting_interval=600 train_for_env_steps=8000000 train_for_seconds=10000000000 save_every_sec=120 keep_checkpoints=2 load_checkpoint_kind=latest save_milestones_sec=-1 save_best_every_sec=5 save_best_metric=reward save_best_after=100000 benchmark=False encoder_mlp_layers=[512, 512] encoder_conv_architecture=convnet_simple encoder_conv_mlp_layers=[512] use_rnn=True rnn_size=512 rnn_type=gru rnn_num_layers=1 decoder_mlp_layers=[] nonlinearity=elu policy_initialization=orthogonal policy_init_gain=1.0 actor_critic_share_weights=True adaptive_stddev=True continuous_tanh_scale=0.0 initial_stddev=1.0 use_env_info_cache=False env_gpu_actions=False env_gpu_observations=True env_frameskip=4 env_framestack=1 pixel_format=CHW use_record_episode_statistics=False with_wandb=False wandb_user=None wandb_project=sample_factory wandb_group=None wandb_job_type=SF wandb_tags=[] with_pbt=False pbt_mix_policies_in_one_env=True pbt_period_env_steps=5000000 pbt_start_mutation=20000000 pbt_replace_fraction=0.3 pbt_mutation_rate=0.15 pbt_replace_reward_gap=0.1 pbt_replace_reward_gap_absolute=1e-06 pbt_optimize_gamma=False pbt_target_objective=true_objective pbt_perturb_min=1.1 pbt_perturb_max=1.5 num_agents=-1 num_humans=0 num_bots=-1 start_bot_difficulty=None timelimit=None res_w=128 res_h=72 wide_aspect_ratio=False eval_env_frameskip=1 fps=35 command_line=--env=doom_health_gathering_supreme --num_workers=8 --num_envs_per_worker=4 --train_for_env_steps=4000000 cli_args={'env': 'doom_health_gathering_supreme', 'num_workers': 8, 'num_envs_per_worker': 4, 'train_for_env_steps': 4000000} git_hash=unknown git_repo_name=not a git repository [2023-03-03 19:45:19,332][01413] Saving configuration to /content/train_dir/default_experiment/config.json... [2023-03-03 19:45:19,338][01413] Rollout worker 0 uses device cpu [2023-03-03 19:45:19,341][01413] Rollout worker 1 uses device cpu [2023-03-03 19:45:19,344][01413] Rollout worker 2 uses device cpu [2023-03-03 19:45:19,345][01413] Rollout worker 3 uses device cpu [2023-03-03 19:45:19,349][01413] Rollout worker 4 uses device cpu [2023-03-03 19:45:19,351][01413] Rollout worker 5 uses device cpu [2023-03-03 19:45:19,352][01413] Rollout worker 6 uses device cpu [2023-03-03 19:45:19,354][01413] Rollout worker 7 uses device cpu [2023-03-03 19:45:19,468][01413] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-03-03 19:45:19,470][01413] InferenceWorker_p0-w0: min num requests: 2 [2023-03-03 19:45:19,501][01413] Starting all processes... [2023-03-03 19:45:19,503][01413] Starting process learner_proc0 [2023-03-03 19:45:19,640][01413] Starting all processes... [2023-03-03 19:45:19,650][01413] Starting process inference_proc0-0 [2023-03-03 19:45:19,653][01413] Starting process rollout_proc0 [2023-03-03 19:45:19,654][01413] Starting process rollout_proc1 [2023-03-03 19:45:19,654][01413] Starting process rollout_proc2 [2023-03-03 19:45:19,654][01413] Starting process rollout_proc3 [2023-03-03 19:45:19,654][01413] Starting process rollout_proc4 [2023-03-03 19:45:19,654][01413] Starting process rollout_proc5 [2023-03-03 19:45:19,654][01413] Starting process rollout_proc6 [2023-03-03 19:45:19,654][01413] Starting process rollout_proc7 [2023-03-03 19:45:29,687][31274] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-03-03 19:45:29,687][31274] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2023-03-03 19:45:29,751][31274] Num visible devices: 1 [2023-03-03 19:45:29,793][31274] Starting seed is not provided [2023-03-03 19:45:29,794][31274] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-03-03 19:45:29,794][31274] Initializing actor-critic model on device cuda:0 [2023-03-03 19:45:29,795][31274] RunningMeanStd input shape: (3, 72, 128) [2023-03-03 19:45:29,796][31274] RunningMeanStd input shape: (1,) [2023-03-03 19:45:29,919][31274] ConvEncoder: input_channels=3 [2023-03-03 19:45:30,861][31288] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-03-03 19:45:30,865][31288] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2023-03-03 19:45:30,872][31274] Conv encoder output size: 512 [2023-03-03 19:45:30,872][31274] Policy head output size: 512 [2023-03-03 19:45:30,950][31288] Num visible devices: 1 [2023-03-03 19:45:30,962][31274] Created Actor Critic model with architecture: [2023-03-03 19:45:30,973][31274] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2023-03-03 19:45:31,334][31289] Worker 1 uses CPU cores [1] [2023-03-03 19:45:31,509][31290] Worker 2 uses CPU cores [0] [2023-03-03 19:45:31,634][31301] Worker 3 uses CPU cores [1] [2023-03-03 19:45:31,705][31295] Worker 5 uses CPU cores [1] [2023-03-03 19:45:32,060][31292] Worker 0 uses CPU cores [0] [2023-03-03 19:45:32,201][31305] Worker 7 uses CPU cores [1] [2023-03-03 19:45:32,266][31303] Worker 4 uses CPU cores [0] [2023-03-03 19:45:32,394][31307] Worker 6 uses CPU cores [0] [2023-03-03 19:45:34,246][31274] Using optimizer [2023-03-03 19:45:34,247][31274] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000982_4022272.pth... [2023-03-03 19:45:34,279][31274] Loading model from checkpoint [2023-03-03 19:45:34,283][31274] Loaded experiment state at self.train_step=982, self.env_steps=4022272 [2023-03-03 19:45:34,284][31274] Initialized policy 0 weights for model version 982 [2023-03-03 19:45:34,286][31274] LearnerWorker_p0 finished initialization! [2023-03-03 19:45:34,287][31274] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-03-03 19:45:34,482][31288] RunningMeanStd input shape: (3, 72, 128) [2023-03-03 19:45:34,484][31288] RunningMeanStd input shape: (1,) [2023-03-03 19:45:34,500][31288] ConvEncoder: input_channels=3 [2023-03-03 19:45:34,597][31288] Conv encoder output size: 512 [2023-03-03 19:45:34,597][31288] Policy head output size: 512 [2023-03-03 19:45:36,836][01413] Inference worker 0-0 is ready! [2023-03-03 19:45:36,838][01413] All inference workers are ready! Signal rollout workers to start! [2023-03-03 19:45:36,938][31290] Doom resolution: 160x120, resize resolution: (128, 72) [2023-03-03 19:45:36,940][31303] Doom resolution: 160x120, resize resolution: (128, 72) [2023-03-03 19:45:36,941][31292] Doom resolution: 160x120, resize resolution: (128, 72) [2023-03-03 19:45:36,936][31307] Doom resolution: 160x120, resize resolution: (128, 72) [2023-03-03 19:45:36,941][31289] Doom resolution: 160x120, resize resolution: (128, 72) [2023-03-03 19:45:36,947][31305] Doom resolution: 160x120, resize resolution: (128, 72) [2023-03-03 19:45:36,947][31301] Doom resolution: 160x120, resize resolution: (128, 72) [2023-03-03 19:45:36,943][31295] Doom resolution: 160x120, resize resolution: (128, 72) [2023-03-03 19:45:37,373][01413] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 4022272. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-03-03 19:45:37,762][31305] Decorrelating experience for 0 frames... [2023-03-03 19:45:37,766][31301] Decorrelating experience for 0 frames... [2023-03-03 19:45:38,239][31290] Decorrelating experience for 0 frames... [2023-03-03 19:45:38,242][31307] Decorrelating experience for 0 frames... [2023-03-03 19:45:38,247][31292] Decorrelating experience for 0 frames... [2023-03-03 19:45:38,924][31295] Decorrelating experience for 0 frames... [2023-03-03 19:45:38,931][31305] Decorrelating experience for 32 frames... [2023-03-03 19:45:38,964][31303] Decorrelating experience for 0 frames... [2023-03-03 19:45:39,455][31307] Decorrelating experience for 32 frames... [2023-03-03 19:45:39,466][01413] Heartbeat connected on LearnerWorker_p0 [2023-03-03 19:45:39,475][01413] Heartbeat connected on Batcher_0 [2023-03-03 19:45:39,511][01413] Heartbeat connected on InferenceWorker_p0-w0 [2023-03-03 19:45:39,872][31301] Decorrelating experience for 32 frames... [2023-03-03 19:45:40,767][31295] Decorrelating experience for 32 frames... [2023-03-03 19:45:40,807][31289] Decorrelating experience for 0 frames... [2023-03-03 19:45:40,928][31292] Decorrelating experience for 32 frames... [2023-03-03 19:45:40,977][31303] Decorrelating experience for 32 frames... [2023-03-03 19:45:41,124][31305] Decorrelating experience for 64 frames... [2023-03-03 19:45:41,660][31290] Decorrelating experience for 32 frames... [2023-03-03 19:45:41,679][31307] Decorrelating experience for 64 frames... [2023-03-03 19:45:41,954][31301] Decorrelating experience for 64 frames... [2023-03-03 19:45:42,373][01413] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 4022272. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-03-03 19:45:42,452][31289] Decorrelating experience for 32 frames... [2023-03-03 19:45:42,641][31295] Decorrelating experience for 64 frames... [2023-03-03 19:45:43,309][31303] Decorrelating experience for 64 frames... [2023-03-03 19:45:43,393][31301] Decorrelating experience for 96 frames... [2023-03-03 19:45:43,570][31290] Decorrelating experience for 64 frames... [2023-03-03 19:45:43,684][01413] Heartbeat connected on RolloutWorker_w3 [2023-03-03 19:45:44,373][31289] Decorrelating experience for 64 frames... [2023-03-03 19:45:44,457][31295] Decorrelating experience for 96 frames... [2023-03-03 19:45:44,792][31307] Decorrelating experience for 96 frames... [2023-03-03 19:45:44,841][01413] Heartbeat connected on RolloutWorker_w5 [2023-03-03 19:45:45,126][01413] Heartbeat connected on RolloutWorker_w6 [2023-03-03 19:45:45,682][31292] Decorrelating experience for 64 frames... [2023-03-03 19:45:45,798][31303] Decorrelating experience for 96 frames... [2023-03-03 19:45:46,121][01413] Heartbeat connected on RolloutWorker_w4 [2023-03-03 19:45:46,143][31290] Decorrelating experience for 96 frames... [2023-03-03 19:45:46,540][31289] Decorrelating experience for 96 frames... [2023-03-03 19:45:46,574][01413] Heartbeat connected on RolloutWorker_w2 [2023-03-03 19:45:46,817][01413] Heartbeat connected on RolloutWorker_w1 [2023-03-03 19:45:47,375][01413] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 4022272. Throughput: 0: 3.2. Samples: 32. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-03-03 19:45:47,377][01413] Avg episode reward: [(0, '2.383')] [2023-03-03 19:45:47,837][31305] Decorrelating experience for 96 frames... [2023-03-03 19:45:48,179][01413] Heartbeat connected on RolloutWorker_w7 [2023-03-03 19:45:49,333][31292] Decorrelating experience for 96 frames... [2023-03-03 19:45:49,506][31274] Signal inference workers to stop experience collection... [2023-03-03 19:45:49,515][31288] InferenceWorker_p0-w0: stopping experience collection [2023-03-03 19:45:49,632][01413] Heartbeat connected on RolloutWorker_w0 [2023-03-03 19:45:51,563][31274] Signal inference workers to resume experience collection... [2023-03-03 19:45:51,566][31288] InferenceWorker_p0-w0: resuming experience collection [2023-03-03 19:45:52,373][01413] Fps is (10 sec: 409.6, 60 sec: 273.1, 300 sec: 273.1). Total num frames: 4026368. Throughput: 0: 172.5. Samples: 2588. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-03-03 19:45:52,377][01413] Avg episode reward: [(0, '4.211')] [2023-03-03 19:45:57,374][01413] Fps is (10 sec: 2457.7, 60 sec: 1228.7, 300 sec: 1228.7). Total num frames: 4046848. Throughput: 0: 262.5. Samples: 5250. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2023-03-03 19:45:57,380][01413] Avg episode reward: [(0, '7.949')] [2023-03-03 19:46:02,131][31288] Updated weights for policy 0, policy_version 992 (0.0391) [2023-03-03 19:46:02,374][01413] Fps is (10 sec: 3686.3, 60 sec: 1638.4, 300 sec: 1638.4). Total num frames: 4063232. Throughput: 0: 396.2. Samples: 9906. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-03 19:46:02,381][01413] Avg episode reward: [(0, '9.996')] [2023-03-03 19:46:07,373][01413] Fps is (10 sec: 3277.2, 60 sec: 1911.5, 300 sec: 1911.5). Total num frames: 4079616. Throughput: 0: 481.9. Samples: 14456. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-03 19:46:07,380][01413] Avg episode reward: [(0, '11.058')] [2023-03-03 19:46:12,373][01413] Fps is (10 sec: 3686.5, 60 sec: 2223.5, 300 sec: 2223.5). Total num frames: 4100096. Throughput: 0: 511.9. Samples: 17916. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-03-03 19:46:12,381][01413] Avg episode reward: [(0, '14.439')] [2023-03-03 19:46:12,521][31288] Updated weights for policy 0, policy_version 1002 (0.0034) [2023-03-03 19:46:17,373][01413] Fps is (10 sec: 4505.5, 60 sec: 2560.0, 300 sec: 2560.0). Total num frames: 4124672. Throughput: 0: 619.5. Samples: 24780. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-03 19:46:17,381][01413] Avg episode reward: [(0, '17.110')] [2023-03-03 19:46:22,374][01413] Fps is (10 sec: 3686.4, 60 sec: 2548.6, 300 sec: 2548.6). Total num frames: 4136960. Throughput: 0: 651.9. Samples: 29334. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-03 19:46:22,380][01413] Avg episode reward: [(0, '18.664')] [2023-03-03 19:46:24,399][31288] Updated weights for policy 0, policy_version 1012 (0.0019) [2023-03-03 19:46:27,373][01413] Fps is (10 sec: 2867.3, 60 sec: 2621.4, 300 sec: 2621.4). Total num frames: 4153344. Throughput: 0: 701.0. Samples: 31546. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-03 19:46:27,376][01413] Avg episode reward: [(0, '18.435')] [2023-03-03 19:46:32,373][01413] Fps is (10 sec: 3686.4, 60 sec: 2755.5, 300 sec: 2755.5). Total num frames: 4173824. Throughput: 0: 835.9. Samples: 37646. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-03 19:46:32,375][01413] Avg episode reward: [(0, '20.128')] [2023-03-03 19:46:32,398][31274] Saving new best policy, reward=20.128! [2023-03-03 19:46:34,339][31288] Updated weights for policy 0, policy_version 1022 (0.0015) [2023-03-03 19:46:37,378][01413] Fps is (10 sec: 4094.1, 60 sec: 2867.0, 300 sec: 2867.0). Total num frames: 4194304. Throughput: 0: 926.6. Samples: 44290. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-03 19:46:37,386][01413] Avg episode reward: [(0, '19.312')] [2023-03-03 19:46:42,373][01413] Fps is (10 sec: 3686.4, 60 sec: 3140.3, 300 sec: 2898.7). Total num frames: 4210688. Throughput: 0: 914.5. Samples: 46402. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-03 19:46:42,376][01413] Avg episode reward: [(0, '18.616')] [2023-03-03 19:46:46,967][31288] Updated weights for policy 0, policy_version 1032 (0.0013) [2023-03-03 19:46:47,373][01413] Fps is (10 sec: 3278.3, 60 sec: 3413.4, 300 sec: 2925.7). Total num frames: 4227072. Throughput: 0: 907.8. Samples: 50756. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-03 19:46:47,375][01413] Avg episode reward: [(0, '18.286')] [2023-03-03 19:46:52,373][01413] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3058.3). Total num frames: 4251648. Throughput: 0: 956.0. Samples: 57478. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-03 19:46:52,375][01413] Avg episode reward: [(0, '18.613')] [2023-03-03 19:46:55,806][31288] Updated weights for policy 0, policy_version 1042 (0.0012) [2023-03-03 19:46:57,373][01413] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3123.2). Total num frames: 4272128. Throughput: 0: 956.5. Samples: 60960. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-03-03 19:46:57,376][01413] Avg episode reward: [(0, '17.342')] [2023-03-03 19:47:02,377][01413] Fps is (10 sec: 3275.6, 60 sec: 3686.2, 300 sec: 3083.9). Total num frames: 4284416. Throughput: 0: 904.8. Samples: 65498. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-03-03 19:47:02,383][01413] Avg episode reward: [(0, '17.712')] [2023-03-03 19:47:07,373][01413] Fps is (10 sec: 2867.2, 60 sec: 3686.4, 300 sec: 3094.8). Total num frames: 4300800. Throughput: 0: 906.3. Samples: 70116. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-03 19:47:07,376][01413] Avg episode reward: [(0, '18.748')] [2023-03-03 19:47:08,962][31288] Updated weights for policy 0, policy_version 1052 (0.0015) [2023-03-03 19:47:12,373][01413] Fps is (10 sec: 3687.7, 60 sec: 3686.4, 300 sec: 3147.5). Total num frames: 4321280. Throughput: 0: 931.2. Samples: 73450. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-03 19:47:12,378][01413] Avg episode reward: [(0, '19.959')] [2023-03-03 19:47:17,373][01413] Fps is (10 sec: 4505.6, 60 sec: 3686.4, 300 sec: 3235.8). Total num frames: 4345856. Throughput: 0: 949.6. Samples: 80378. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-03-03 19:47:17,378][01413] Avg episode reward: [(0, '21.546')] [2023-03-03 19:47:17,391][31274] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001061_4345856.pth... [2023-03-03 19:47:17,648][31274] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000980_4014080.pth [2023-03-03 19:47:17,656][31274] Saving new best policy, reward=21.546! [2023-03-03 19:47:18,813][31288] Updated weights for policy 0, policy_version 1062 (0.0023) [2023-03-03 19:47:22,375][01413] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3198.8). Total num frames: 4358144. Throughput: 0: 895.5. Samples: 84582. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-03 19:47:22,377][01413] Avg episode reward: [(0, '22.196')] [2023-03-03 19:47:22,379][31274] Saving new best policy, reward=22.196! [2023-03-03 19:47:27,373][01413] Fps is (10 sec: 2867.2, 60 sec: 3686.4, 300 sec: 3202.3). Total num frames: 4374528. Throughput: 0: 894.7. Samples: 86664. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-03 19:47:27,379][01413] Avg episode reward: [(0, '23.490')] [2023-03-03 19:47:27,397][31274] Saving new best policy, reward=23.490! [2023-03-03 19:47:30,891][31288] Updated weights for policy 0, policy_version 1072 (0.0013) [2023-03-03 19:47:32,373][01413] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3241.2). Total num frames: 4395008. Throughput: 0: 936.3. Samples: 92888. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-03 19:47:32,375][01413] Avg episode reward: [(0, '22.502')] [2023-03-03 19:47:37,373][01413] Fps is (10 sec: 4096.0, 60 sec: 3686.7, 300 sec: 3276.8). Total num frames: 4415488. Throughput: 0: 933.5. Samples: 99484. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-03-03 19:47:37,380][01413] Avg episode reward: [(0, '20.226')] [2023-03-03 19:47:41,681][31288] Updated weights for policy 0, policy_version 1082 (0.0012) [2023-03-03 19:47:42,373][01413] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3276.8). Total num frames: 4431872. Throughput: 0: 904.1. Samples: 101646. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-03 19:47:42,376][01413] Avg episode reward: [(0, '19.049')] [2023-03-03 19:47:47,373][01413] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3276.8). Total num frames: 4448256. Throughput: 0: 901.0. Samples: 106042. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-03 19:47:47,381][01413] Avg episode reward: [(0, '18.787')] [2023-03-03 19:47:52,142][31288] Updated weights for policy 0, policy_version 1092 (0.0020) [2023-03-03 19:47:52,374][01413] Fps is (10 sec: 4095.9, 60 sec: 3686.4, 300 sec: 3337.5). Total num frames: 4472832. Throughput: 0: 951.4. Samples: 112928. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-03 19:47:52,381][01413] Avg episode reward: [(0, '17.582')] [2023-03-03 19:47:57,373][01413] Fps is (10 sec: 4505.6, 60 sec: 3686.4, 300 sec: 3364.6). Total num frames: 4493312. Throughput: 0: 953.2. Samples: 116346. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-03-03 19:47:57,378][01413] Avg episode reward: [(0, '17.934')] [2023-03-03 19:48:02,373][01413] Fps is (10 sec: 3276.9, 60 sec: 3686.6, 300 sec: 3333.3). Total num frames: 4505600. Throughput: 0: 901.3. Samples: 120936. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-03-03 19:48:02,376][01413] Avg episode reward: [(0, '18.122')] [2023-03-03 19:48:04,400][31288] Updated weights for policy 0, policy_version 1102 (0.0011) [2023-03-03 19:48:07,373][01413] Fps is (10 sec: 2867.2, 60 sec: 3686.4, 300 sec: 3331.4). Total num frames: 4521984. Throughput: 0: 915.5. Samples: 125778. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-03 19:48:07,376][01413] Avg episode reward: [(0, '18.201')] [2023-03-03 19:48:12,373][01413] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3382.5). Total num frames: 4546560. Throughput: 0: 946.2. Samples: 129242. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-03 19:48:12,380][01413] Avg episode reward: [(0, '18.143')] [2023-03-03 19:48:13,793][31288] Updated weights for policy 0, policy_version 1112 (0.0011) [2023-03-03 19:48:17,373][01413] Fps is (10 sec: 4505.6, 60 sec: 3686.4, 300 sec: 3404.8). Total num frames: 4567040. Throughput: 0: 959.9. Samples: 136084. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-03-03 19:48:17,378][01413] Avg episode reward: [(0, '19.915')] [2023-03-03 19:48:22,373][01413] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3400.9). Total num frames: 4583424. Throughput: 0: 910.8. Samples: 140470. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-03 19:48:22,385][01413] Avg episode reward: [(0, '20.909')] [2023-03-03 19:48:26,057][31288] Updated weights for policy 0, policy_version 1122 (0.0015) [2023-03-03 19:48:27,373][01413] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3397.3). Total num frames: 4599808. Throughput: 0: 912.0. Samples: 142684. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-03 19:48:27,379][01413] Avg episode reward: [(0, '21.912')] [2023-03-03 19:48:32,373][01413] Fps is (10 sec: 4095.9, 60 sec: 3822.9, 300 sec: 3440.6). Total num frames: 4624384. Throughput: 0: 962.6. Samples: 149358. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-03 19:48:32,376][01413] Avg episode reward: [(0, '21.839')] [2023-03-03 19:48:35,034][31288] Updated weights for policy 0, policy_version 1132 (0.0024) [2023-03-03 19:48:37,377][01413] Fps is (10 sec: 4094.6, 60 sec: 3754.4, 300 sec: 3436.0). Total num frames: 4640768. Throughput: 0: 946.6. Samples: 155526. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-03 19:48:37,381][01413] Avg episode reward: [(0, '22.141')] [2023-03-03 19:48:42,375][01413] Fps is (10 sec: 3276.2, 60 sec: 3754.5, 300 sec: 3431.7). Total num frames: 4657152. Throughput: 0: 918.3. Samples: 157672. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-03 19:48:42,378][01413] Avg episode reward: [(0, '22.507')] [2023-03-03 19:48:47,373][01413] Fps is (10 sec: 3277.9, 60 sec: 3754.7, 300 sec: 3427.7). Total num frames: 4673536. Throughput: 0: 920.5. Samples: 162358. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-03 19:48:47,380][01413] Avg episode reward: [(0, '22.119')] [2023-03-03 19:48:47,701][31288] Updated weights for policy 0, policy_version 1142 (0.0015) [2023-03-03 19:48:52,377][01413] Fps is (10 sec: 4095.3, 60 sec: 3754.5, 300 sec: 3465.8). Total num frames: 4698112. Throughput: 0: 966.9. Samples: 169294. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-03 19:48:52,379][01413] Avg episode reward: [(0, '20.985')] [2023-03-03 19:48:57,373][01413] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3461.1). Total num frames: 4714496. Throughput: 0: 964.6. Samples: 172648. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-03 19:48:57,379][01413] Avg episode reward: [(0, '21.688')] [2023-03-03 19:48:57,416][31288] Updated weights for policy 0, policy_version 1152 (0.0026) [2023-03-03 19:49:02,374][01413] Fps is (10 sec: 3277.7, 60 sec: 3754.6, 300 sec: 3456.6). Total num frames: 4730880. Throughput: 0: 914.1. Samples: 177218. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2023-03-03 19:49:02,382][01413] Avg episode reward: [(0, '20.497')] [2023-03-03 19:49:07,374][01413] Fps is (10 sec: 3276.7, 60 sec: 3754.6, 300 sec: 3452.3). Total num frames: 4747264. Throughput: 0: 922.5. Samples: 181982. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-03 19:49:07,376][01413] Avg episode reward: [(0, '20.568')] [2023-03-03 19:49:09,593][31288] Updated weights for policy 0, policy_version 1162 (0.0015) [2023-03-03 19:49:12,373][01413] Fps is (10 sec: 4096.4, 60 sec: 3754.7, 300 sec: 3486.4). Total num frames: 4771840. Throughput: 0: 949.3. Samples: 185402. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-03 19:49:12,376][01413] Avg episode reward: [(0, '21.859')] [2023-03-03 19:49:17,373][01413] Fps is (10 sec: 4505.7, 60 sec: 3754.7, 300 sec: 3500.2). Total num frames: 4792320. Throughput: 0: 949.0. Samples: 192062. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-03 19:49:17,378][01413] Avg episode reward: [(0, '22.431')] [2023-03-03 19:49:17,398][31274] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001170_4792320.pth... [2023-03-03 19:49:17,635][31274] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000982_4022272.pth [2023-03-03 19:49:20,289][31288] Updated weights for policy 0, policy_version 1172 (0.0011) [2023-03-03 19:49:22,374][01413] Fps is (10 sec: 3276.5, 60 sec: 3686.3, 300 sec: 3477.0). Total num frames: 4804608. Throughput: 0: 907.7. Samples: 196368. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-03-03 19:49:22,379][01413] Avg episode reward: [(0, '22.739')] [2023-03-03 19:49:27,373][01413] Fps is (10 sec: 3276.7, 60 sec: 3754.7, 300 sec: 3490.5). Total num frames: 4825088. Throughput: 0: 909.1. Samples: 198578. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-03-03 19:49:27,382][01413] Avg episode reward: [(0, '22.757')] [2023-03-03 19:49:30,962][31288] Updated weights for policy 0, policy_version 1182 (0.0013) [2023-03-03 19:49:32,373][01413] Fps is (10 sec: 4096.3, 60 sec: 3686.4, 300 sec: 3503.4). Total num frames: 4845568. Throughput: 0: 955.3. Samples: 205348. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-03 19:49:32,381][01413] Avg episode reward: [(0, '22.798')] [2023-03-03 19:49:37,375][01413] Fps is (10 sec: 4095.2, 60 sec: 3754.8, 300 sec: 3515.7). Total num frames: 4866048. Throughput: 0: 933.8. Samples: 211314. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-03-03 19:49:37,378][01413] Avg episode reward: [(0, '21.857')] [2023-03-03 19:49:42,373][01413] Fps is (10 sec: 3276.8, 60 sec: 3686.5, 300 sec: 3494.1). Total num frames: 4878336. Throughput: 0: 905.9. Samples: 213414. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-03 19:49:42,378][01413] Avg episode reward: [(0, '22.887')] [2023-03-03 19:49:43,005][31288] Updated weights for policy 0, policy_version 1192 (0.0027) [2023-03-03 19:49:47,373][01413] Fps is (10 sec: 3277.5, 60 sec: 3754.7, 300 sec: 3506.2). Total num frames: 4898816. Throughput: 0: 908.8. Samples: 218114. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-03 19:49:47,379][01413] Avg episode reward: [(0, '22.158')] [2023-03-03 19:49:52,373][01413] Fps is (10 sec: 4096.0, 60 sec: 3686.6, 300 sec: 3517.7). Total num frames: 4919296. Throughput: 0: 955.9. Samples: 224998. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-03 19:49:52,376][01413] Avg episode reward: [(0, '21.571')] [2023-03-03 19:49:52,555][31288] Updated weights for policy 0, policy_version 1202 (0.0013) [2023-03-03 19:49:57,373][01413] Fps is (10 sec: 4095.9, 60 sec: 3754.7, 300 sec: 3528.9). Total num frames: 4939776. Throughput: 0: 955.8. Samples: 228412. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-03 19:49:57,376][01413] Avg episode reward: [(0, '20.792')] [2023-03-03 19:50:02,373][01413] Fps is (10 sec: 3276.8, 60 sec: 3686.5, 300 sec: 3508.6). Total num frames: 4952064. Throughput: 0: 903.5. Samples: 232718. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-03 19:50:02,377][01413] Avg episode reward: [(0, '21.205')] [2023-03-03 19:50:05,357][31288] Updated weights for policy 0, policy_version 1212 (0.0032) [2023-03-03 19:50:07,373][01413] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3519.5). Total num frames: 4972544. Throughput: 0: 923.2. Samples: 237910. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-03 19:50:07,376][01413] Avg episode reward: [(0, '18.888')] [2023-03-03 19:50:12,373][01413] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3530.0). Total num frames: 4993024. Throughput: 0: 949.2. Samples: 241294. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-03 19:50:12,381][01413] Avg episode reward: [(0, '19.790')] [2023-03-03 19:50:14,295][31288] Updated weights for policy 0, policy_version 1222 (0.0012) [2023-03-03 19:50:17,376][01413] Fps is (10 sec: 4095.0, 60 sec: 3686.2, 300 sec: 3540.1). Total num frames: 5013504. Throughput: 0: 941.2. Samples: 247706. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-03-03 19:50:17,378][01413] Avg episode reward: [(0, '18.567')] [2023-03-03 19:50:22,373][01413] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3521.1). Total num frames: 5025792. Throughput: 0: 903.4. Samples: 251966. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-03-03 19:50:22,376][01413] Avg episode reward: [(0, '19.201')] [2023-03-03 19:50:26,802][31288] Updated weights for policy 0, policy_version 1232 (0.0023) [2023-03-03 19:50:27,373][01413] Fps is (10 sec: 3277.6, 60 sec: 3686.4, 300 sec: 3531.0). Total num frames: 5046272. Throughput: 0: 911.8. Samples: 254446. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-03 19:50:27,382][01413] Avg episode reward: [(0, '20.518')] [2023-03-03 19:50:32,373][01413] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3554.5). Total num frames: 5070848. Throughput: 0: 962.4. Samples: 261422. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-03-03 19:50:32,376][01413] Avg episode reward: [(0, '21.374')] [2023-03-03 19:50:36,801][31288] Updated weights for policy 0, policy_version 1242 (0.0014) [2023-03-03 19:50:37,377][01413] Fps is (10 sec: 4094.3, 60 sec: 3686.3, 300 sec: 3610.0). Total num frames: 5087232. Throughput: 0: 932.3. Samples: 266956. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-03 19:50:37,380][01413] Avg episode reward: [(0, '21.680')] [2023-03-03 19:50:42,376][01413] Fps is (10 sec: 2866.5, 60 sec: 3686.2, 300 sec: 3651.7). Total num frames: 5099520. Throughput: 0: 903.4. Samples: 269068. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-03 19:50:42,378][01413] Avg episode reward: [(0, '22.377')] [2023-03-03 19:50:47,373][01413] Fps is (10 sec: 3687.9, 60 sec: 3754.7, 300 sec: 3721.1). Total num frames: 5124096. Throughput: 0: 924.4. Samples: 274314. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-03-03 19:50:47,376][01413] Avg episode reward: [(0, '23.924')] [2023-03-03 19:50:47,386][31274] Saving new best policy, reward=23.924! [2023-03-03 19:50:48,261][31288] Updated weights for policy 0, policy_version 1252 (0.0018) [2023-03-03 19:50:52,373][01413] Fps is (10 sec: 4506.7, 60 sec: 3754.7, 300 sec: 3721.1). Total num frames: 5144576. Throughput: 0: 960.1. Samples: 281114. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-03-03 19:50:52,376][01413] Avg episode reward: [(0, '24.254')] [2023-03-03 19:50:52,380][31274] Saving new best policy, reward=24.254! [2023-03-03 19:50:57,374][01413] Fps is (10 sec: 3686.2, 60 sec: 3686.4, 300 sec: 3721.1). Total num frames: 5160960. Throughput: 0: 949.9. Samples: 284038. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-03-03 19:50:57,381][01413] Avg episode reward: [(0, '23.195')] [2023-03-03 19:50:59,280][31288] Updated weights for policy 0, policy_version 1262 (0.0022) [2023-03-03 19:51:02,373][01413] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3721.1). Total num frames: 5177344. Throughput: 0: 902.9. Samples: 288336. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-03 19:51:02,378][01413] Avg episode reward: [(0, '23.338')] [2023-03-03 19:51:07,373][01413] Fps is (10 sec: 3686.6, 60 sec: 3754.7, 300 sec: 3721.1). Total num frames: 5197824. Throughput: 0: 933.6. Samples: 293980. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-03-03 19:51:07,376][01413] Avg episode reward: [(0, '22.612')] [2023-03-03 19:51:10,041][31288] Updated weights for policy 0, policy_version 1272 (0.0014) [2023-03-03 19:51:12,378][01413] Fps is (10 sec: 4093.9, 60 sec: 3754.3, 300 sec: 3707.2). Total num frames: 5218304. Throughput: 0: 955.9. Samples: 297468. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-03 19:51:12,381][01413] Avg episode reward: [(0, '23.452')] [2023-03-03 19:51:17,374][01413] Fps is (10 sec: 3686.3, 60 sec: 3686.5, 300 sec: 3721.1). Total num frames: 5234688. Throughput: 0: 932.1. Samples: 303366. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-03-03 19:51:17,379][01413] Avg episode reward: [(0, '23.168')] [2023-03-03 19:51:17,393][31274] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001278_5234688.pth... [2023-03-03 19:51:17,609][31274] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001061_4345856.pth [2023-03-03 19:51:22,374][01413] Fps is (10 sec: 2868.3, 60 sec: 3686.3, 300 sec: 3707.2). Total num frames: 5246976. Throughput: 0: 902.1. Samples: 307550. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-03-03 19:51:22,383][01413] Avg episode reward: [(0, '22.537')] [2023-03-03 19:51:22,509][31288] Updated weights for policy 0, policy_version 1282 (0.0036) [2023-03-03 19:51:27,373][01413] Fps is (10 sec: 3686.5, 60 sec: 3754.7, 300 sec: 3721.1). Total num frames: 5271552. Throughput: 0: 917.3. Samples: 310346. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-03-03 19:51:27,376][01413] Avg episode reward: [(0, '22.706')] [2023-03-03 19:51:31,540][31288] Updated weights for policy 0, policy_version 1292 (0.0017) [2023-03-03 19:51:32,374][01413] Fps is (10 sec: 4506.0, 60 sec: 3686.4, 300 sec: 3721.2). Total num frames: 5292032. Throughput: 0: 953.4. Samples: 317216. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-03-03 19:51:32,376][01413] Avg episode reward: [(0, '22.752')] [2023-03-03 19:51:37,373][01413] Fps is (10 sec: 3686.4, 60 sec: 3686.7, 300 sec: 3721.1). Total num frames: 5308416. Throughput: 0: 922.8. Samples: 322642. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-03-03 19:51:37,377][01413] Avg episode reward: [(0, '20.749')] [2023-03-03 19:51:42,376][01413] Fps is (10 sec: 3276.1, 60 sec: 3754.7, 300 sec: 3721.1). Total num frames: 5324800. Throughput: 0: 904.3. Samples: 324734. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-03-03 19:51:42,379][01413] Avg episode reward: [(0, '20.255')] [2023-03-03 19:51:44,312][31288] Updated weights for policy 0, policy_version 1302 (0.0015) [2023-03-03 19:51:47,373][01413] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3707.2). Total num frames: 5345280. Throughput: 0: 928.4. Samples: 330114. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-03-03 19:51:47,376][01413] Avg episode reward: [(0, '19.318')] [2023-03-03 19:51:52,373][01413] Fps is (10 sec: 4506.7, 60 sec: 3754.7, 300 sec: 3721.1). Total num frames: 5369856. Throughput: 0: 956.1. Samples: 337006. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-03 19:51:52,376][01413] Avg episode reward: [(0, '20.479')] [2023-03-03 19:51:53,105][31288] Updated weights for policy 0, policy_version 1312 (0.0021) [2023-03-03 19:51:57,373][01413] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3735.0). Total num frames: 5386240. Throughput: 0: 941.7. Samples: 339838. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-03 19:51:57,382][01413] Avg episode reward: [(0, '20.161')] [2023-03-03 19:52:02,374][01413] Fps is (10 sec: 2867.1, 60 sec: 3686.4, 300 sec: 3721.1). Total num frames: 5398528. Throughput: 0: 907.0. Samples: 344182. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-03 19:52:02,381][01413] Avg episode reward: [(0, '20.933')] [2023-03-03 19:52:05,664][31288] Updated weights for policy 0, policy_version 1322 (0.0021) [2023-03-03 19:52:07,373][01413] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3721.1). Total num frames: 5419008. Throughput: 0: 947.0. Samples: 350162. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-03 19:52:07,375][01413] Avg episode reward: [(0, '21.391')] [2023-03-03 19:52:12,373][01413] Fps is (10 sec: 4505.7, 60 sec: 3755.0, 300 sec: 3721.1). Total num frames: 5443584. Throughput: 0: 961.2. Samples: 353602. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-03 19:52:12,376][01413] Avg episode reward: [(0, '21.678')] [2023-03-03 19:52:15,474][31288] Updated weights for policy 0, policy_version 1332 (0.0022) [2023-03-03 19:52:17,375][01413] Fps is (10 sec: 4095.3, 60 sec: 3754.6, 300 sec: 3735.0). Total num frames: 5459968. Throughput: 0: 934.8. Samples: 359282. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-03 19:52:17,381][01413] Avg episode reward: [(0, '20.310')] [2023-03-03 19:52:22,373][01413] Fps is (10 sec: 2867.2, 60 sec: 3754.7, 300 sec: 3721.1). Total num frames: 5472256. Throughput: 0: 910.5. Samples: 363616. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-03 19:52:22,377][01413] Avg episode reward: [(0, '20.521')] [2023-03-03 19:52:26,933][31288] Updated weights for policy 0, policy_version 1342 (0.0026) [2023-03-03 19:52:27,373][01413] Fps is (10 sec: 3687.0, 60 sec: 3754.7, 300 sec: 3735.0). Total num frames: 5496832. Throughput: 0: 931.8. Samples: 366662. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-03 19:52:27,375][01413] Avg episode reward: [(0, '19.121')] [2023-03-03 19:52:32,373][01413] Fps is (10 sec: 4915.3, 60 sec: 3823.0, 300 sec: 3748.9). Total num frames: 5521408. Throughput: 0: 964.8. Samples: 373530. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-03-03 19:52:32,376][01413] Avg episode reward: [(0, '18.033')] [2023-03-03 19:52:37,373][01413] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3735.0). Total num frames: 5533696. Throughput: 0: 924.8. Samples: 378622. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-03-03 19:52:37,377][01413] Avg episode reward: [(0, '18.473')] [2023-03-03 19:52:37,937][31288] Updated weights for policy 0, policy_version 1352 (0.0012) [2023-03-03 19:52:42,373][01413] Fps is (10 sec: 2867.2, 60 sec: 3754.8, 300 sec: 3735.0). Total num frames: 5550080. Throughput: 0: 909.7. Samples: 380776. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-03-03 19:52:42,379][01413] Avg episode reward: [(0, '19.130')] [2023-03-03 19:52:47,373][01413] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3721.1). Total num frames: 5570560. Throughput: 0: 940.4. Samples: 386498. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-03 19:52:47,378][01413] Avg episode reward: [(0, '20.734')] [2023-03-03 19:52:48,409][31288] Updated weights for policy 0, policy_version 1362 (0.0012) [2023-03-03 19:52:52,373][01413] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3735.0). Total num frames: 5595136. Throughput: 0: 961.4. Samples: 393426. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-03-03 19:52:52,376][01413] Avg episode reward: [(0, '20.200')] [2023-03-03 19:52:57,373][01413] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3735.0). Total num frames: 5607424. Throughput: 0: 937.6. Samples: 395792. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-03 19:52:57,379][01413] Avg episode reward: [(0, '21.815')] [2023-03-03 19:53:00,628][31288] Updated weights for policy 0, policy_version 1372 (0.0014) [2023-03-03 19:53:02,373][01413] Fps is (10 sec: 2867.2, 60 sec: 3754.7, 300 sec: 3735.0). Total num frames: 5623808. Throughput: 0: 907.1. Samples: 400100. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-03 19:53:02,376][01413] Avg episode reward: [(0, '21.861')] [2023-03-03 19:53:07,373][01413] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3721.1). Total num frames: 5644288. Throughput: 0: 948.8. Samples: 406310. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-03-03 19:53:07,375][01413] Avg episode reward: [(0, '23.316')] [2023-03-03 19:53:10,210][31288] Updated weights for policy 0, policy_version 1382 (0.0025) [2023-03-03 19:53:12,373][01413] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3735.0). Total num frames: 5668864. Throughput: 0: 956.6. Samples: 409710. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-03-03 19:53:12,376][01413] Avg episode reward: [(0, '22.374')] [2023-03-03 19:53:17,373][01413] Fps is (10 sec: 3686.4, 60 sec: 3686.5, 300 sec: 3721.1). Total num frames: 5681152. Throughput: 0: 920.1. Samples: 414936. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-03 19:53:17,378][01413] Avg episode reward: [(0, '21.913')] [2023-03-03 19:53:17,394][31274] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001387_5681152.pth... [2023-03-03 19:53:17,599][31274] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001170_4792320.pth [2023-03-03 19:53:22,373][01413] Fps is (10 sec: 2867.2, 60 sec: 3754.7, 300 sec: 3721.1). Total num frames: 5697536. Throughput: 0: 904.0. Samples: 419302. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-03-03 19:53:22,376][01413] Avg episode reward: [(0, '22.118')] [2023-03-03 19:53:23,015][31288] Updated weights for policy 0, policy_version 1392 (0.0012) [2023-03-03 19:53:27,373][01413] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3721.1). Total num frames: 5722112. Throughput: 0: 931.6. Samples: 422696. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-03 19:53:27,376][01413] Avg episode reward: [(0, '22.263')] [2023-03-03 19:53:31,700][31288] Updated weights for policy 0, policy_version 1402 (0.0018) [2023-03-03 19:53:32,375][01413] Fps is (10 sec: 4504.8, 60 sec: 3686.3, 300 sec: 3735.0). Total num frames: 5742592. Throughput: 0: 958.3. Samples: 429622. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-03 19:53:32,377][01413] Avg episode reward: [(0, '20.572')] [2023-03-03 19:53:37,382][01413] Fps is (10 sec: 3276.1, 60 sec: 3686.3, 300 sec: 3721.1). Total num frames: 5754880. Throughput: 0: 907.5. Samples: 434264. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-03-03 19:53:37,388][01413] Avg episode reward: [(0, '20.029')] [2023-03-03 19:53:42,373][01413] Fps is (10 sec: 2867.7, 60 sec: 3686.4, 300 sec: 3721.1). Total num frames: 5771264. Throughput: 0: 902.5. Samples: 436404. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-03-03 19:53:42,381][01413] Avg episode reward: [(0, '20.540')] [2023-03-03 19:53:44,425][31288] Updated weights for policy 0, policy_version 1412 (0.0012) [2023-03-03 19:53:47,373][01413] Fps is (10 sec: 4096.8, 60 sec: 3754.7, 300 sec: 3721.2). Total num frames: 5795840. Throughput: 0: 946.2. Samples: 442678. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-03-03 19:53:47,380][01413] Avg episode reward: [(0, '21.443')] [2023-03-03 19:53:52,373][01413] Fps is (10 sec: 4505.6, 60 sec: 3686.4, 300 sec: 3735.0). Total num frames: 5816320. Throughput: 0: 957.0. Samples: 449374. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-03 19:53:52,383][01413] Avg episode reward: [(0, '20.595')] [2023-03-03 19:53:54,255][31288] Updated weights for policy 0, policy_version 1422 (0.0021) [2023-03-03 19:53:57,373][01413] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3735.0). Total num frames: 5832704. Throughput: 0: 928.8. Samples: 451504. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-03-03 19:53:57,378][01413] Avg episode reward: [(0, '20.476')] [2023-03-03 19:54:02,373][01413] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3735.0). Total num frames: 5849088. Throughput: 0: 907.5. Samples: 455772. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-03 19:54:02,376][01413] Avg episode reward: [(0, '20.823')] [2023-03-03 19:54:05,810][31288] Updated weights for policy 0, policy_version 1432 (0.0017) [2023-03-03 19:54:07,373][01413] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3721.1). Total num frames: 5869568. Throughput: 0: 957.4. Samples: 462384. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-03-03 19:54:07,379][01413] Avg episode reward: [(0, '20.989')] [2023-03-03 19:54:12,373][01413] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3721.1). Total num frames: 5890048. Throughput: 0: 954.4. Samples: 465642. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-03 19:54:12,381][01413] Avg episode reward: [(0, '20.680')] [2023-03-03 19:54:17,373][01413] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3721.1). Total num frames: 5902336. Throughput: 0: 908.6. Samples: 470506. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-03-03 19:54:17,380][01413] Avg episode reward: [(0, '21.805')] [2023-03-03 19:54:17,614][31288] Updated weights for policy 0, policy_version 1442 (0.0023) [2023-03-03 19:54:22,373][01413] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3721.1). Total num frames: 5922816. Throughput: 0: 913.9. Samples: 475386. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-03 19:54:22,378][01413] Avg episode reward: [(0, '22.036')] [2023-03-03 19:54:27,290][31288] Updated weights for policy 0, policy_version 1452 (0.0019) [2023-03-03 19:54:27,373][01413] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3735.0). Total num frames: 5947392. Throughput: 0: 942.8. Samples: 478830. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-03-03 19:54:27,376][01413] Avg episode reward: [(0, '22.279')] [2023-03-03 19:54:32,373][01413] Fps is (10 sec: 4096.0, 60 sec: 3686.5, 300 sec: 3721.1). Total num frames: 5963776. Throughput: 0: 958.0. Samples: 485788. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-03-03 19:54:32,377][01413] Avg episode reward: [(0, '22.319')] [2023-03-03 19:54:37,373][01413] Fps is (10 sec: 3276.8, 60 sec: 3754.8, 300 sec: 3735.0). Total num frames: 5980160. Throughput: 0: 905.4. Samples: 490118. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-03-03 19:54:37,376][01413] Avg episode reward: [(0, '22.772')] [2023-03-03 19:54:39,590][31288] Updated weights for policy 0, policy_version 1462 (0.0023) [2023-03-03 19:54:42,374][01413] Fps is (10 sec: 3276.5, 60 sec: 3754.6, 300 sec: 3721.1). Total num frames: 5996544. Throughput: 0: 903.5. Samples: 492160. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-03-03 19:54:42,379][01413] Avg episode reward: [(0, '23.219')] [2023-03-03 19:54:47,373][01413] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3735.0). Total num frames: 6021120. Throughput: 0: 954.9. Samples: 498744. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-03-03 19:54:47,376][01413] Avg episode reward: [(0, '23.721')] [2023-03-03 19:54:48,905][31288] Updated weights for policy 0, policy_version 1472 (0.0014) [2023-03-03 19:54:52,375][01413] Fps is (10 sec: 4505.2, 60 sec: 3754.6, 300 sec: 3735.0). Total num frames: 6041600. Throughput: 0: 946.3. Samples: 504970. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-03-03 19:54:52,377][01413] Avg episode reward: [(0, '23.668')] [2023-03-03 19:54:57,373][01413] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3735.0). Total num frames: 6053888. Throughput: 0: 921.6. Samples: 507114. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-03-03 19:54:57,375][01413] Avg episode reward: [(0, '23.324')] [2023-03-03 19:55:01,637][31288] Updated weights for policy 0, policy_version 1482 (0.0012) [2023-03-03 19:55:02,374][01413] Fps is (10 sec: 2867.6, 60 sec: 3686.4, 300 sec: 3721.1). Total num frames: 6070272. Throughput: 0: 912.9. Samples: 511588. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-03-03 19:55:02,381][01413] Avg episode reward: [(0, '21.741')] [2023-03-03 19:55:07,373][01413] Fps is (10 sec: 4095.9, 60 sec: 3754.7, 300 sec: 3735.0). Total num frames: 6094848. Throughput: 0: 954.9. Samples: 518358. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-03-03 19:55:07,380][01413] Avg episode reward: [(0, '20.774')] [2023-03-03 19:55:11,335][31288] Updated weights for policy 0, policy_version 1492 (0.0019) [2023-03-03 19:55:12,373][01413] Fps is (10 sec: 4096.1, 60 sec: 3686.4, 300 sec: 3721.1). Total num frames: 6111232. Throughput: 0: 952.0. Samples: 521672. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-03 19:55:12,379][01413] Avg episode reward: [(0, '19.197')] [2023-03-03 19:55:17,373][01413] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3735.0). Total num frames: 6127616. Throughput: 0: 895.8. Samples: 526100. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-03 19:55:17,379][01413] Avg episode reward: [(0, '19.311')] [2023-03-03 19:55:17,392][31274] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001496_6127616.pth... [2023-03-03 19:55:17,556][31274] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001278_5234688.pth [2023-03-03 19:55:22,373][01413] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3721.1). Total num frames: 6144000. Throughput: 0: 911.1. Samples: 531118. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-03 19:55:22,381][01413] Avg episode reward: [(0, '20.909')] [2023-03-03 19:55:23,452][31288] Updated weights for policy 0, policy_version 1502 (0.0015) [2023-03-03 19:55:27,373][01413] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3721.1). Total num frames: 6168576. Throughput: 0: 941.8. Samples: 534538. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-03-03 19:55:27,381][01413] Avg episode reward: [(0, '22.000')] [2023-03-03 19:55:32,373][01413] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3721.2). Total num frames: 6184960. Throughput: 0: 940.6. Samples: 541070. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-03-03 19:55:32,378][01413] Avg episode reward: [(0, '21.784')] [2023-03-03 19:55:34,109][31288] Updated weights for policy 0, policy_version 1512 (0.0011) [2023-03-03 19:55:37,381][01413] Fps is (10 sec: 3274.3, 60 sec: 3685.9, 300 sec: 3734.9). Total num frames: 6201344. Throughput: 0: 895.7. Samples: 545282. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-03 19:55:37,383][01413] Avg episode reward: [(0, '21.816')] [2023-03-03 19:55:42,374][01413] Fps is (10 sec: 3276.7, 60 sec: 3686.4, 300 sec: 3707.2). Total num frames: 6217728. Throughput: 0: 894.7. Samples: 547374. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-03-03 19:55:42,381][01413] Avg episode reward: [(0, '22.915')] [2023-03-03 19:55:45,236][31288] Updated weights for policy 0, policy_version 1522 (0.0020) [2023-03-03 19:55:47,373][01413] Fps is (10 sec: 4099.1, 60 sec: 3686.4, 300 sec: 3721.1). Total num frames: 6242304. Throughput: 0: 948.6. Samples: 554276. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-03-03 19:55:47,379][01413] Avg episode reward: [(0, '22.251')] [2023-03-03 19:55:52,373][01413] Fps is (10 sec: 4096.1, 60 sec: 3618.2, 300 sec: 3721.1). Total num frames: 6258688. Throughput: 0: 930.4. Samples: 560226. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-03 19:55:52,379][01413] Avg episode reward: [(0, '21.706')] [2023-03-03 19:55:57,017][31288] Updated weights for policy 0, policy_version 1532 (0.0012) [2023-03-03 19:55:57,373][01413] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3721.1). Total num frames: 6275072. Throughput: 0: 903.9. Samples: 562346. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-03-03 19:55:57,380][01413] Avg episode reward: [(0, '22.830')] [2023-03-03 19:56:02,373][01413] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3721.1). Total num frames: 6295552. Throughput: 0: 914.7. Samples: 567260. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-03 19:56:02,376][01413] Avg episode reward: [(0, '22.739')] [2023-03-03 19:56:06,829][31288] Updated weights for policy 0, policy_version 1542 (0.0015) [2023-03-03 19:56:07,373][01413] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3721.2). Total num frames: 6316032. Throughput: 0: 955.8. Samples: 574128. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-03 19:56:07,376][01413] Avg episode reward: [(0, '22.411')] [2023-03-03 19:56:12,378][01413] Fps is (10 sec: 4094.1, 60 sec: 3754.4, 300 sec: 3734.9). Total num frames: 6336512. Throughput: 0: 951.2. Samples: 577348. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-03 19:56:12,384][01413] Avg episode reward: [(0, '21.846')] [2023-03-03 19:56:17,373][01413] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3735.0). Total num frames: 6348800. Throughput: 0: 903.2. Samples: 581712. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-03 19:56:17,376][01413] Avg episode reward: [(0, '21.887')] [2023-03-03 19:56:19,354][31288] Updated weights for policy 0, policy_version 1552 (0.0012) [2023-03-03 19:56:22,373][01413] Fps is (10 sec: 3278.3, 60 sec: 3754.7, 300 sec: 3721.1). Total num frames: 6369280. Throughput: 0: 932.2. Samples: 587224. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-03 19:56:22,381][01413] Avg episode reward: [(0, '21.997')] [2023-03-03 19:56:27,373][01413] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3735.0). Total num frames: 6393856. Throughput: 0: 961.8. Samples: 590656. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-03 19:56:27,381][01413] Avg episode reward: [(0, '22.596')] [2023-03-03 19:56:28,219][31288] Updated weights for policy 0, policy_version 1562 (0.0017) [2023-03-03 19:56:32,373][01413] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3735.0). Total num frames: 6410240. Throughput: 0: 944.7. Samples: 596786. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-03 19:56:32,379][01413] Avg episode reward: [(0, '21.915')] [2023-03-03 19:56:37,375][01413] Fps is (10 sec: 2866.8, 60 sec: 3686.8, 300 sec: 3721.1). Total num frames: 6422528. Throughput: 0: 907.6. Samples: 601068. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-03-03 19:56:37,379][01413] Avg episode reward: [(0, '23.013')] [2023-03-03 19:56:40,698][31288] Updated weights for policy 0, policy_version 1572 (0.0018) [2023-03-03 19:56:42,379][01413] Fps is (10 sec: 3275.0, 60 sec: 3754.3, 300 sec: 3721.0). Total num frames: 6443008. Throughput: 0: 918.5. Samples: 603682. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-03-03 19:56:42,383][01413] Avg episode reward: [(0, '23.571')] [2023-03-03 19:56:47,373][01413] Fps is (10 sec: 4506.2, 60 sec: 3754.7, 300 sec: 3721.1). Total num frames: 6467584. Throughput: 0: 961.4. Samples: 610522. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-03 19:56:47,381][01413] Avg episode reward: [(0, '21.621')] [2023-03-03 19:56:50,273][31288] Updated weights for policy 0, policy_version 1582 (0.0011) [2023-03-03 19:56:52,373][01413] Fps is (10 sec: 4098.3, 60 sec: 3754.7, 300 sec: 3721.1). Total num frames: 6483968. Throughput: 0: 931.4. Samples: 616040. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-03 19:56:52,381][01413] Avg episode reward: [(0, '21.391')] [2023-03-03 19:56:57,373][01413] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3735.0). Total num frames: 6500352. Throughput: 0: 907.6. Samples: 618184. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-03-03 19:56:57,376][01413] Avg episode reward: [(0, '20.623')] [2023-03-03 19:57:02,073][31288] Updated weights for policy 0, policy_version 1592 (0.0020) [2023-03-03 19:57:02,373][01413] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3735.0). Total num frames: 6520832. Throughput: 0: 932.0. Samples: 623650. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-03-03 19:57:02,375][01413] Avg episode reward: [(0, '20.141')] [2023-03-03 19:57:07,373][01413] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3721.1). Total num frames: 6541312. Throughput: 0: 963.2. Samples: 630566. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-03-03 19:57:07,378][01413] Avg episode reward: [(0, '20.278')] [2023-03-03 19:57:12,373][01413] Fps is (10 sec: 3686.4, 60 sec: 3686.7, 300 sec: 3721.1). Total num frames: 6557696. Throughput: 0: 949.7. Samples: 633392. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-03 19:57:12,377][01413] Avg episode reward: [(0, '21.457')] [2023-03-03 19:57:12,807][31288] Updated weights for policy 0, policy_version 1602 (0.0030) [2023-03-03 19:57:17,373][01413] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3735.0). Total num frames: 6574080. Throughput: 0: 906.8. Samples: 637594. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-03 19:57:17,377][01413] Avg episode reward: [(0, '22.956')] [2023-03-03 19:57:17,399][31274] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001605_6574080.pth... [2023-03-03 19:57:17,604][31274] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001387_5681152.pth [2023-03-03 19:57:22,373][01413] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3721.1). Total num frames: 6594560. Throughput: 0: 940.2. Samples: 643374. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-03 19:57:22,376][01413] Avg episode reward: [(0, '22.366')] [2023-03-03 19:57:23,858][31288] Updated weights for policy 0, policy_version 1612 (0.0023) [2023-03-03 19:57:27,373][01413] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3707.2). Total num frames: 6615040. Throughput: 0: 959.1. Samples: 646834. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-03 19:57:27,379][01413] Avg episode reward: [(0, '22.631')] [2023-03-03 19:57:32,377][01413] Fps is (10 sec: 4094.6, 60 sec: 3754.4, 300 sec: 3735.0). Total num frames: 6635520. Throughput: 0: 937.5. Samples: 652712. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-03-03 19:57:32,382][01413] Avg episode reward: [(0, '21.819')] [2023-03-03 19:57:35,290][31288] Updated weights for policy 0, policy_version 1622 (0.0038) [2023-03-03 19:57:37,373][01413] Fps is (10 sec: 3276.8, 60 sec: 3754.8, 300 sec: 3721.1). Total num frames: 6647808. Throughput: 0: 907.3. Samples: 656868. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-03-03 19:57:37,376][01413] Avg episode reward: [(0, '21.235')] [2023-03-03 19:57:42,373][01413] Fps is (10 sec: 3277.9, 60 sec: 3755.0, 300 sec: 3721.1). Total num frames: 6668288. Throughput: 0: 923.5. Samples: 659740. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-03 19:57:42,376][01413] Avg episode reward: [(0, '20.124')] [2023-03-03 19:57:45,644][31288] Updated weights for policy 0, policy_version 1632 (0.0022) [2023-03-03 19:57:47,373][01413] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3707.2). Total num frames: 6688768. Throughput: 0: 952.0. Samples: 666488. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-03 19:57:47,376][01413] Avg episode reward: [(0, '20.434')] [2023-03-03 19:57:52,374][01413] Fps is (10 sec: 3686.2, 60 sec: 3686.4, 300 sec: 3721.1). Total num frames: 6705152. Throughput: 0: 916.5. Samples: 671810. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-03 19:57:52,376][01413] Avg episode reward: [(0, '21.189')] [2023-03-03 19:57:57,373][01413] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3721.1). Total num frames: 6721536. Throughput: 0: 902.4. Samples: 674002. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-03-03 19:57:57,375][01413] Avg episode reward: [(0, '22.005')] [2023-03-03 19:57:58,008][31288] Updated weights for policy 0, policy_version 1642 (0.0019) [2023-03-03 19:58:02,373][01413] Fps is (10 sec: 3686.6, 60 sec: 3686.4, 300 sec: 3721.1). Total num frames: 6742016. Throughput: 0: 936.7. Samples: 679746. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-03-03 19:58:02,376][01413] Avg episode reward: [(0, '21.392')] [2023-03-03 19:58:07,018][31288] Updated weights for policy 0, policy_version 1652 (0.0014) [2023-03-03 19:58:07,373][01413] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3721.1). Total num frames: 6766592. Throughput: 0: 960.1. Samples: 686580. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-03-03 19:58:07,376][01413] Avg episode reward: [(0, '22.923')] [2023-03-03 19:58:12,373][01413] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3735.0). Total num frames: 6782976. Throughput: 0: 939.4. Samples: 689106. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-03 19:58:12,376][01413] Avg episode reward: [(0, '23.691')] [2023-03-03 19:58:17,374][01413] Fps is (10 sec: 2866.9, 60 sec: 3686.3, 300 sec: 3721.1). Total num frames: 6795264. Throughput: 0: 904.7. Samples: 693422. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-03-03 19:58:17,377][01413] Avg episode reward: [(0, '24.230')] [2023-03-03 19:58:19,600][31288] Updated weights for policy 0, policy_version 1662 (0.0030) [2023-03-03 19:58:22,373][01413] Fps is (10 sec: 3686.5, 60 sec: 3754.7, 300 sec: 3721.1). Total num frames: 6819840. Throughput: 0: 946.0. Samples: 699440. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-03 19:58:22,378][01413] Avg episode reward: [(0, '22.704')] [2023-03-03 19:58:27,374][01413] Fps is (10 sec: 4506.0, 60 sec: 3754.7, 300 sec: 3721.1). Total num frames: 6840320. Throughput: 0: 958.7. Samples: 702880. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-03 19:58:27,376][01413] Avg episode reward: [(0, '22.850')] [2023-03-03 19:58:29,243][31288] Updated weights for policy 0, policy_version 1672 (0.0014) [2023-03-03 19:58:32,373][01413] Fps is (10 sec: 3686.4, 60 sec: 3686.6, 300 sec: 3735.0). Total num frames: 6856704. Throughput: 0: 928.0. Samples: 708250. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-03 19:58:32,379][01413] Avg episode reward: [(0, '22.226')] [2023-03-03 19:58:37,373][01413] Fps is (10 sec: 2867.3, 60 sec: 3686.4, 300 sec: 3721.1). Total num frames: 6868992. Throughput: 0: 906.9. Samples: 712620. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-03-03 19:58:37,376][01413] Avg episode reward: [(0, '21.483')] [2023-03-03 19:58:41,143][31288] Updated weights for policy 0, policy_version 1682 (0.0021) [2023-03-03 19:58:42,380][01413] Fps is (10 sec: 3684.1, 60 sec: 3754.3, 300 sec: 3721.0). Total num frames: 6893568. Throughput: 0: 933.6. Samples: 716018. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-03-03 19:58:42,383][01413] Avg episode reward: [(0, '20.164')] [2023-03-03 19:58:47,377][01413] Fps is (10 sec: 4504.0, 60 sec: 3754.5, 300 sec: 3721.1). Total num frames: 6914048. Throughput: 0: 956.1. Samples: 722774. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-03-03 19:58:47,379][01413] Avg episode reward: [(0, '21.454')] [2023-03-03 19:58:51,787][31288] Updated weights for policy 0, policy_version 1692 (0.0021) [2023-03-03 19:58:52,373][01413] Fps is (10 sec: 3688.8, 60 sec: 3754.7, 300 sec: 3721.1). Total num frames: 6930432. Throughput: 0: 911.4. Samples: 727592. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-03 19:58:52,380][01413] Avg episode reward: [(0, '21.499')] [2023-03-03 19:58:57,373][01413] Fps is (10 sec: 3277.9, 60 sec: 3754.7, 300 sec: 3721.1). Total num frames: 6946816. Throughput: 0: 903.5. Samples: 729762. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-03-03 19:58:57,375][01413] Avg episode reward: [(0, '21.691')] [2023-03-03 19:59:02,373][01413] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3721.1). Total num frames: 6967296. Throughput: 0: 942.5. Samples: 735834. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-03 19:59:02,381][01413] Avg episode reward: [(0, '22.568')] [2023-03-03 19:59:02,604][31288] Updated weights for policy 0, policy_version 1702 (0.0015) [2023-03-03 19:59:07,376][01413] Fps is (10 sec: 4504.5, 60 sec: 3754.5, 300 sec: 3735.0). Total num frames: 6991872. Throughput: 0: 961.1. Samples: 742690. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-03 19:59:07,381][01413] Avg episode reward: [(0, '23.148')] [2023-03-03 19:59:12,373][01413] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3735.0). Total num frames: 7004160. Throughput: 0: 932.0. Samples: 744818. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-03 19:59:12,375][01413] Avg episode reward: [(0, '23.696')] [2023-03-03 19:59:14,513][31288] Updated weights for policy 0, policy_version 1712 (0.0014) [2023-03-03 19:59:17,374][01413] Fps is (10 sec: 2867.7, 60 sec: 3754.7, 300 sec: 3721.1). Total num frames: 7020544. Throughput: 0: 906.5. Samples: 749044. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-03 19:59:17,376][01413] Avg episode reward: [(0, '23.813')] [2023-03-03 19:59:17,393][31274] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001714_7020544.pth... [2023-03-03 19:59:17,523][31274] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001496_6127616.pth [2023-03-03 19:59:22,373][01413] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3707.2). Total num frames: 7041024. Throughput: 0: 951.6. Samples: 755440. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-03 19:59:22,379][01413] Avg episode reward: [(0, '24.171')] [2023-03-03 19:59:24,430][31288] Updated weights for policy 0, policy_version 1722 (0.0026) [2023-03-03 19:59:27,373][01413] Fps is (10 sec: 4505.8, 60 sec: 3754.7, 300 sec: 3735.0). Total num frames: 7065600. Throughput: 0: 953.0. Samples: 758898. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-03-03 19:59:27,375][01413] Avg episode reward: [(0, '24.763')] [2023-03-03 19:59:27,392][31274] Saving new best policy, reward=24.763! [2023-03-03 19:59:32,373][01413] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3721.1). Total num frames: 7077888. Throughput: 0: 914.6. Samples: 763928. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-03 19:59:32,378][01413] Avg episode reward: [(0, '24.351')] [2023-03-03 19:59:36,904][31288] Updated weights for policy 0, policy_version 1732 (0.0011) [2023-03-03 19:59:37,373][01413] Fps is (10 sec: 2867.2, 60 sec: 3754.7, 300 sec: 3721.1). Total num frames: 7094272. Throughput: 0: 910.5. Samples: 768566. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-03 19:59:37,380][01413] Avg episode reward: [(0, '24.241')] [2023-03-03 19:59:42,373][01413] Fps is (10 sec: 4096.0, 60 sec: 3755.1, 300 sec: 3721.1). Total num frames: 7118848. Throughput: 0: 938.8. Samples: 772008. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-03 19:59:42,380][01413] Avg episode reward: [(0, '23.245')] [2023-03-03 19:59:45,838][31288] Updated weights for policy 0, policy_version 1742 (0.0012) [2023-03-03 19:59:47,373][01413] Fps is (10 sec: 4505.5, 60 sec: 3754.9, 300 sec: 3721.1). Total num frames: 7139328. Throughput: 0: 953.3. Samples: 778732. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-03 19:59:47,376][01413] Avg episode reward: [(0, '22.603')] [2023-03-03 19:59:52,373][01413] Fps is (10 sec: 3276.9, 60 sec: 3686.4, 300 sec: 3721.1). Total num frames: 7151616. Throughput: 0: 902.4. Samples: 783296. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-03 19:59:52,379][01413] Avg episode reward: [(0, '20.854')] [2023-03-03 19:59:57,373][01413] Fps is (10 sec: 2867.3, 60 sec: 3686.4, 300 sec: 3721.1). Total num frames: 7168000. Throughput: 0: 904.4. Samples: 785514. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-03 19:59:57,375][01413] Avg episode reward: [(0, '21.911')] [2023-03-03 19:59:58,319][31288] Updated weights for policy 0, policy_version 1752 (0.0031) [2023-03-03 20:00:02,373][01413] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3721.1). Total num frames: 7192576. Throughput: 0: 953.2. Samples: 791936. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-03 20:00:02,377][01413] Avg episode reward: [(0, '22.127')] [2023-03-03 20:00:07,379][01413] Fps is (10 sec: 4503.1, 60 sec: 3686.2, 300 sec: 3734.9). Total num frames: 7213056. Throughput: 0: 953.7. Samples: 798360. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-03 20:00:07,388][01413] Avg episode reward: [(0, '22.419')] [2023-03-03 20:00:08,260][31288] Updated weights for policy 0, policy_version 1762 (0.0023) [2023-03-03 20:00:12,373][01413] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3721.1). Total num frames: 7225344. Throughput: 0: 923.4. Samples: 800452. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-03-03 20:00:12,377][01413] Avg episode reward: [(0, '22.472')] [2023-03-03 20:00:17,373][01413] Fps is (10 sec: 3278.6, 60 sec: 3754.7, 300 sec: 3735.0). Total num frames: 7245824. Throughput: 0: 909.1. Samples: 804836. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-03 20:00:17,375][01413] Avg episode reward: [(0, '24.526')] [2023-03-03 20:00:19,941][31288] Updated weights for policy 0, policy_version 1772 (0.0015) [2023-03-03 20:00:22,373][01413] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3721.1). Total num frames: 7266304. Throughput: 0: 958.5. Samples: 811698. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-03 20:00:22,376][01413] Avg episode reward: [(0, '24.459')] [2023-03-03 20:00:27,375][01413] Fps is (10 sec: 4095.2, 60 sec: 3686.3, 300 sec: 3735.0). Total num frames: 7286784. Throughput: 0: 959.1. Samples: 815168. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-03-03 20:00:27,380][01413] Avg episode reward: [(0, '23.767')] [2023-03-03 20:00:30,850][31288] Updated weights for policy 0, policy_version 1782 (0.0014) [2023-03-03 20:00:32,373][01413] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3735.1). Total num frames: 7303168. Throughput: 0: 913.1. Samples: 819820. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-03 20:00:32,380][01413] Avg episode reward: [(0, '23.585')] [2023-03-03 20:00:37,373][01413] Fps is (10 sec: 3277.4, 60 sec: 3754.7, 300 sec: 3735.0). Total num frames: 7319552. Throughput: 0: 917.7. Samples: 824594. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-03 20:00:37,378][01413] Avg episode reward: [(0, '23.161')] [2023-03-03 20:00:41,610][31288] Updated weights for policy 0, policy_version 1792 (0.0013) [2023-03-03 20:00:42,373][01413] Fps is (10 sec: 3686.3, 60 sec: 3686.4, 300 sec: 3721.1). Total num frames: 7340032. Throughput: 0: 943.6. Samples: 827974. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-03 20:00:42,376][01413] Avg episode reward: [(0, '20.985')] [2023-03-03 20:00:47,373][01413] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3735.0). Total num frames: 7360512. Throughput: 0: 950.7. Samples: 834716. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-03 20:00:47,379][01413] Avg episode reward: [(0, '21.506')] [2023-03-03 20:00:52,373][01413] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3721.1). Total num frames: 7372800. Throughput: 0: 902.4. Samples: 838962. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-03-03 20:00:52,378][01413] Avg episode reward: [(0, '21.643')] [2023-03-03 20:00:53,862][31288] Updated weights for policy 0, policy_version 1802 (0.0013) [2023-03-03 20:00:57,373][01413] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3721.1). Total num frames: 7393280. Throughput: 0: 904.7. Samples: 841162. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-03-03 20:00:57,376][01413] Avg episode reward: [(0, '23.047')] [2023-03-03 20:01:02,373][01413] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3735.0). Total num frames: 7417856. Throughput: 0: 956.0. Samples: 847858. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-03 20:01:02,376][01413] Avg episode reward: [(0, '23.665')] [2023-03-03 20:01:03,362][31288] Updated weights for policy 0, policy_version 1812 (0.0028) [2023-03-03 20:01:07,374][01413] Fps is (10 sec: 4095.9, 60 sec: 3686.7, 300 sec: 3721.2). Total num frames: 7434240. Throughput: 0: 937.3. Samples: 853878. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-03 20:01:07,376][01413] Avg episode reward: [(0, '24.574')] [2023-03-03 20:01:12,377][01413] Fps is (10 sec: 3275.7, 60 sec: 3754.5, 300 sec: 3735.0). Total num frames: 7450624. Throughput: 0: 908.2. Samples: 856040. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-03-03 20:01:12,382][01413] Avg episode reward: [(0, '25.501')] [2023-03-03 20:01:12,384][31274] Saving new best policy, reward=25.501! [2023-03-03 20:01:16,048][31288] Updated weights for policy 0, policy_version 1822 (0.0020) [2023-03-03 20:01:17,373][01413] Fps is (10 sec: 3276.9, 60 sec: 3686.4, 300 sec: 3721.1). Total num frames: 7467008. Throughput: 0: 909.0. Samples: 860726. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2023-03-03 20:01:17,376][01413] Avg episode reward: [(0, '26.192')] [2023-03-03 20:01:17,396][31274] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001823_7467008.pth... [2023-03-03 20:01:17,520][31274] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001605_6574080.pth [2023-03-03 20:01:17,535][31274] Saving new best policy, reward=26.192! [2023-03-03 20:01:22,373][01413] Fps is (10 sec: 3687.6, 60 sec: 3686.4, 300 sec: 3707.2). Total num frames: 7487488. Throughput: 0: 948.2. Samples: 867264. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-03 20:01:22,382][01413] Avg episode reward: [(0, '27.486')] [2023-03-03 20:01:22,386][31274] Saving new best policy, reward=27.486! [2023-03-03 20:01:25,545][31288] Updated weights for policy 0, policy_version 1832 (0.0018) [2023-03-03 20:01:27,373][01413] Fps is (10 sec: 4096.0, 60 sec: 3686.5, 300 sec: 3721.1). Total num frames: 7507968. Throughput: 0: 946.9. Samples: 870584. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-03-03 20:01:27,379][01413] Avg episode reward: [(0, '26.795')] [2023-03-03 20:01:32,373][01413] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3721.1). Total num frames: 7520256. Throughput: 0: 893.8. Samples: 874938. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-03-03 20:01:32,378][01413] Avg episode reward: [(0, '26.003')] [2023-03-03 20:01:37,374][01413] Fps is (10 sec: 3276.6, 60 sec: 3686.4, 300 sec: 3721.2). Total num frames: 7540736. Throughput: 0: 917.7. Samples: 880260. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-03 20:01:37,376][01413] Avg episode reward: [(0, '25.474')] [2023-03-03 20:01:37,738][31288] Updated weights for policy 0, policy_version 1842 (0.0016) [2023-03-03 20:01:42,374][01413] Fps is (10 sec: 4505.5, 60 sec: 3754.7, 300 sec: 3721.1). Total num frames: 7565312. Throughput: 0: 944.1. Samples: 883646. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-03-03 20:01:42,376][01413] Avg episode reward: [(0, '23.813')] [2023-03-03 20:01:47,373][01413] Fps is (10 sec: 4096.2, 60 sec: 3686.4, 300 sec: 3721.1). Total num frames: 7581696. Throughput: 0: 935.6. Samples: 889958. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-03-03 20:01:47,377][01413] Avg episode reward: [(0, '21.933')] [2023-03-03 20:01:48,368][31288] Updated weights for policy 0, policy_version 1852 (0.0011) [2023-03-03 20:01:52,373][01413] Fps is (10 sec: 2867.3, 60 sec: 3686.4, 300 sec: 3707.2). Total num frames: 7593984. Throughput: 0: 895.7. Samples: 894186. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-03-03 20:01:52,382][01413] Avg episode reward: [(0, '21.517')] [2023-03-03 20:01:57,373][01413] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3707.2). Total num frames: 7614464. Throughput: 0: 900.5. Samples: 896560. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-03-03 20:01:57,375][01413] Avg episode reward: [(0, '21.283')] [2023-03-03 20:01:59,300][31288] Updated weights for policy 0, policy_version 1862 (0.0014) [2023-03-03 20:02:02,373][01413] Fps is (10 sec: 4505.6, 60 sec: 3686.4, 300 sec: 3721.1). Total num frames: 7639040. Throughput: 0: 949.0. Samples: 903430. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-03 20:02:02,379][01413] Avg episode reward: [(0, '20.522')] [2023-03-03 20:02:07,373][01413] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3721.1). Total num frames: 7655424. Throughput: 0: 932.4. Samples: 909224. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-03 20:02:07,377][01413] Avg episode reward: [(0, '20.814')] [2023-03-03 20:02:10,746][31288] Updated weights for policy 0, policy_version 1872 (0.0011) [2023-03-03 20:02:12,373][01413] Fps is (10 sec: 3276.8, 60 sec: 3686.6, 300 sec: 3721.1). Total num frames: 7671808. Throughput: 0: 905.7. Samples: 911340. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-03 20:02:12,383][01413] Avg episode reward: [(0, '21.090')] [2023-03-03 20:02:17,373][01413] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3707.2). Total num frames: 7688192. Throughput: 0: 921.8. Samples: 916418. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-03 20:02:17,378][01413] Avg episode reward: [(0, '20.688')] [2023-03-03 20:02:20,939][31288] Updated weights for policy 0, policy_version 1882 (0.0012) [2023-03-03 20:02:22,373][01413] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3721.1). Total num frames: 7712768. Throughput: 0: 952.5. Samples: 923124. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-03 20:02:22,375][01413] Avg episode reward: [(0, '21.192')] [2023-03-03 20:02:27,379][01413] Fps is (10 sec: 4093.8, 60 sec: 3686.1, 300 sec: 3707.2). Total num frames: 7729152. Throughput: 0: 947.9. Samples: 926306. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-03 20:02:27,382][01413] Avg episode reward: [(0, '21.256')] [2023-03-03 20:02:32,374][01413] Fps is (10 sec: 3276.5, 60 sec: 3754.6, 300 sec: 3721.1). Total num frames: 7745536. Throughput: 0: 902.2. Samples: 930560. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-03 20:02:32,382][01413] Avg episode reward: [(0, '22.728')] [2023-03-03 20:02:33,284][31288] Updated weights for policy 0, policy_version 1892 (0.0012) [2023-03-03 20:02:37,373][01413] Fps is (10 sec: 3688.4, 60 sec: 3754.7, 300 sec: 3721.1). Total num frames: 7766016. Throughput: 0: 934.2. Samples: 936224. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-03 20:02:37,376][01413] Avg episode reward: [(0, '23.700')] [2023-03-03 20:02:42,373][01413] Fps is (10 sec: 4096.4, 60 sec: 3686.4, 300 sec: 3721.1). Total num frames: 7786496. Throughput: 0: 956.8. Samples: 939618. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-03-03 20:02:42,376][01413] Avg episode reward: [(0, '23.802')] [2023-03-03 20:02:42,700][31288] Updated weights for policy 0, policy_version 1902 (0.0012) [2023-03-03 20:02:47,373][01413] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3721.1). Total num frames: 7802880. Throughput: 0: 939.1. Samples: 945690. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-03-03 20:02:47,377][01413] Avg episode reward: [(0, '24.214')] [2023-03-03 20:02:52,373][01413] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3721.1). Total num frames: 7819264. Throughput: 0: 904.5. Samples: 949928. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-03-03 20:02:52,378][01413] Avg episode reward: [(0, '24.109')] [2023-03-03 20:02:55,294][31288] Updated weights for policy 0, policy_version 1912 (0.0018) [2023-03-03 20:02:57,373][01413] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3721.1). Total num frames: 7839744. Throughput: 0: 917.0. Samples: 952606. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-03-03 20:02:57,376][01413] Avg episode reward: [(0, '22.676')] [2023-03-03 20:03:02,373][01413] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3721.1). Total num frames: 7864320. Throughput: 0: 958.2. Samples: 959536. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-03-03 20:03:02,375][01413] Avg episode reward: [(0, '22.678')] [2023-03-03 20:03:04,240][31288] Updated weights for policy 0, policy_version 1922 (0.0012) [2023-03-03 20:03:07,378][01413] Fps is (10 sec: 4094.1, 60 sec: 3754.4, 300 sec: 3721.1). Total num frames: 7880704. Throughput: 0: 931.8. Samples: 965060. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-03-03 20:03:07,381][01413] Avg episode reward: [(0, '22.777')] [2023-03-03 20:03:12,373][01413] Fps is (10 sec: 2867.2, 60 sec: 3686.4, 300 sec: 3721.1). Total num frames: 7892992. Throughput: 0: 908.2. Samples: 967168. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-03 20:03:12,376][01413] Avg episode reward: [(0, '22.636')] [2023-03-03 20:03:16,603][31288] Updated weights for policy 0, policy_version 1932 (0.0017) [2023-03-03 20:03:17,373][01413] Fps is (10 sec: 3278.3, 60 sec: 3754.7, 300 sec: 3707.2). Total num frames: 7913472. Throughput: 0: 932.1. Samples: 972504. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-03 20:03:17,375][01413] Avg episode reward: [(0, '22.187')] [2023-03-03 20:03:17,415][31274] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001933_7917568.pth... [2023-03-03 20:03:17,568][31274] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001714_7020544.pth [2023-03-03 20:03:22,373][01413] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3721.1). Total num frames: 7938048. Throughput: 0: 955.6. Samples: 979228. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-03-03 20:03:22,379][01413] Avg episode reward: [(0, '23.314')] [2023-03-03 20:03:27,119][31288] Updated weights for policy 0, policy_version 1942 (0.0012) [2023-03-03 20:03:27,374][01413] Fps is (10 sec: 4095.8, 60 sec: 3755.0, 300 sec: 3721.1). Total num frames: 7954432. Throughput: 0: 941.5. Samples: 981984. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-03-03 20:03:27,379][01413] Avg episode reward: [(0, '23.626')] [2023-03-03 20:03:32,373][01413] Fps is (10 sec: 2867.2, 60 sec: 3686.4, 300 sec: 3721.1). Total num frames: 7966720. Throughput: 0: 900.9. Samples: 986230. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-03-03 20:03:32,375][01413] Avg episode reward: [(0, '22.957')] [2023-03-03 20:03:37,373][01413] Fps is (10 sec: 3277.0, 60 sec: 3686.4, 300 sec: 3707.3). Total num frames: 7987200. Throughput: 0: 941.9. Samples: 992314. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-03 20:03:37,382][01413] Avg episode reward: [(0, '23.124')] [2023-03-03 20:03:38,258][31288] Updated weights for policy 0, policy_version 1952 (0.0027) [2023-03-03 20:03:40,771][31274] Stopping Batcher_0... [2023-03-03 20:03:40,771][31274] Loop batcher_evt_loop terminating... [2023-03-03 20:03:40,779][31274] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001955_8007680.pth... [2023-03-03 20:03:40,782][01413] Component Batcher_0 stopped! [2023-03-03 20:03:40,819][31288] Weights refcount: 2 0 [2023-03-03 20:03:40,836][31288] Stopping InferenceWorker_p0-w0... [2023-03-03 20:03:40,839][31288] Loop inference_proc0-0_evt_loop terminating... [2023-03-03 20:03:40,847][01413] Component InferenceWorker_p0-w0 stopped! [2023-03-03 20:03:40,894][31274] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001823_7467008.pth [2023-03-03 20:03:40,906][31274] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001955_8007680.pth... [2023-03-03 20:03:41,030][01413] Component RolloutWorker_w3 stopped! [2023-03-03 20:03:41,035][01413] Component RolloutWorker_w2 stopped! [2023-03-03 20:03:41,038][31301] Stopping RolloutWorker_w3... [2023-03-03 20:03:41,039][31301] Loop rollout_proc3_evt_loop terminating... [2023-03-03 20:03:41,035][31290] Stopping RolloutWorker_w2... [2023-03-03 20:03:41,042][31295] Stopping RolloutWorker_w5... [2023-03-03 20:03:41,043][31295] Loop rollout_proc5_evt_loop terminating... [2023-03-03 20:03:41,044][01413] Component RolloutWorker_w5 stopped! [2023-03-03 20:03:41,049][31305] Stopping RolloutWorker_w7... [2023-03-03 20:03:41,049][01413] Component RolloutWorker_w7 stopped! [2023-03-03 20:03:41,051][31289] Stopping RolloutWorker_w1... [2023-03-03 20:03:41,050][31305] Loop rollout_proc7_evt_loop terminating... [2023-03-03 20:03:41,052][31289] Loop rollout_proc1_evt_loop terminating... [2023-03-03 20:03:41,052][01413] Component RolloutWorker_w1 stopped! [2023-03-03 20:03:41,063][01413] Component RolloutWorker_w4 stopped! [2023-03-03 20:03:41,076][01413] Component RolloutWorker_w6 stopped! [2023-03-03 20:03:41,077][31307] Stopping RolloutWorker_w6... [2023-03-03 20:03:41,063][31303] Stopping RolloutWorker_w4... [2023-03-03 20:03:41,042][31290] Loop rollout_proc2_evt_loop terminating... [2023-03-03 20:03:41,087][31307] Loop rollout_proc6_evt_loop terminating... [2023-03-03 20:03:41,090][31303] Loop rollout_proc4_evt_loop terminating... [2023-03-03 20:03:41,123][01413] Component RolloutWorker_w0 stopped! [2023-03-03 20:03:41,131][31292] Stopping RolloutWorker_w0... [2023-03-03 20:03:41,136][01413] Component LearnerWorker_p0 stopped! [2023-03-03 20:03:41,141][01413] Waiting for process learner_proc0 to stop... [2023-03-03 20:03:41,150][31274] Stopping LearnerWorker_p0... [2023-03-03 20:03:41,151][31274] Loop learner_proc0_evt_loop terminating... [2023-03-03 20:03:41,134][31292] Loop rollout_proc0_evt_loop terminating... [2023-03-03 20:03:44,026][01413] Waiting for process inference_proc0-0 to join... [2023-03-03 20:03:44,104][01413] Waiting for process rollout_proc0 to join... [2023-03-03 20:03:44,475][01413] Waiting for process rollout_proc1 to join... [2023-03-03 20:03:44,481][01413] Waiting for process rollout_proc2 to join... [2023-03-03 20:03:44,484][01413] Waiting for process rollout_proc3 to join... [2023-03-03 20:03:44,485][01413] Waiting for process rollout_proc4 to join... [2023-03-03 20:03:44,486][01413] Waiting for process rollout_proc5 to join... [2023-03-03 20:03:44,490][01413] Waiting for process rollout_proc6 to join... [2023-03-03 20:03:44,491][01413] Waiting for process rollout_proc7 to join... [2023-03-03 20:03:44,492][01413] Batcher 0 profile tree view: batching: 24.9004, releasing_batches: 0.0299 [2023-03-03 20:03:44,493][01413] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0000 wait_policy_total: 544.2616 update_model: 7.8380 weight_update: 0.0021 one_step: 0.0079 handle_policy_step: 491.4267 deserialize: 14.6421, stack: 3.1257, obs_to_device_normalize: 112.2818, forward: 232.6898, send_messages: 25.7646 prepare_outputs: 77.3226 to_cpu: 47.2944 [2023-03-03 20:03:44,495][01413] Learner 0 profile tree view: misc: 0.0063, prepare_batch: 15.6551 train: 79.1597 epoch_init: 0.0058, minibatch_init: 0.0065, losses_postprocess: 0.5649, kl_divergence: 0.6406, after_optimizer: 3.0828 calculate_losses: 26.3801 losses_init: 0.0034, forward_head: 1.7898, bptt_initial: 17.3692, tail: 1.0037, advantages_returns: 0.2421, losses: 3.4803 bptt: 2.1497 bptt_forward_core: 2.0374 update: 47.7564 clip: 1.3649 [2023-03-03 20:03:44,497][01413] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.4120, enqueue_policy_requests: 147.7827, env_step: 805.4733, overhead: 21.4961, complete_rollouts: 7.2092 save_policy_outputs: 20.6140 split_output_tensors: 10.0397 [2023-03-03 20:03:44,498][01413] RolloutWorker_w7 profile tree view: wait_for_trajectories: 0.3894, enqueue_policy_requests: 150.0804, env_step: 805.3314, overhead: 21.9034, complete_rollouts: 6.9242 save_policy_outputs: 20.9730 split_output_tensors: 9.9927 [2023-03-03 20:03:44,502][01413] Loop Runner_EvtLoop terminating... [2023-03-03 20:03:44,503][01413] Runner profile tree view: main_loop: 1105.0020 [2023-03-03 20:03:44,504][01413] Collected {0: 8007680}, FPS: 3606.7 [2023-03-03 20:03:44,591][01413] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2023-03-03 20:03:44,593][01413] Overriding arg 'num_workers' with value 1 passed from command line [2023-03-03 20:03:44,595][01413] Adding new argument 'no_render'=True that is not in the saved config file! [2023-03-03 20:03:44,597][01413] Adding new argument 'save_video'=True that is not in the saved config file! [2023-03-03 20:03:44,599][01413] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2023-03-03 20:03:44,601][01413] Adding new argument 'video_name'=None that is not in the saved config file! [2023-03-03 20:03:44,602][01413] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2023-03-03 20:03:44,604][01413] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2023-03-03 20:03:44,605][01413] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2023-03-03 20:03:44,607][01413] Adding new argument 'hf_repository'='DiegoD616/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2023-03-03 20:03:44,608][01413] Adding new argument 'policy_index'=0 that is not in the saved config file! [2023-03-03 20:03:44,609][01413] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2023-03-03 20:03:44,610][01413] Adding new argument 'train_script'=None that is not in the saved config file! [2023-03-03 20:03:44,611][01413] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2023-03-03 20:03:44,612][01413] Using frameskip 1 and render_action_repeat=4 for evaluation [2023-03-03 20:03:44,645][01413] RunningMeanStd input shape: (3, 72, 128) [2023-03-03 20:03:44,649][01413] RunningMeanStd input shape: (1,) [2023-03-03 20:03:44,673][01413] ConvEncoder: input_channels=3 [2023-03-03 20:03:44,870][01413] Conv encoder output size: 512 [2023-03-03 20:03:44,872][01413] Policy head output size: 512 [2023-03-03 20:03:44,955][01413] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001955_8007680.pth... [2023-03-03 20:03:46,050][01413] Num frames 100... [2023-03-03 20:03:46,220][01413] Num frames 200... [2023-03-03 20:03:46,401][01413] Num frames 300... [2023-03-03 20:03:46,573][01413] Num frames 400... [2023-03-03 20:03:46,741][01413] Num frames 500... [2023-03-03 20:03:46,857][01413] Num frames 600... [2023-03-03 20:03:46,983][01413] Num frames 700... [2023-03-03 20:03:47,143][01413] Avg episode rewards: #0: 18.890, true rewards: #0: 7.890 [2023-03-03 20:03:47,145][01413] Avg episode reward: 18.890, avg true_objective: 7.890 [2023-03-03 20:03:47,161][01413] Num frames 800... [2023-03-03 20:03:47,273][01413] Num frames 900... [2023-03-03 20:03:47,387][01413] Num frames 1000... [2023-03-03 20:03:47,503][01413] Num frames 1100... [2023-03-03 20:03:47,624][01413] Num frames 1200... [2023-03-03 20:03:47,723][01413] Avg episode rewards: #0: 12.185, true rewards: #0: 6.185 [2023-03-03 20:03:47,725][01413] Avg episode reward: 12.185, avg true_objective: 6.185 [2023-03-03 20:03:47,801][01413] Num frames 1300... [2023-03-03 20:03:47,923][01413] Num frames 1400... [2023-03-03 20:03:48,043][01413] Num frames 1500... [2023-03-03 20:03:48,171][01413] Num frames 1600... [2023-03-03 20:03:48,284][01413] Num frames 1700... [2023-03-03 20:03:48,412][01413] Num frames 1800... [2023-03-03 20:03:48,541][01413] Num frames 1900... [2023-03-03 20:03:48,682][01413] Num frames 2000... [2023-03-03 20:03:48,804][01413] Num frames 2100... [2023-03-03 20:03:48,934][01413] Num frames 2200... [2023-03-03 20:03:49,070][01413] Avg episode rewards: #0: 14.203, true rewards: #0: 7.537 [2023-03-03 20:03:49,072][01413] Avg episode reward: 14.203, avg true_objective: 7.537 [2023-03-03 20:03:49,134][01413] Num frames 2300... [2023-03-03 20:03:49,256][01413] Num frames 2400... [2023-03-03 20:03:49,377][01413] Num frames 2500... [2023-03-03 20:03:49,501][01413] Num frames 2600... [2023-03-03 20:03:49,627][01413] Num frames 2700... [2023-03-03 20:03:49,750][01413] Num frames 2800... [2023-03-03 20:03:49,868][01413] Num frames 2900... [2023-03-03 20:03:49,995][01413] Num frames 3000... [2023-03-03 20:03:50,113][01413] Num frames 3100... [2023-03-03 20:03:50,238][01413] Num frames 3200... [2023-03-03 20:03:50,355][01413] Num frames 3300... [2023-03-03 20:03:50,473][01413] Num frames 3400... [2023-03-03 20:03:50,581][01413] Avg episode rewards: #0: 17.360, true rewards: #0: 8.610 [2023-03-03 20:03:50,583][01413] Avg episode reward: 17.360, avg true_objective: 8.610 [2023-03-03 20:03:50,660][01413] Num frames 3500... [2023-03-03 20:03:50,774][01413] Num frames 3600... [2023-03-03 20:03:50,890][01413] Num frames 3700... [2023-03-03 20:03:51,012][01413] Num frames 3800... [2023-03-03 20:03:51,137][01413] Num frames 3900... [2023-03-03 20:03:51,254][01413] Num frames 4000... [2023-03-03 20:03:51,379][01413] Avg episode rewards: #0: 16.324, true rewards: #0: 8.124 [2023-03-03 20:03:51,381][01413] Avg episode reward: 16.324, avg true_objective: 8.124 [2023-03-03 20:03:51,432][01413] Num frames 4100... [2023-03-03 20:03:51,547][01413] Num frames 4200... [2023-03-03 20:03:51,668][01413] Num frames 4300... [2023-03-03 20:03:51,784][01413] Num frames 4400... [2023-03-03 20:03:51,900][01413] Num frames 4500... [2023-03-03 20:03:52,022][01413] Num frames 4600... [2023-03-03 20:03:52,141][01413] Num frames 4700... [2023-03-03 20:03:52,264][01413] Num frames 4800... [2023-03-03 20:03:52,380][01413] Num frames 4900... [2023-03-03 20:03:52,499][01413] Num frames 5000... [2023-03-03 20:03:52,621][01413] Num frames 5100... [2023-03-03 20:03:52,735][01413] Avg episode rewards: #0: 17.917, true rewards: #0: 8.583 [2023-03-03 20:03:52,737][01413] Avg episode reward: 17.917, avg true_objective: 8.583 [2023-03-03 20:03:52,802][01413] Num frames 5200... [2023-03-03 20:03:52,916][01413] Num frames 5300... [2023-03-03 20:03:53,044][01413] Num frames 5400... [2023-03-03 20:03:53,160][01413] Num frames 5500... [2023-03-03 20:03:53,277][01413] Num frames 5600... [2023-03-03 20:03:53,395][01413] Num frames 5700... [2023-03-03 20:03:53,520][01413] Num frames 5800... [2023-03-03 20:03:53,636][01413] Num frames 5900... [2023-03-03 20:03:53,753][01413] Num frames 6000... [2023-03-03 20:03:53,874][01413] Num frames 6100... [2023-03-03 20:03:53,990][01413] Num frames 6200... [2023-03-03 20:03:54,114][01413] Num frames 6300... [2023-03-03 20:03:54,235][01413] Num frames 6400... [2023-03-03 20:03:54,350][01413] Num frames 6500... [2023-03-03 20:03:54,468][01413] Num frames 6600... [2023-03-03 20:03:54,589][01413] Num frames 6700... [2023-03-03 20:03:54,708][01413] Num frames 6800... [2023-03-03 20:03:54,825][01413] Num frames 6900... [2023-03-03 20:03:54,945][01413] Num frames 7000... [2023-03-03 20:03:55,122][01413] Avg episode rewards: #0: 21.849, true rewards: #0: 10.134 [2023-03-03 20:03:55,123][01413] Avg episode reward: 21.849, avg true_objective: 10.134 [2023-03-03 20:03:55,135][01413] Num frames 7100... [2023-03-03 20:03:55,249][01413] Num frames 7200... [2023-03-03 20:03:55,364][01413] Num frames 7300... [2023-03-03 20:03:55,490][01413] Num frames 7400... [2023-03-03 20:03:55,603][01413] Num frames 7500... [2023-03-03 20:03:55,718][01413] Num frames 7600... [2023-03-03 20:03:55,836][01413] Num frames 7700... [2023-03-03 20:03:55,950][01413] Num frames 7800... [2023-03-03 20:03:56,077][01413] Num frames 7900... [2023-03-03 20:03:56,175][01413] Avg episode rewards: #0: 22.046, true rewards: #0: 9.921 [2023-03-03 20:03:56,177][01413] Avg episode reward: 22.046, avg true_objective: 9.921 [2023-03-03 20:03:56,258][01413] Num frames 8000... [2023-03-03 20:03:56,371][01413] Num frames 8100... [2023-03-03 20:03:56,489][01413] Num frames 8200... [2023-03-03 20:03:56,605][01413] Num frames 8300... [2023-03-03 20:03:56,728][01413] Num frames 8400... [2023-03-03 20:03:56,900][01413] Num frames 8500... [2023-03-03 20:03:57,087][01413] Num frames 8600... [2023-03-03 20:03:57,252][01413] Num frames 8700... [2023-03-03 20:03:57,360][01413] Avg episode rewards: #0: 21.472, true rewards: #0: 9.694 [2023-03-03 20:03:57,365][01413] Avg episode reward: 21.472, avg true_objective: 9.694 [2023-03-03 20:03:57,492][01413] Num frames 8800... [2023-03-03 20:03:57,651][01413] Num frames 8900... [2023-03-03 20:03:57,809][01413] Num frames 9000... [2023-03-03 20:03:57,969][01413] Num frames 9100... [2023-03-03 20:03:58,157][01413] Num frames 9200... [2023-03-03 20:03:58,318][01413] Num frames 9300... [2023-03-03 20:03:58,480][01413] Num frames 9400... [2023-03-03 20:03:58,661][01413] Num frames 9500... [2023-03-03 20:03:58,827][01413] Num frames 9600... [2023-03-03 20:03:58,996][01413] Num frames 9700... [2023-03-03 20:03:59,174][01413] Num frames 9800... [2023-03-03 20:03:59,343][01413] Num frames 9900... [2023-03-03 20:03:59,522][01413] Num frames 10000... [2023-03-03 20:03:59,695][01413] Num frames 10100... [2023-03-03 20:03:59,865][01413] Num frames 10200... [2023-03-03 20:04:00,037][01413] Num frames 10300... [2023-03-03 20:04:00,215][01413] Num frames 10400... [2023-03-03 20:04:00,337][01413] Num frames 10500... [2023-03-03 20:04:00,461][01413] Num frames 10600... [2023-03-03 20:04:00,578][01413] Num frames 10700... [2023-03-03 20:04:00,708][01413] Num frames 10800... [2023-03-03 20:04:00,794][01413] Avg episode rewards: #0: 24.425, true rewards: #0: 10.825 [2023-03-03 20:04:00,796][01413] Avg episode reward: 24.425, avg true_objective: 10.825 [2023-03-03 20:05:02,415][01413] Replay video saved to /content/train_dir/default_experiment/replay.mp4!