[2024-09-29 18:04:46,328][00189] Saving configuration to /content/train_dir/samplefactory-vizdoom-v1/config.json... [2024-09-29 18:04:46,333][00189] Rollout worker 0 uses device cpu [2024-09-29 18:04:46,336][00189] Rollout worker 1 uses device cpu [2024-09-29 18:04:46,338][00189] Rollout worker 2 uses device cpu [2024-09-29 18:04:46,340][00189] Rollout worker 3 uses device cpu [2024-09-29 18:04:46,341][00189] Rollout worker 4 uses device cpu [2024-09-29 18:04:46,342][00189] Rollout worker 5 uses device cpu [2024-09-29 18:04:46,343][00189] Rollout worker 6 uses device cpu [2024-09-29 18:04:46,344][00189] Rollout worker 7 uses device cpu [2024-09-29 18:04:46,498][00189] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-09-29 18:04:46,501][00189] InferenceWorker_p0-w0: min num requests: 2 [2024-09-29 18:04:46,533][00189] Starting all processes... [2024-09-29 18:04:46,535][00189] Starting process learner_proc0 [2024-09-29 18:04:46,583][00189] Starting all processes... [2024-09-29 18:04:46,592][00189] Starting process inference_proc0-0 [2024-09-29 18:04:46,593][00189] Starting process rollout_proc0 [2024-09-29 18:04:46,596][00189] Starting process rollout_proc1 [2024-09-29 18:04:46,596][00189] Starting process rollout_proc2 [2024-09-29 18:04:46,596][00189] Starting process rollout_proc3 [2024-09-29 18:04:46,597][00189] Starting process rollout_proc4 [2024-09-29 18:04:46,597][00189] Starting process rollout_proc5 [2024-09-29 18:04:46,597][00189] Starting process rollout_proc6 [2024-09-29 18:04:46,597][00189] Starting process rollout_proc7 [2024-09-29 18:04:58,066][03808] Worker 3 uses CPU cores [1] [2024-09-29 18:04:58,099][03804] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-09-29 18:04:58,099][03804] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2024-09-29 18:04:58,134][03791] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-09-29 18:04:58,134][03791] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2024-09-29 18:04:58,202][03791] Num visible devices: 1 [2024-09-29 18:04:58,214][03804] Num visible devices: 1 [2024-09-29 18:04:58,236][03791] Starting seed is not provided [2024-09-29 18:04:58,236][03791] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-09-29 18:04:58,237][03791] Initializing actor-critic model on device cuda:0 [2024-09-29 18:04:58,237][03791] RunningMeanStd input shape: (3, 72, 128) [2024-09-29 18:04:58,242][03791] RunningMeanStd input shape: (1,) [2024-09-29 18:04:58,310][03791] ConvEncoder: input_channels=3 [2024-09-29 18:04:58,355][03807] Worker 2 uses CPU cores [0] [2024-09-29 18:04:58,526][03809] Worker 4 uses CPU cores [0] [2024-09-29 18:04:58,533][03805] Worker 0 uses CPU cores [0] [2024-09-29 18:04:58,730][03811] Worker 6 uses CPU cores [0] [2024-09-29 18:04:58,743][03806] Worker 1 uses CPU cores [1] [2024-09-29 18:04:58,761][03810] Worker 5 uses CPU cores [1] [2024-09-29 18:04:58,798][03812] Worker 7 uses CPU cores [1] [2024-09-29 18:04:58,888][03791] Conv encoder output size: 512 [2024-09-29 18:04:58,889][03791] Policy head output size: 512 [2024-09-29 18:04:58,904][03791] Created Actor Critic model with architecture: [2024-09-29 18:04:58,905][03791] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2024-09-29 18:05:03,184][03791] Using optimizer [2024-09-29 18:05:03,186][03791] No checkpoints found [2024-09-29 18:05:03,186][03791] Did not load from checkpoint, starting from scratch! [2024-09-29 18:05:03,186][03791] Initialized policy 0 weights for model version 0 [2024-09-29 18:05:03,194][03791] LearnerWorker_p0 finished initialization! [2024-09-29 18:05:03,196][03791] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-09-29 18:05:03,312][03804] RunningMeanStd input shape: (3, 72, 128) [2024-09-29 18:05:03,313][03804] RunningMeanStd input shape: (1,) [2024-09-29 18:05:03,330][03804] ConvEncoder: input_channels=3 [2024-09-29 18:05:03,456][03804] Conv encoder output size: 512 [2024-09-29 18:05:03,456][03804] Policy head output size: 512 [2024-09-29 18:05:05,039][00189] Inference worker 0-0 is ready! [2024-09-29 18:05:05,040][00189] All inference workers are ready! Signal rollout workers to start! [2024-09-29 18:05:05,151][03805] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-29 18:05:05,172][03811] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-29 18:05:05,191][03807] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-29 18:05:05,193][03809] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-29 18:05:05,198][03808] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-29 18:05:05,201][03806] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-29 18:05:05,197][03810] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-29 18:05:05,236][03812] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-29 18:05:06,169][03811] Decorrelating experience for 0 frames... [2024-09-29 18:05:06,173][03807] Decorrelating experience for 0 frames... [2024-09-29 18:05:06,489][00189] Heartbeat connected on Batcher_0 [2024-09-29 18:05:06,495][00189] Heartbeat connected on LearnerWorker_p0 [2024-09-29 18:05:06,527][00189] Heartbeat connected on InferenceWorker_p0-w0 [2024-09-29 18:05:06,952][03806] Decorrelating experience for 0 frames... [2024-09-29 18:05:06,957][03808] Decorrelating experience for 0 frames... [2024-09-29 18:05:06,958][03810] Decorrelating experience for 0 frames... [2024-09-29 18:05:06,978][03812] Decorrelating experience for 0 frames... [2024-09-29 18:05:07,513][03807] Decorrelating experience for 32 frames... [2024-09-29 18:05:08,115][00189] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-09-29 18:05:08,455][03811] Decorrelating experience for 32 frames... [2024-09-29 18:05:08,850][03806] Decorrelating experience for 32 frames... [2024-09-29 18:05:08,849][03810] Decorrelating experience for 32 frames... [2024-09-29 18:05:08,861][03808] Decorrelating experience for 32 frames... [2024-09-29 18:05:08,857][03812] Decorrelating experience for 32 frames... [2024-09-29 18:05:09,197][03809] Decorrelating experience for 0 frames... [2024-09-29 18:05:10,757][03811] Decorrelating experience for 64 frames... [2024-09-29 18:05:10,916][03806] Decorrelating experience for 64 frames... [2024-09-29 18:05:10,923][03808] Decorrelating experience for 64 frames... [2024-09-29 18:05:10,932][03810] Decorrelating experience for 64 frames... [2024-09-29 18:05:10,992][03807] Decorrelating experience for 64 frames... [2024-09-29 18:05:11,376][03809] Decorrelating experience for 32 frames... [2024-09-29 18:05:11,415][03805] Decorrelating experience for 0 frames... [2024-09-29 18:05:12,070][03812] Decorrelating experience for 64 frames... [2024-09-29 18:05:12,201][03806] Decorrelating experience for 96 frames... [2024-09-29 18:05:12,265][03811] Decorrelating experience for 96 frames... [2024-09-29 18:05:12,335][00189] Heartbeat connected on RolloutWorker_w1 [2024-09-29 18:05:12,450][00189] Heartbeat connected on RolloutWorker_w6 [2024-09-29 18:05:12,509][03807] Decorrelating experience for 96 frames... [2024-09-29 18:05:12,663][00189] Heartbeat connected on RolloutWorker_w2 [2024-09-29 18:05:12,900][03805] Decorrelating experience for 32 frames... [2024-09-29 18:05:13,034][03812] Decorrelating experience for 96 frames... [2024-09-29 18:05:13,115][00189] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-09-29 18:05:13,238][00189] Heartbeat connected on RolloutWorker_w7 [2024-09-29 18:05:13,452][03810] Decorrelating experience for 96 frames... [2024-09-29 18:05:13,570][00189] Heartbeat connected on RolloutWorker_w5 [2024-09-29 18:05:13,673][03809] Decorrelating experience for 64 frames... [2024-09-29 18:05:13,862][03805] Decorrelating experience for 64 frames... [2024-09-29 18:05:14,311][03808] Decorrelating experience for 96 frames... [2024-09-29 18:05:14,474][00189] Heartbeat connected on RolloutWorker_w3 [2024-09-29 18:05:14,526][03809] Decorrelating experience for 96 frames... [2024-09-29 18:05:14,627][00189] Heartbeat connected on RolloutWorker_w4 [2024-09-29 18:05:14,683][03805] Decorrelating experience for 96 frames... [2024-09-29 18:05:14,746][00189] Heartbeat connected on RolloutWorker_w0 [2024-09-29 18:05:17,781][03791] Signal inference workers to stop experience collection... [2024-09-29 18:05:17,793][03804] InferenceWorker_p0-w0: stopping experience collection [2024-09-29 18:05:18,115][00189] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 172.0. Samples: 1720. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-09-29 18:05:18,118][00189] Avg episode reward: [(0, '1.792')] [2024-09-29 18:05:20,012][03791] Signal inference workers to resume experience collection... [2024-09-29 18:05:20,013][03804] InferenceWorker_p0-w0: resuming experience collection [2024-09-29 18:05:23,120][00189] Fps is (10 sec: 1228.1, 60 sec: 818.9, 300 sec: 818.9). Total num frames: 12288. Throughput: 0: 280.6. Samples: 4210. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-09-29 18:05:23,124][00189] Avg episode reward: [(0, '3.027')] [2024-09-29 18:05:23,395][03791] Stopping Batcher_0... [2024-09-29 18:05:23,395][03791] Loop batcher_evt_loop terminating... [2024-09-29 18:05:23,396][00189] Component Batcher_0 stopped! [2024-09-29 18:05:23,401][03791] Saving /content/train_dir/samplefactory-vizdoom-v1/checkpoint_p0/checkpoint_000000004_16384.pth... [2024-09-29 18:05:23,504][03804] Weights refcount: 2 0 [2024-09-29 18:05:23,507][03804] Stopping InferenceWorker_p0-w0... [2024-09-29 18:05:23,512][03804] Loop inference_proc0-0_evt_loop terminating... [2024-09-29 18:05:23,508][00189] Component InferenceWorker_p0-w0 stopped! [2024-09-29 18:05:23,575][03791] Saving /content/train_dir/samplefactory-vizdoom-v1/checkpoint_p0/checkpoint_000000004_16384.pth... [2024-09-29 18:05:23,850][00189] Component LearnerWorker_p0 stopped! [2024-09-29 18:05:23,853][03791] Stopping LearnerWorker_p0... [2024-09-29 18:05:23,854][03791] Loop learner_proc0_evt_loop terminating... [2024-09-29 18:05:23,956][00189] Component RolloutWorker_w1 stopped! [2024-09-29 18:05:23,962][03806] Stopping RolloutWorker_w1... [2024-09-29 18:05:23,963][03806] Loop rollout_proc1_evt_loop terminating... [2024-09-29 18:05:23,984][00189] Component RolloutWorker_w5 stopped! [2024-09-29 18:05:23,991][03810] Stopping RolloutWorker_w5... [2024-09-29 18:05:23,991][03810] Loop rollout_proc5_evt_loop terminating... [2024-09-29 18:05:24,022][00189] Component RolloutWorker_w3 stopped! [2024-09-29 18:05:24,024][03808] Stopping RolloutWorker_w3... [2024-09-29 18:05:24,026][03808] Loop rollout_proc3_evt_loop terminating... [2024-09-29 18:05:24,045][00189] Component RolloutWorker_w7 stopped! [2024-09-29 18:05:24,047][03812] Stopping RolloutWorker_w7... [2024-09-29 18:05:24,047][03812] Loop rollout_proc7_evt_loop terminating... [2024-09-29 18:05:24,203][03807] Stopping RolloutWorker_w2... [2024-09-29 18:05:24,203][03807] Loop rollout_proc2_evt_loop terminating... [2024-09-29 18:05:24,204][00189] Component RolloutWorker_w2 stopped! [2024-09-29 18:05:24,209][03805] Stopping RolloutWorker_w0... [2024-09-29 18:05:24,209][03805] Loop rollout_proc0_evt_loop terminating... [2024-09-29 18:05:24,209][00189] Component RolloutWorker_w0 stopped! [2024-09-29 18:05:24,304][00189] Component RolloutWorker_w4 stopped! [2024-09-29 18:05:24,309][03809] Stopping RolloutWorker_w4... [2024-09-29 18:05:24,326][00189] Component RolloutWorker_w6 stopped! [2024-09-29 18:05:24,331][00189] Waiting for process learner_proc0 to stop... [2024-09-29 18:05:24,336][03811] Stopping RolloutWorker_w6... [2024-09-29 18:05:24,339][03809] Loop rollout_proc4_evt_loop terminating... [2024-09-29 18:05:24,345][03811] Loop rollout_proc6_evt_loop terminating... [2024-09-29 18:05:26,049][00189] Waiting for process inference_proc0-0 to join... [2024-09-29 18:05:26,496][00189] Waiting for process rollout_proc0 to join... [2024-09-29 18:05:27,921][00189] Waiting for process rollout_proc1 to join... [2024-09-29 18:05:27,962][00189] Waiting for process rollout_proc2 to join... [2024-09-29 18:05:27,967][00189] Waiting for process rollout_proc3 to join... [2024-09-29 18:05:27,971][00189] Waiting for process rollout_proc4 to join... [2024-09-29 18:05:27,974][00189] Waiting for process rollout_proc5 to join... [2024-09-29 18:05:27,978][00189] Waiting for process rollout_proc6 to join... [2024-09-29 18:05:27,982][00189] Waiting for process rollout_proc7 to join... [2024-09-29 18:05:27,984][00189] Batcher 0 profile tree view: batching: 0.0787, releasing_batches: 0.0005 [2024-09-29 18:05:27,986][00189] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0000 wait_policy_total: 9.8122 update_model: 0.0320 weight_update: 0.0017 one_step: 0.0160 handle_policy_step: 5.6634 deserialize: 0.0776, stack: 0.0176, obs_to_device_normalize: 0.5598, forward: 4.3490, send_messages: 0.1291 prepare_outputs: 0.3908 to_cpu: 0.2227 [2024-09-29 18:05:27,988][00189] Learner 0 profile tree view: misc: 0.0000, prepare_batch: 4.0240 train: 1.5797 epoch_init: 0.0000, minibatch_init: 0.0000, losses_postprocess: 0.0007, kl_divergence: 0.0011, after_optimizer: 0.0791 calculate_losses: 0.3553 losses_init: 0.0000, forward_head: 0.2001, bptt_initial: 0.1092, tail: 0.0030, advantages_returns: 0.0032, losses: 0.0209 bptt: 0.0181 bptt_forward_core: 0.0180 update: 1.1266 clip: 0.0068 [2024-09-29 18:05:27,992][00189] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.0010, enqueue_policy_requests: 0.8160, env_step: 4.0551, overhead: 0.0584, complete_rollouts: 0.0385 save_policy_outputs: 0.1534 split_output_tensors: 0.0530 [2024-09-29 18:05:27,993][00189] RolloutWorker_w7 profile tree view: wait_for_trajectories: 0.0011, enqueue_policy_requests: 0.8219, env_step: 3.8835, overhead: 0.0819, complete_rollouts: 0.0257 save_policy_outputs: 0.1694 split_output_tensors: 0.0419 [2024-09-29 18:05:27,997][00189] Loop Runner_EvtLoop terminating... [2024-09-29 18:05:27,998][00189] Runner profile tree view: main_loop: 41.4651 [2024-09-29 18:05:28,001][00189] Collected {0: 16384}, FPS: 395.1 [2024-09-29 18:06:48,149][00189] Environment doom_basic already registered, overwriting... [2024-09-29 18:06:48,152][00189] Environment doom_two_colors_easy already registered, overwriting... [2024-09-29 18:06:48,154][00189] Environment doom_two_colors_hard already registered, overwriting... [2024-09-29 18:06:48,156][00189] Environment doom_dm already registered, overwriting... [2024-09-29 18:06:48,161][00189] Environment doom_dwango5 already registered, overwriting... [2024-09-29 18:06:48,165][00189] Environment doom_my_way_home_flat_actions already registered, overwriting... [2024-09-29 18:06:48,166][00189] Environment doom_defend_the_center_flat_actions already registered, overwriting... [2024-09-29 18:06:48,171][00189] Environment doom_my_way_home already registered, overwriting... [2024-09-29 18:06:48,173][00189] Environment doom_deadly_corridor already registered, overwriting... [2024-09-29 18:06:48,174][00189] Environment doom_defend_the_center already registered, overwriting... [2024-09-29 18:06:48,178][00189] Environment doom_defend_the_line already registered, overwriting... [2024-09-29 18:06:48,180][00189] Environment doom_health_gathering already registered, overwriting... [2024-09-29 18:06:48,184][00189] Environment doom_health_gathering_supreme already registered, overwriting... [2024-09-29 18:06:48,186][00189] Environment doom_battle already registered, overwriting... [2024-09-29 18:06:48,191][00189] Environment doom_battle2 already registered, overwriting... [2024-09-29 18:06:48,192][00189] Environment doom_duel_bots already registered, overwriting... [2024-09-29 18:06:48,196][00189] Environment doom_deathmatch_bots already registered, overwriting... [2024-09-29 18:06:48,198][00189] Environment doom_duel already registered, overwriting... [2024-09-29 18:06:48,201][00189] Environment doom_deathmatch_full already registered, overwriting... [2024-09-29 18:06:48,204][00189] Environment doom_benchmark already registered, overwriting... [2024-09-29 18:06:48,206][00189] register_encoder_factory: [2024-09-29 18:06:48,245][00189] Loading existing experiment configuration from /content/train_dir/samplefactory-vizdoom-v1/config.json [2024-09-29 18:06:48,246][00189] Overriding arg 'train_for_env_steps' with value 1000000 passed from command line [2024-09-29 18:06:48,256][00189] Experiment dir /content/train_dir/samplefactory-vizdoom-v1 already exists! [2024-09-29 18:06:48,260][00189] Resuming existing experiment from /content/train_dir/samplefactory-vizdoom-v1... [2024-09-29 18:06:48,263][00189] Weights and Biases integration disabled [2024-09-29 18:06:48,284][00189] Environment var CUDA_VISIBLE_DEVICES is 0 [2024-09-29 18:06:50,287][00189] Starting experiment with the following configuration: help=False algo=APPO env=doom_health_gathering_supreme experiment=samplefactory-vizdoom-v1 train_dir=/content/train_dir restart_behavior=resume device=gpu seed=None num_policies=1 async_rl=True serial_mode=False batched_sampling=False num_batches_to_accumulate=2 worker_num_splits=2 policy_workers_per_policy=1 max_policy_lag=1000 num_workers=8 num_envs_per_worker=4 batch_size=1024 num_batches_per_epoch=1 num_epochs=1 rollout=32 recurrence=32 shuffle_minibatches=False gamma=0.99 reward_scale=1.0 reward_clip=1000.0 value_bootstrap=False normalize_returns=True exploration_loss_coeff=0.001 value_loss_coeff=0.5 kl_loss_coeff=0.0 exploration_loss=symmetric_kl gae_lambda=0.95 ppo_clip_ratio=0.1 ppo_clip_value=0.2 with_vtrace=False vtrace_rho=1.0 vtrace_c=1.0 optimizer=adam adam_eps=1e-06 adam_beta1=0.9 adam_beta2=0.999 max_grad_norm=4.0 learning_rate=0.0001 lr_schedule=constant lr_schedule_kl_threshold=0.008 lr_adaptive_min=1e-06 lr_adaptive_max=0.01 obs_subtract_mean=0.0 obs_scale=255.0 normalize_input=True normalize_input_keys=None decorrelate_experience_max_seconds=0 decorrelate_envs_on_one_worker=True actor_worker_gpus=[] set_workers_cpu_affinity=True force_envs_single_thread=False default_niceness=0 log_to_file=True experiment_summaries_interval=10 flush_summaries_interval=30 stats_avg=100 summaries_use_frameskip=True heartbeat_interval=20 heartbeat_reporting_interval=600 train_for_env_steps=1000000 train_for_seconds=10000000000 save_every_sec=120 keep_checkpoints=2 load_checkpoint_kind=latest save_milestones_sec=-1 save_best_every_sec=5 save_best_metric=reward save_best_after=100000 benchmark=False encoder_mlp_layers=[512, 512] encoder_conv_architecture=convnet_simple encoder_conv_mlp_layers=[512] use_rnn=True rnn_size=512 rnn_type=gru rnn_num_layers=1 decoder_mlp_layers=[] nonlinearity=elu policy_initialization=orthogonal policy_init_gain=1.0 actor_critic_share_weights=True adaptive_stddev=True continuous_tanh_scale=0.0 initial_stddev=1.0 use_env_info_cache=False env_gpu_actions=False env_gpu_observations=True env_frameskip=4 env_framestack=1 pixel_format=CHW use_record_episode_statistics=False with_wandb=False wandb_user=None wandb_project=sample_factory wandb_group=None wandb_job_type=SF wandb_tags=[] with_pbt=False pbt_mix_policies_in_one_env=True pbt_period_env_steps=5000000 pbt_start_mutation=20000000 pbt_replace_fraction=0.3 pbt_mutation_rate=0.15 pbt_replace_reward_gap=0.1 pbt_replace_reward_gap_absolute=1e-06 pbt_optimize_gamma=False pbt_target_objective=true_objective pbt_perturb_min=1.1 pbt_perturb_max=1.5 num_agents=-1 num_humans=0 num_bots=-1 start_bot_difficulty=None timelimit=None res_w=128 res_h=72 wide_aspect_ratio=False eval_env_frameskip=1 fps=35 command_line=--env=doom_health_gathering_supreme --num_workers=8 --num_envs_per_worker=4 --train_for_env_steps=10000 --experiment=samplefactory-vizdoom-v1 --restart_behavior=resume cli_args={'env': 'doom_health_gathering_supreme', 'experiment': 'samplefactory-vizdoom-v1', 'restart_behavior': 'resume', 'num_workers': 8, 'num_envs_per_worker': 4, 'train_for_env_steps': 10000} git_hash=unknown git_repo_name=not a git repository [2024-09-29 18:06:50,290][00189] Saving configuration to /content/train_dir/samplefactory-vizdoom-v1/config.json... [2024-09-29 18:06:50,296][00189] Rollout worker 0 uses device cpu [2024-09-29 18:06:50,297][00189] Rollout worker 1 uses device cpu [2024-09-29 18:06:50,302][00189] Rollout worker 2 uses device cpu [2024-09-29 18:06:50,302][00189] Rollout worker 3 uses device cpu [2024-09-29 18:06:50,303][00189] Rollout worker 4 uses device cpu [2024-09-29 18:06:50,304][00189] Rollout worker 5 uses device cpu [2024-09-29 18:06:50,305][00189] Rollout worker 6 uses device cpu [2024-09-29 18:06:50,307][00189] Rollout worker 7 uses device cpu [2024-09-29 18:06:50,452][00189] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-09-29 18:06:50,453][00189] InferenceWorker_p0-w0: min num requests: 2 [2024-09-29 18:06:50,489][00189] Starting all processes... [2024-09-29 18:06:50,490][00189] Starting process learner_proc0 [2024-09-29 18:06:50,539][00189] Starting all processes... [2024-09-29 18:06:50,546][00189] Starting process inference_proc0-0 [2024-09-29 18:06:50,548][00189] Starting process rollout_proc2 [2024-09-29 18:06:50,548][00189] Starting process rollout_proc1 [2024-09-29 18:06:50,546][00189] Starting process rollout_proc0 [2024-09-29 18:06:50,549][00189] Starting process rollout_proc3 [2024-09-29 18:06:50,549][00189] Starting process rollout_proc4 [2024-09-29 18:06:50,549][00189] Starting process rollout_proc5 [2024-09-29 18:06:50,549][00189] Starting process rollout_proc6 [2024-09-29 18:06:50,549][00189] Starting process rollout_proc7 [2024-09-29 18:07:01,348][07564] Worker 7 uses CPU cores [1] [2024-09-29 18:07:01,965][07560] Worker 4 uses CPU cores [0] [2024-09-29 18:07:01,992][07557] Worker 2 uses CPU cores [0] [2024-09-29 18:07:02,161][07543] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-09-29 18:07:02,161][07543] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2024-09-29 18:07:02,177][07559] Worker 0 uses CPU cores [0] [2024-09-29 18:07:02,202][07543] Num visible devices: 1 [2024-09-29 18:07:02,227][07543] Starting seed is not provided [2024-09-29 18:07:02,228][07543] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-09-29 18:07:02,228][07543] Initializing actor-critic model on device cuda:0 [2024-09-29 18:07:02,229][07543] RunningMeanStd input shape: (3, 72, 128) [2024-09-29 18:07:02,231][07543] RunningMeanStd input shape: (1,) [2024-09-29 18:07:02,281][07543] ConvEncoder: input_channels=3 [2024-09-29 18:07:02,310][07563] Worker 5 uses CPU cores [1] [2024-09-29 18:07:02,321][07558] Worker 1 uses CPU cores [1] [2024-09-29 18:07:02,347][07562] Worker 6 uses CPU cores [0] [2024-09-29 18:07:02,397][07556] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-09-29 18:07:02,397][07556] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2024-09-29 18:07:02,444][07556] Num visible devices: 1 [2024-09-29 18:07:02,466][07561] Worker 3 uses CPU cores [1] [2024-09-29 18:07:02,527][07543] Conv encoder output size: 512 [2024-09-29 18:07:02,528][07543] Policy head output size: 512 [2024-09-29 18:07:02,542][07543] Created Actor Critic model with architecture: [2024-09-29 18:07:02,543][07543] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2024-09-29 18:07:04,106][07543] Using optimizer [2024-09-29 18:07:04,108][07543] Loading state from checkpoint /content/train_dir/samplefactory-vizdoom-v1/checkpoint_p0/checkpoint_000000004_16384.pth... [2024-09-29 18:07:04,146][07543] Loading model from checkpoint [2024-09-29 18:07:04,151][07543] Loaded experiment state at self.train_step=4, self.env_steps=16384 [2024-09-29 18:07:04,151][07543] Initialized policy 0 weights for model version 4 [2024-09-29 18:07:04,156][07543] LearnerWorker_p0 finished initialization! [2024-09-29 18:07:04,158][07543] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-09-29 18:07:04,272][07556] RunningMeanStd input shape: (3, 72, 128) [2024-09-29 18:07:04,274][07556] RunningMeanStd input shape: (1,) [2024-09-29 18:07:04,291][07556] ConvEncoder: input_channels=3 [2024-09-29 18:07:04,393][07556] Conv encoder output size: 512 [2024-09-29 18:07:04,394][07556] Policy head output size: 512 [2024-09-29 18:07:05,792][00189] Inference worker 0-0 is ready! [2024-09-29 18:07:05,794][00189] All inference workers are ready! Signal rollout workers to start! [2024-09-29 18:07:05,868][07563] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-29 18:07:05,867][07562] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-29 18:07:05,872][07558] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-29 18:07:05,871][07557] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-29 18:07:05,870][07559] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-29 18:07:05,872][07560] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-29 18:07:05,871][07564] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-29 18:07:05,874][07561] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-29 18:07:07,076][07559] Decorrelating experience for 0 frames... [2024-09-29 18:07:07,079][07562] Decorrelating experience for 0 frames... [2024-09-29 18:07:07,082][07560] Decorrelating experience for 0 frames... [2024-09-29 18:07:07,090][07558] Decorrelating experience for 0 frames... [2024-09-29 18:07:07,093][07561] Decorrelating experience for 0 frames... [2024-09-29 18:07:07,097][07564] Decorrelating experience for 0 frames... [2024-09-29 18:07:07,773][07562] Decorrelating experience for 32 frames... [2024-09-29 18:07:07,863][07561] Decorrelating experience for 32 frames... [2024-09-29 18:07:07,872][07558] Decorrelating experience for 32 frames... [2024-09-29 18:07:07,881][07559] Decorrelating experience for 32 frames... [2024-09-29 18:07:08,281][00189] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 16384. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-09-29 18:07:08,980][07562] Decorrelating experience for 64 frames... [2024-09-29 18:07:09,063][07560] Decorrelating experience for 32 frames... [2024-09-29 18:07:09,078][07563] Decorrelating experience for 0 frames... [2024-09-29 18:07:09,161][07558] Decorrelating experience for 64 frames... [2024-09-29 18:07:09,168][07561] Decorrelating experience for 64 frames... [2024-09-29 18:07:09,187][07559] Decorrelating experience for 64 frames... [2024-09-29 18:07:09,967][07563] Decorrelating experience for 32 frames... [2024-09-29 18:07:10,089][07561] Decorrelating experience for 96 frames... [2024-09-29 18:07:10,245][07557] Decorrelating experience for 0 frames... [2024-09-29 18:07:10,354][07560] Decorrelating experience for 64 frames... [2024-09-29 18:07:10,417][07559] Decorrelating experience for 96 frames... [2024-09-29 18:07:10,444][00189] Heartbeat connected on Batcher_0 [2024-09-29 18:07:10,449][00189] Heartbeat connected on LearnerWorker_p0 [2024-09-29 18:07:10,473][00189] Heartbeat connected on RolloutWorker_w3 [2024-09-29 18:07:10,570][00189] Heartbeat connected on RolloutWorker_w0 [2024-09-29 18:07:10,926][00189] Heartbeat connected on InferenceWorker_p0-w0 [2024-09-29 18:07:10,948][07557] Decorrelating experience for 32 frames... [2024-09-29 18:07:11,896][07564] Decorrelating experience for 32 frames... [2024-09-29 18:07:12,059][07563] Decorrelating experience for 64 frames... [2024-09-29 18:07:13,042][07557] Decorrelating experience for 64 frames... [2024-09-29 18:07:13,281][00189] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 16384. Throughput: 0: 110.4. Samples: 552. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-09-29 18:07:13,284][00189] Avg episode reward: [(0, '2.382')] [2024-09-29 18:07:15,472][07562] Decorrelating experience for 96 frames... [2024-09-29 18:07:15,847][07558] Decorrelating experience for 96 frames... [2024-09-29 18:07:16,135][00189] Heartbeat connected on RolloutWorker_w6 [2024-09-29 18:07:16,152][07563] Decorrelating experience for 96 frames... [2024-09-29 18:07:16,529][00189] Heartbeat connected on RolloutWorker_w1 [2024-09-29 18:07:16,829][00189] Heartbeat connected on RolloutWorker_w5 [2024-09-29 18:07:17,416][07557] Decorrelating experience for 96 frames... [2024-09-29 18:07:17,592][07564] Decorrelating experience for 64 frames... [2024-09-29 18:07:18,012][00189] Heartbeat connected on RolloutWorker_w2 [2024-09-29 18:07:18,281][00189] Fps is (10 sec: 409.6, 60 sec: 409.6, 300 sec: 409.6). Total num frames: 20480. Throughput: 0: 228.4. Samples: 2284. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-09-29 18:07:18,283][00189] Avg episode reward: [(0, '3.423')] [2024-09-29 18:07:19,377][07560] Decorrelating experience for 96 frames... [2024-09-29 18:07:19,990][00189] Heartbeat connected on RolloutWorker_w4 [2024-09-29 18:07:20,215][07543] Signal inference workers to stop experience collection... [2024-09-29 18:07:20,234][07556] InferenceWorker_p0-w0: stopping experience collection [2024-09-29 18:07:20,384][07564] Decorrelating experience for 96 frames... [2024-09-29 18:07:20,480][00189] Heartbeat connected on RolloutWorker_w7 [2024-09-29 18:07:20,594][07543] Signal inference workers to resume experience collection... [2024-09-29 18:07:20,596][07556] InferenceWorker_p0-w0: resuming experience collection [2024-09-29 18:07:23,282][00189] Fps is (10 sec: 2048.0, 60 sec: 1365.3, 300 sec: 1365.3). Total num frames: 36864. Throughput: 0: 243.6. Samples: 3654. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-29 18:07:23,287][00189] Avg episode reward: [(0, '3.633')] [2024-09-29 18:07:27,992][07556] Updated weights for policy 0, policy_version 14 (0.0026) [2024-09-29 18:07:28,281][00189] Fps is (10 sec: 3686.4, 60 sec: 2048.0, 300 sec: 2048.0). Total num frames: 57344. Throughput: 0: 496.2. Samples: 9924. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-29 18:07:28,283][00189] Avg episode reward: [(0, '4.081')] [2024-09-29 18:07:33,281][00189] Fps is (10 sec: 3276.9, 60 sec: 2129.9, 300 sec: 2129.9). Total num frames: 69632. Throughput: 0: 576.8. Samples: 14420. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 18:07:33,284][00189] Avg episode reward: [(0, '4.152')] [2024-09-29 18:07:38,281][00189] Fps is (10 sec: 2867.2, 60 sec: 2321.1, 300 sec: 2321.1). Total num frames: 86016. Throughput: 0: 553.9. Samples: 16616. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 18:07:38,283][00189] Avg episode reward: [(0, '4.297')] [2024-09-29 18:07:40,137][07556] Updated weights for policy 0, policy_version 24 (0.0017) [2024-09-29 18:07:43,281][00189] Fps is (10 sec: 4095.8, 60 sec: 2691.6, 300 sec: 2691.6). Total num frames: 110592. Throughput: 0: 656.0. Samples: 22960. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 18:07:43,283][00189] Avg episode reward: [(0, '4.402')] [2024-09-29 18:07:43,291][07543] Saving new best policy, reward=4.402! [2024-09-29 18:07:48,282][00189] Fps is (10 sec: 4505.0, 60 sec: 2867.1, 300 sec: 2867.1). Total num frames: 131072. Throughput: 0: 729.9. Samples: 29198. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 18:07:48,287][00189] Avg episode reward: [(0, '4.393')] [2024-09-29 18:07:50,562][07556] Updated weights for policy 0, policy_version 34 (0.0012) [2024-09-29 18:07:53,281][00189] Fps is (10 sec: 3276.9, 60 sec: 2821.7, 300 sec: 2821.7). Total num frames: 143360. Throughput: 0: 694.3. Samples: 31244. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-29 18:07:53,283][00189] Avg episode reward: [(0, '4.383')] [2024-09-29 18:07:58,281][00189] Fps is (10 sec: 3277.2, 60 sec: 2949.1, 300 sec: 2949.1). Total num frames: 163840. Throughput: 0: 799.9. Samples: 36550. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-29 18:07:58,286][00189] Avg episode reward: [(0, '4.368')] [2024-09-29 18:08:01,389][07556] Updated weights for policy 0, policy_version 44 (0.0036) [2024-09-29 18:08:03,281][00189] Fps is (10 sec: 4505.6, 60 sec: 3127.9, 300 sec: 3127.9). Total num frames: 188416. Throughput: 0: 907.9. Samples: 43138. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 18:08:03,287][00189] Avg episode reward: [(0, '4.433')] [2024-09-29 18:08:03,292][07543] Saving new best policy, reward=4.433! [2024-09-29 18:08:08,282][00189] Fps is (10 sec: 3686.0, 60 sec: 3071.9, 300 sec: 3071.9). Total num frames: 200704. Throughput: 0: 936.6. Samples: 45804. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-29 18:08:08,284][00189] Avg episode reward: [(0, '4.449')] [2024-09-29 18:08:08,292][07543] Saving new best policy, reward=4.449! [2024-09-29 18:08:13,281][00189] Fps is (10 sec: 2867.2, 60 sec: 3345.1, 300 sec: 3087.7). Total num frames: 217088. Throughput: 0: 886.6. Samples: 49822. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 18:08:13,284][00189] Avg episode reward: [(0, '4.358')] [2024-09-29 18:08:13,572][07556] Updated weights for policy 0, policy_version 54 (0.0021) [2024-09-29 18:08:18,281][00189] Fps is (10 sec: 4096.6, 60 sec: 3686.4, 300 sec: 3218.3). Total num frames: 241664. Throughput: 0: 939.2. Samples: 56686. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-29 18:08:18,283][00189] Avg episode reward: [(0, '4.435')] [2024-09-29 18:08:23,056][07556] Updated weights for policy 0, policy_version 64 (0.0017) [2024-09-29 18:08:23,281][00189] Fps is (10 sec: 4505.7, 60 sec: 3754.7, 300 sec: 3276.8). Total num frames: 262144. Throughput: 0: 965.4. Samples: 60058. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 18:08:23,284][00189] Avg episode reward: [(0, '4.567')] [2024-09-29 18:08:23,294][07543] Saving new best policy, reward=4.567! [2024-09-29 18:08:28,281][00189] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3225.6). Total num frames: 274432. Throughput: 0: 922.0. Samples: 64448. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-29 18:08:28,287][00189] Avg episode reward: [(0, '4.665')] [2024-09-29 18:08:28,295][07543] Saving new best policy, reward=4.665! [2024-09-29 18:08:33,281][00189] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3276.8). Total num frames: 294912. Throughput: 0: 909.5. Samples: 70124. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-29 18:08:33,283][00189] Avg episode reward: [(0, '4.593')] [2024-09-29 18:08:34,753][07556] Updated weights for policy 0, policy_version 74 (0.0024) [2024-09-29 18:08:38,281][00189] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3322.3). Total num frames: 315392. Throughput: 0: 938.3. Samples: 73468. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-29 18:08:38,286][00189] Avg episode reward: [(0, '4.491')] [2024-09-29 18:08:43,281][00189] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3319.9). Total num frames: 331776. Throughput: 0: 946.2. Samples: 79128. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 18:08:43,288][00189] Avg episode reward: [(0, '4.252')] [2024-09-29 18:08:46,758][07556] Updated weights for policy 0, policy_version 84 (0.0015) [2024-09-29 18:08:48,281][00189] Fps is (10 sec: 3276.8, 60 sec: 3618.2, 300 sec: 3317.8). Total num frames: 348160. Throughput: 0: 904.8. Samples: 83852. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-29 18:08:48,287][00189] Avg episode reward: [(0, '4.240')] [2024-09-29 18:08:48,298][07543] Saving /content/train_dir/samplefactory-vizdoom-v1/checkpoint_p0/checkpoint_000000085_348160.pth... [2024-09-29 18:08:53,281][00189] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3393.8). Total num frames: 372736. Throughput: 0: 919.1. Samples: 87162. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 18:08:53,288][00189] Avg episode reward: [(0, '4.371')] [2024-09-29 18:08:55,795][07556] Updated weights for policy 0, policy_version 94 (0.0015) [2024-09-29 18:08:58,281][00189] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3388.5). Total num frames: 389120. Throughput: 0: 975.1. Samples: 93702. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-29 18:08:58,285][00189] Avg episode reward: [(0, '4.539')] [2024-09-29 18:09:03,281][00189] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3383.7). Total num frames: 405504. Throughput: 0: 916.5. Samples: 97930. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 18:09:03,282][00189] Avg episode reward: [(0, '4.393')] [2024-09-29 18:09:07,768][07556] Updated weights for policy 0, policy_version 104 (0.0021) [2024-09-29 18:09:08,281][00189] Fps is (10 sec: 3686.4, 60 sec: 3754.8, 300 sec: 3413.3). Total num frames: 425984. Throughput: 0: 906.3. Samples: 100840. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 18:09:08,287][00189] Avg episode reward: [(0, '4.433')] [2024-09-29 18:09:13,281][00189] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3473.4). Total num frames: 450560. Throughput: 0: 961.2. Samples: 107702. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-29 18:09:13,282][00189] Avg episode reward: [(0, '4.671')] [2024-09-29 18:09:13,290][07543] Saving new best policy, reward=4.671! [2024-09-29 18:09:18,246][07556] Updated weights for policy 0, policy_version 114 (0.0017) [2024-09-29 18:09:18,282][00189] Fps is (10 sec: 4095.4, 60 sec: 3754.6, 300 sec: 3465.8). Total num frames: 466944. Throughput: 0: 948.7. Samples: 112816. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 18:09:18,288][00189] Avg episode reward: [(0, '4.523')] [2024-09-29 18:09:23,281][00189] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3458.8). Total num frames: 483328. Throughput: 0: 922.4. Samples: 114978. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 18:09:23,283][00189] Avg episode reward: [(0, '4.409')] [2024-09-29 18:09:28,287][00189] Fps is (10 sec: 3684.5, 60 sec: 3822.5, 300 sec: 3481.4). Total num frames: 503808. Throughput: 0: 938.3. Samples: 121356. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-29 18:09:28,289][00189] Avg episode reward: [(0, '4.493')] [2024-09-29 18:09:28,687][07556] Updated weights for policy 0, policy_version 124 (0.0013) [2024-09-29 18:09:33,281][00189] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3502.8). Total num frames: 524288. Throughput: 0: 970.0. Samples: 127502. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-29 18:09:33,285][00189] Avg episode reward: [(0, '4.612')] [2024-09-29 18:09:38,281][00189] Fps is (10 sec: 3278.9, 60 sec: 3686.4, 300 sec: 3467.9). Total num frames: 536576. Throughput: 0: 943.2. Samples: 129608. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 18:09:38,287][00189] Avg episode reward: [(0, '4.724')] [2024-09-29 18:09:38,295][07543] Saving new best policy, reward=4.724! [2024-09-29 18:09:40,834][07556] Updated weights for policy 0, policy_version 134 (0.0017) [2024-09-29 18:09:43,283][00189] Fps is (10 sec: 3276.0, 60 sec: 3754.5, 300 sec: 3488.2). Total num frames: 557056. Throughput: 0: 914.8. Samples: 134872. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 18:09:43,290][00189] Avg episode reward: [(0, '4.643')] [2024-09-29 18:09:48,281][00189] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3532.8). Total num frames: 581632. Throughput: 0: 971.9. Samples: 141664. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 18:09:48,287][00189] Avg episode reward: [(0, '4.480')] [2024-09-29 18:09:50,139][07556] Updated weights for policy 0, policy_version 144 (0.0019) [2024-09-29 18:09:53,284][00189] Fps is (10 sec: 4096.1, 60 sec: 3754.5, 300 sec: 3525.0). Total num frames: 598016. Throughput: 0: 965.7. Samples: 144300. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-29 18:09:53,289][00189] Avg episode reward: [(0, '4.501')] [2024-09-29 18:09:58,281][00189] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3517.7). Total num frames: 614400. Throughput: 0: 908.5. Samples: 148586. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-29 18:09:58,287][00189] Avg episode reward: [(0, '4.639')] [2024-09-29 18:10:01,998][07556] Updated weights for policy 0, policy_version 154 (0.0034) [2024-09-29 18:10:03,281][00189] Fps is (10 sec: 3687.2, 60 sec: 3822.9, 300 sec: 3534.3). Total num frames: 634880. Throughput: 0: 942.0. Samples: 155204. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 18:10:03,282][00189] Avg episode reward: [(0, '4.560')] [2024-09-29 18:10:08,281][00189] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3549.9). Total num frames: 655360. Throughput: 0: 968.7. Samples: 158568. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-29 18:10:08,287][00189] Avg episode reward: [(0, '4.561')] [2024-09-29 18:10:13,281][00189] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3520.3). Total num frames: 667648. Throughput: 0: 926.1. Samples: 163026. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-29 18:10:13,287][00189] Avg episode reward: [(0, '4.581')] [2024-09-29 18:10:13,695][07556] Updated weights for policy 0, policy_version 164 (0.0030) [2024-09-29 18:10:18,282][00189] Fps is (10 sec: 3276.3, 60 sec: 3686.4, 300 sec: 3535.5). Total num frames: 688128. Throughput: 0: 918.1. Samples: 168818. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 18:10:18,285][00189] Avg episode reward: [(0, '4.593')] [2024-09-29 18:10:22,991][07556] Updated weights for policy 0, policy_version 174 (0.0023) [2024-09-29 18:10:23,281][00189] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3570.9). Total num frames: 712704. Throughput: 0: 947.6. Samples: 172252. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-29 18:10:23,283][00189] Avg episode reward: [(0, '4.692')] [2024-09-29 18:10:28,294][00189] Fps is (10 sec: 4091.4, 60 sec: 3754.3, 300 sec: 3563.3). Total num frames: 729088. Throughput: 0: 952.8. Samples: 177760. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 18:10:28,301][00189] Avg episode reward: [(0, '4.719')] [2024-09-29 18:10:33,281][00189] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3556.5). Total num frames: 745472. Throughput: 0: 908.0. Samples: 182522. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 18:10:33,283][00189] Avg episode reward: [(0, '4.704')] [2024-09-29 18:10:34,973][07556] Updated weights for policy 0, policy_version 184 (0.0020) [2024-09-29 18:10:38,281][00189] Fps is (10 sec: 3691.1, 60 sec: 3822.9, 300 sec: 3569.4). Total num frames: 765952. Throughput: 0: 925.2. Samples: 185934. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 18:10:38,283][00189] Avg episode reward: [(0, '4.556')] [2024-09-29 18:10:43,281][00189] Fps is (10 sec: 4096.0, 60 sec: 3823.1, 300 sec: 3581.6). Total num frames: 786432. Throughput: 0: 974.6. Samples: 192444. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 18:10:43,287][00189] Avg episode reward: [(0, '4.652')] [2024-09-29 18:10:45,803][07556] Updated weights for policy 0, policy_version 194 (0.0014) [2024-09-29 18:10:48,281][00189] Fps is (10 sec: 3276.7, 60 sec: 3618.1, 300 sec: 3556.1). Total num frames: 798720. Throughput: 0: 922.0. Samples: 196694. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-29 18:10:48,284][00189] Avg episode reward: [(0, '4.617')] [2024-09-29 18:10:48,305][07543] Saving /content/train_dir/samplefactory-vizdoom-v1/checkpoint_p0/checkpoint_000000195_798720.pth... [2024-09-29 18:10:48,553][07543] Removing /content/train_dir/samplefactory-vizdoom-v1/checkpoint_p0/checkpoint_000000004_16384.pth [2024-09-29 18:10:53,281][00189] Fps is (10 sec: 3686.4, 60 sec: 3754.8, 300 sec: 3586.3). Total num frames: 823296. Throughput: 0: 912.2. Samples: 199616. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 18:10:53,283][00189] Avg episode reward: [(0, '4.790')] [2024-09-29 18:10:53,293][07543] Saving new best policy, reward=4.790! [2024-09-29 18:10:56,113][07556] Updated weights for policy 0, policy_version 204 (0.0013) [2024-09-29 18:10:58,281][00189] Fps is (10 sec: 4505.8, 60 sec: 3822.9, 300 sec: 3597.4). Total num frames: 843776. Throughput: 0: 961.6. Samples: 206298. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 18:10:58,287][00189] Avg episode reward: [(0, '4.719')] [2024-09-29 18:11:03,281][00189] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3573.1). Total num frames: 856064. Throughput: 0: 941.2. Samples: 211172. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-29 18:11:03,283][00189] Avg episode reward: [(0, '4.591')] [2024-09-29 18:11:07,915][07556] Updated weights for policy 0, policy_version 214 (0.0025) [2024-09-29 18:11:08,281][00189] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3584.0). Total num frames: 876544. Throughput: 0: 913.3. Samples: 213352. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 18:11:08,283][00189] Avg episode reward: [(0, '4.803')] [2024-09-29 18:11:08,294][07543] Saving new best policy, reward=4.803! [2024-09-29 18:11:13,281][00189] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3594.4). Total num frames: 897024. Throughput: 0: 934.3. Samples: 219792. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-29 18:11:13,293][00189] Avg episode reward: [(0, '4.743')] [2024-09-29 18:11:17,994][07556] Updated weights for policy 0, policy_version 224 (0.0013) [2024-09-29 18:11:18,285][00189] Fps is (10 sec: 4096.0, 60 sec: 3823.0, 300 sec: 3604.5). Total num frames: 917504. Throughput: 0: 964.4. Samples: 225918. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-29 18:11:18,290][00189] Avg episode reward: [(0, '4.681')] [2024-09-29 18:11:23,281][00189] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3582.0). Total num frames: 929792. Throughput: 0: 934.6. Samples: 227992. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-29 18:11:23,285][00189] Avg episode reward: [(0, '4.798')] [2024-09-29 18:11:28,281][00189] Fps is (10 sec: 3276.8, 60 sec: 3687.2, 300 sec: 3591.9). Total num frames: 950272. Throughput: 0: 910.8. Samples: 233430. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-29 18:11:28,287][00189] Avg episode reward: [(0, '4.952')] [2024-09-29 18:11:28,300][07543] Saving new best policy, reward=4.952! [2024-09-29 18:11:29,319][07556] Updated weights for policy 0, policy_version 234 (0.0014) [2024-09-29 18:11:33,281][00189] Fps is (10 sec: 4505.7, 60 sec: 3822.9, 300 sec: 3616.8). Total num frames: 974848. Throughput: 0: 965.4. Samples: 240138. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-29 18:11:33,285][00189] Avg episode reward: [(0, '4.852')] [2024-09-29 18:11:38,286][00189] Fps is (10 sec: 3684.6, 60 sec: 3686.1, 300 sec: 3595.3). Total num frames: 987136. Throughput: 0: 952.7. Samples: 242494. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-29 18:11:38,287][00189] Avg episode reward: [(0, '4.816')] [2024-09-29 18:11:41,287][07556] Updated weights for policy 0, policy_version 244 (0.0022) [2024-09-29 18:11:43,080][00189] Component Batcher_0 stopped! [2024-09-29 18:11:43,080][07543] Stopping Batcher_0... [2024-09-29 18:11:43,083][07543] Loop batcher_evt_loop terminating... [2024-09-29 18:11:43,087][07543] Saving /content/train_dir/samplefactory-vizdoom-v1/checkpoint_p0/checkpoint_000000246_1007616.pth... [2024-09-29 18:11:43,138][07556] Weights refcount: 2 0 [2024-09-29 18:11:43,149][00189] Component InferenceWorker_p0-w0 stopped! [2024-09-29 18:11:43,152][07556] Stopping InferenceWorker_p0-w0... [2024-09-29 18:11:43,153][07556] Loop inference_proc0-0_evt_loop terminating... [2024-09-29 18:11:43,213][07543] Removing /content/train_dir/samplefactory-vizdoom-v1/checkpoint_p0/checkpoint_000000085_348160.pth [2024-09-29 18:11:43,227][07543] Saving /content/train_dir/samplefactory-vizdoom-v1/checkpoint_p0/checkpoint_000000246_1007616.pth... [2024-09-29 18:11:43,387][00189] Component LearnerWorker_p0 stopped! [2024-09-29 18:11:43,387][07543] Stopping LearnerWorker_p0... [2024-09-29 18:11:43,396][07543] Loop learner_proc0_evt_loop terminating... [2024-09-29 18:11:43,449][07559] Stopping RolloutWorker_w0... [2024-09-29 18:11:43,448][00189] Component RolloutWorker_w0 stopped! [2024-09-29 18:11:43,450][07559] Loop rollout_proc0_evt_loop terminating... [2024-09-29 18:11:43,456][00189] Component RolloutWorker_w2 stopped! [2024-09-29 18:11:43,456][07557] Stopping RolloutWorker_w2... [2024-09-29 18:11:43,458][07557] Loop rollout_proc2_evt_loop terminating... [2024-09-29 18:11:43,464][00189] Component RolloutWorker_w6 stopped! [2024-09-29 18:11:43,464][07562] Stopping RolloutWorker_w6... [2024-09-29 18:11:43,466][07562] Loop rollout_proc6_evt_loop terminating... [2024-09-29 18:11:43,518][00189] Component RolloutWorker_w4 stopped! [2024-09-29 18:11:43,518][07560] Stopping RolloutWorker_w4... [2024-09-29 18:11:43,531][07560] Loop rollout_proc4_evt_loop terminating... [2024-09-29 18:11:43,638][07563] Stopping RolloutWorker_w5... [2024-09-29 18:11:43,638][00189] Component RolloutWorker_w5 stopped! [2024-09-29 18:11:43,639][07563] Loop rollout_proc5_evt_loop terminating... [2024-09-29 18:11:43,661][00189] Component RolloutWorker_w1 stopped! [2024-09-29 18:11:43,661][07558] Stopping RolloutWorker_w1... [2024-09-29 18:11:43,668][00189] Component RolloutWorker_w3 stopped! [2024-09-29 18:11:43,666][07558] Loop rollout_proc1_evt_loop terminating... [2024-09-29 18:11:43,668][07561] Stopping RolloutWorker_w3... [2024-09-29 18:11:43,686][07561] Loop rollout_proc3_evt_loop terminating... [2024-09-29 18:11:43,689][00189] Component RolloutWorker_w7 stopped! [2024-09-29 18:11:43,689][07564] Stopping RolloutWorker_w7... [2024-09-29 18:11:43,690][00189] Waiting for process learner_proc0 to stop... [2024-09-29 18:11:43,707][07564] Loop rollout_proc7_evt_loop terminating... [2024-09-29 18:11:44,799][00189] Waiting for process inference_proc0-0 to join... [2024-09-29 18:11:44,869][00189] Waiting for process rollout_proc0 to join... [2024-09-29 18:11:46,245][00189] Waiting for process rollout_proc1 to join... [2024-09-29 18:11:46,249][00189] Waiting for process rollout_proc2 to join... [2024-09-29 18:11:46,254][00189] Waiting for process rollout_proc3 to join... [2024-09-29 18:11:46,258][00189] Waiting for process rollout_proc4 to join... [2024-09-29 18:11:46,263][00189] Waiting for process rollout_proc5 to join... [2024-09-29 18:11:46,268][00189] Waiting for process rollout_proc6 to join... [2024-09-29 18:11:46,271][00189] Waiting for process rollout_proc7 to join... [2024-09-29 18:11:46,276][00189] Batcher 0 profile tree view: batching: 6.6854, releasing_batches: 0.0071 [2024-09-29 18:11:46,279][00189] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0040 wait_policy_total: 124.2264 update_model: 1.8470 weight_update: 0.0022 one_step: 0.0027 handle_policy_step: 139.9538 deserialize: 3.6917, stack: 0.7440, obs_to_device_normalize: 29.2683, forward: 70.3915, send_messages: 6.7209 prepare_outputs: 21.9718 to_cpu: 13.5312 [2024-09-29 18:11:46,282][00189] Learner 0 profile tree view: misc: 0.0013, prepare_batch: 7.7213 train: 20.1748 epoch_init: 0.0013, minibatch_init: 0.0016, losses_postprocess: 0.1348, kl_divergence: 0.1211, after_optimizer: 0.9527 calculate_losses: 6.4110 losses_init: 0.0046, forward_head: 0.5808, bptt_initial: 3.9899, tail: 0.3308, advantages_returns: 0.0824, losses: 0.7235 bptt: 0.5978 bptt_forward_core: 0.5812 update: 12.3863 clip: 0.3863 [2024-09-29 18:11:46,283][00189] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.0759, enqueue_policy_requests: 31.4690, env_step: 210.3667, overhead: 3.6807, complete_rollouts: 2.1235 save_policy_outputs: 6.5771 split_output_tensors: 2.1776 [2024-09-29 18:11:46,285][00189] RolloutWorker_w7 profile tree view: wait_for_trajectories: 0.0831, enqueue_policy_requests: 29.3873, env_step: 204.6449, overhead: 3.5443, complete_rollouts: 1.5031 save_policy_outputs: 6.7584 split_output_tensors: 2.1056 [2024-09-29 18:11:46,287][00189] Loop Runner_EvtLoop terminating... [2024-09-29 18:11:46,289][00189] Runner profile tree view: main_loop: 295.7994 [2024-09-29 18:11:46,290][00189] Collected {0: 1007616}, FPS: 3351.0 [2024-09-29 18:12:57,108][00189] Environment doom_basic already registered, overwriting... [2024-09-29 18:12:57,110][00189] Environment doom_two_colors_easy already registered, overwriting... [2024-09-29 18:12:57,112][00189] Environment doom_two_colors_hard already registered, overwriting... [2024-09-29 18:12:57,114][00189] Environment doom_dm already registered, overwriting... [2024-09-29 18:12:57,115][00189] Environment doom_dwango5 already registered, overwriting... [2024-09-29 18:12:57,117][00189] Environment doom_my_way_home_flat_actions already registered, overwriting... [2024-09-29 18:12:57,118][00189] Environment doom_defend_the_center_flat_actions already registered, overwriting... [2024-09-29 18:12:57,121][00189] Environment doom_my_way_home already registered, overwriting... [2024-09-29 18:12:57,122][00189] Environment doom_deadly_corridor already registered, overwriting... [2024-09-29 18:12:57,123][00189] Environment doom_defend_the_center already registered, overwriting... [2024-09-29 18:12:57,124][00189] Environment doom_defend_the_line already registered, overwriting... [2024-09-29 18:12:57,127][00189] Environment doom_health_gathering already registered, overwriting... [2024-09-29 18:12:57,128][00189] Environment doom_health_gathering_supreme already registered, overwriting... [2024-09-29 18:12:57,131][00189] Environment doom_battle already registered, overwriting... [2024-09-29 18:12:57,133][00189] Environment doom_battle2 already registered, overwriting... [2024-09-29 18:12:57,134][00189] Environment doom_duel_bots already registered, overwriting... [2024-09-29 18:12:57,135][00189] Environment doom_deathmatch_bots already registered, overwriting... [2024-09-29 18:12:57,136][00189] Environment doom_duel already registered, overwriting... [2024-09-29 18:12:57,138][00189] Environment doom_deathmatch_full already registered, overwriting... [2024-09-29 18:12:57,139][00189] Environment doom_benchmark already registered, overwriting... [2024-09-29 18:12:57,140][00189] register_encoder_factory: [2024-09-29 18:12:57,155][00189] Loading existing experiment configuration from /content/train_dir/samplefactory-vizdoom-v1/config.json [2024-09-29 18:12:57,158][00189] Overriding arg 'train_for_env_steps' with value 4000000 passed from command line [2024-09-29 18:12:57,165][00189] Experiment dir /content/train_dir/samplefactory-vizdoom-v1 already exists! [2024-09-29 18:12:57,166][00189] Resuming existing experiment from /content/train_dir/samplefactory-vizdoom-v1... [2024-09-29 18:12:57,167][00189] Weights and Biases integration disabled [2024-09-29 18:12:57,172][00189] Environment var CUDA_VISIBLE_DEVICES is 0 [2024-09-29 18:12:58,768][00189] Starting experiment with the following configuration: help=False algo=APPO env=doom_health_gathering_supreme experiment=samplefactory-vizdoom-v1 train_dir=/content/train_dir restart_behavior=resume device=gpu seed=None num_policies=1 async_rl=True serial_mode=False batched_sampling=False num_batches_to_accumulate=2 worker_num_splits=2 policy_workers_per_policy=1 max_policy_lag=1000 num_workers=8 num_envs_per_worker=4 batch_size=1024 num_batches_per_epoch=1 num_epochs=1 rollout=32 recurrence=32 shuffle_minibatches=False gamma=0.99 reward_scale=1.0 reward_clip=1000.0 value_bootstrap=False normalize_returns=True exploration_loss_coeff=0.001 value_loss_coeff=0.5 kl_loss_coeff=0.0 exploration_loss=symmetric_kl gae_lambda=0.95 ppo_clip_ratio=0.1 ppo_clip_value=0.2 with_vtrace=False vtrace_rho=1.0 vtrace_c=1.0 optimizer=adam adam_eps=1e-06 adam_beta1=0.9 adam_beta2=0.999 max_grad_norm=4.0 learning_rate=0.0001 lr_schedule=constant lr_schedule_kl_threshold=0.008 lr_adaptive_min=1e-06 lr_adaptive_max=0.01 obs_subtract_mean=0.0 obs_scale=255.0 normalize_input=True normalize_input_keys=None decorrelate_experience_max_seconds=0 decorrelate_envs_on_one_worker=True actor_worker_gpus=[] set_workers_cpu_affinity=True force_envs_single_thread=False default_niceness=0 log_to_file=True experiment_summaries_interval=10 flush_summaries_interval=30 stats_avg=100 summaries_use_frameskip=True heartbeat_interval=20 heartbeat_reporting_interval=600 train_for_env_steps=4000000 train_for_seconds=10000000000 save_every_sec=120 keep_checkpoints=2 load_checkpoint_kind=latest save_milestones_sec=-1 save_best_every_sec=5 save_best_metric=reward save_best_after=100000 benchmark=False encoder_mlp_layers=[512, 512] encoder_conv_architecture=convnet_simple encoder_conv_mlp_layers=[512] use_rnn=True rnn_size=512 rnn_type=gru rnn_num_layers=1 decoder_mlp_layers=[] nonlinearity=elu policy_initialization=orthogonal policy_init_gain=1.0 actor_critic_share_weights=True adaptive_stddev=True continuous_tanh_scale=0.0 initial_stddev=1.0 use_env_info_cache=False env_gpu_actions=False env_gpu_observations=True env_frameskip=4 env_framestack=1 pixel_format=CHW use_record_episode_statistics=False with_wandb=False wandb_user=None wandb_project=sample_factory wandb_group=None wandb_job_type=SF wandb_tags=[] with_pbt=False pbt_mix_policies_in_one_env=True pbt_period_env_steps=5000000 pbt_start_mutation=20000000 pbt_replace_fraction=0.3 pbt_mutation_rate=0.15 pbt_replace_reward_gap=0.1 pbt_replace_reward_gap_absolute=1e-06 pbt_optimize_gamma=False pbt_target_objective=true_objective pbt_perturb_min=1.1 pbt_perturb_max=1.5 num_agents=-1 num_humans=0 num_bots=-1 start_bot_difficulty=None timelimit=None res_w=128 res_h=72 wide_aspect_ratio=False eval_env_frameskip=1 fps=35 command_line=--env=doom_health_gathering_supreme --num_workers=8 --num_envs_per_worker=4 --train_for_env_steps=10000 --experiment=samplefactory-vizdoom-v1 --restart_behavior=resume cli_args={'env': 'doom_health_gathering_supreme', 'experiment': 'samplefactory-vizdoom-v1', 'restart_behavior': 'resume', 'num_workers': 8, 'num_envs_per_worker': 4, 'train_for_env_steps': 10000} git_hash=unknown git_repo_name=not a git repository [2024-09-29 18:12:58,771][00189] Saving configuration to /content/train_dir/samplefactory-vizdoom-v1/config.json... [2024-09-29 18:12:58,775][00189] Rollout worker 0 uses device cpu [2024-09-29 18:12:58,777][00189] Rollout worker 1 uses device cpu [2024-09-29 18:12:58,780][00189] Rollout worker 2 uses device cpu [2024-09-29 18:12:58,781][00189] Rollout worker 3 uses device cpu [2024-09-29 18:12:58,782][00189] Rollout worker 4 uses device cpu [2024-09-29 18:12:58,783][00189] Rollout worker 5 uses device cpu [2024-09-29 18:12:58,784][00189] Rollout worker 6 uses device cpu [2024-09-29 18:12:58,786][00189] Rollout worker 7 uses device cpu [2024-09-29 18:12:58,929][00189] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-09-29 18:12:58,931][00189] InferenceWorker_p0-w0: min num requests: 2 [2024-09-29 18:12:58,966][00189] Starting all processes... [2024-09-29 18:12:58,967][00189] Starting process learner_proc0 [2024-09-29 18:12:59,016][00189] Starting all processes... [2024-09-29 18:12:59,022][00189] Starting process inference_proc0-0 [2024-09-29 18:12:59,022][00189] Starting process rollout_proc0 [2024-09-29 18:12:59,040][00189] Starting process rollout_proc1 [2024-09-29 18:12:59,040][00189] Starting process rollout_proc2 [2024-09-29 18:12:59,042][00189] Starting process rollout_proc3 [2024-09-29 18:12:59,042][00189] Starting process rollout_proc4 [2024-09-29 18:12:59,042][00189] Starting process rollout_proc5 [2024-09-29 18:12:59,044][00189] Starting process rollout_proc6 [2024-09-29 18:12:59,044][00189] Starting process rollout_proc7 [2024-09-29 18:13:10,386][09285] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-09-29 18:13:10,390][09285] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2024-09-29 18:13:10,432][09303] Worker 3 uses CPU cores [1] [2024-09-29 18:13:10,460][09285] Num visible devices: 1 [2024-09-29 18:13:10,487][09285] Starting seed is not provided [2024-09-29 18:13:10,487][09285] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-09-29 18:13:10,488][09285] Initializing actor-critic model on device cuda:0 [2024-09-29 18:13:10,489][09285] RunningMeanStd input shape: (3, 72, 128) [2024-09-29 18:13:10,490][09285] RunningMeanStd input shape: (1,) [2024-09-29 18:13:10,528][09306] Worker 7 uses CPU cores [1] [2024-09-29 18:13:10,544][09299] Worker 0 uses CPU cores [0] [2024-09-29 18:13:10,562][09285] ConvEncoder: input_channels=3 [2024-09-29 18:13:10,660][09298] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-09-29 18:13:10,665][09298] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2024-09-29 18:13:10,732][09305] Worker 6 uses CPU cores [0] [2024-09-29 18:13:10,742][09304] Worker 5 uses CPU cores [1] [2024-09-29 18:13:10,746][09298] Num visible devices: 1 [2024-09-29 18:13:10,763][09301] Worker 2 uses CPU cores [0] [2024-09-29 18:13:10,777][09300] Worker 1 uses CPU cores [1] [2024-09-29 18:13:10,790][09302] Worker 4 uses CPU cores [0] [2024-09-29 18:13:10,880][09285] Conv encoder output size: 512 [2024-09-29 18:13:10,880][09285] Policy head output size: 512 [2024-09-29 18:13:10,896][09285] Created Actor Critic model with architecture: [2024-09-29 18:13:10,896][09285] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2024-09-29 18:13:12,433][09285] Using optimizer [2024-09-29 18:13:12,434][09285] Loading state from checkpoint /content/train_dir/samplefactory-vizdoom-v1/checkpoint_p0/checkpoint_000000246_1007616.pth... [2024-09-29 18:13:12,465][09285] Loading model from checkpoint [2024-09-29 18:13:12,470][09285] Loaded experiment state at self.train_step=246, self.env_steps=1007616 [2024-09-29 18:13:12,470][09285] Initialized policy 0 weights for model version 246 [2024-09-29 18:13:12,473][09285] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-09-29 18:13:12,481][09285] LearnerWorker_p0 finished initialization! [2024-09-29 18:13:12,570][09298] RunningMeanStd input shape: (3, 72, 128) [2024-09-29 18:13:12,571][09298] RunningMeanStd input shape: (1,) [2024-09-29 18:13:12,590][09298] ConvEncoder: input_channels=3 [2024-09-29 18:13:12,691][09298] Conv encoder output size: 512 [2024-09-29 18:13:12,691][09298] Policy head output size: 512 [2024-09-29 18:13:14,305][00189] Inference worker 0-0 is ready! [2024-09-29 18:13:14,307][00189] All inference workers are ready! Signal rollout workers to start! [2024-09-29 18:13:14,388][09302] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-29 18:13:14,385][09305] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-29 18:13:14,389][09301] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-29 18:13:14,394][09299] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-29 18:13:14,473][09306] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-29 18:13:14,462][09303] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-29 18:13:14,464][09304] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-29 18:13:14,484][09300] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-29 18:13:15,597][09305] Decorrelating experience for 0 frames... [2024-09-29 18:13:15,607][09299] Decorrelating experience for 0 frames... [2024-09-29 18:13:16,096][09306] Decorrelating experience for 0 frames... [2024-09-29 18:13:16,098][09303] Decorrelating experience for 0 frames... [2024-09-29 18:13:16,101][09304] Decorrelating experience for 0 frames... [2024-09-29 18:13:16,772][09301] Decorrelating experience for 0 frames... [2024-09-29 18:13:17,172][00189] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 1007616. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-09-29 18:13:17,448][09299] Decorrelating experience for 32 frames... [2024-09-29 18:13:17,451][09305] Decorrelating experience for 32 frames... [2024-09-29 18:13:17,511][09306] Decorrelating experience for 32 frames... [2024-09-29 18:13:17,527][09303] Decorrelating experience for 32 frames... [2024-09-29 18:13:18,467][09304] Decorrelating experience for 32 frames... [2024-09-29 18:13:18,913][09301] Decorrelating experience for 32 frames... [2024-09-29 18:13:18,921][00189] Heartbeat connected on Batcher_0 [2024-09-29 18:13:18,926][00189] Heartbeat connected on LearnerWorker_p0 [2024-09-29 18:13:18,956][00189] Heartbeat connected on InferenceWorker_p0-w0 [2024-09-29 18:13:19,057][09299] Decorrelating experience for 64 frames... [2024-09-29 18:13:19,170][09300] Decorrelating experience for 0 frames... [2024-09-29 18:13:19,280][09306] Decorrelating experience for 64 frames... [2024-09-29 18:13:19,333][09302] Decorrelating experience for 0 frames... [2024-09-29 18:13:19,779][09304] Decorrelating experience for 64 frames... [2024-09-29 18:13:20,014][09303] Decorrelating experience for 64 frames... [2024-09-29 18:13:20,395][09301] Decorrelating experience for 64 frames... [2024-09-29 18:13:20,417][09302] Decorrelating experience for 32 frames... [2024-09-29 18:13:20,528][09305] Decorrelating experience for 64 frames... [2024-09-29 18:13:20,965][09304] Decorrelating experience for 96 frames... [2024-09-29 18:13:21,172][00189] Heartbeat connected on RolloutWorker_w5 [2024-09-29 18:13:21,309][09306] Decorrelating experience for 96 frames... [2024-09-29 18:13:21,409][09303] Decorrelating experience for 96 frames... [2024-09-29 18:13:21,502][09302] Decorrelating experience for 64 frames... [2024-09-29 18:13:21,511][09305] Decorrelating experience for 96 frames... [2024-09-29 18:13:21,549][00189] Heartbeat connected on RolloutWorker_w7 [2024-09-29 18:13:21,687][00189] Heartbeat connected on RolloutWorker_w3 [2024-09-29 18:13:21,784][00189] Heartbeat connected on RolloutWorker_w6 [2024-09-29 18:13:22,172][00189] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 1007616. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-09-29 18:13:22,445][09300] Decorrelating experience for 32 frames... [2024-09-29 18:13:23,935][09301] Decorrelating experience for 96 frames... [2024-09-29 18:13:24,295][00189] Heartbeat connected on RolloutWorker_w2 [2024-09-29 18:13:24,294][09302] Decorrelating experience for 96 frames... [2024-09-29 18:13:24,832][00189] Heartbeat connected on RolloutWorker_w4 [2024-09-29 18:13:25,476][09299] Decorrelating experience for 96 frames... [2024-09-29 18:13:25,852][09300] Decorrelating experience for 64 frames... [2024-09-29 18:13:26,026][09285] Signal inference workers to stop experience collection... [2024-09-29 18:13:26,043][09298] InferenceWorker_p0-w0: stopping experience collection [2024-09-29 18:13:26,070][00189] Heartbeat connected on RolloutWorker_w0 [2024-09-29 18:13:26,423][09300] Decorrelating experience for 96 frames... [2024-09-29 18:13:26,485][00189] Heartbeat connected on RolloutWorker_w1 [2024-09-29 18:13:26,611][09285] Signal inference workers to resume experience collection... [2024-09-29 18:13:26,612][09298] InferenceWorker_p0-w0: resuming experience collection [2024-09-29 18:13:27,172][00189] Fps is (10 sec: 409.6, 60 sec: 409.6, 300 sec: 409.6). Total num frames: 1011712. Throughput: 0: 237.2. Samples: 2372. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-09-29 18:13:27,178][00189] Avg episode reward: [(0, '3.380')] [2024-09-29 18:13:32,172][00189] Fps is (10 sec: 2047.9, 60 sec: 1365.3, 300 sec: 1365.3). Total num frames: 1028096. Throughput: 0: 271.6. Samples: 4074. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-29 18:13:32,177][00189] Avg episode reward: [(0, '4.027')] [2024-09-29 18:13:37,174][00189] Fps is (10 sec: 2457.0, 60 sec: 1433.5, 300 sec: 1433.5). Total num frames: 1036288. Throughput: 0: 388.5. Samples: 7770. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 18:13:37,179][00189] Avg episode reward: [(0, '4.581')] [2024-09-29 18:13:39,119][09298] Updated weights for policy 0, policy_version 256 (0.0387) [2024-09-29 18:13:42,172][00189] Fps is (10 sec: 3276.9, 60 sec: 2129.9, 300 sec: 2129.9). Total num frames: 1060864. Throughput: 0: 537.2. Samples: 13430. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 18:13:42,174][00189] Avg episode reward: [(0, '5.311')] [2024-09-29 18:13:42,184][09285] Saving new best policy, reward=5.311! [2024-09-29 18:13:47,172][00189] Fps is (10 sec: 4506.6, 60 sec: 2457.6, 300 sec: 2457.6). Total num frames: 1081344. Throughput: 0: 556.0. Samples: 16680. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-29 18:13:47,179][00189] Avg episode reward: [(0, '5.552')] [2024-09-29 18:13:47,186][09285] Saving new best policy, reward=5.552! [2024-09-29 18:13:49,224][09298] Updated weights for policy 0, policy_version 266 (0.0016) [2024-09-29 18:13:52,172][00189] Fps is (10 sec: 3276.8, 60 sec: 2457.6, 300 sec: 2457.6). Total num frames: 1093632. Throughput: 0: 627.5. Samples: 21962. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-29 18:13:52,174][00189] Avg episode reward: [(0, '5.455')] [2024-09-29 18:13:57,172][00189] Fps is (10 sec: 3276.8, 60 sec: 2662.4, 300 sec: 2662.4). Total num frames: 1114112. Throughput: 0: 670.0. Samples: 26798. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-29 18:13:57,179][00189] Avg episode reward: [(0, '5.209')] [2024-09-29 18:14:00,470][09298] Updated weights for policy 0, policy_version 276 (0.0016) [2024-09-29 18:14:02,172][00189] Fps is (10 sec: 4505.6, 60 sec: 2912.7, 300 sec: 2912.7). Total num frames: 1138688. Throughput: 0: 670.8. Samples: 30186. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-29 18:14:02,179][00189] Avg episode reward: [(0, '5.397')] [2024-09-29 18:14:07,172][00189] Fps is (10 sec: 4096.0, 60 sec: 2949.1, 300 sec: 2949.1). Total num frames: 1155072. Throughput: 0: 816.8. Samples: 36756. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-09-29 18:14:07,177][00189] Avg episode reward: [(0, '5.580')] [2024-09-29 18:14:07,188][09285] Saving new best policy, reward=5.580! [2024-09-29 18:14:12,172][00189] Fps is (10 sec: 2867.1, 60 sec: 2904.4, 300 sec: 2904.4). Total num frames: 1167360. Throughput: 0: 853.2. Samples: 40764. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 18:14:12,177][00189] Avg episode reward: [(0, '5.805')] [2024-09-29 18:14:12,179][09285] Saving new best policy, reward=5.805! [2024-09-29 18:14:12,548][09298] Updated weights for policy 0, policy_version 286 (0.0025) [2024-09-29 18:14:17,172][00189] Fps is (10 sec: 3686.4, 60 sec: 3072.0, 300 sec: 3072.0). Total num frames: 1191936. Throughput: 0: 879.2. Samples: 43638. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 18:14:17,179][00189] Avg episode reward: [(0, '6.150')] [2024-09-29 18:14:17,191][09285] Saving new best policy, reward=6.150! [2024-09-29 18:14:21,589][09298] Updated weights for policy 0, policy_version 296 (0.0013) [2024-09-29 18:14:22,172][00189] Fps is (10 sec: 4505.7, 60 sec: 3413.3, 300 sec: 3150.8). Total num frames: 1212416. Throughput: 0: 947.1. Samples: 50386. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 18:14:22,178][00189] Avg episode reward: [(0, '5.959')] [2024-09-29 18:14:27,172][00189] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3159.8). Total num frames: 1228800. Throughput: 0: 932.1. Samples: 55374. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 18:14:27,177][00189] Avg episode reward: [(0, '5.920')] [2024-09-29 18:14:32,172][00189] Fps is (10 sec: 3276.8, 60 sec: 3618.2, 300 sec: 3167.6). Total num frames: 1245184. Throughput: 0: 907.7. Samples: 57526. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-29 18:14:32,179][00189] Avg episode reward: [(0, '5.893')] [2024-09-29 18:14:33,352][09298] Updated weights for policy 0, policy_version 306 (0.0018) [2024-09-29 18:14:37,174][00189] Fps is (10 sec: 4095.1, 60 sec: 3891.2, 300 sec: 3276.7). Total num frames: 1269760. Throughput: 0: 938.5. Samples: 64198. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-29 18:14:37,181][00189] Avg episode reward: [(0, '5.968')] [2024-09-29 18:14:42,172][00189] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3276.8). Total num frames: 1286144. Throughput: 0: 963.2. Samples: 70142. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 18:14:42,175][00189] Avg episode reward: [(0, '6.542')] [2024-09-29 18:14:42,178][09285] Saving new best policy, reward=6.542! [2024-09-29 18:14:44,316][09298] Updated weights for policy 0, policy_version 316 (0.0024) [2024-09-29 18:14:47,172][00189] Fps is (10 sec: 3277.6, 60 sec: 3686.4, 300 sec: 3276.8). Total num frames: 1302528. Throughput: 0: 932.5. Samples: 72150. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-29 18:14:47,174][00189] Avg episode reward: [(0, '6.936')] [2024-09-29 18:14:47,188][09285] Saving new best policy, reward=6.936! [2024-09-29 18:14:52,172][00189] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3319.9). Total num frames: 1323008. Throughput: 0: 908.0. Samples: 77614. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-29 18:14:52,175][00189] Avg episode reward: [(0, '7.766')] [2024-09-29 18:14:52,176][09285] Saving new best policy, reward=7.766! [2024-09-29 18:14:54,831][09298] Updated weights for policy 0, policy_version 326 (0.0012) [2024-09-29 18:14:57,176][00189] Fps is (10 sec: 4094.2, 60 sec: 3822.7, 300 sec: 3358.6). Total num frames: 1343488. Throughput: 0: 967.2. Samples: 84292. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 18:14:57,182][00189] Avg episode reward: [(0, '7.587')] [2024-09-29 18:14:57,202][09285] Saving /content/train_dir/samplefactory-vizdoom-v1/checkpoint_p0/checkpoint_000000328_1343488.pth... [2024-09-29 18:14:57,365][09285] Removing /content/train_dir/samplefactory-vizdoom-v1/checkpoint_p0/checkpoint_000000195_798720.pth [2024-09-29 18:15:02,181][00189] Fps is (10 sec: 3273.8, 60 sec: 3617.6, 300 sec: 3315.5). Total num frames: 1355776. Throughput: 0: 955.3. Samples: 86636. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 18:15:02,187][00189] Avg episode reward: [(0, '7.413')] [2024-09-29 18:15:06,850][09298] Updated weights for policy 0, policy_version 336 (0.0012) [2024-09-29 18:15:07,172][00189] Fps is (10 sec: 3278.2, 60 sec: 3686.4, 300 sec: 3351.3). Total num frames: 1376256. Throughput: 0: 906.1. Samples: 91162. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-29 18:15:07,174][00189] Avg episode reward: [(0, '7.207')] [2024-09-29 18:15:12,172][00189] Fps is (10 sec: 4099.8, 60 sec: 3822.9, 300 sec: 3383.7). Total num frames: 1396736. Throughput: 0: 944.3. Samples: 97866. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 18:15:12,178][00189] Avg episode reward: [(0, '6.842')] [2024-09-29 18:15:16,808][09298] Updated weights for policy 0, policy_version 346 (0.0019) [2024-09-29 18:15:17,172][00189] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3413.3). Total num frames: 1417216. Throughput: 0: 970.4. Samples: 101196. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 18:15:17,178][00189] Avg episode reward: [(0, '6.760')] [2024-09-29 18:15:22,172][00189] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3375.1). Total num frames: 1429504. Throughput: 0: 919.2. Samples: 105562. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 18:15:22,176][00189] Avg episode reward: [(0, '7.711')] [2024-09-29 18:15:27,172][00189] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3434.3). Total num frames: 1454080. Throughput: 0: 926.8. Samples: 111850. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-29 18:15:27,177][00189] Avg episode reward: [(0, '7.851')] [2024-09-29 18:15:27,187][09285] Saving new best policy, reward=7.851! [2024-09-29 18:15:27,581][09298] Updated weights for policy 0, policy_version 356 (0.0017) [2024-09-29 18:15:32,176][00189] Fps is (10 sec: 4503.6, 60 sec: 3822.7, 300 sec: 3458.7). Total num frames: 1474560. Throughput: 0: 955.6. Samples: 115156. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 18:15:32,181][00189] Avg episode reward: [(0, '8.089')] [2024-09-29 18:15:32,187][09285] Saving new best policy, reward=8.089! [2024-09-29 18:15:37,173][00189] Fps is (10 sec: 3685.9, 60 sec: 3686.5, 300 sec: 3452.3). Total num frames: 1490944. Throughput: 0: 948.8. Samples: 120312. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 18:15:37,176][00189] Avg episode reward: [(0, '7.643')] [2024-09-29 18:15:39,669][09298] Updated weights for policy 0, policy_version 366 (0.0020) [2024-09-29 18:15:42,172][00189] Fps is (10 sec: 3278.1, 60 sec: 3686.4, 300 sec: 3446.3). Total num frames: 1507328. Throughput: 0: 914.0. Samples: 125418. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 18:15:42,175][00189] Avg episode reward: [(0, '7.793')] [2024-09-29 18:15:47,172][00189] Fps is (10 sec: 4096.6, 60 sec: 3822.9, 300 sec: 3495.3). Total num frames: 1531904. Throughput: 0: 935.3. Samples: 128716. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 18:15:47,179][00189] Avg episode reward: [(0, '8.282')] [2024-09-29 18:15:47,187][09285] Saving new best policy, reward=8.282! [2024-09-29 18:15:48,666][09298] Updated weights for policy 0, policy_version 376 (0.0019) [2024-09-29 18:15:52,172][00189] Fps is (10 sec: 4096.2, 60 sec: 3754.7, 300 sec: 3488.2). Total num frames: 1548288. Throughput: 0: 976.9. Samples: 135124. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 18:15:52,175][00189] Avg episode reward: [(0, '7.818')] [2024-09-29 18:15:57,172][00189] Fps is (10 sec: 3276.8, 60 sec: 3686.7, 300 sec: 3481.6). Total num frames: 1564672. Throughput: 0: 923.5. Samples: 139424. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-29 18:15:57,176][00189] Avg episode reward: [(0, '8.011')] [2024-09-29 18:16:00,274][09298] Updated weights for policy 0, policy_version 386 (0.0022) [2024-09-29 18:16:02,172][00189] Fps is (10 sec: 4096.0, 60 sec: 3891.8, 300 sec: 3525.0). Total num frames: 1589248. Throughput: 0: 924.7. Samples: 142806. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 18:16:02,174][00189] Avg episode reward: [(0, '8.009')] [2024-09-29 18:16:07,172][00189] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3541.8). Total num frames: 1609728. Throughput: 0: 980.4. Samples: 149680. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 18:16:07,176][00189] Avg episode reward: [(0, '8.221')] [2024-09-29 18:16:10,871][09298] Updated weights for policy 0, policy_version 396 (0.0030) [2024-09-29 18:16:12,172][00189] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3510.9). Total num frames: 1622016. Throughput: 0: 946.4. Samples: 154436. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-29 18:16:12,174][00189] Avg episode reward: [(0, '8.268')] [2024-09-29 18:16:17,172][00189] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3527.1). Total num frames: 1642496. Throughput: 0: 925.3. Samples: 156790. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 18:16:17,178][00189] Avg episode reward: [(0, '8.791')] [2024-09-29 18:16:17,188][09285] Saving new best policy, reward=8.791! [2024-09-29 18:16:21,228][09298] Updated weights for policy 0, policy_version 406 (0.0013) [2024-09-29 18:16:22,172][00189] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3564.6). Total num frames: 1667072. Throughput: 0: 958.0. Samples: 163422. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 18:16:22,181][00189] Avg episode reward: [(0, '9.316')] [2024-09-29 18:16:22,190][09285] Saving new best policy, reward=9.316! [2024-09-29 18:16:27,173][00189] Fps is (10 sec: 4095.7, 60 sec: 3822.9, 300 sec: 3557.0). Total num frames: 1683456. Throughput: 0: 971.5. Samples: 169136. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-09-29 18:16:27,177][00189] Avg episode reward: [(0, '9.245')] [2024-09-29 18:16:32,172][00189] Fps is (10 sec: 3276.8, 60 sec: 3754.9, 300 sec: 3549.9). Total num frames: 1699840. Throughput: 0: 943.4. Samples: 171168. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-29 18:16:32,178][00189] Avg episode reward: [(0, '8.640')] [2024-09-29 18:16:32,996][09298] Updated weights for policy 0, policy_version 416 (0.0019) [2024-09-29 18:16:37,172][00189] Fps is (10 sec: 3686.6, 60 sec: 3823.0, 300 sec: 3563.5). Total num frames: 1720320. Throughput: 0: 931.4. Samples: 177036. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 18:16:37,181][00189] Avg episode reward: [(0, '8.147')] [2024-09-29 18:16:42,172][00189] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3576.5). Total num frames: 1740800. Throughput: 0: 972.0. Samples: 183162. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 18:16:42,176][00189] Avg episode reward: [(0, '8.984')] [2024-09-29 18:16:43,763][09298] Updated weights for policy 0, policy_version 426 (0.0017) [2024-09-29 18:16:47,173][00189] Fps is (10 sec: 3276.7, 60 sec: 3686.4, 300 sec: 3549.9). Total num frames: 1753088. Throughput: 0: 940.4. Samples: 185126. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-29 18:16:47,175][00189] Avg episode reward: [(0, '8.904')] [2024-09-29 18:16:52,172][00189] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3562.6). Total num frames: 1773568. Throughput: 0: 897.4. Samples: 190062. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 18:16:52,174][00189] Avg episode reward: [(0, '9.555')] [2024-09-29 18:16:52,181][09285] Saving new best policy, reward=9.555! [2024-09-29 18:16:54,903][09298] Updated weights for policy 0, policy_version 436 (0.0024) [2024-09-29 18:16:57,172][00189] Fps is (10 sec: 4096.3, 60 sec: 3822.9, 300 sec: 3574.7). Total num frames: 1794048. Throughput: 0: 941.3. Samples: 196796. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 18:16:57,175][00189] Avg episode reward: [(0, '9.641')] [2024-09-29 18:16:57,182][09285] Saving /content/train_dir/samplefactory-vizdoom-v1/checkpoint_p0/checkpoint_000000438_1794048.pth... [2024-09-29 18:16:57,309][09285] Removing /content/train_dir/samplefactory-vizdoom-v1/checkpoint_p0/checkpoint_000000246_1007616.pth [2024-09-29 18:16:57,328][09285] Saving new best policy, reward=9.641! [2024-09-29 18:17:02,172][00189] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3568.1). Total num frames: 1810432. Throughput: 0: 952.0. Samples: 199632. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 18:17:02,180][00189] Avg episode reward: [(0, '10.314')] [2024-09-29 18:17:02,191][09285] Saving new best policy, reward=10.314! [2024-09-29 18:17:07,093][09298] Updated weights for policy 0, policy_version 446 (0.0024) [2024-09-29 18:17:07,172][00189] Fps is (10 sec: 3276.7, 60 sec: 3618.1, 300 sec: 3561.7). Total num frames: 1826816. Throughput: 0: 895.8. Samples: 203734. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 18:17:07,177][00189] Avg episode reward: [(0, '11.558')] [2024-09-29 18:17:07,187][09285] Saving new best policy, reward=11.558! [2024-09-29 18:17:12,172][00189] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3573.1). Total num frames: 1847296. Throughput: 0: 902.0. Samples: 209724. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 18:17:12,174][00189] Avg episode reward: [(0, '11.598')] [2024-09-29 18:17:12,181][09285] Saving new best policy, reward=11.598! [2024-09-29 18:17:17,172][00189] Fps is (10 sec: 3686.5, 60 sec: 3686.4, 300 sec: 3566.9). Total num frames: 1863680. Throughput: 0: 926.4. Samples: 212856. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 18:17:17,179][00189] Avg episode reward: [(0, '11.211')] [2024-09-29 18:17:17,601][09298] Updated weights for policy 0, policy_version 456 (0.0023) [2024-09-29 18:17:22,172][00189] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3544.3). Total num frames: 1875968. Throughput: 0: 897.5. Samples: 217422. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-29 18:17:22,179][00189] Avg episode reward: [(0, '11.569')] [2024-09-29 18:17:27,172][00189] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3555.3). Total num frames: 1896448. Throughput: 0: 877.6. Samples: 222654. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 18:17:27,179][00189] Avg episode reward: [(0, '11.185')] [2024-09-29 18:17:29,311][09298] Updated weights for policy 0, policy_version 466 (0.0012) [2024-09-29 18:17:32,172][00189] Fps is (10 sec: 4505.6, 60 sec: 3686.4, 300 sec: 3582.0). Total num frames: 1921024. Throughput: 0: 902.1. Samples: 225718. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 18:17:32,175][00189] Avg episode reward: [(0, '11.254')] [2024-09-29 18:17:37,172][00189] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3560.4). Total num frames: 1933312. Throughput: 0: 918.9. Samples: 231414. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 18:17:37,180][00189] Avg episode reward: [(0, '12.200')] [2024-09-29 18:17:37,195][09285] Saving new best policy, reward=12.200! [2024-09-29 18:17:41,867][09298] Updated weights for policy 0, policy_version 476 (0.0015) [2024-09-29 18:17:42,172][00189] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3555.0). Total num frames: 1949696. Throughput: 0: 859.2. Samples: 235462. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 18:17:42,179][00189] Avg episode reward: [(0, '11.765')] [2024-09-29 18:17:47,172][00189] Fps is (10 sec: 3686.4, 60 sec: 3618.2, 300 sec: 3565.0). Total num frames: 1970176. Throughput: 0: 871.2. Samples: 238838. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 18:17:47,174][00189] Avg episode reward: [(0, '12.663')] [2024-09-29 18:17:47,188][09285] Saving new best policy, reward=12.663! [2024-09-29 18:17:51,427][09298] Updated weights for policy 0, policy_version 486 (0.0015) [2024-09-29 18:17:52,176][00189] Fps is (10 sec: 4094.2, 60 sec: 3617.9, 300 sec: 3574.6). Total num frames: 1990656. Throughput: 0: 923.7. Samples: 245302. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 18:17:52,179][00189] Avg episode reward: [(0, '13.120')] [2024-09-29 18:17:52,181][09285] Saving new best policy, reward=13.120! [2024-09-29 18:17:57,172][00189] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3554.7). Total num frames: 2002944. Throughput: 0: 883.0. Samples: 249458. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 18:17:57,178][00189] Avg episode reward: [(0, '12.865')] [2024-09-29 18:18:02,172][00189] Fps is (10 sec: 3278.2, 60 sec: 3549.9, 300 sec: 3564.2). Total num frames: 2023424. Throughput: 0: 874.8. Samples: 252220. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 18:18:02,178][00189] Avg episode reward: [(0, '12.603')] [2024-09-29 18:18:03,076][09298] Updated weights for policy 0, policy_version 496 (0.0012) [2024-09-29 18:18:07,172][00189] Fps is (10 sec: 4505.6, 60 sec: 3686.4, 300 sec: 3587.5). Total num frames: 2048000. Throughput: 0: 926.1. Samples: 259096. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 18:18:07,177][00189] Avg episode reward: [(0, '13.190')] [2024-09-29 18:18:07,188][09285] Saving new best policy, reward=13.190! [2024-09-29 18:18:12,173][00189] Fps is (10 sec: 4095.8, 60 sec: 3618.1, 300 sec: 3582.3). Total num frames: 2064384. Throughput: 0: 927.0. Samples: 264370. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-29 18:18:12,180][00189] Avg episode reward: [(0, '13.374')] [2024-09-29 18:18:12,186][09285] Saving new best policy, reward=13.374! [2024-09-29 18:18:15,134][09298] Updated weights for policy 0, policy_version 506 (0.0026) [2024-09-29 18:18:17,172][00189] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3637.8). Total num frames: 2080768. Throughput: 0: 902.2. Samples: 266318. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 18:18:17,175][00189] Avg episode reward: [(0, '14.395')] [2024-09-29 18:18:17,183][09285] Saving new best policy, reward=14.395! [2024-09-29 18:18:22,172][00189] Fps is (10 sec: 3686.6, 60 sec: 3754.7, 300 sec: 3693.3). Total num frames: 2101248. Throughput: 0: 913.8. Samples: 272536. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-29 18:18:22,175][00189] Avg episode reward: [(0, '14.509')] [2024-09-29 18:18:22,179][09285] Saving new best policy, reward=14.509! [2024-09-29 18:18:24,456][09298] Updated weights for policy 0, policy_version 516 (0.0016) [2024-09-29 18:18:27,172][00189] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3707.2). Total num frames: 2121728. Throughput: 0: 962.8. Samples: 278790. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 18:18:27,174][00189] Avg episode reward: [(0, '15.344')] [2024-09-29 18:18:27,186][09285] Saving new best policy, reward=15.344! [2024-09-29 18:18:32,172][00189] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3721.1). Total num frames: 2134016. Throughput: 0: 931.8. Samples: 280770. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-29 18:18:32,176][00189] Avg episode reward: [(0, '14.856')] [2024-09-29 18:18:36,346][09298] Updated weights for policy 0, policy_version 526 (0.0013) [2024-09-29 18:18:37,172][00189] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3707.2). Total num frames: 2154496. Throughput: 0: 905.3. Samples: 286036. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 18:18:37,178][00189] Avg episode reward: [(0, '14.113')] [2024-09-29 18:18:42,172][00189] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3721.1). Total num frames: 2179072. Throughput: 0: 962.3. Samples: 292760. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 18:18:42,174][00189] Avg episode reward: [(0, '14.682')] [2024-09-29 18:18:47,172][00189] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3721.1). Total num frames: 2191360. Throughput: 0: 955.3. Samples: 295208. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 18:18:47,180][00189] Avg episode reward: [(0, '15.286')] [2024-09-29 18:18:48,025][09298] Updated weights for policy 0, policy_version 536 (0.0021) [2024-09-29 18:18:52,172][00189] Fps is (10 sec: 2867.2, 60 sec: 3618.4, 300 sec: 3707.2). Total num frames: 2207744. Throughput: 0: 897.4. Samples: 299480. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 18:18:52,175][00189] Avg episode reward: [(0, '15.820')] [2024-09-29 18:18:52,179][09285] Saving new best policy, reward=15.820! [2024-09-29 18:18:57,172][00189] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3707.2). Total num frames: 2232320. Throughput: 0: 926.2. Samples: 306050. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 18:18:57,175][00189] Avg episode reward: [(0, '15.565')] [2024-09-29 18:18:57,195][09285] Saving /content/train_dir/samplefactory-vizdoom-v1/checkpoint_p0/checkpoint_000000545_2232320.pth... [2024-09-29 18:18:57,346][09285] Removing /content/train_dir/samplefactory-vizdoom-v1/checkpoint_p0/checkpoint_000000328_1343488.pth [2024-09-29 18:18:57,998][09298] Updated weights for policy 0, policy_version 546 (0.0018) [2024-09-29 18:19:02,172][00189] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3707.2). Total num frames: 2248704. Throughput: 0: 954.0. Samples: 309246. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 18:19:02,174][00189] Avg episode reward: [(0, '14.944')] [2024-09-29 18:19:07,176][00189] Fps is (10 sec: 2866.2, 60 sec: 3549.6, 300 sec: 3707.2). Total num frames: 2260992. Throughput: 0: 908.6. Samples: 313426. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 18:19:07,181][00189] Avg episode reward: [(0, '15.213')] [2024-09-29 18:19:10,161][09298] Updated weights for policy 0, policy_version 556 (0.0012) [2024-09-29 18:19:12,172][00189] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3707.2). Total num frames: 2285568. Throughput: 0: 899.8. Samples: 319280. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-29 18:19:12,178][00189] Avg episode reward: [(0, '14.297')] [2024-09-29 18:19:17,172][00189] Fps is (10 sec: 4507.2, 60 sec: 3754.7, 300 sec: 3707.2). Total num frames: 2306048. Throughput: 0: 927.5. Samples: 322506. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 18:19:17,179][00189] Avg episode reward: [(0, '13.956')] [2024-09-29 18:19:20,751][09298] Updated weights for policy 0, policy_version 566 (0.0012) [2024-09-29 18:19:22,176][00189] Fps is (10 sec: 3275.4, 60 sec: 3617.9, 300 sec: 3693.3). Total num frames: 2318336. Throughput: 0: 930.1. Samples: 327896. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 18:19:22,180][00189] Avg episode reward: [(0, '15.435')] [2024-09-29 18:19:27,172][00189] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3707.2). Total num frames: 2338816. Throughput: 0: 891.2. Samples: 332866. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-29 18:19:27,179][00189] Avg episode reward: [(0, '15.332')] [2024-09-29 18:19:31,236][09298] Updated weights for policy 0, policy_version 576 (0.0017) [2024-09-29 18:19:32,172][00189] Fps is (10 sec: 4097.8, 60 sec: 3754.7, 300 sec: 3693.4). Total num frames: 2359296. Throughput: 0: 914.0. Samples: 336336. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 18:19:32,179][00189] Avg episode reward: [(0, '16.185')] [2024-09-29 18:19:32,182][09285] Saving new best policy, reward=16.185! [2024-09-29 18:19:37,174][00189] Fps is (10 sec: 4095.1, 60 sec: 3754.5, 300 sec: 3707.2). Total num frames: 2379776. Throughput: 0: 965.2. Samples: 342914. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-29 18:19:37,177][00189] Avg episode reward: [(0, '16.579')] [2024-09-29 18:19:37,191][09285] Saving new best policy, reward=16.579! [2024-09-29 18:19:42,172][00189] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3693.3). Total num frames: 2392064. Throughput: 0: 911.4. Samples: 347062. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 18:19:42,175][00189] Avg episode reward: [(0, '16.252')] [2024-09-29 18:19:43,370][09298] Updated weights for policy 0, policy_version 586 (0.0022) [2024-09-29 18:19:47,172][00189] Fps is (10 sec: 3687.2, 60 sec: 3754.7, 300 sec: 3707.2). Total num frames: 2416640. Throughput: 0: 904.9. Samples: 349968. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-29 18:19:47,174][00189] Avg episode reward: [(0, '15.869')] [2024-09-29 18:19:52,055][09298] Updated weights for policy 0, policy_version 596 (0.0021) [2024-09-29 18:19:52,176][00189] Fps is (10 sec: 4913.4, 60 sec: 3891.0, 300 sec: 3721.1). Total num frames: 2441216. Throughput: 0: 966.4. Samples: 356912. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 18:19:52,178][00189] Avg episode reward: [(0, '15.826')] [2024-09-29 18:19:57,172][00189] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3721.2). Total num frames: 2453504. Throughput: 0: 946.0. Samples: 361852. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 18:19:57,184][00189] Avg episode reward: [(0, '16.416')] [2024-09-29 18:20:02,172][00189] Fps is (10 sec: 3278.0, 60 sec: 3754.7, 300 sec: 3721.1). Total num frames: 2473984. Throughput: 0: 923.0. Samples: 364040. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 18:20:02,175][00189] Avg episode reward: [(0, '16.529')] [2024-09-29 18:20:03,967][09298] Updated weights for policy 0, policy_version 606 (0.0013) [2024-09-29 18:20:07,172][00189] Fps is (10 sec: 4096.0, 60 sec: 3891.4, 300 sec: 3721.1). Total num frames: 2494464. Throughput: 0: 955.3. Samples: 370882. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-29 18:20:07,181][00189] Avg episode reward: [(0, '16.978')] [2024-09-29 18:20:07,192][09285] Saving new best policy, reward=16.978! [2024-09-29 18:20:12,172][00189] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3707.2). Total num frames: 2510848. Throughput: 0: 977.1. Samples: 376836. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-29 18:20:12,176][00189] Avg episode reward: [(0, '17.613')] [2024-09-29 18:20:12,265][09285] Saving new best policy, reward=17.613! [2024-09-29 18:20:15,439][09298] Updated weights for policy 0, policy_version 616 (0.0014) [2024-09-29 18:20:17,172][00189] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3721.1). Total num frames: 2527232. Throughput: 0: 942.1. Samples: 378730. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 18:20:17,175][00189] Avg episode reward: [(0, '18.977')] [2024-09-29 18:20:17,186][09285] Saving new best policy, reward=18.977! [2024-09-29 18:20:22,172][00189] Fps is (10 sec: 3686.4, 60 sec: 3823.2, 300 sec: 3707.2). Total num frames: 2547712. Throughput: 0: 917.8. Samples: 384214. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 18:20:22,174][00189] Avg episode reward: [(0, '19.215')] [2024-09-29 18:20:22,178][09285] Saving new best policy, reward=19.215! [2024-09-29 18:20:25,346][09298] Updated weights for policy 0, policy_version 626 (0.0029) [2024-09-29 18:20:27,173][00189] Fps is (10 sec: 4505.2, 60 sec: 3891.1, 300 sec: 3721.2). Total num frames: 2572288. Throughput: 0: 972.0. Samples: 390804. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 18:20:27,178][00189] Avg episode reward: [(0, '18.171')] [2024-09-29 18:20:32,174][00189] Fps is (10 sec: 3685.6, 60 sec: 3754.5, 300 sec: 3707.2). Total num frames: 2584576. Throughput: 0: 963.8. Samples: 393340. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-29 18:20:32,177][00189] Avg episode reward: [(0, '19.233')] [2024-09-29 18:20:32,183][09285] Saving new best policy, reward=19.233! [2024-09-29 18:20:37,076][09298] Updated weights for policy 0, policy_version 636 (0.0013) [2024-09-29 18:20:37,172][00189] Fps is (10 sec: 3277.1, 60 sec: 3754.8, 300 sec: 3721.1). Total num frames: 2605056. Throughput: 0: 909.4. Samples: 397834. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-29 18:20:37,176][00189] Avg episode reward: [(0, '19.325')] [2024-09-29 18:20:37,185][09285] Saving new best policy, reward=19.325! [2024-09-29 18:20:42,172][00189] Fps is (10 sec: 4096.9, 60 sec: 3891.2, 300 sec: 3707.2). Total num frames: 2625536. Throughput: 0: 948.4. Samples: 404528. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-29 18:20:42,177][00189] Avg episode reward: [(0, '18.305')] [2024-09-29 18:20:47,173][00189] Fps is (10 sec: 3685.9, 60 sec: 3754.6, 300 sec: 3707.2). Total num frames: 2641920. Throughput: 0: 971.1. Samples: 407740. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 18:20:47,175][00189] Avg episode reward: [(0, '18.742')] [2024-09-29 18:20:47,519][09298] Updated weights for policy 0, policy_version 646 (0.0026) [2024-09-29 18:20:52,172][00189] Fps is (10 sec: 3276.8, 60 sec: 3618.3, 300 sec: 3707.2). Total num frames: 2658304. Throughput: 0: 912.6. Samples: 411948. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-29 18:20:52,174][00189] Avg episode reward: [(0, '19.793')] [2024-09-29 18:20:52,183][09285] Saving new best policy, reward=19.793! [2024-09-29 18:20:57,172][00189] Fps is (10 sec: 3686.9, 60 sec: 3754.7, 300 sec: 3693.3). Total num frames: 2678784. Throughput: 0: 916.2. Samples: 418066. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 18:20:57,178][00189] Avg episode reward: [(0, '19.354')] [2024-09-29 18:20:57,189][09285] Saving /content/train_dir/samplefactory-vizdoom-v1/checkpoint_p0/checkpoint_000000654_2678784.pth... [2024-09-29 18:20:57,331][09285] Removing /content/train_dir/samplefactory-vizdoom-v1/checkpoint_p0/checkpoint_000000438_1794048.pth [2024-09-29 18:20:58,553][09298] Updated weights for policy 0, policy_version 656 (0.0013) [2024-09-29 18:21:02,174][00189] Fps is (10 sec: 4095.3, 60 sec: 3754.6, 300 sec: 3693.3). Total num frames: 2699264. Throughput: 0: 947.7. Samples: 421378. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 18:21:02,179][00189] Avg episode reward: [(0, '18.290')] [2024-09-29 18:21:07,172][00189] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3707.2). Total num frames: 2715648. Throughput: 0: 942.8. Samples: 426640. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 18:21:07,174][00189] Avg episode reward: [(0, '17.481')] [2024-09-29 18:21:10,285][09298] Updated weights for policy 0, policy_version 666 (0.0015) [2024-09-29 18:21:12,176][00189] Fps is (10 sec: 3685.7, 60 sec: 3754.4, 300 sec: 3707.2). Total num frames: 2736128. Throughput: 0: 910.0. Samples: 431758. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 18:21:12,178][00189] Avg episode reward: [(0, '17.532')] [2024-09-29 18:21:17,172][00189] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3693.3). Total num frames: 2756608. Throughput: 0: 925.6. Samples: 434988. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-29 18:21:17,174][00189] Avg episode reward: [(0, '17.721')] [2024-09-29 18:21:19,667][09298] Updated weights for policy 0, policy_version 676 (0.0017) [2024-09-29 18:21:22,172][00189] Fps is (10 sec: 3687.7, 60 sec: 3754.6, 300 sec: 3693.3). Total num frames: 2772992. Throughput: 0: 966.6. Samples: 441330. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 18:21:22,177][00189] Avg episode reward: [(0, '16.767')] [2024-09-29 18:21:27,172][00189] Fps is (10 sec: 3276.8, 60 sec: 3618.2, 300 sec: 3693.3). Total num frames: 2789376. Throughput: 0: 913.3. Samples: 445628. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-29 18:21:27,180][00189] Avg episode reward: [(0, '17.407')] [2024-09-29 18:21:31,284][09298] Updated weights for policy 0, policy_version 686 (0.0016) [2024-09-29 18:21:32,172][00189] Fps is (10 sec: 3686.5, 60 sec: 3754.8, 300 sec: 3693.3). Total num frames: 2809856. Throughput: 0: 915.0. Samples: 448916. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 18:21:32,179][00189] Avg episode reward: [(0, '19.204')] [2024-09-29 18:21:37,172][00189] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3707.2). Total num frames: 2834432. Throughput: 0: 969.6. Samples: 455580. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 18:21:37,178][00189] Avg episode reward: [(0, '19.231')] [2024-09-29 18:21:42,172][00189] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3707.2). Total num frames: 2846720. Throughput: 0: 936.5. Samples: 460210. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 18:21:42,183][00189] Avg episode reward: [(0, '20.099')] [2024-09-29 18:21:42,185][09285] Saving new best policy, reward=20.099! [2024-09-29 18:21:42,963][09298] Updated weights for policy 0, policy_version 696 (0.0013) [2024-09-29 18:21:47,172][00189] Fps is (10 sec: 2867.2, 60 sec: 3686.5, 300 sec: 3693.3). Total num frames: 2863104. Throughput: 0: 909.4. Samples: 462300. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 18:21:47,175][00189] Avg episode reward: [(0, '20.154')] [2024-09-29 18:21:47,209][09285] Saving new best policy, reward=20.154! [2024-09-29 18:21:52,172][00189] Fps is (10 sec: 4095.9, 60 sec: 3822.9, 300 sec: 3707.2). Total num frames: 2887680. Throughput: 0: 938.0. Samples: 468852. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 18:21:52,180][00189] Avg episode reward: [(0, '21.664')] [2024-09-29 18:21:52,183][09285] Saving new best policy, reward=21.664! [2024-09-29 18:21:52,793][09298] Updated weights for policy 0, policy_version 706 (0.0022) [2024-09-29 18:21:57,172][00189] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3707.2). Total num frames: 2904064. Throughput: 0: 949.1. Samples: 474466. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-29 18:21:57,175][00189] Avg episode reward: [(0, '21.183')] [2024-09-29 18:22:02,172][00189] Fps is (10 sec: 3276.9, 60 sec: 3686.5, 300 sec: 3707.2). Total num frames: 2920448. Throughput: 0: 923.0. Samples: 476524. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-29 18:22:02,178][00189] Avg episode reward: [(0, '21.014')] [2024-09-29 18:22:04,868][09298] Updated weights for policy 0, policy_version 716 (0.0026) [2024-09-29 18:22:07,172][00189] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3707.2). Total num frames: 2940928. Throughput: 0: 915.6. Samples: 482534. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-29 18:22:07,174][00189] Avg episode reward: [(0, '19.644')] [2024-09-29 18:22:12,172][00189] Fps is (10 sec: 4096.0, 60 sec: 3754.9, 300 sec: 3721.1). Total num frames: 2961408. Throughput: 0: 973.7. Samples: 489444. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-29 18:22:12,180][00189] Avg episode reward: [(0, '20.013')] [2024-09-29 18:22:15,261][09298] Updated weights for policy 0, policy_version 726 (0.0034) [2024-09-29 18:22:17,172][00189] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3735.0). Total num frames: 2977792. Throughput: 0: 948.3. Samples: 491588. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-09-29 18:22:17,174][00189] Avg episode reward: [(0, '19.886')] [2024-09-29 18:22:22,172][00189] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3735.0). Total num frames: 2998272. Throughput: 0: 910.4. Samples: 496548. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-29 18:22:22,176][00189] Avg episode reward: [(0, '20.973')] [2024-09-29 18:22:25,235][09298] Updated weights for policy 0, policy_version 736 (0.0014) [2024-09-29 18:22:27,172][00189] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3735.0). Total num frames: 3022848. Throughput: 0: 960.5. Samples: 503434. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 18:22:27,180][00189] Avg episode reward: [(0, '21.142')] [2024-09-29 18:22:32,172][00189] Fps is (10 sec: 4095.9, 60 sec: 3822.9, 300 sec: 3748.9). Total num frames: 3039232. Throughput: 0: 986.4. Samples: 506686. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 18:22:32,178][00189] Avg episode reward: [(0, '20.719')] [2024-09-29 18:22:37,026][09298] Updated weights for policy 0, policy_version 746 (0.0019) [2024-09-29 18:22:37,172][00189] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3748.9). Total num frames: 3055616. Throughput: 0: 935.7. Samples: 510960. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-29 18:22:37,177][00189] Avg episode reward: [(0, '21.573')] [2024-09-29 18:22:42,172][00189] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3748.9). Total num frames: 3076096. Throughput: 0: 958.2. Samples: 517584. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 18:22:42,180][00189] Avg episode reward: [(0, '21.529')] [2024-09-29 18:22:46,008][09298] Updated weights for policy 0, policy_version 756 (0.0021) [2024-09-29 18:22:47,172][00189] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3748.9). Total num frames: 3096576. Throughput: 0: 988.7. Samples: 521016. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 18:22:47,182][00189] Avg episode reward: [(0, '20.659')] [2024-09-29 18:22:52,172][00189] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3762.8). Total num frames: 3112960. Throughput: 0: 963.3. Samples: 525884. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-29 18:22:52,174][00189] Avg episode reward: [(0, '22.557')] [2024-09-29 18:22:52,181][09285] Saving new best policy, reward=22.557! [2024-09-29 18:22:57,172][00189] Fps is (10 sec: 3686.3, 60 sec: 3822.9, 300 sec: 3762.8). Total num frames: 3133440. Throughput: 0: 931.7. Samples: 531370. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 18:22:57,178][00189] Avg episode reward: [(0, '22.175')] [2024-09-29 18:22:57,191][09285] Saving /content/train_dir/samplefactory-vizdoom-v1/checkpoint_p0/checkpoint_000000765_3133440.pth... [2024-09-29 18:22:57,332][09285] Removing /content/train_dir/samplefactory-vizdoom-v1/checkpoint_p0/checkpoint_000000545_2232320.pth [2024-09-29 18:22:57,867][09298] Updated weights for policy 0, policy_version 766 (0.0018) [2024-09-29 18:23:02,172][00189] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3748.9). Total num frames: 3153920. Throughput: 0: 957.2. Samples: 534664. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-29 18:23:02,176][00189] Avg episode reward: [(0, '21.850')] [2024-09-29 18:23:07,176][00189] Fps is (10 sec: 3684.9, 60 sec: 3822.7, 300 sec: 3748.8). Total num frames: 3170304. Throughput: 0: 985.7. Samples: 540910. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-29 18:23:07,181][00189] Avg episode reward: [(0, '21.460')] [2024-09-29 18:23:08,691][09298] Updated weights for policy 0, policy_version 776 (0.0017) [2024-09-29 18:23:12,172][00189] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 3186688. Throughput: 0: 931.1. Samples: 545334. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 18:23:12,177][00189] Avg episode reward: [(0, '22.203')] [2024-09-29 18:23:17,172][00189] Fps is (10 sec: 4097.7, 60 sec: 3891.2, 300 sec: 3762.8). Total num frames: 3211264. Throughput: 0: 936.4. Samples: 548826. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-29 18:23:17,174][00189] Avg episode reward: [(0, '19.598')] [2024-09-29 18:23:18,484][09298] Updated weights for policy 0, policy_version 786 (0.0020) [2024-09-29 18:23:22,172][00189] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3762.8). Total num frames: 3231744. Throughput: 0: 994.1. Samples: 555696. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 18:23:22,180][00189] Avg episode reward: [(0, '20.711')] [2024-09-29 18:23:27,175][00189] Fps is (10 sec: 3685.5, 60 sec: 3754.5, 300 sec: 3776.6). Total num frames: 3248128. Throughput: 0: 948.6. Samples: 560274. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 18:23:27,180][00189] Avg episode reward: [(0, '20.761')] [2024-09-29 18:23:30,195][09298] Updated weights for policy 0, policy_version 796 (0.0019) [2024-09-29 18:23:32,172][00189] Fps is (10 sec: 3686.4, 60 sec: 3823.0, 300 sec: 3776.7). Total num frames: 3268608. Throughput: 0: 926.8. Samples: 562722. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 18:23:32,175][00189] Avg episode reward: [(0, '22.254')] [2024-09-29 18:23:37,172][00189] Fps is (10 sec: 4097.0, 60 sec: 3891.2, 300 sec: 3762.8). Total num frames: 3289088. Throughput: 0: 974.8. Samples: 569750. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 18:23:37,174][00189] Avg episode reward: [(0, '21.223')] [2024-09-29 18:23:39,455][09298] Updated weights for policy 0, policy_version 806 (0.0013) [2024-09-29 18:23:42,172][00189] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3776.7). Total num frames: 3305472. Throughput: 0: 978.4. Samples: 575396. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 18:23:42,179][00189] Avg episode reward: [(0, '22.058')] [2024-09-29 18:23:47,172][00189] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3776.7). Total num frames: 3321856. Throughput: 0: 950.9. Samples: 577456. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 18:23:47,175][00189] Avg episode reward: [(0, '22.724')] [2024-09-29 18:23:47,184][09285] Saving new best policy, reward=22.724! [2024-09-29 18:23:50,848][09298] Updated weights for policy 0, policy_version 816 (0.0014) [2024-09-29 18:23:52,172][00189] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3776.7). Total num frames: 3346432. Throughput: 0: 948.2. Samples: 583574. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 18:23:52,174][00189] Avg episode reward: [(0, '22.379')] [2024-09-29 18:23:57,174][00189] Fps is (10 sec: 4504.6, 60 sec: 3891.1, 300 sec: 3790.5). Total num frames: 3366912. Throughput: 0: 997.2. Samples: 590210. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 18:23:57,178][00189] Avg episode reward: [(0, '22.982')] [2024-09-29 18:23:57,189][09285] Saving new best policy, reward=22.982! [2024-09-29 18:24:02,172][00189] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3790.6). Total num frames: 3379200. Throughput: 0: 964.8. Samples: 592240. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 18:24:02,175][00189] Avg episode reward: [(0, '22.220')] [2024-09-29 18:24:02,541][09298] Updated weights for policy 0, policy_version 826 (0.0018) [2024-09-29 18:24:07,172][00189] Fps is (10 sec: 3687.2, 60 sec: 3891.5, 300 sec: 3790.5). Total num frames: 3403776. Throughput: 0: 932.4. Samples: 597656. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 18:24:07,183][00189] Avg episode reward: [(0, '21.927')] [2024-09-29 18:24:11,624][09298] Updated weights for policy 0, policy_version 836 (0.0021) [2024-09-29 18:24:12,172][00189] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3790.5). Total num frames: 3424256. Throughput: 0: 983.2. Samples: 604516. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 18:24:12,180][00189] Avg episode reward: [(0, '19.700')] [2024-09-29 18:24:17,172][00189] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3804.5). Total num frames: 3440640. Throughput: 0: 988.1. Samples: 607188. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-29 18:24:17,175][00189] Avg episode reward: [(0, '19.537')] [2024-09-29 18:24:22,172][00189] Fps is (10 sec: 3276.7, 60 sec: 3754.7, 300 sec: 3790.5). Total num frames: 3457024. Throughput: 0: 923.3. Samples: 611298. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-29 18:24:22,176][00189] Avg episode reward: [(0, '19.348')] [2024-09-29 18:24:23,449][09298] Updated weights for policy 0, policy_version 846 (0.0021) [2024-09-29 18:24:27,172][00189] Fps is (10 sec: 4096.1, 60 sec: 3891.4, 300 sec: 3804.4). Total num frames: 3481600. Throughput: 0: 948.2. Samples: 618064. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-29 18:24:27,176][00189] Avg episode reward: [(0, '20.940')] [2024-09-29 18:24:32,174][00189] Fps is (10 sec: 4095.1, 60 sec: 3822.8, 300 sec: 3790.5). Total num frames: 3497984. Throughput: 0: 978.0. Samples: 621468. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 18:24:32,179][00189] Avg episode reward: [(0, '20.920')] [2024-09-29 18:24:34,441][09298] Updated weights for policy 0, policy_version 856 (0.0012) [2024-09-29 18:24:37,172][00189] Fps is (10 sec: 2867.2, 60 sec: 3686.4, 300 sec: 3790.5). Total num frames: 3510272. Throughput: 0: 940.2. Samples: 625884. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 18:24:37,176][00189] Avg episode reward: [(0, '21.373')] [2024-09-29 18:24:42,173][00189] Fps is (10 sec: 3686.8, 60 sec: 3822.8, 300 sec: 3790.5). Total num frames: 3534848. Throughput: 0: 917.8. Samples: 631512. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-29 18:24:42,179][00189] Avg episode reward: [(0, '21.456')] [2024-09-29 18:24:44,858][09298] Updated weights for policy 0, policy_version 866 (0.0015) [2024-09-29 18:24:47,172][00189] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3776.7). Total num frames: 3555328. Throughput: 0: 944.0. Samples: 634722. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-29 18:24:47,174][00189] Avg episode reward: [(0, '20.223')] [2024-09-29 18:24:52,172][00189] Fps is (10 sec: 3277.2, 60 sec: 3686.4, 300 sec: 3776.7). Total num frames: 3567616. Throughput: 0: 938.9. Samples: 639908. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 18:24:52,177][00189] Avg episode reward: [(0, '21.463')] [2024-09-29 18:24:57,172][00189] Fps is (10 sec: 2867.2, 60 sec: 3618.3, 300 sec: 3762.8). Total num frames: 3584000. Throughput: 0: 892.3. Samples: 644668. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-29 18:24:57,176][00189] Avg episode reward: [(0, '20.683')] [2024-09-29 18:24:57,187][09285] Saving /content/train_dir/samplefactory-vizdoom-v1/checkpoint_p0/checkpoint_000000875_3584000.pth... [2024-09-29 18:24:57,310][09285] Removing /content/train_dir/samplefactory-vizdoom-v1/checkpoint_p0/checkpoint_000000654_2678784.pth [2024-09-29 18:24:57,415][09298] Updated weights for policy 0, policy_version 876 (0.0016) [2024-09-29 18:25:02,172][00189] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3776.7). Total num frames: 3608576. Throughput: 0: 902.1. Samples: 647782. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-29 18:25:02,174][00189] Avg episode reward: [(0, '20.688')] [2024-09-29 18:25:07,172][00189] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3776.7). Total num frames: 3624960. Throughput: 0: 954.0. Samples: 654230. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 18:25:07,175][00189] Avg episode reward: [(0, '21.041')] [2024-09-29 18:25:07,933][09298] Updated weights for policy 0, policy_version 886 (0.0012) [2024-09-29 18:25:12,172][00189] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3762.8). Total num frames: 3637248. Throughput: 0: 894.0. Samples: 658294. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 18:25:12,176][00189] Avg episode reward: [(0, '21.671')] [2024-09-29 18:25:17,172][00189] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3776.7). Total num frames: 3661824. Throughput: 0: 880.9. Samples: 661108. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 18:25:17,181][00189] Avg episode reward: [(0, '22.201')] [2024-09-29 18:25:19,171][09298] Updated weights for policy 0, policy_version 896 (0.0019) [2024-09-29 18:25:22,172][00189] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3762.8). Total num frames: 3682304. Throughput: 0: 923.6. Samples: 667446. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 18:25:22,175][00189] Avg episode reward: [(0, '22.400')] [2024-09-29 18:25:27,172][00189] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3762.8). Total num frames: 3694592. Throughput: 0: 901.8. Samples: 672092. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-29 18:25:27,175][00189] Avg episode reward: [(0, '21.917')] [2024-09-29 18:25:31,547][09298] Updated weights for policy 0, policy_version 906 (0.0013) [2024-09-29 18:25:32,172][00189] Fps is (10 sec: 2867.2, 60 sec: 3550.0, 300 sec: 3748.9). Total num frames: 3710976. Throughput: 0: 873.8. Samples: 674042. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 18:25:32,175][00189] Avg episode reward: [(0, '21.734')] [2024-09-29 18:25:37,172][00189] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3748.9). Total num frames: 3731456. Throughput: 0: 898.6. Samples: 680346. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 18:25:37,175][00189] Avg episode reward: [(0, '19.304')] [2024-09-29 18:25:42,172][00189] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3748.9). Total num frames: 3747840. Throughput: 0: 917.5. Samples: 685956. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 18:25:42,176][00189] Avg episode reward: [(0, '19.102')] [2024-09-29 18:25:42,433][09298] Updated weights for policy 0, policy_version 916 (0.0026) [2024-09-29 18:25:47,172][00189] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3748.9). Total num frames: 3764224. Throughput: 0: 891.8. Samples: 687914. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 18:25:47,174][00189] Avg episode reward: [(0, '18.917')] [2024-09-29 18:25:52,172][00189] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3748.9). Total num frames: 3784704. Throughput: 0: 869.9. Samples: 693374. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 18:25:52,177][00189] Avg episode reward: [(0, '19.407')] [2024-09-29 18:25:53,368][09298] Updated weights for policy 0, policy_version 926 (0.0019) [2024-09-29 18:25:57,172][00189] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3748.9). Total num frames: 3805184. Throughput: 0: 928.0. Samples: 700056. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-29 18:25:57,179][00189] Avg episode reward: [(0, '18.734')] [2024-09-29 18:26:02,172][00189] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3748.9). Total num frames: 3821568. Throughput: 0: 915.0. Samples: 702282. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 18:26:02,177][00189] Avg episode reward: [(0, '21.530')] [2024-09-29 18:26:05,545][09298] Updated weights for policy 0, policy_version 936 (0.0019) [2024-09-29 18:26:07,172][00189] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3735.0). Total num frames: 3837952. Throughput: 0: 878.5. Samples: 706978. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 18:26:07,174][00189] Avg episode reward: [(0, '23.183')] [2024-09-29 18:26:07,181][09285] Saving new best policy, reward=23.183! [2024-09-29 18:26:12,172][00189] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 3862528. Throughput: 0: 922.1. Samples: 713588. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 18:26:12,174][00189] Avg episode reward: [(0, '21.873')] [2024-09-29 18:26:15,099][09298] Updated weights for policy 0, policy_version 946 (0.0021) [2024-09-29 18:26:17,172][00189] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3748.9). Total num frames: 3878912. Throughput: 0: 951.3. Samples: 716852. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 18:26:17,177][00189] Avg episode reward: [(0, '21.572')] [2024-09-29 18:26:22,172][00189] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3735.0). Total num frames: 3891200. Throughput: 0: 900.4. Samples: 720866. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-29 18:26:22,179][00189] Avg episode reward: [(0, '21.786')] [2024-09-29 18:26:26,893][09298] Updated weights for policy 0, policy_version 956 (0.0022) [2024-09-29 18:26:27,172][00189] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3748.9). Total num frames: 3915776. Throughput: 0: 911.4. Samples: 726968. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 18:26:27,177][00189] Avg episode reward: [(0, '20.644')] [2024-09-29 18:26:32,172][00189] Fps is (10 sec: 4505.4, 60 sec: 3754.6, 300 sec: 3735.0). Total num frames: 3936256. Throughput: 0: 941.8. Samples: 730296. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 18:26:32,178][00189] Avg episode reward: [(0, '20.165')] [2024-09-29 18:26:37,172][00189] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3735.0). Total num frames: 3948544. Throughput: 0: 930.4. Samples: 735240. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-29 18:26:37,178][00189] Avg episode reward: [(0, '20.602')] [2024-09-29 18:26:39,130][09298] Updated weights for policy 0, policy_version 966 (0.0013) [2024-09-29 18:26:42,172][00189] Fps is (10 sec: 3276.9, 60 sec: 3686.4, 300 sec: 3748.9). Total num frames: 3969024. Throughput: 0: 890.1. Samples: 740112. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 18:26:42,179][00189] Avg episode reward: [(0, '20.886')] [2024-09-29 18:26:47,172][00189] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3735.0). Total num frames: 3989504. Throughput: 0: 911.6. Samples: 743306. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 18:26:47,176][00189] Avg episode reward: [(0, '21.062')] [2024-09-29 18:26:48,762][09298] Updated weights for policy 0, policy_version 976 (0.0018) [2024-09-29 18:26:51,807][09285] Saving /content/train_dir/samplefactory-vizdoom-v1/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-09-29 18:26:51,818][00189] Component Batcher_0 stopped! [2024-09-29 18:26:51,808][09285] Stopping Batcher_0... [2024-09-29 18:26:51,826][09285] Loop batcher_evt_loop terminating... [2024-09-29 18:26:51,918][09298] Weights refcount: 2 0 [2024-09-29 18:26:51,932][09298] Stopping InferenceWorker_p0-w0... [2024-09-29 18:26:51,934][09298] Loop inference_proc0-0_evt_loop terminating... [2024-09-29 18:26:51,932][00189] Component InferenceWorker_p0-w0 stopped! [2024-09-29 18:26:51,974][09285] Removing /content/train_dir/samplefactory-vizdoom-v1/checkpoint_p0/checkpoint_000000765_3133440.pth [2024-09-29 18:26:51,992][09285] Saving /content/train_dir/samplefactory-vizdoom-v1/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-09-29 18:26:52,228][00189] Component LearnerWorker_p0 stopped! [2024-09-29 18:26:52,234][09285] Stopping LearnerWorker_p0... [2024-09-29 18:26:52,234][09285] Loop learner_proc0_evt_loop terminating... [2024-09-29 18:26:52,586][00189] Component RolloutWorker_w5 stopped! [2024-09-29 18:26:52,590][09304] Stopping RolloutWorker_w5... [2024-09-29 18:26:52,600][09304] Loop rollout_proc5_evt_loop terminating... [2024-09-29 18:26:52,632][00189] Component RolloutWorker_w7 stopped! [2024-09-29 18:26:52,637][09306] Stopping RolloutWorker_w7... [2024-09-29 18:26:52,638][09306] Loop rollout_proc7_evt_loop terminating... [2024-09-29 18:26:52,644][00189] Component RolloutWorker_w3 stopped! [2024-09-29 18:26:52,649][09303] Stopping RolloutWorker_w3... [2024-09-29 18:26:52,649][09303] Loop rollout_proc3_evt_loop terminating... [2024-09-29 18:26:52,670][09305] Stopping RolloutWorker_w6... [2024-09-29 18:26:52,670][00189] Component RolloutWorker_w6 stopped! [2024-09-29 18:26:52,692][09305] Loop rollout_proc6_evt_loop terminating... [2024-09-29 18:26:52,736][09301] Stopping RolloutWorker_w2... [2024-09-29 18:26:52,736][00189] Component RolloutWorker_w2 stopped! [2024-09-29 18:26:52,754][09300] Stopping RolloutWorker_w1... [2024-09-29 18:26:52,757][09300] Loop rollout_proc1_evt_loop terminating... [2024-09-29 18:26:52,759][09302] Stopping RolloutWorker_w4... [2024-09-29 18:26:52,758][00189] Component RolloutWorker_w4 stopped! [2024-09-29 18:26:52,737][09301] Loop rollout_proc2_evt_loop terminating... [2024-09-29 18:26:52,759][09302] Loop rollout_proc4_evt_loop terminating... [2024-09-29 18:26:52,763][00189] Component RolloutWorker_w1 stopped! [2024-09-29 18:26:52,855][00189] Component RolloutWorker_w0 stopped! [2024-09-29 18:26:52,862][00189] Waiting for process learner_proc0 to stop... [2024-09-29 18:26:52,855][09299] Stopping RolloutWorker_w0... [2024-09-29 18:26:52,876][09299] Loop rollout_proc0_evt_loop terminating... [2024-09-29 18:26:54,362][00189] Waiting for process inference_proc0-0 to join... [2024-09-29 18:26:55,064][00189] Waiting for process rollout_proc0 to join... [2024-09-29 18:26:56,603][00189] Waiting for process rollout_proc1 to join... [2024-09-29 18:26:56,613][00189] Waiting for process rollout_proc2 to join... [2024-09-29 18:26:56,622][00189] Waiting for process rollout_proc3 to join... [2024-09-29 18:26:56,626][00189] Waiting for process rollout_proc4 to join... [2024-09-29 18:26:56,631][00189] Waiting for process rollout_proc5 to join... [2024-09-29 18:26:56,633][00189] Waiting for process rollout_proc6 to join... [2024-09-29 18:26:56,639][00189] Waiting for process rollout_proc7 to join... [2024-09-29 18:26:56,643][00189] Batcher 0 profile tree view: batching: 19.6931, releasing_batches: 0.0199 [2024-09-29 18:26:56,644][00189] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0000 wait_policy_total: 355.5071 update_model: 5.8446 weight_update: 0.0026 one_step: 0.0138 handle_policy_step: 422.5302 deserialize: 11.6689, stack: 2.2359, obs_to_device_normalize: 87.9662, forward: 210.4179, send_messages: 21.9223 prepare_outputs: 66.1545 to_cpu: 41.0552 [2024-09-29 18:26:56,646][00189] Learner 0 profile tree view: misc: 0.0039, prepare_batch: 12.4317 train: 58.3512 epoch_init: 0.0042, minibatch_init: 0.0063, losses_postprocess: 0.3944, kl_divergence: 0.4331, after_optimizer: 2.8642 calculate_losses: 18.4718 losses_init: 0.0027, forward_head: 1.4964, bptt_initial: 11.6154, tail: 0.8862, advantages_returns: 0.2929, losses: 2.1791 bptt: 1.7036 bptt_forward_core: 1.6219 update: 35.6881 clip: 1.0770 [2024-09-29 18:26:56,648][00189] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.2581, enqueue_policy_requests: 86.7064, env_step: 631.4459, overhead: 10.8465, complete_rollouts: 5.9156 save_policy_outputs: 19.4015 split_output_tensors: 6.9140 [2024-09-29 18:26:56,649][00189] RolloutWorker_w7 profile tree view: wait_for_trajectories: 0.1940, enqueue_policy_requests: 90.6917, env_step: 630.7164, overhead: 10.6671, complete_rollouts: 5.5359 save_policy_outputs: 18.7177 split_output_tensors: 6.6255 [2024-09-29 18:26:56,652][00189] Loop Runner_EvtLoop terminating... [2024-09-29 18:26:56,653][00189] Runner profile tree view: main_loop: 837.6879 [2024-09-29 18:26:56,655][00189] Collected {0: 4005888}, FPS: 3579.2 [2024-09-29 18:27:58,834][00189] Environment doom_basic already registered, overwriting... [2024-09-29 18:27:58,837][00189] Environment doom_two_colors_easy already registered, overwriting... [2024-09-29 18:27:58,839][00189] Environment doom_two_colors_hard already registered, overwriting... [2024-09-29 18:27:58,840][00189] Environment doom_dm already registered, overwriting... [2024-09-29 18:27:58,841][00189] Environment doom_dwango5 already registered, overwriting... [2024-09-29 18:27:58,843][00189] Environment doom_my_way_home_flat_actions already registered, overwriting... [2024-09-29 18:27:58,844][00189] Environment doom_defend_the_center_flat_actions already registered, overwriting... [2024-09-29 18:27:58,845][00189] Environment doom_my_way_home already registered, overwriting... [2024-09-29 18:27:58,847][00189] Environment doom_deadly_corridor already registered, overwriting... [2024-09-29 18:27:58,848][00189] Environment doom_defend_the_center already registered, overwriting... [2024-09-29 18:27:58,850][00189] Environment doom_defend_the_line already registered, overwriting... [2024-09-29 18:27:58,852][00189] Environment doom_health_gathering already registered, overwriting... [2024-09-29 18:27:58,853][00189] Environment doom_health_gathering_supreme already registered, overwriting... [2024-09-29 18:27:58,855][00189] Environment doom_battle already registered, overwriting... [2024-09-29 18:27:58,856][00189] Environment doom_battle2 already registered, overwriting... [2024-09-29 18:27:58,858][00189] Environment doom_duel_bots already registered, overwriting... [2024-09-29 18:27:58,860][00189] Environment doom_deathmatch_bots already registered, overwriting... [2024-09-29 18:27:58,862][00189] Environment doom_duel already registered, overwriting... [2024-09-29 18:27:58,863][00189] Environment doom_deathmatch_full already registered, overwriting... [2024-09-29 18:27:58,865][00189] Environment doom_benchmark already registered, overwriting... [2024-09-29 18:27:58,867][00189] register_encoder_factory: [2024-09-29 18:27:58,889][00189] Loading existing experiment configuration from /content/train_dir/samplefactory-vizdoom-v1/config.json [2024-09-29 18:27:58,891][00189] Overriding arg 'train_for_env_steps' with value 6000000 passed from command line [2024-09-29 18:27:58,898][00189] Experiment dir /content/train_dir/samplefactory-vizdoom-v1 already exists! [2024-09-29 18:27:58,899][00189] Resuming existing experiment from /content/train_dir/samplefactory-vizdoom-v1... [2024-09-29 18:27:58,902][00189] Weights and Biases integration disabled [2024-09-29 18:27:58,905][00189] Environment var CUDA_VISIBLE_DEVICES is 0 [2024-09-29 18:28:00,450][00189] Starting experiment with the following configuration: help=False algo=APPO env=doom_health_gathering_supreme experiment=samplefactory-vizdoom-v1 train_dir=/content/train_dir restart_behavior=resume device=gpu seed=None num_policies=1 async_rl=True serial_mode=False batched_sampling=False num_batches_to_accumulate=2 worker_num_splits=2 policy_workers_per_policy=1 max_policy_lag=1000 num_workers=8 num_envs_per_worker=4 batch_size=1024 num_batches_per_epoch=1 num_epochs=1 rollout=32 recurrence=32 shuffle_minibatches=False gamma=0.99 reward_scale=1.0 reward_clip=1000.0 value_bootstrap=False normalize_returns=True exploration_loss_coeff=0.001 value_loss_coeff=0.5 kl_loss_coeff=0.0 exploration_loss=symmetric_kl gae_lambda=0.95 ppo_clip_ratio=0.1 ppo_clip_value=0.2 with_vtrace=False vtrace_rho=1.0 vtrace_c=1.0 optimizer=adam adam_eps=1e-06 adam_beta1=0.9 adam_beta2=0.999 max_grad_norm=4.0 learning_rate=0.0001 lr_schedule=constant lr_schedule_kl_threshold=0.008 lr_adaptive_min=1e-06 lr_adaptive_max=0.01 obs_subtract_mean=0.0 obs_scale=255.0 normalize_input=True normalize_input_keys=None decorrelate_experience_max_seconds=0 decorrelate_envs_on_one_worker=True actor_worker_gpus=[] set_workers_cpu_affinity=True force_envs_single_thread=False default_niceness=0 log_to_file=True experiment_summaries_interval=10 flush_summaries_interval=30 stats_avg=100 summaries_use_frameskip=True heartbeat_interval=20 heartbeat_reporting_interval=600 train_for_env_steps=6000000 train_for_seconds=10000000000 save_every_sec=120 keep_checkpoints=2 load_checkpoint_kind=latest save_milestones_sec=-1 save_best_every_sec=5 save_best_metric=reward save_best_after=100000 benchmark=False encoder_mlp_layers=[512, 512] encoder_conv_architecture=convnet_simple encoder_conv_mlp_layers=[512] use_rnn=True rnn_size=512 rnn_type=gru rnn_num_layers=1 decoder_mlp_layers=[] nonlinearity=elu policy_initialization=orthogonal policy_init_gain=1.0 actor_critic_share_weights=True adaptive_stddev=True continuous_tanh_scale=0.0 initial_stddev=1.0 use_env_info_cache=False env_gpu_actions=False env_gpu_observations=True env_frameskip=4 env_framestack=1 pixel_format=CHW use_record_episode_statistics=False with_wandb=False wandb_user=None wandb_project=sample_factory wandb_group=None wandb_job_type=SF wandb_tags=[] with_pbt=False pbt_mix_policies_in_one_env=True pbt_period_env_steps=5000000 pbt_start_mutation=20000000 pbt_replace_fraction=0.3 pbt_mutation_rate=0.15 pbt_replace_reward_gap=0.1 pbt_replace_reward_gap_absolute=1e-06 pbt_optimize_gamma=False pbt_target_objective=true_objective pbt_perturb_min=1.1 pbt_perturb_max=1.5 num_agents=-1 num_humans=0 num_bots=-1 start_bot_difficulty=None timelimit=None res_w=128 res_h=72 wide_aspect_ratio=False eval_env_frameskip=1 fps=35 command_line=--env=doom_health_gathering_supreme --num_workers=8 --num_envs_per_worker=4 --train_for_env_steps=10000 --experiment=samplefactory-vizdoom-v1 --restart_behavior=resume cli_args={'env': 'doom_health_gathering_supreme', 'experiment': 'samplefactory-vizdoom-v1', 'restart_behavior': 'resume', 'num_workers': 8, 'num_envs_per_worker': 4, 'train_for_env_steps': 10000} git_hash=unknown git_repo_name=not a git repository [2024-09-29 18:28:00,453][00189] Saving configuration to /content/train_dir/samplefactory-vizdoom-v1/config.json... [2024-09-29 18:28:00,455][00189] Rollout worker 0 uses device cpu [2024-09-29 18:28:00,457][00189] Rollout worker 1 uses device cpu [2024-09-29 18:28:00,458][00189] Rollout worker 2 uses device cpu [2024-09-29 18:28:00,459][00189] Rollout worker 3 uses device cpu [2024-09-29 18:28:00,461][00189] Rollout worker 4 uses device cpu [2024-09-29 18:28:00,462][00189] Rollout worker 5 uses device cpu [2024-09-29 18:28:00,463][00189] Rollout worker 6 uses device cpu [2024-09-29 18:28:00,464][00189] Rollout worker 7 uses device cpu [2024-09-29 18:28:00,619][00189] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-09-29 18:28:00,621][00189] InferenceWorker_p0-w0: min num requests: 2 [2024-09-29 18:28:00,656][00189] Starting all processes... [2024-09-29 18:28:00,658][00189] Starting process learner_proc0 [2024-09-29 18:28:00,706][00189] Starting all processes... [2024-09-29 18:28:00,713][00189] Starting process inference_proc0-0 [2024-09-29 18:28:00,713][00189] Starting process rollout_proc0 [2024-09-29 18:28:00,715][00189] Starting process rollout_proc1 [2024-09-29 18:28:00,716][00189] Starting process rollout_proc2 [2024-09-29 18:28:00,716][00189] Starting process rollout_proc3 [2024-09-29 18:28:00,716][00189] Starting process rollout_proc4 [2024-09-29 18:28:00,716][00189] Starting process rollout_proc5 [2024-09-29 18:28:00,716][00189] Starting process rollout_proc6 [2024-09-29 18:28:00,716][00189] Starting process rollout_proc7 [2024-09-29 18:28:10,841][13231] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-09-29 18:28:10,857][13231] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2024-09-29 18:28:10,928][13231] Num visible devices: 1 [2024-09-29 18:28:10,969][13231] Starting seed is not provided [2024-09-29 18:28:10,970][13231] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-09-29 18:28:10,971][13231] Initializing actor-critic model on device cuda:0 [2024-09-29 18:28:10,972][13231] RunningMeanStd input shape: (3, 72, 128) [2024-09-29 18:28:10,973][13231] RunningMeanStd input shape: (1,) [2024-09-29 18:28:11,089][13231] ConvEncoder: input_channels=3 [2024-09-29 18:28:11,954][13247] Worker 2 uses CPU cores [0] [2024-09-29 18:28:12,018][13244] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-09-29 18:28:12,020][13244] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2024-09-29 18:28:12,107][13244] Num visible devices: 1 [2024-09-29 18:28:12,126][13231] Conv encoder output size: 512 [2024-09-29 18:28:12,127][13231] Policy head output size: 512 [2024-09-29 18:28:12,202][13250] Worker 5 uses CPU cores [1] [2024-09-29 18:28:12,221][13231] Created Actor Critic model with architecture: [2024-09-29 18:28:12,223][13231] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2024-09-29 18:28:12,227][13251] Worker 7 uses CPU cores [1] [2024-09-29 18:28:12,292][13246] Worker 1 uses CPU cores [1] [2024-09-29 18:28:12,316][13245] Worker 0 uses CPU cores [0] [2024-09-29 18:28:12,369][13249] Worker 4 uses CPU cores [0] [2024-09-29 18:28:12,382][13252] Worker 6 uses CPU cores [0] [2024-09-29 18:28:12,388][13248] Worker 3 uses CPU cores [1] [2024-09-29 18:28:13,860][13231] Using optimizer [2024-09-29 18:28:13,860][13231] Loading state from checkpoint /content/train_dir/samplefactory-vizdoom-v1/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-09-29 18:28:13,891][13231] Loading model from checkpoint [2024-09-29 18:28:13,896][13231] Loaded experiment state at self.train_step=978, self.env_steps=4005888 [2024-09-29 18:28:13,896][13231] Initialized policy 0 weights for model version 978 [2024-09-29 18:28:13,899][13231] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-09-29 18:28:13,905][00189] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 4005888. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-09-29 18:28:13,906][13231] LearnerWorker_p0 finished initialization! [2024-09-29 18:28:14,000][13244] RunningMeanStd input shape: (3, 72, 128) [2024-09-29 18:28:14,002][13244] RunningMeanStd input shape: (1,) [2024-09-29 18:28:14,018][13244] ConvEncoder: input_channels=3 [2024-09-29 18:28:14,122][13244] Conv encoder output size: 512 [2024-09-29 18:28:14,122][13244] Policy head output size: 512 [2024-09-29 18:28:15,553][00189] Inference worker 0-0 is ready! [2024-09-29 18:28:15,555][00189] All inference workers are ready! Signal rollout workers to start! [2024-09-29 18:28:15,625][13249] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-29 18:28:15,628][13251] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-29 18:28:15,631][13250] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-29 18:28:15,633][13248] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-29 18:28:15,635][13246] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-29 18:28:15,631][13252] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-29 18:28:15,628][13245] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-29 18:28:15,633][13247] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-29 18:28:16,105][13247] Decorrelating experience for 0 frames... [2024-09-29 18:28:16,902][13248] Decorrelating experience for 0 frames... [2024-09-29 18:28:16,900][13250] Decorrelating experience for 0 frames... [2024-09-29 18:28:16,902][13251] Decorrelating experience for 0 frames... [2024-09-29 18:28:17,123][13247] Decorrelating experience for 32 frames... [2024-09-29 18:28:17,783][13245] Decorrelating experience for 0 frames... [2024-09-29 18:28:18,222][13250] Decorrelating experience for 32 frames... [2024-09-29 18:28:18,226][13248] Decorrelating experience for 32 frames... [2024-09-29 18:28:18,231][13246] Decorrelating experience for 0 frames... [2024-09-29 18:28:18,909][00189] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 4005888. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-09-29 18:28:19,308][13252] Decorrelating experience for 0 frames... [2024-09-29 18:28:19,316][13245] Decorrelating experience for 32 frames... [2024-09-29 18:28:19,774][13251] Decorrelating experience for 32 frames... [2024-09-29 18:28:19,789][13246] Decorrelating experience for 32 frames... [2024-09-29 18:28:20,056][13248] Decorrelating experience for 64 frames... [2024-09-29 18:28:20,611][00189] Heartbeat connected on Batcher_0 [2024-09-29 18:28:20,616][00189] Heartbeat connected on LearnerWorker_p0 [2024-09-29 18:28:20,657][00189] Heartbeat connected on InferenceWorker_p0-w0 [2024-09-29 18:28:20,732][13252] Decorrelating experience for 32 frames... [2024-09-29 18:28:21,469][13250] Decorrelating experience for 64 frames... [2024-09-29 18:28:21,641][13245] Decorrelating experience for 64 frames... [2024-09-29 18:28:21,642][13249] Decorrelating experience for 0 frames... [2024-09-29 18:28:22,456][13251] Decorrelating experience for 64 frames... [2024-09-29 18:28:22,600][13248] Decorrelating experience for 96 frames... [2024-09-29 18:28:23,010][00189] Heartbeat connected on RolloutWorker_w3 [2024-09-29 18:28:23,098][13247] Decorrelating experience for 64 frames... [2024-09-29 18:28:23,290][13249] Decorrelating experience for 32 frames... [2024-09-29 18:28:23,491][13246] Decorrelating experience for 64 frames... [2024-09-29 18:28:23,906][00189] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 4005888. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-09-29 18:28:24,478][13252] Decorrelating experience for 64 frames... [2024-09-29 18:28:24,538][13250] Decorrelating experience for 96 frames... [2024-09-29 18:28:24,612][13251] Decorrelating experience for 96 frames... [2024-09-29 18:28:24,846][00189] Heartbeat connected on RolloutWorker_w5 [2024-09-29 18:28:24,946][13245] Decorrelating experience for 96 frames... [2024-09-29 18:28:24,954][00189] Heartbeat connected on RolloutWorker_w7 [2024-09-29 18:28:25,034][13247] Decorrelating experience for 96 frames... [2024-09-29 18:28:25,137][00189] Heartbeat connected on RolloutWorker_w0 [2024-09-29 18:28:25,237][00189] Heartbeat connected on RolloutWorker_w2 [2024-09-29 18:28:25,745][13246] Decorrelating experience for 96 frames... [2024-09-29 18:28:26,130][00189] Heartbeat connected on RolloutWorker_w1 [2024-09-29 18:28:27,091][13249] Decorrelating experience for 64 frames... [2024-09-29 18:28:27,959][13231] Signal inference workers to stop experience collection... [2024-09-29 18:28:27,976][13244] InferenceWorker_p0-w0: stopping experience collection [2024-09-29 18:28:28,281][13252] Decorrelating experience for 96 frames... [2024-09-29 18:28:28,367][00189] Heartbeat connected on RolloutWorker_w6 [2024-09-29 18:28:28,644][13249] Decorrelating experience for 96 frames... [2024-09-29 18:28:28,705][00189] Heartbeat connected on RolloutWorker_w4 [2024-09-29 18:28:28,905][00189] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 4005888. Throughput: 0: 143.9. Samples: 2158. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-09-29 18:28:28,907][00189] Avg episode reward: [(0, '4.341')] [2024-09-29 18:28:28,938][13231] Signal inference workers to resume experience collection... [2024-09-29 18:28:28,939][13244] InferenceWorker_p0-w0: resuming experience collection [2024-09-29 18:28:33,909][00189] Fps is (10 sec: 2047.2, 60 sec: 1023.8, 300 sec: 1023.8). Total num frames: 4026368. Throughput: 0: 325.1. Samples: 6504. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-09-29 18:28:33,911][00189] Avg episode reward: [(0, '8.938')] [2024-09-29 18:28:38,905][00189] Fps is (10 sec: 3686.4, 60 sec: 1474.6, 300 sec: 1474.6). Total num frames: 4042752. Throughput: 0: 344.7. Samples: 8618. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-29 18:28:38,907][00189] Avg episode reward: [(0, '11.106')] [2024-09-29 18:28:39,479][13244] Updated weights for policy 0, policy_version 988 (0.0354) [2024-09-29 18:28:43,905][00189] Fps is (10 sec: 3687.8, 60 sec: 1911.5, 300 sec: 1911.5). Total num frames: 4063232. Throughput: 0: 465.1. Samples: 13952. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-29 18:28:43,914][00189] Avg episode reward: [(0, '15.921')] [2024-09-29 18:28:48,786][13244] Updated weights for policy 0, policy_version 998 (0.0017) [2024-09-29 18:28:48,906][00189] Fps is (10 sec: 4505.3, 60 sec: 2340.5, 300 sec: 2340.5). Total num frames: 4087808. Throughput: 0: 597.1. Samples: 20898. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-29 18:28:48,910][00189] Avg episode reward: [(0, '17.340')] [2024-09-29 18:28:53,906][00189] Fps is (10 sec: 4095.7, 60 sec: 2457.6, 300 sec: 2457.6). Total num frames: 4104192. Throughput: 0: 595.4. Samples: 23818. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 18:28:53,910][00189] Avg episode reward: [(0, '19.349')] [2024-09-29 18:28:58,906][00189] Fps is (10 sec: 3277.0, 60 sec: 2548.6, 300 sec: 2548.6). Total num frames: 4120576. Throughput: 0: 622.4. Samples: 28006. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 18:28:58,908][00189] Avg episode reward: [(0, '20.545')] [2024-09-29 18:29:00,657][13244] Updated weights for policy 0, policy_version 1008 (0.0022) [2024-09-29 18:29:03,905][00189] Fps is (10 sec: 3686.7, 60 sec: 2703.4, 300 sec: 2703.4). Total num frames: 4141056. Throughput: 0: 770.4. Samples: 34664. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-29 18:29:03,914][00189] Avg episode reward: [(0, '22.721')] [2024-09-29 18:29:08,905][00189] Fps is (10 sec: 4096.0, 60 sec: 2830.0, 300 sec: 2830.0). Total num frames: 4161536. Throughput: 0: 848.9. Samples: 38200. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-29 18:29:08,911][00189] Avg episode reward: [(0, '23.361')] [2024-09-29 18:29:08,986][13231] Saving new best policy, reward=23.361! [2024-09-29 18:29:10,504][13244] Updated weights for policy 0, policy_version 1018 (0.0014) [2024-09-29 18:29:13,907][00189] Fps is (10 sec: 3685.8, 60 sec: 2867.1, 300 sec: 2867.1). Total num frames: 4177920. Throughput: 0: 908.4. Samples: 43036. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 18:29:13,913][00189] Avg episode reward: [(0, '23.305')] [2024-09-29 18:29:18,906][00189] Fps is (10 sec: 3686.4, 60 sec: 3208.7, 300 sec: 2961.7). Total num frames: 4198400. Throughput: 0: 938.0. Samples: 48712. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-29 18:29:18,908][00189] Avg episode reward: [(0, '21.361')] [2024-09-29 18:29:21,233][13244] Updated weights for policy 0, policy_version 1028 (0.0013) [2024-09-29 18:29:23,905][00189] Fps is (10 sec: 4096.7, 60 sec: 3549.9, 300 sec: 3042.7). Total num frames: 4218880. Throughput: 0: 967.5. Samples: 52154. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 18:29:23,910][00189] Avg episode reward: [(0, '20.075')] [2024-09-29 18:29:28,907][00189] Fps is (10 sec: 4095.3, 60 sec: 3891.1, 300 sec: 3112.9). Total num frames: 4239360. Throughput: 0: 979.6. Samples: 58038. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 18:29:28,912][00189] Avg episode reward: [(0, '20.068')] [2024-09-29 18:29:32,946][13244] Updated weights for policy 0, policy_version 1038 (0.0014) [2024-09-29 18:29:33,905][00189] Fps is (10 sec: 3686.4, 60 sec: 3823.2, 300 sec: 3123.2). Total num frames: 4255744. Throughput: 0: 928.4. Samples: 62676. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-29 18:29:33,910][00189] Avg episode reward: [(0, '19.734')] [2024-09-29 18:29:38,905][00189] Fps is (10 sec: 3687.1, 60 sec: 3891.2, 300 sec: 3180.4). Total num frames: 4276224. Throughput: 0: 940.4. Samples: 66134. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-29 18:29:38,908][00189] Avg episode reward: [(0, '19.924')] [2024-09-29 18:29:41,904][13244] Updated weights for policy 0, policy_version 1048 (0.0017) [2024-09-29 18:29:43,908][00189] Fps is (10 sec: 4094.8, 60 sec: 3891.0, 300 sec: 3231.2). Total num frames: 4296704. Throughput: 0: 1000.5. Samples: 73030. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 18:29:43,911][00189] Avg episode reward: [(0, '21.159')] [2024-09-29 18:29:48,906][00189] Fps is (10 sec: 3686.1, 60 sec: 3754.7, 300 sec: 3233.7). Total num frames: 4313088. Throughput: 0: 949.0. Samples: 77370. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-29 18:29:48,909][00189] Avg episode reward: [(0, '21.765')] [2024-09-29 18:29:53,426][13244] Updated weights for policy 0, policy_version 1058 (0.0039) [2024-09-29 18:29:53,905][00189] Fps is (10 sec: 3687.5, 60 sec: 3823.0, 300 sec: 3276.8). Total num frames: 4333568. Throughput: 0: 935.7. Samples: 80306. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-29 18:29:53,912][00189] Avg episode reward: [(0, '22.343')] [2024-09-29 18:29:58,905][00189] Fps is (10 sec: 4096.3, 60 sec: 3891.2, 300 sec: 3315.8). Total num frames: 4354048. Throughput: 0: 977.1. Samples: 87002. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-29 18:29:58,907][00189] Avg episode reward: [(0, '23.799')] [2024-09-29 18:29:58,918][13231] Saving /content/train_dir/samplefactory-vizdoom-v1/checkpoint_p0/checkpoint_000001063_4354048.pth... [2024-09-29 18:29:59,054][13231] Removing /content/train_dir/samplefactory-vizdoom-v1/checkpoint_p0/checkpoint_000000875_3584000.pth [2024-09-29 18:29:59,070][13231] Saving new best policy, reward=23.799! [2024-09-29 18:30:03,908][00189] Fps is (10 sec: 3685.3, 60 sec: 3822.7, 300 sec: 3313.9). Total num frames: 4370432. Throughput: 0: 964.8. Samples: 92130. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 18:30:03,911][00189] Avg episode reward: [(0, '23.078')] [2024-09-29 18:30:04,461][13244] Updated weights for policy 0, policy_version 1068 (0.0020) [2024-09-29 18:30:08,906][00189] Fps is (10 sec: 3276.7, 60 sec: 3754.6, 300 sec: 3312.4). Total num frames: 4386816. Throughput: 0: 936.3. Samples: 94286. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 18:30:08,916][00189] Avg episode reward: [(0, '22.970')] [2024-09-29 18:30:13,905][00189] Fps is (10 sec: 4097.2, 60 sec: 3891.3, 300 sec: 3379.2). Total num frames: 4411392. Throughput: 0: 952.1. Samples: 100880. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 18:30:13,911][00189] Avg episode reward: [(0, '20.163')] [2024-09-29 18:30:14,453][13244] Updated weights for policy 0, policy_version 1078 (0.0015) [2024-09-29 18:30:18,906][00189] Fps is (10 sec: 4505.7, 60 sec: 3891.2, 300 sec: 3407.9). Total num frames: 4431872. Throughput: 0: 991.4. Samples: 107288. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 18:30:18,919][00189] Avg episode reward: [(0, '21.235')] [2024-09-29 18:30:23,909][00189] Fps is (10 sec: 3685.1, 60 sec: 3822.7, 300 sec: 3402.7). Total num frames: 4448256. Throughput: 0: 961.1. Samples: 109386. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 18:30:23,914][00189] Avg episode reward: [(0, '19.902')] [2024-09-29 18:30:25,914][13244] Updated weights for policy 0, policy_version 1088 (0.0019) [2024-09-29 18:30:28,906][00189] Fps is (10 sec: 3686.4, 60 sec: 3823.0, 300 sec: 3428.5). Total num frames: 4468736. Throughput: 0: 932.2. Samples: 114978. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 18:30:28,911][00189] Avg episode reward: [(0, '20.408')] [2024-09-29 18:30:33,905][00189] Fps is (10 sec: 4097.4, 60 sec: 3891.2, 300 sec: 3452.3). Total num frames: 4489216. Throughput: 0: 988.7. Samples: 121862. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-29 18:30:33,909][00189] Avg episode reward: [(0, '20.761')] [2024-09-29 18:30:35,360][13244] Updated weights for policy 0, policy_version 1098 (0.0021) [2024-09-29 18:30:38,905][00189] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3446.3). Total num frames: 4505600. Throughput: 0: 982.2. Samples: 124504. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-09-29 18:30:38,909][00189] Avg episode reward: [(0, '20.979')] [2024-09-29 18:30:43,905][00189] Fps is (10 sec: 3276.8, 60 sec: 3754.8, 300 sec: 3440.6). Total num frames: 4521984. Throughput: 0: 935.2. Samples: 129088. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-29 18:30:43,908][00189] Avg episode reward: [(0, '22.297')] [2024-09-29 18:30:46,626][13244] Updated weights for policy 0, policy_version 1108 (0.0013) [2024-09-29 18:30:48,906][00189] Fps is (10 sec: 4095.9, 60 sec: 3891.2, 300 sec: 3488.2). Total num frames: 4546560. Throughput: 0: 973.7. Samples: 135944. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 18:30:48,908][00189] Avg episode reward: [(0, '23.658')] [2024-09-29 18:30:53,905][00189] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3507.2). Total num frames: 4567040. Throughput: 0: 1003.0. Samples: 139420. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 18:30:53,910][00189] Avg episode reward: [(0, '22.987')] [2024-09-29 18:30:57,923][13244] Updated weights for policy 0, policy_version 1118 (0.0012) [2024-09-29 18:30:58,905][00189] Fps is (10 sec: 3276.9, 60 sec: 3754.7, 300 sec: 3475.4). Total num frames: 4579328. Throughput: 0: 951.1. Samples: 143680. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-29 18:30:58,914][00189] Avg episode reward: [(0, '23.471')] [2024-09-29 18:31:03,906][00189] Fps is (10 sec: 3686.3, 60 sec: 3891.4, 300 sec: 3517.7). Total num frames: 4603904. Throughput: 0: 948.1. Samples: 149954. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 18:31:03,911][00189] Avg episode reward: [(0, '25.177')] [2024-09-29 18:31:03,917][13231] Saving new best policy, reward=25.177! [2024-09-29 18:31:07,231][13244] Updated weights for policy 0, policy_version 1128 (0.0012) [2024-09-29 18:31:08,906][00189] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3534.3). Total num frames: 4624384. Throughput: 0: 975.6. Samples: 153286. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 18:31:08,914][00189] Avg episode reward: [(0, '25.867')] [2024-09-29 18:31:08,929][13231] Saving new best policy, reward=25.867! [2024-09-29 18:31:13,906][00189] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3527.1). Total num frames: 4640768. Throughput: 0: 968.0. Samples: 158538. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 18:31:13,908][00189] Avg episode reward: [(0, '25.597')] [2024-09-29 18:31:18,906][00189] Fps is (10 sec: 3276.7, 60 sec: 3754.6, 300 sec: 3520.3). Total num frames: 4657152. Throughput: 0: 930.8. Samples: 163748. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 18:31:18,912][00189] Avg episode reward: [(0, '24.208')] [2024-09-29 18:31:19,096][13244] Updated weights for policy 0, policy_version 1138 (0.0012) [2024-09-29 18:31:23,906][00189] Fps is (10 sec: 4096.0, 60 sec: 3891.4, 300 sec: 3557.0). Total num frames: 4681728. Throughput: 0: 949.6. Samples: 167234. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 18:31:23,913][00189] Avg episode reward: [(0, '24.773')] [2024-09-29 18:31:28,900][13244] Updated weights for policy 0, policy_version 1148 (0.0025) [2024-09-29 18:31:28,905][00189] Fps is (10 sec: 4505.8, 60 sec: 3891.2, 300 sec: 3570.9). Total num frames: 4702208. Throughput: 0: 991.9. Samples: 173724. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-29 18:31:28,913][00189] Avg episode reward: [(0, '22.741')] [2024-09-29 18:31:33,906][00189] Fps is (10 sec: 3276.6, 60 sec: 3754.6, 300 sec: 3543.0). Total num frames: 4714496. Throughput: 0: 936.4. Samples: 178084. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-29 18:31:33,912][00189] Avg episode reward: [(0, '21.422')] [2024-09-29 18:31:38,906][00189] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3576.5). Total num frames: 4739072. Throughput: 0: 931.2. Samples: 181322. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-29 18:31:38,907][00189] Avg episode reward: [(0, '20.096')] [2024-09-29 18:31:39,587][13244] Updated weights for policy 0, policy_version 1158 (0.0017) [2024-09-29 18:31:43,905][00189] Fps is (10 sec: 4506.0, 60 sec: 3959.5, 300 sec: 3588.9). Total num frames: 4759552. Throughput: 0: 990.9. Samples: 188270. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 18:31:43,910][00189] Avg episode reward: [(0, '19.655')] [2024-09-29 18:31:48,906][00189] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3581.6). Total num frames: 4775936. Throughput: 0: 963.8. Samples: 193324. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 18:31:48,908][00189] Avg episode reward: [(0, '20.521')] [2024-09-29 18:31:51,254][13244] Updated weights for policy 0, policy_version 1168 (0.0026) [2024-09-29 18:31:53,907][00189] Fps is (10 sec: 3685.7, 60 sec: 3822.8, 300 sec: 3593.3). Total num frames: 4796416. Throughput: 0: 938.2. Samples: 195506. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 18:31:53,910][00189] Avg episode reward: [(0, '20.185')] [2024-09-29 18:31:58,906][00189] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3604.5). Total num frames: 4816896. Throughput: 0: 973.8. Samples: 202360. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 18:31:58,908][00189] Avg episode reward: [(0, '21.042')] [2024-09-29 18:31:58,918][13231] Saving /content/train_dir/samplefactory-vizdoom-v1/checkpoint_p0/checkpoint_000001176_4816896.pth... [2024-09-29 18:31:59,099][13231] Removing /content/train_dir/samplefactory-vizdoom-v1/checkpoint_p0/checkpoint_000000978_4005888.pth [2024-09-29 18:32:00,336][13244] Updated weights for policy 0, policy_version 1178 (0.0028) [2024-09-29 18:32:03,907][00189] Fps is (10 sec: 3686.4, 60 sec: 3822.8, 300 sec: 3597.3). Total num frames: 4833280. Throughput: 0: 987.3. Samples: 208176. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-29 18:32:03,917][00189] Avg episode reward: [(0, '21.844')] [2024-09-29 18:32:08,906][00189] Fps is (10 sec: 3276.7, 60 sec: 3754.6, 300 sec: 3590.5). Total num frames: 4849664. Throughput: 0: 955.9. Samples: 210250. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-29 18:32:08,911][00189] Avg episode reward: [(0, '22.125')] [2024-09-29 18:32:11,976][13244] Updated weights for policy 0, policy_version 1188 (0.0022) [2024-09-29 18:32:13,905][00189] Fps is (10 sec: 4096.8, 60 sec: 3891.2, 300 sec: 3618.1). Total num frames: 4874240. Throughput: 0: 946.3. Samples: 216308. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 18:32:13,913][00189] Avg episode reward: [(0, '22.055')] [2024-09-29 18:32:18,908][00189] Fps is (10 sec: 4504.7, 60 sec: 3959.3, 300 sec: 3627.9). Total num frames: 4894720. Throughput: 0: 1002.7. Samples: 223206. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-29 18:32:18,914][00189] Avg episode reward: [(0, '23.280')] [2024-09-29 18:32:21,888][13244] Updated weights for policy 0, policy_version 1198 (0.0021) [2024-09-29 18:32:23,906][00189] Fps is (10 sec: 3686.0, 60 sec: 3822.9, 300 sec: 3620.8). Total num frames: 4911104. Throughput: 0: 982.1. Samples: 225518. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-29 18:32:23,916][00189] Avg episode reward: [(0, '23.471')] [2024-09-29 18:32:28,906][00189] Fps is (10 sec: 3277.5, 60 sec: 3754.7, 300 sec: 3614.1). Total num frames: 4927488. Throughput: 0: 934.6. Samples: 230326. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 18:32:28,908][00189] Avg episode reward: [(0, '22.759')] [2024-09-29 18:32:32,743][13244] Updated weights for policy 0, policy_version 1208 (0.0019) [2024-09-29 18:32:33,906][00189] Fps is (10 sec: 4096.1, 60 sec: 3959.5, 300 sec: 3639.1). Total num frames: 4952064. Throughput: 0: 973.7. Samples: 237140. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 18:32:33,908][00189] Avg episode reward: [(0, '21.727')] [2024-09-29 18:32:38,906][00189] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3632.3). Total num frames: 4968448. Throughput: 0: 998.7. Samples: 240446. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-29 18:32:38,910][00189] Avg episode reward: [(0, '22.302')] [2024-09-29 18:32:43,906][00189] Fps is (10 sec: 3276.9, 60 sec: 3754.6, 300 sec: 3625.7). Total num frames: 4984832. Throughput: 0: 939.6. Samples: 244640. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-09-29 18:32:43,908][00189] Avg episode reward: [(0, '21.087')] [2024-09-29 18:32:44,523][13244] Updated weights for policy 0, policy_version 1218 (0.0012) [2024-09-29 18:32:48,905][00189] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3649.2). Total num frames: 5009408. Throughput: 0: 960.1. Samples: 251378. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-29 18:32:48,910][00189] Avg episode reward: [(0, '23.140')] [2024-09-29 18:32:53,123][13244] Updated weights for policy 0, policy_version 1228 (0.0017) [2024-09-29 18:32:53,906][00189] Fps is (10 sec: 4505.6, 60 sec: 3891.3, 300 sec: 3657.1). Total num frames: 5029888. Throughput: 0: 988.7. Samples: 254740. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-29 18:32:53,908][00189] Avg episode reward: [(0, '23.503')] [2024-09-29 18:32:58,905][00189] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3636.1). Total num frames: 5042176. Throughput: 0: 964.2. Samples: 259696. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 18:32:58,913][00189] Avg episode reward: [(0, '25.604')] [2024-09-29 18:33:03,906][00189] Fps is (10 sec: 3276.8, 60 sec: 3823.0, 300 sec: 3644.0). Total num frames: 5062656. Throughput: 0: 934.6. Samples: 265262. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-29 18:33:03,911][00189] Avg episode reward: [(0, '24.668')] [2024-09-29 18:33:05,118][13244] Updated weights for policy 0, policy_version 1238 (0.0012) [2024-09-29 18:33:08,905][00189] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3665.6). Total num frames: 5087232. Throughput: 0: 960.7. Samples: 268750. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-09-29 18:33:08,911][00189] Avg episode reward: [(0, '24.746')] [2024-09-29 18:33:13,906][00189] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3721.1). Total num frames: 5103616. Throughput: 0: 989.6. Samples: 274856. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 18:33:13,911][00189] Avg episode reward: [(0, '24.967')] [2024-09-29 18:33:16,008][13244] Updated weights for policy 0, policy_version 1248 (0.0012) [2024-09-29 18:33:18,905][00189] Fps is (10 sec: 3276.8, 60 sec: 3754.8, 300 sec: 3776.7). Total num frames: 5120000. Throughput: 0: 940.4. Samples: 279458. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-29 18:33:18,908][00189] Avg episode reward: [(0, '24.087')] [2024-09-29 18:33:23,905][00189] Fps is (10 sec: 4096.1, 60 sec: 3891.3, 300 sec: 3860.0). Total num frames: 5144576. Throughput: 0: 946.1. Samples: 283020. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-29 18:33:23,910][00189] Avg episode reward: [(0, '24.579')] [2024-09-29 18:33:25,350][13244] Updated weights for policy 0, policy_version 1258 (0.0020) [2024-09-29 18:33:28,905][00189] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3860.0). Total num frames: 5165056. Throughput: 0: 1005.3. Samples: 289880. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-29 18:33:28,909][00189] Avg episode reward: [(0, '25.113')] [2024-09-29 18:33:33,912][00189] Fps is (10 sec: 3684.0, 60 sec: 3822.6, 300 sec: 3859.9). Total num frames: 5181440. Throughput: 0: 952.9. Samples: 294264. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 18:33:33,915][00189] Avg episode reward: [(0, '25.673')] [2024-09-29 18:33:37,094][13244] Updated weights for policy 0, policy_version 1268 (0.0033) [2024-09-29 18:33:38,905][00189] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 5197824. Throughput: 0: 937.9. Samples: 296944. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 18:33:38,911][00189] Avg episode reward: [(0, '25.412')] [2024-09-29 18:33:43,905][00189] Fps is (10 sec: 4098.6, 60 sec: 3959.5, 300 sec: 3846.1). Total num frames: 5222400. Throughput: 0: 979.6. Samples: 303778. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 18:33:43,909][00189] Avg episode reward: [(0, '24.869')] [2024-09-29 18:33:46,817][13244] Updated weights for policy 0, policy_version 1278 (0.0029) [2024-09-29 18:33:48,909][00189] Fps is (10 sec: 4094.4, 60 sec: 3822.7, 300 sec: 3846.0). Total num frames: 5238784. Throughput: 0: 972.1. Samples: 309012. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 18:33:48,912][00189] Avg episode reward: [(0, '24.541')] [2024-09-29 18:33:53,905][00189] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3846.1). Total num frames: 5255168. Throughput: 0: 941.4. Samples: 311112. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-29 18:33:53,908][00189] Avg episode reward: [(0, '22.495')] [2024-09-29 18:33:57,852][13244] Updated weights for policy 0, policy_version 1288 (0.0014) [2024-09-29 18:33:58,906][00189] Fps is (10 sec: 4097.6, 60 sec: 3959.5, 300 sec: 3860.0). Total num frames: 5279744. Throughput: 0: 951.6. Samples: 317678. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 18:33:58,907][00189] Avg episode reward: [(0, '22.467')] [2024-09-29 18:33:58,921][13231] Saving /content/train_dir/samplefactory-vizdoom-v1/checkpoint_p0/checkpoint_000001289_5279744.pth... [2024-09-29 18:33:59,059][13231] Removing /content/train_dir/samplefactory-vizdoom-v1/checkpoint_p0/checkpoint_000001063_4354048.pth [2024-09-29 18:34:03,912][00189] Fps is (10 sec: 4093.2, 60 sec: 3890.8, 300 sec: 3846.0). Total num frames: 5296128. Throughput: 0: 985.5. Samples: 323812. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 18:34:03,915][00189] Avg episode reward: [(0, '21.512')] [2024-09-29 18:34:08,906][00189] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3846.1). Total num frames: 5312512. Throughput: 0: 952.8. Samples: 325894. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-29 18:34:08,910][00189] Avg episode reward: [(0, '21.152')] [2024-09-29 18:34:09,840][13244] Updated weights for policy 0, policy_version 1298 (0.0039) [2024-09-29 18:34:13,905][00189] Fps is (10 sec: 3688.9, 60 sec: 3823.0, 300 sec: 3846.1). Total num frames: 5332992. Throughput: 0: 921.2. Samples: 331336. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-29 18:34:13,908][00189] Avg episode reward: [(0, '20.818')] [2024-09-29 18:34:18,905][00189] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 5353472. Throughput: 0: 973.9. Samples: 338084. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 18:34:18,913][00189] Avg episode reward: [(0, '22.256')] [2024-09-29 18:34:18,957][13244] Updated weights for policy 0, policy_version 1308 (0.0023) [2024-09-29 18:34:23,909][00189] Fps is (10 sec: 3685.0, 60 sec: 3754.4, 300 sec: 3832.2). Total num frames: 5369856. Throughput: 0: 971.9. Samples: 340684. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-29 18:34:23,916][00189] Avg episode reward: [(0, '22.391')] [2024-09-29 18:34:28,906][00189] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3832.2). Total num frames: 5386240. Throughput: 0: 912.8. Samples: 344854. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 18:34:28,912][00189] Avg episode reward: [(0, '21.384')] [2024-09-29 18:34:31,250][13244] Updated weights for policy 0, policy_version 1318 (0.0016) [2024-09-29 18:34:33,905][00189] Fps is (10 sec: 4097.5, 60 sec: 3823.3, 300 sec: 3846.1). Total num frames: 5410816. Throughput: 0: 944.3. Samples: 351504. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 18:34:33,911][00189] Avg episode reward: [(0, '23.156')] [2024-09-29 18:34:38,905][00189] Fps is (10 sec: 4096.1, 60 sec: 3822.9, 300 sec: 3832.2). Total num frames: 5427200. Throughput: 0: 973.2. Samples: 354906. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 18:34:38,911][00189] Avg episode reward: [(0, '23.583')] [2024-09-29 18:34:41,825][13244] Updated weights for policy 0, policy_version 1328 (0.0019) [2024-09-29 18:34:43,905][00189] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3832.2). Total num frames: 5443584. Throughput: 0: 928.4. Samples: 359458. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-29 18:34:43,910][00189] Avg episode reward: [(0, '24.308')] [2024-09-29 18:34:48,905][00189] Fps is (10 sec: 3686.4, 60 sec: 3754.9, 300 sec: 3832.2). Total num frames: 5464064. Throughput: 0: 922.4. Samples: 365316. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 18:34:48,908][00189] Avg episode reward: [(0, '22.778')] [2024-09-29 18:34:51,984][13244] Updated weights for policy 0, policy_version 1338 (0.0012) [2024-09-29 18:34:53,905][00189] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 5488640. Throughput: 0: 951.9. Samples: 368730. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-29 18:34:53,908][00189] Avg episode reward: [(0, '23.025')] [2024-09-29 18:34:58,909][00189] Fps is (10 sec: 3685.0, 60 sec: 3686.2, 300 sec: 3832.2). Total num frames: 5500928. Throughput: 0: 952.4. Samples: 374196. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-29 18:34:58,912][00189] Avg episode reward: [(0, '21.606')] [2024-09-29 18:35:03,905][00189] Fps is (10 sec: 2867.2, 60 sec: 3686.8, 300 sec: 3832.2). Total num frames: 5517312. Throughput: 0: 908.8. Samples: 378980. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 18:35:03,913][00189] Avg episode reward: [(0, '21.145')] [2024-09-29 18:35:04,134][13244] Updated weights for policy 0, policy_version 1348 (0.0031) [2024-09-29 18:35:08,905][00189] Fps is (10 sec: 4097.5, 60 sec: 3822.9, 300 sec: 3832.2). Total num frames: 5541888. Throughput: 0: 928.5. Samples: 382462. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 18:35:08,908][00189] Avg episode reward: [(0, '22.057')] [2024-09-29 18:35:13,466][13244] Updated weights for policy 0, policy_version 1358 (0.0018) [2024-09-29 18:35:13,905][00189] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3832.2). Total num frames: 5562368. Throughput: 0: 983.3. Samples: 389102. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 18:35:13,910][00189] Avg episode reward: [(0, '22.943')] [2024-09-29 18:35:18,905][00189] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3818.3). Total num frames: 5574656. Throughput: 0: 930.8. Samples: 393388. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 18:35:18,910][00189] Avg episode reward: [(0, '23.771')] [2024-09-29 18:35:23,905][00189] Fps is (10 sec: 3276.8, 60 sec: 3754.9, 300 sec: 3818.3). Total num frames: 5595136. Throughput: 0: 919.1. Samples: 396266. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 18:35:23,910][00189] Avg episode reward: [(0, '25.155')] [2024-09-29 18:35:24,957][13244] Updated weights for policy 0, policy_version 1368 (0.0013) [2024-09-29 18:35:28,906][00189] Fps is (10 sec: 4505.5, 60 sec: 3891.2, 300 sec: 3832.2). Total num frames: 5619712. Throughput: 0: 966.0. Samples: 402928. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 18:35:28,914][00189] Avg episode reward: [(0, '25.659')] [2024-09-29 18:35:33,908][00189] Fps is (10 sec: 3685.3, 60 sec: 3686.2, 300 sec: 3818.3). Total num frames: 5632000. Throughput: 0: 946.1. Samples: 407892. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 18:35:33,915][00189] Avg episode reward: [(0, '26.437')] [2024-09-29 18:35:33,955][13231] Saving new best policy, reward=26.437! [2024-09-29 18:35:36,788][13244] Updated weights for policy 0, policy_version 1378 (0.0013) [2024-09-29 18:35:38,905][00189] Fps is (10 sec: 3276.9, 60 sec: 3754.7, 300 sec: 3832.2). Total num frames: 5652480. Throughput: 0: 917.6. Samples: 410022. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-29 18:35:38,912][00189] Avg episode reward: [(0, '25.599')] [2024-09-29 18:35:43,905][00189] Fps is (10 sec: 4097.2, 60 sec: 3822.9, 300 sec: 3818.3). Total num frames: 5672960. Throughput: 0: 943.4. Samples: 416644. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-29 18:35:43,909][00189] Avg episode reward: [(0, '26.865')] [2024-09-29 18:35:43,914][13231] Saving new best policy, reward=26.865! [2024-09-29 18:35:45,857][13244] Updated weights for policy 0, policy_version 1388 (0.0016) [2024-09-29 18:35:48,905][00189] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3818.3). Total num frames: 5693440. Throughput: 0: 976.7. Samples: 422932. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 18:35:48,908][00189] Avg episode reward: [(0, '26.728')] [2024-09-29 18:35:53,905][00189] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3832.2). Total num frames: 5709824. Throughput: 0: 946.3. Samples: 425044. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 18:35:53,908][00189] Avg episode reward: [(0, '26.864')] [2024-09-29 18:35:57,356][13244] Updated weights for policy 0, policy_version 1398 (0.0017) [2024-09-29 18:35:58,905][00189] Fps is (10 sec: 3686.4, 60 sec: 3823.2, 300 sec: 3818.3). Total num frames: 5730304. Throughput: 0: 928.1. Samples: 430868. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 18:35:58,912][00189] Avg episode reward: [(0, '26.869')] [2024-09-29 18:35:58,923][13231] Saving /content/train_dir/samplefactory-vizdoom-v1/checkpoint_p0/checkpoint_000001399_5730304.pth... [2024-09-29 18:35:59,056][13231] Removing /content/train_dir/samplefactory-vizdoom-v1/checkpoint_p0/checkpoint_000001176_4816896.pth [2024-09-29 18:35:59,076][13231] Saving new best policy, reward=26.869! [2024-09-29 18:36:03,905][00189] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3818.3). Total num frames: 5750784. Throughput: 0: 978.0. Samples: 437396. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 18:36:03,912][00189] Avg episode reward: [(0, '28.524')] [2024-09-29 18:36:03,990][13231] Saving new best policy, reward=28.524! [2024-09-29 18:36:08,246][13244] Updated weights for policy 0, policy_version 1408 (0.0028) [2024-09-29 18:36:08,905][00189] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3818.3). Total num frames: 5767168. Throughput: 0: 966.1. Samples: 439742. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 18:36:08,910][00189] Avg episode reward: [(0, '28.948')] [2024-09-29 18:36:08,924][13231] Saving new best policy, reward=28.948! [2024-09-29 18:36:13,905][00189] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3818.3). Total num frames: 5783552. Throughput: 0: 920.7. Samples: 444358. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 18:36:13,912][00189] Avg episode reward: [(0, '30.230')] [2024-09-29 18:36:13,917][13231] Saving new best policy, reward=30.230! [2024-09-29 18:36:18,519][13244] Updated weights for policy 0, policy_version 1418 (0.0022) [2024-09-29 18:36:18,905][00189] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3818.3). Total num frames: 5808128. Throughput: 0: 962.2. Samples: 451188. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-29 18:36:18,908][00189] Avg episode reward: [(0, '27.436')] [2024-09-29 18:36:23,908][00189] Fps is (10 sec: 4095.2, 60 sec: 3822.8, 300 sec: 3804.4). Total num frames: 5824512. Throughput: 0: 992.0. Samples: 454666. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 18:36:23,913][00189] Avg episode reward: [(0, '28.014')] [2024-09-29 18:36:28,905][00189] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3818.3). Total num frames: 5840896. Throughput: 0: 937.7. Samples: 458840. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-29 18:36:28,908][00189] Avg episode reward: [(0, '28.298')] [2024-09-29 18:36:30,115][13244] Updated weights for policy 0, policy_version 1428 (0.0036) [2024-09-29 18:36:33,905][00189] Fps is (10 sec: 4096.9, 60 sec: 3891.4, 300 sec: 3818.3). Total num frames: 5865472. Throughput: 0: 937.7. Samples: 465130. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-29 18:36:33,908][00189] Avg episode reward: [(0, '27.566')] [2024-09-29 18:36:38,906][00189] Fps is (10 sec: 4505.3, 60 sec: 3891.2, 300 sec: 3818.3). Total num frames: 5885952. Throughput: 0: 968.6. Samples: 468630. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-29 18:36:38,909][00189] Avg episode reward: [(0, '24.180')] [2024-09-29 18:36:39,388][13244] Updated weights for policy 0, policy_version 1438 (0.0027) [2024-09-29 18:36:43,906][00189] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3804.4). Total num frames: 5898240. Throughput: 0: 954.6. Samples: 473826. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 18:36:43,911][00189] Avg episode reward: [(0, '23.657')] [2024-09-29 18:36:48,906][00189] Fps is (10 sec: 3276.9, 60 sec: 3754.6, 300 sec: 3804.4). Total num frames: 5918720. Throughput: 0: 925.9. Samples: 479060. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 18:36:48,912][00189] Avg episode reward: [(0, '25.211')] [2024-09-29 18:36:51,245][13244] Updated weights for policy 0, policy_version 1448 (0.0012) [2024-09-29 18:36:53,906][00189] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3804.4). Total num frames: 5939200. Throughput: 0: 946.1. Samples: 482318. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 18:36:53,911][00189] Avg episode reward: [(0, '24.362')] [2024-09-29 18:36:58,905][00189] Fps is (10 sec: 4096.2, 60 sec: 3822.9, 300 sec: 3818.3). Total num frames: 5959680. Throughput: 0: 981.6. Samples: 488528. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 18:36:58,912][00189] Avg episode reward: [(0, '23.120')] [2024-09-29 18:37:02,461][13244] Updated weights for policy 0, policy_version 1458 (0.0031) [2024-09-29 18:37:03,906][00189] Fps is (10 sec: 3686.4, 60 sec: 3754.6, 300 sec: 3818.3). Total num frames: 5976064. Throughput: 0: 924.7. Samples: 492800. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-29 18:37:03,908][00189] Avg episode reward: [(0, '22.941')] [2024-09-29 18:37:08,906][00189] Fps is (10 sec: 3686.3, 60 sec: 3822.9, 300 sec: 3804.4). Total num frames: 5996544. Throughput: 0: 923.8. Samples: 496236. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-29 18:37:08,911][00189] Avg episode reward: [(0, '24.412')] [2024-09-29 18:37:09,939][13231] Stopping Batcher_0... [2024-09-29 18:37:09,939][00189] Component Batcher_0 stopped! [2024-09-29 18:37:09,941][13231] Loop batcher_evt_loop terminating... [2024-09-29 18:37:09,944][13231] Saving /content/train_dir/samplefactory-vizdoom-v1/checkpoint_p0/checkpoint_000001466_6004736.pth... [2024-09-29 18:37:09,996][13244] Weights refcount: 2 0 [2024-09-29 18:37:10,001][00189] Component InferenceWorker_p0-w0 stopped! [2024-09-29 18:37:10,005][13244] Stopping InferenceWorker_p0-w0... [2024-09-29 18:37:10,006][13244] Loop inference_proc0-0_evt_loop terminating... [2024-09-29 18:37:10,086][13231] Removing /content/train_dir/samplefactory-vizdoom-v1/checkpoint_p0/checkpoint_000001289_5279744.pth [2024-09-29 18:37:10,101][13231] Saving /content/train_dir/samplefactory-vizdoom-v1/checkpoint_p0/checkpoint_000001466_6004736.pth... [2024-09-29 18:37:10,273][00189] Component LearnerWorker_p0 stopped! [2024-09-29 18:37:10,280][13231] Stopping LearnerWorker_p0... [2024-09-29 18:37:10,281][13231] Loop learner_proc0_evt_loop terminating... [2024-09-29 18:37:10,323][13247] Stopping RolloutWorker_w2... [2024-09-29 18:37:10,323][00189] Component RolloutWorker_w2 stopped! [2024-09-29 18:37:10,329][13247] Loop rollout_proc2_evt_loop terminating... [2024-09-29 18:37:10,339][13249] Stopping RolloutWorker_w4... [2024-09-29 18:37:10,339][00189] Component RolloutWorker_w4 stopped! [2024-09-29 18:37:10,342][13249] Loop rollout_proc4_evt_loop terminating... [2024-09-29 18:37:10,359][13252] Stopping RolloutWorker_w6... [2024-09-29 18:37:10,359][00189] Component RolloutWorker_w6 stopped! [2024-09-29 18:37:10,365][13252] Loop rollout_proc6_evt_loop terminating... [2024-09-29 18:37:10,385][13245] Stopping RolloutWorker_w0... [2024-09-29 18:37:10,385][00189] Component RolloutWorker_w0 stopped! [2024-09-29 18:37:10,392][13245] Loop rollout_proc0_evt_loop terminating... [2024-09-29 18:37:10,512][00189] Component RolloutWorker_w5 stopped! [2024-09-29 18:37:10,518][13250] Stopping RolloutWorker_w5... [2024-09-29 18:37:10,518][13250] Loop rollout_proc5_evt_loop terminating... [2024-09-29 18:37:10,527][00189] Component RolloutWorker_w7 stopped! [2024-09-29 18:37:10,533][13251] Stopping RolloutWorker_w7... [2024-09-29 18:37:10,533][13251] Loop rollout_proc7_evt_loop terminating... [2024-09-29 18:37:10,573][00189] Component RolloutWorker_w1 stopped! [2024-09-29 18:37:10,581][13246] Stopping RolloutWorker_w1... [2024-09-29 18:37:10,582][13246] Loop rollout_proc1_evt_loop terminating... [2024-09-29 18:37:10,619][00189] Component RolloutWorker_w3 stopped! [2024-09-29 18:37:10,626][00189] Waiting for process learner_proc0 to stop... [2024-09-29 18:37:10,632][13248] Stopping RolloutWorker_w3... [2024-09-29 18:37:10,632][13248] Loop rollout_proc3_evt_loop terminating... [2024-09-29 18:37:11,704][00189] Waiting for process inference_proc0-0 to join... [2024-09-29 18:37:11,710][00189] Waiting for process rollout_proc0 to join... [2024-09-29 18:37:12,993][00189] Waiting for process rollout_proc1 to join... [2024-09-29 18:37:13,130][00189] Waiting for process rollout_proc2 to join... [2024-09-29 18:37:13,139][00189] Waiting for process rollout_proc3 to join... [2024-09-29 18:37:13,142][00189] Waiting for process rollout_proc4 to join... [2024-09-29 18:37:13,146][00189] Waiting for process rollout_proc5 to join... [2024-09-29 18:37:13,150][00189] Waiting for process rollout_proc6 to join... [2024-09-29 18:37:13,154][00189] Waiting for process rollout_proc7 to join... [2024-09-29 18:37:13,158][00189] Batcher 0 profile tree view: batching: 13.3814, releasing_batches: 0.0123 [2024-09-29 18:37:13,159][00189] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0000 wait_policy_total: 234.9399 update_model: 4.0083 weight_update: 0.0021 one_step: 0.0105 handle_policy_step: 273.6050 deserialize: 7.2268, stack: 1.4695, obs_to_device_normalize: 56.2989, forward: 136.7025, send_messages: 14.3004 prepare_outputs: 43.5609 to_cpu: 27.3862 [2024-09-29 18:37:13,161][00189] Learner 0 profile tree view: misc: 0.0027, prepare_batch: 9.1955 train: 38.3956 epoch_init: 0.0075, minibatch_init: 0.0080, losses_postprocess: 0.3124, kl_divergence: 0.2967, after_optimizer: 1.5934 calculate_losses: 12.0283 losses_init: 0.0039, forward_head: 1.0154, bptt_initial: 7.5387, tail: 0.6592, advantages_returns: 0.1716, losses: 1.3827 bptt: 1.0981 bptt_forward_core: 1.0310 update: 23.8179 clip: 0.7491 [2024-09-29 18:37:13,163][00189] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.1977, enqueue_policy_requests: 57.8925, env_step: 410.4342, overhead: 6.6927, complete_rollouts: 3.5523 save_policy_outputs: 12.1283 split_output_tensors: 4.1940 [2024-09-29 18:37:13,164][00189] RolloutWorker_w7 profile tree view: wait_for_trajectories: 0.1803, enqueue_policy_requests: 56.5854, env_step: 412.3422, overhead: 6.9324, complete_rollouts: 3.3981 save_policy_outputs: 11.8552 split_output_tensors: 4.1133 [2024-09-29 18:37:13,165][00189] Loop Runner_EvtLoop terminating... [2024-09-29 18:37:13,166][00189] Runner profile tree view: main_loop: 552.5109 [2024-09-29 18:37:13,168][00189] Collected {0: 6004736}, FPS: 3617.8 [2024-09-29 18:39:42,091][00189] Environment doom_basic already registered, overwriting... [2024-09-29 18:39:42,094][00189] Environment doom_two_colors_easy already registered, overwriting... [2024-09-29 18:39:42,095][00189] Environment doom_two_colors_hard already registered, overwriting... [2024-09-29 18:39:42,098][00189] Environment doom_dm already registered, overwriting... [2024-09-29 18:39:42,099][00189] Environment doom_dwango5 already registered, overwriting... [2024-09-29 18:39:42,103][00189] Environment doom_my_way_home_flat_actions already registered, overwriting... [2024-09-29 18:39:42,104][00189] Environment doom_defend_the_center_flat_actions already registered, overwriting... [2024-09-29 18:39:42,105][00189] Environment doom_my_way_home already registered, overwriting... [2024-09-29 18:39:42,106][00189] Environment doom_deadly_corridor already registered, overwriting... [2024-09-29 18:39:42,106][00189] Environment doom_defend_the_center already registered, overwriting... [2024-09-29 18:39:42,107][00189] Environment doom_defend_the_line already registered, overwriting... [2024-09-29 18:39:42,111][00189] Environment doom_health_gathering already registered, overwriting... [2024-09-29 18:39:42,112][00189] Environment doom_health_gathering_supreme already registered, overwriting... [2024-09-29 18:39:42,115][00189] Environment doom_battle already registered, overwriting... [2024-09-29 18:39:42,116][00189] Environment doom_battle2 already registered, overwriting... [2024-09-29 18:39:42,117][00189] Environment doom_duel_bots already registered, overwriting... [2024-09-29 18:39:42,118][00189] Environment doom_deathmatch_bots already registered, overwriting... [2024-09-29 18:39:42,119][00189] Environment doom_duel already registered, overwriting... [2024-09-29 18:39:42,120][00189] Environment doom_deathmatch_full already registered, overwriting... [2024-09-29 18:39:42,121][00189] Environment doom_benchmark already registered, overwriting... [2024-09-29 18:39:42,126][00189] register_encoder_factory: [2024-09-29 18:39:42,157][00189] Loading existing experiment configuration from /content/train_dir/samplefactory-vizdoom-v1/config.json [2024-09-29 18:39:42,159][00189] Overriding arg 'train_for_env_steps' with value 10000000 passed from command line [2024-09-29 18:39:42,166][00189] Experiment dir /content/train_dir/samplefactory-vizdoom-v1 already exists! [2024-09-29 18:39:42,168][00189] Resuming existing experiment from /content/train_dir/samplefactory-vizdoom-v1... [2024-09-29 18:39:42,169][00189] Weights and Biases integration disabled [2024-09-29 18:39:42,176][00189] Environment var CUDA_VISIBLE_DEVICES is 0 [2024-09-29 18:39:44,379][00189] Starting experiment with the following configuration: help=False algo=APPO env=doom_health_gathering_supreme experiment=samplefactory-vizdoom-v1 train_dir=/content/train_dir restart_behavior=resume device=gpu seed=None num_policies=1 async_rl=True serial_mode=False batched_sampling=False num_batches_to_accumulate=2 worker_num_splits=2 policy_workers_per_policy=1 max_policy_lag=1000 num_workers=8 num_envs_per_worker=4 batch_size=1024 num_batches_per_epoch=1 num_epochs=1 rollout=32 recurrence=32 shuffle_minibatches=False gamma=0.99 reward_scale=1.0 reward_clip=1000.0 value_bootstrap=False normalize_returns=True exploration_loss_coeff=0.001 value_loss_coeff=0.5 kl_loss_coeff=0.0 exploration_loss=symmetric_kl gae_lambda=0.95 ppo_clip_ratio=0.1 ppo_clip_value=0.2 with_vtrace=False vtrace_rho=1.0 vtrace_c=1.0 optimizer=adam adam_eps=1e-06 adam_beta1=0.9 adam_beta2=0.999 max_grad_norm=4.0 learning_rate=0.0001 lr_schedule=constant lr_schedule_kl_threshold=0.008 lr_adaptive_min=1e-06 lr_adaptive_max=0.01 obs_subtract_mean=0.0 obs_scale=255.0 normalize_input=True normalize_input_keys=None decorrelate_experience_max_seconds=0 decorrelate_envs_on_one_worker=True actor_worker_gpus=[] set_workers_cpu_affinity=True force_envs_single_thread=False default_niceness=0 log_to_file=True experiment_summaries_interval=10 flush_summaries_interval=30 stats_avg=100 summaries_use_frameskip=True heartbeat_interval=20 heartbeat_reporting_interval=600 train_for_env_steps=10000000 train_for_seconds=10000000000 save_every_sec=120 keep_checkpoints=2 load_checkpoint_kind=latest save_milestones_sec=-1 save_best_every_sec=5 save_best_metric=reward save_best_after=100000 benchmark=False encoder_mlp_layers=[512, 512] encoder_conv_architecture=convnet_simple encoder_conv_mlp_layers=[512] use_rnn=True rnn_size=512 rnn_type=gru rnn_num_layers=1 decoder_mlp_layers=[] nonlinearity=elu policy_initialization=orthogonal policy_init_gain=1.0 actor_critic_share_weights=True adaptive_stddev=True continuous_tanh_scale=0.0 initial_stddev=1.0 use_env_info_cache=False env_gpu_actions=False env_gpu_observations=True env_frameskip=4 env_framestack=1 pixel_format=CHW use_record_episode_statistics=False with_wandb=False wandb_user=None wandb_project=sample_factory wandb_group=None wandb_job_type=SF wandb_tags=[] with_pbt=False pbt_mix_policies_in_one_env=True pbt_period_env_steps=5000000 pbt_start_mutation=20000000 pbt_replace_fraction=0.3 pbt_mutation_rate=0.15 pbt_replace_reward_gap=0.1 pbt_replace_reward_gap_absolute=1e-06 pbt_optimize_gamma=False pbt_target_objective=true_objective pbt_perturb_min=1.1 pbt_perturb_max=1.5 num_agents=-1 num_humans=0 num_bots=-1 start_bot_difficulty=None timelimit=None res_w=128 res_h=72 wide_aspect_ratio=False eval_env_frameskip=1 fps=35 command_line=--env=doom_health_gathering_supreme --num_workers=8 --num_envs_per_worker=4 --train_for_env_steps=10000 --experiment=samplefactory-vizdoom-v1 --restart_behavior=resume cli_args={'env': 'doom_health_gathering_supreme', 'experiment': 'samplefactory-vizdoom-v1', 'restart_behavior': 'resume', 'num_workers': 8, 'num_envs_per_worker': 4, 'train_for_env_steps': 10000} git_hash=unknown git_repo_name=not a git repository [2024-09-29 18:39:44,384][00189] Saving configuration to /content/train_dir/samplefactory-vizdoom-v1/config.json... [2024-09-29 18:39:44,388][00189] Rollout worker 0 uses device cpu [2024-09-29 18:39:44,390][00189] Rollout worker 1 uses device cpu [2024-09-29 18:39:44,392][00189] Rollout worker 2 uses device cpu [2024-09-29 18:39:44,394][00189] Rollout worker 3 uses device cpu [2024-09-29 18:39:44,395][00189] Rollout worker 4 uses device cpu [2024-09-29 18:39:44,401][00189] Rollout worker 5 uses device cpu [2024-09-29 18:39:44,402][00189] Rollout worker 6 uses device cpu [2024-09-29 18:39:44,404][00189] Rollout worker 7 uses device cpu [2024-09-29 18:39:44,596][00189] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-09-29 18:39:44,597][00189] InferenceWorker_p0-w0: min num requests: 2 [2024-09-29 18:39:44,632][00189] Starting all processes... [2024-09-29 18:39:44,634][00189] Starting process learner_proc0 [2024-09-29 18:39:44,682][00189] Starting all processes... [2024-09-29 18:39:44,689][00189] Starting process inference_proc0-0 [2024-09-29 18:39:44,689][00189] Starting process rollout_proc0 [2024-09-29 18:39:44,691][00189] Starting process rollout_proc1 [2024-09-29 18:39:44,693][00189] Starting process rollout_proc2 [2024-09-29 18:39:44,694][00189] Starting process rollout_proc3 [2024-09-29 18:39:44,694][00189] Starting process rollout_proc4 [2024-09-29 18:39:44,694][00189] Starting process rollout_proc5 [2024-09-29 18:39:44,694][00189] Starting process rollout_proc6 [2024-09-29 18:39:44,694][00189] Starting process rollout_proc7 [2024-09-29 18:39:54,211][16336] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-09-29 18:39:54,221][16336] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2024-09-29 18:39:54,288][16336] Num visible devices: 1 [2024-09-29 18:39:54,332][16336] Starting seed is not provided [2024-09-29 18:39:54,333][16336] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-09-29 18:39:54,334][16336] Initializing actor-critic model on device cuda:0 [2024-09-29 18:39:54,335][16336] RunningMeanStd input shape: (3, 72, 128) [2024-09-29 18:39:54,336][16336] RunningMeanStd input shape: (1,) [2024-09-29 18:39:54,374][16353] Worker 3 uses CPU cores [1] [2024-09-29 18:39:54,405][16336] ConvEncoder: input_channels=3 [2024-09-29 18:39:54,417][16357] Worker 7 uses CPU cores [1] [2024-09-29 18:39:54,480][16356] Worker 6 uses CPU cores [0] [2024-09-29 18:39:54,643][16350] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-09-29 18:39:54,643][16350] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2024-09-29 18:39:54,737][16350] Num visible devices: 1 [2024-09-29 18:39:54,770][16351] Worker 1 uses CPU cores [1] [2024-09-29 18:39:54,850][16349] Worker 0 uses CPU cores [0] [2024-09-29 18:39:54,891][16355] Worker 5 uses CPU cores [1] [2024-09-29 18:39:54,892][16352] Worker 2 uses CPU cores [0] [2024-09-29 18:39:54,899][16354] Worker 4 uses CPU cores [0] [2024-09-29 18:39:55,003][16336] Conv encoder output size: 512 [2024-09-29 18:39:55,004][16336] Policy head output size: 512 [2024-09-29 18:39:55,030][16336] Created Actor Critic model with architecture: [2024-09-29 18:39:55,031][16336] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2024-09-29 18:39:57,141][16336] Using optimizer [2024-09-29 18:39:57,142][16336] Loading state from checkpoint /content/train_dir/samplefactory-vizdoom-v1/checkpoint_p0/checkpoint_000001466_6004736.pth... [2024-09-29 18:39:57,188][16336] Loading model from checkpoint [2024-09-29 18:39:57,195][16336] Loaded experiment state at self.train_step=1466, self.env_steps=6004736 [2024-09-29 18:39:57,196][16336] Initialized policy 0 weights for model version 1466 [2024-09-29 18:39:57,200][16336] LearnerWorker_p0 finished initialization! [2024-09-29 18:39:57,202][16336] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-09-29 18:39:57,324][16350] RunningMeanStd input shape: (3, 72, 128) [2024-09-29 18:39:57,327][16350] RunningMeanStd input shape: (1,) [2024-09-29 18:39:57,379][16350] ConvEncoder: input_channels=3 [2024-09-29 18:39:57,545][16350] Conv encoder output size: 512 [2024-09-29 18:39:57,546][16350] Policy head output size: 512 [2024-09-29 18:39:59,079][00189] Inference worker 0-0 is ready! [2024-09-29 18:39:59,080][00189] All inference workers are ready! Signal rollout workers to start! [2024-09-29 18:39:59,158][16357] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-29 18:39:59,159][16353] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-29 18:39:59,159][16355] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-29 18:39:59,152][16351] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-29 18:39:59,155][16352] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-29 18:39:59,161][16349] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-29 18:39:59,164][16356] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-29 18:39:59,163][16354] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-29 18:40:00,362][16351] Decorrelating experience for 0 frames... [2024-09-29 18:40:00,358][16353] Decorrelating experience for 0 frames... [2024-09-29 18:40:00,362][16357] Decorrelating experience for 0 frames... [2024-09-29 18:40:00,368][16349] Decorrelating experience for 0 frames... [2024-09-29 18:40:00,371][16352] Decorrelating experience for 0 frames... [2024-09-29 18:40:00,378][16354] Decorrelating experience for 0 frames... [2024-09-29 18:40:01,478][16353] Decorrelating experience for 32 frames... [2024-09-29 18:40:01,478][16355] Decorrelating experience for 0 frames... [2024-09-29 18:40:01,481][16357] Decorrelating experience for 32 frames... [2024-09-29 18:40:01,488][16352] Decorrelating experience for 32 frames... [2024-09-29 18:40:01,497][16354] Decorrelating experience for 32 frames... [2024-09-29 18:40:01,584][16356] Decorrelating experience for 0 frames... [2024-09-29 18:40:02,176][00189] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 6004736. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-09-29 18:40:02,378][16355] Decorrelating experience for 32 frames... [2024-09-29 18:40:02,386][16351] Decorrelating experience for 32 frames... [2024-09-29 18:40:02,718][16349] Decorrelating experience for 32 frames... [2024-09-29 18:40:02,783][16356] Decorrelating experience for 32 frames... [2024-09-29 18:40:02,869][16354] Decorrelating experience for 64 frames... [2024-09-29 18:40:02,926][16355] Decorrelating experience for 64 frames... [2024-09-29 18:40:03,889][16357] Decorrelating experience for 64 frames... [2024-09-29 18:40:04,081][16349] Decorrelating experience for 64 frames... [2024-09-29 18:40:04,105][16351] Decorrelating experience for 64 frames... [2024-09-29 18:40:04,134][16356] Decorrelating experience for 64 frames... [2024-09-29 18:40:04,148][16355] Decorrelating experience for 96 frames... [2024-09-29 18:40:04,204][16354] Decorrelating experience for 96 frames... [2024-09-29 18:40:04,590][00189] Heartbeat connected on Batcher_0 [2024-09-29 18:40:04,593][00189] Heartbeat connected on LearnerWorker_p0 [2024-09-29 18:40:04,625][00189] Heartbeat connected on RolloutWorker_w5 [2024-09-29 18:40:04,627][00189] Heartbeat connected on RolloutWorker_w4 [2024-09-29 18:40:05,339][00189] Heartbeat connected on InferenceWorker_p0-w0 [2024-09-29 18:40:05,807][16357] Decorrelating experience for 96 frames... [2024-09-29 18:40:06,121][00189] Heartbeat connected on RolloutWorker_w7 [2024-09-29 18:40:06,221][16351] Decorrelating experience for 96 frames... [2024-09-29 18:40:06,567][16352] Decorrelating experience for 64 frames... [2024-09-29 18:40:06,758][00189] Heartbeat connected on RolloutWorker_w1 [2024-09-29 18:40:06,989][16349] Decorrelating experience for 96 frames... [2024-09-29 18:40:07,074][16356] Decorrelating experience for 96 frames... [2024-09-29 18:40:07,176][00189] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 6004736. Throughput: 0: 2.4. Samples: 12. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-09-29 18:40:07,181][00189] Avg episode reward: [(0, '1.570')] [2024-09-29 18:40:07,264][00189] Heartbeat connected on RolloutWorker_w0 [2024-09-29 18:40:07,360][00189] Heartbeat connected on RolloutWorker_w6 [2024-09-29 18:40:10,572][16336] Signal inference workers to stop experience collection... [2024-09-29 18:40:10,581][16350] InferenceWorker_p0-w0: stopping experience collection [2024-09-29 18:40:10,609][16352] Decorrelating experience for 96 frames... [2024-09-29 18:40:10,745][16353] Decorrelating experience for 64 frames... [2024-09-29 18:40:11,006][00189] Heartbeat connected on RolloutWorker_w2 [2024-09-29 18:40:11,150][16336] Signal inference workers to resume experience collection... [2024-09-29 18:40:11,154][16350] InferenceWorker_p0-w0: resuming experience collection [2024-09-29 18:40:12,177][00189] Fps is (10 sec: 409.5, 60 sec: 409.5, 300 sec: 409.5). Total num frames: 6008832. Throughput: 0: 240.8. Samples: 2408. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-09-29 18:40:12,184][00189] Avg episode reward: [(0, '5.276')] [2024-09-29 18:40:13,826][16353] Decorrelating experience for 96 frames... [2024-09-29 18:40:14,554][00189] Heartbeat connected on RolloutWorker_w3 [2024-09-29 18:40:17,176][00189] Fps is (10 sec: 2457.6, 60 sec: 1638.4, 300 sec: 1638.4). Total num frames: 6029312. Throughput: 0: 440.5. Samples: 6608. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 18:40:17,178][00189] Avg episode reward: [(0, '8.379')] [2024-09-29 18:40:20,631][16350] Updated weights for policy 0, policy_version 1476 (0.0021) [2024-09-29 18:40:22,176][00189] Fps is (10 sec: 4096.5, 60 sec: 2252.8, 300 sec: 2252.8). Total num frames: 6049792. Throughput: 0: 503.5. Samples: 10070. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-29 18:40:22,178][00189] Avg episode reward: [(0, '15.457')] [2024-09-29 18:40:27,176][00189] Fps is (10 sec: 4096.0, 60 sec: 2621.4, 300 sec: 2621.4). Total num frames: 6070272. Throughput: 0: 646.3. Samples: 16158. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-29 18:40:27,179][00189] Avg episode reward: [(0, '19.823')] [2024-09-29 18:40:32,176][00189] Fps is (10 sec: 3276.8, 60 sec: 2594.1, 300 sec: 2594.1). Total num frames: 6082560. Throughput: 0: 679.6. Samples: 20388. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-29 18:40:32,178][00189] Avg episode reward: [(0, '21.571')] [2024-09-29 18:40:32,441][16350] Updated weights for policy 0, policy_version 1486 (0.0012) [2024-09-29 18:40:37,176][00189] Fps is (10 sec: 3686.5, 60 sec: 2925.7, 300 sec: 2925.7). Total num frames: 6107136. Throughput: 0: 682.7. Samples: 23894. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-29 18:40:37,179][00189] Avg episode reward: [(0, '22.501')] [2024-09-29 18:40:41,498][16350] Updated weights for policy 0, policy_version 1496 (0.0012) [2024-09-29 18:40:42,176][00189] Fps is (10 sec: 4505.6, 60 sec: 3072.0, 300 sec: 3072.0). Total num frames: 6127616. Throughput: 0: 765.6. Samples: 30624. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-29 18:40:42,183][00189] Avg episode reward: [(0, '25.114')] [2024-09-29 18:40:47,176][00189] Fps is (10 sec: 3276.8, 60 sec: 3003.7, 300 sec: 3003.7). Total num frames: 6139904. Throughput: 0: 783.4. Samples: 35252. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-29 18:40:47,183][00189] Avg episode reward: [(0, '26.556')] [2024-09-29 18:40:52,176][00189] Fps is (10 sec: 3276.8, 60 sec: 3113.0, 300 sec: 3113.0). Total num frames: 6160384. Throughput: 0: 844.5. Samples: 38016. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-29 18:40:52,184][00189] Avg episode reward: [(0, '26.885')] [2024-09-29 18:40:53,248][16350] Updated weights for policy 0, policy_version 1506 (0.0019) [2024-09-29 18:40:57,176][00189] Fps is (10 sec: 4505.5, 60 sec: 3276.8, 300 sec: 3276.8). Total num frames: 6184960. Throughput: 0: 943.4. Samples: 44860. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 18:40:57,179][00189] Avg episode reward: [(0, '27.305')] [2024-09-29 18:41:02,176][00189] Fps is (10 sec: 4096.0, 60 sec: 3276.8, 300 sec: 3276.8). Total num frames: 6201344. Throughput: 0: 969.9. Samples: 50254. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-29 18:41:02,179][00189] Avg episode reward: [(0, '27.355')] [2024-09-29 18:41:04,167][16350] Updated weights for policy 0, policy_version 1516 (0.0017) [2024-09-29 18:41:07,176][00189] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3276.8). Total num frames: 6217728. Throughput: 0: 938.7. Samples: 52310. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-29 18:41:07,178][00189] Avg episode reward: [(0, '27.007')] [2024-09-29 18:41:12,176][00189] Fps is (10 sec: 3686.4, 60 sec: 3823.0, 300 sec: 3335.3). Total num frames: 6238208. Throughput: 0: 944.1. Samples: 58644. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 18:41:12,182][00189] Avg episode reward: [(0, '25.870')] [2024-09-29 18:41:14,033][16350] Updated weights for policy 0, policy_version 1526 (0.0013) [2024-09-29 18:41:17,176][00189] Fps is (10 sec: 4505.7, 60 sec: 3891.2, 300 sec: 3440.6). Total num frames: 6262784. Throughput: 0: 993.3. Samples: 65088. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 18:41:17,179][00189] Avg episode reward: [(0, '25.626')] [2024-09-29 18:41:22,176][00189] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3379.2). Total num frames: 6275072. Throughput: 0: 961.3. Samples: 67152. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 18:41:22,180][00189] Avg episode reward: [(0, '26.808')] [2024-09-29 18:41:25,624][16350] Updated weights for policy 0, policy_version 1536 (0.0017) [2024-09-29 18:41:27,176][00189] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3421.4). Total num frames: 6295552. Throughput: 0: 937.2. Samples: 72796. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 18:41:27,182][00189] Avg episode reward: [(0, '26.159')] [2024-09-29 18:41:32,176][00189] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3504.4). Total num frames: 6320128. Throughput: 0: 990.0. Samples: 79800. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 18:41:32,181][00189] Avg episode reward: [(0, '26.406')] [2024-09-29 18:41:35,059][16350] Updated weights for policy 0, policy_version 1546 (0.0013) [2024-09-29 18:41:37,178][00189] Fps is (10 sec: 4095.1, 60 sec: 3822.8, 300 sec: 3492.3). Total num frames: 6336512. Throughput: 0: 988.2. Samples: 82486. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 18:41:37,181][00189] Avg episode reward: [(0, '25.943')] [2024-09-29 18:41:42,176][00189] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3481.6). Total num frames: 6352896. Throughput: 0: 934.0. Samples: 86892. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-29 18:41:42,180][00189] Avg episode reward: [(0, '25.743')] [2024-09-29 18:41:42,187][16336] Saving /content/train_dir/samplefactory-vizdoom-v1/checkpoint_p0/checkpoint_000001551_6352896.pth... [2024-09-29 18:41:42,299][16336] Removing /content/train_dir/samplefactory-vizdoom-v1/checkpoint_p0/checkpoint_000001399_5730304.pth [2024-09-29 18:41:46,187][16350] Updated weights for policy 0, policy_version 1556 (0.0015) [2024-09-29 18:41:47,176][00189] Fps is (10 sec: 4096.9, 60 sec: 3959.5, 300 sec: 3549.9). Total num frames: 6377472. Throughput: 0: 966.3. Samples: 93738. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-29 18:41:47,182][00189] Avg episode reward: [(0, '26.284')] [2024-09-29 18:41:52,176][00189] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3537.5). Total num frames: 6393856. Throughput: 0: 995.9. Samples: 97124. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 18:41:52,178][00189] Avg episode reward: [(0, '25.524')] [2024-09-29 18:41:57,179][00189] Fps is (10 sec: 3275.7, 60 sec: 3754.5, 300 sec: 3526.0). Total num frames: 6410240. Throughput: 0: 953.9. Samples: 101574. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-29 18:41:57,185][00189] Avg episode reward: [(0, '24.702')] [2024-09-29 18:41:57,990][16350] Updated weights for policy 0, policy_version 1566 (0.0012) [2024-09-29 18:42:02,176][00189] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3549.9). Total num frames: 6430720. Throughput: 0: 944.2. Samples: 107576. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 18:42:02,179][00189] Avg episode reward: [(0, '23.693')] [2024-09-29 18:42:06,934][16350] Updated weights for policy 0, policy_version 1576 (0.0012) [2024-09-29 18:42:07,176][00189] Fps is (10 sec: 4507.0, 60 sec: 3959.5, 300 sec: 3604.5). Total num frames: 6455296. Throughput: 0: 974.9. Samples: 111024. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-29 18:42:07,179][00189] Avg episode reward: [(0, '24.300')] [2024-09-29 18:42:12,176][00189] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3560.4). Total num frames: 6467584. Throughput: 0: 967.3. Samples: 116326. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-09-29 18:42:12,181][00189] Avg episode reward: [(0, '25.213')] [2024-09-29 18:42:17,176][00189] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3580.2). Total num frames: 6488064. Throughput: 0: 922.8. Samples: 121326. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-29 18:42:17,179][00189] Avg episode reward: [(0, '25.454')] [2024-09-29 18:42:18,749][16350] Updated weights for policy 0, policy_version 1586 (0.0028) [2024-09-29 18:42:22,176][00189] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3598.6). Total num frames: 6508544. Throughput: 0: 940.5. Samples: 124808. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-29 18:42:22,178][00189] Avg episode reward: [(0, '24.257')] [2024-09-29 18:42:27,176][00189] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3615.8). Total num frames: 6529024. Throughput: 0: 988.8. Samples: 131390. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 18:42:27,182][00189] Avg episode reward: [(0, '24.903')] [2024-09-29 18:42:29,290][16350] Updated weights for policy 0, policy_version 1596 (0.0025) [2024-09-29 18:42:32,177][00189] Fps is (10 sec: 3686.2, 60 sec: 3754.6, 300 sec: 3604.5). Total num frames: 6545408. Throughput: 0: 933.2. Samples: 135734. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 18:42:32,185][00189] Avg episode reward: [(0, '25.949')] [2024-09-29 18:42:37,176][00189] Fps is (10 sec: 3686.4, 60 sec: 3823.1, 300 sec: 3620.3). Total num frames: 6565888. Throughput: 0: 929.9. Samples: 138968. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 18:42:37,179][00189] Avg episode reward: [(0, '26.631')] [2024-09-29 18:42:39,274][16350] Updated weights for policy 0, policy_version 1606 (0.0013) [2024-09-29 18:42:42,176][00189] Fps is (10 sec: 4505.8, 60 sec: 3959.5, 300 sec: 3660.8). Total num frames: 6590464. Throughput: 0: 983.4. Samples: 145826. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 18:42:42,181][00189] Avg episode reward: [(0, '25.265')] [2024-09-29 18:42:47,176][00189] Fps is (10 sec: 3686.3, 60 sec: 3754.7, 300 sec: 3624.3). Total num frames: 6602752. Throughput: 0: 957.5. Samples: 150664. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-29 18:42:47,178][00189] Avg episode reward: [(0, '24.282')] [2024-09-29 18:42:51,000][16350] Updated weights for policy 0, policy_version 1616 (0.0012) [2024-09-29 18:42:52,176][00189] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3638.2). Total num frames: 6623232. Throughput: 0: 932.9. Samples: 153004. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-29 18:42:52,181][00189] Avg episode reward: [(0, '24.907')] [2024-09-29 18:42:57,176][00189] Fps is (10 sec: 4096.0, 60 sec: 3891.4, 300 sec: 3651.3). Total num frames: 6643712. Throughput: 0: 972.1. Samples: 160072. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-09-29 18:42:57,179][00189] Avg episode reward: [(0, '24.744')] [2024-09-29 18:43:00,031][16350] Updated weights for policy 0, policy_version 1626 (0.0021) [2024-09-29 18:43:02,177][00189] Fps is (10 sec: 4095.8, 60 sec: 3891.2, 300 sec: 3663.6). Total num frames: 6664192. Throughput: 0: 992.8. Samples: 166004. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-29 18:43:02,179][00189] Avg episode reward: [(0, '24.818')] [2024-09-29 18:43:07,176][00189] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3631.0). Total num frames: 6676480. Throughput: 0: 961.1. Samples: 168056. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 18:43:07,182][00189] Avg episode reward: [(0, '24.585')] [2024-09-29 18:43:11,689][16350] Updated weights for policy 0, policy_version 1636 (0.0025) [2024-09-29 18:43:12,176][00189] Fps is (10 sec: 3686.6, 60 sec: 3891.2, 300 sec: 3664.8). Total num frames: 6701056. Throughput: 0: 944.9. Samples: 173910. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-29 18:43:12,183][00189] Avg episode reward: [(0, '24.946')] [2024-09-29 18:43:17,176][00189] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3675.9). Total num frames: 6721536. Throughput: 0: 997.0. Samples: 180600. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-29 18:43:17,183][00189] Avg episode reward: [(0, '26.794')] [2024-09-29 18:43:22,176][00189] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3665.9). Total num frames: 6737920. Throughput: 0: 972.2. Samples: 182716. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 18:43:22,180][00189] Avg episode reward: [(0, '26.784')] [2024-09-29 18:43:23,269][16350] Updated weights for policy 0, policy_version 1646 (0.0023) [2024-09-29 18:43:27,176][00189] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3656.4). Total num frames: 6754304. Throughput: 0: 930.7. Samples: 187706. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 18:43:27,183][00189] Avg episode reward: [(0, '27.424')] [2024-09-29 18:43:32,176][00189] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3686.4). Total num frames: 6778880. Throughput: 0: 977.5. Samples: 194652. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 18:43:32,178][00189] Avg episode reward: [(0, '27.413')] [2024-09-29 18:43:32,584][16350] Updated weights for policy 0, policy_version 1656 (0.0019) [2024-09-29 18:43:37,176][00189] Fps is (10 sec: 4096.1, 60 sec: 3822.9, 300 sec: 3676.9). Total num frames: 6795264. Throughput: 0: 993.7. Samples: 197720. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 18:43:37,180][00189] Avg episode reward: [(0, '28.565')] [2024-09-29 18:43:42,176][00189] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3667.8). Total num frames: 6811648. Throughput: 0: 930.8. Samples: 201960. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-29 18:43:42,178][00189] Avg episode reward: [(0, '28.174')] [2024-09-29 18:43:42,186][16336] Saving /content/train_dir/samplefactory-vizdoom-v1/checkpoint_p0/checkpoint_000001663_6811648.pth... [2024-09-29 18:43:42,310][16336] Removing /content/train_dir/samplefactory-vizdoom-v1/checkpoint_p0/checkpoint_000001466_6004736.pth [2024-09-29 18:43:44,371][16350] Updated weights for policy 0, policy_version 1666 (0.0019) [2024-09-29 18:43:47,176][00189] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3695.5). Total num frames: 6836224. Throughput: 0: 945.3. Samples: 208542. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 18:43:47,179][00189] Avg episode reward: [(0, '26.709')] [2024-09-29 18:43:52,176][00189] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3704.2). Total num frames: 6856704. Throughput: 0: 976.5. Samples: 211998. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-29 18:43:52,178][00189] Avg episode reward: [(0, '26.484')] [2024-09-29 18:43:53,940][16350] Updated weights for policy 0, policy_version 1676 (0.0019) [2024-09-29 18:43:57,179][00189] Fps is (10 sec: 3685.2, 60 sec: 3822.7, 300 sec: 3695.1). Total num frames: 6873088. Throughput: 0: 960.6. Samples: 217142. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-29 18:43:57,182][00189] Avg episode reward: [(0, '25.617')] [2024-09-29 18:44:02,176][00189] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3686.4). Total num frames: 6889472. Throughput: 0: 937.4. Samples: 222782. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-09-29 18:44:02,183][00189] Avg episode reward: [(0, '25.001')] [2024-09-29 18:44:04,797][16350] Updated weights for policy 0, policy_version 1686 (0.0021) [2024-09-29 18:44:07,176][00189] Fps is (10 sec: 4097.3, 60 sec: 3959.5, 300 sec: 3711.5). Total num frames: 6914048. Throughput: 0: 968.8. Samples: 226314. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 18:44:07,183][00189] Avg episode reward: [(0, '24.754')] [2024-09-29 18:44:12,176][00189] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3719.2). Total num frames: 6934528. Throughput: 0: 995.8. Samples: 232516. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 18:44:12,180][00189] Avg episode reward: [(0, '24.552')] [2024-09-29 18:44:16,029][16350] Updated weights for policy 0, policy_version 1696 (0.0012) [2024-09-29 18:44:17,176][00189] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3710.5). Total num frames: 6950912. Throughput: 0: 943.8. Samples: 237124. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-29 18:44:17,178][00189] Avg episode reward: [(0, '24.193')] [2024-09-29 18:44:22,176][00189] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3717.9). Total num frames: 6971392. Throughput: 0: 953.7. Samples: 240638. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 18:44:22,178][00189] Avg episode reward: [(0, '24.408')] [2024-09-29 18:44:25,017][16350] Updated weights for policy 0, policy_version 1706 (0.0015) [2024-09-29 18:44:27,176][00189] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3740.5). Total num frames: 6995968. Throughput: 0: 1016.0. Samples: 247682. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 18:44:27,178][00189] Avg episode reward: [(0, '23.594')] [2024-09-29 18:44:32,176][00189] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3716.7). Total num frames: 7008256. Throughput: 0: 972.3. Samples: 252294. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-29 18:44:32,181][00189] Avg episode reward: [(0, '22.724')] [2024-09-29 18:44:36,527][16350] Updated weights for policy 0, policy_version 1716 (0.0022) [2024-09-29 18:44:37,176][00189] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3723.6). Total num frames: 7028736. Throughput: 0: 954.0. Samples: 254926. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-29 18:44:37,179][00189] Avg episode reward: [(0, '22.434')] [2024-09-29 18:44:42,176][00189] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3744.9). Total num frames: 7053312. Throughput: 0: 998.2. Samples: 262056. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 18:44:42,178][00189] Avg episode reward: [(0, '23.880')] [2024-09-29 18:44:46,163][16350] Updated weights for policy 0, policy_version 1726 (0.0017) [2024-09-29 18:44:47,176][00189] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3736.7). Total num frames: 7069696. Throughput: 0: 994.4. Samples: 267530. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-29 18:44:47,178][00189] Avg episode reward: [(0, '24.124')] [2024-09-29 18:44:52,176][00189] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3728.8). Total num frames: 7086080. Throughput: 0: 964.4. Samples: 269714. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-29 18:44:52,183][00189] Avg episode reward: [(0, '24.343')] [2024-09-29 18:44:56,986][16350] Updated weights for policy 0, policy_version 1736 (0.0012) [2024-09-29 18:44:57,176][00189] Fps is (10 sec: 4096.1, 60 sec: 3959.7, 300 sec: 3748.9). Total num frames: 7110656. Throughput: 0: 967.4. Samples: 276050. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 18:44:57,179][00189] Avg episode reward: [(0, '25.424')] [2024-09-29 18:45:02,182][00189] Fps is (10 sec: 4502.8, 60 sec: 4027.3, 300 sec: 3818.2). Total num frames: 7131136. Throughput: 0: 1007.6. Samples: 282474. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-29 18:45:02,186][00189] Avg episode reward: [(0, '25.404')] [2024-09-29 18:45:07,184][00189] Fps is (10 sec: 3276.1, 60 sec: 3822.8, 300 sec: 3846.1). Total num frames: 7143424. Throughput: 0: 975.2. Samples: 284522. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 18:45:07,189][00189] Avg episode reward: [(0, '26.140')] [2024-09-29 18:45:08,896][16350] Updated weights for policy 0, policy_version 1746 (0.0021) [2024-09-29 18:45:12,176][00189] Fps is (10 sec: 3278.8, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 7163904. Throughput: 0: 936.9. Samples: 289844. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 18:45:12,179][00189] Avg episode reward: [(0, '25.827')] [2024-09-29 18:45:17,176][00189] Fps is (10 sec: 4506.6, 60 sec: 3959.5, 300 sec: 3860.0). Total num frames: 7188480. Throughput: 0: 987.4. Samples: 296726. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 18:45:17,178][00189] Avg episode reward: [(0, '25.264')] [2024-09-29 18:45:17,837][16350] Updated weights for policy 0, policy_version 1756 (0.0021) [2024-09-29 18:45:22,176][00189] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 7204864. Throughput: 0: 991.7. Samples: 299554. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 18:45:22,179][00189] Avg episode reward: [(0, '26.815')] [2024-09-29 18:45:27,176][00189] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3860.0). Total num frames: 7221248. Throughput: 0: 936.6. Samples: 304202. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 18:45:27,178][00189] Avg episode reward: [(0, '26.314')] [2024-09-29 18:45:29,259][16350] Updated weights for policy 0, policy_version 1766 (0.0012) [2024-09-29 18:45:32,176][00189] Fps is (10 sec: 4096.1, 60 sec: 3959.5, 300 sec: 3860.0). Total num frames: 7245824. Throughput: 0: 968.3. Samples: 311104. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 18:45:32,178][00189] Avg episode reward: [(0, '28.272')] [2024-09-29 18:45:37,182][00189] Fps is (10 sec: 4502.8, 60 sec: 3959.1, 300 sec: 3859.9). Total num frames: 7266304. Throughput: 0: 995.3. Samples: 314508. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 18:45:37,185][00189] Avg episode reward: [(0, '27.641')] [2024-09-29 18:45:39,783][16350] Updated weights for policy 0, policy_version 1776 (0.0012) [2024-09-29 18:45:42,178][00189] Fps is (10 sec: 3276.1, 60 sec: 3754.5, 300 sec: 3859.9). Total num frames: 7278592. Throughput: 0: 949.0. Samples: 318758. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 18:45:42,181][00189] Avg episode reward: [(0, '27.163')] [2024-09-29 18:45:42,195][16336] Saving /content/train_dir/samplefactory-vizdoom-v1/checkpoint_p0/checkpoint_000001777_7278592.pth... [2024-09-29 18:45:42,380][16336] Removing /content/train_dir/samplefactory-vizdoom-v1/checkpoint_p0/checkpoint_000001551_6352896.pth [2024-09-29 18:45:47,176][00189] Fps is (10 sec: 3278.7, 60 sec: 3822.9, 300 sec: 3860.0). Total num frames: 7299072. Throughput: 0: 936.6. Samples: 324616. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-29 18:45:47,179][00189] Avg episode reward: [(0, '27.220')] [2024-09-29 18:45:50,275][16350] Updated weights for policy 0, policy_version 1786 (0.0012) [2024-09-29 18:45:52,180][00189] Fps is (10 sec: 4504.9, 60 sec: 3959.2, 300 sec: 3859.9). Total num frames: 7323648. Throughput: 0: 968.3. Samples: 328098. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-29 18:45:52,182][00189] Avg episode reward: [(0, '26.780')] [2024-09-29 18:45:57,177][00189] Fps is (10 sec: 3686.2, 60 sec: 3754.6, 300 sec: 3846.1). Total num frames: 7335936. Throughput: 0: 970.3. Samples: 333510. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 18:45:57,179][00189] Avg episode reward: [(0, '26.489')] [2024-09-29 18:46:02,068][16350] Updated weights for policy 0, policy_version 1796 (0.0017) [2024-09-29 18:46:02,176][00189] Fps is (10 sec: 3278.0, 60 sec: 3755.0, 300 sec: 3860.0). Total num frames: 7356416. Throughput: 0: 929.2. Samples: 338540. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 18:46:02,179][00189] Avg episode reward: [(0, '27.219')] [2024-09-29 18:46:07,176][00189] Fps is (10 sec: 4096.3, 60 sec: 3891.3, 300 sec: 3860.0). Total num frames: 7376896. Throughput: 0: 941.6. Samples: 341926. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-29 18:46:07,180][00189] Avg episode reward: [(0, '26.938')] [2024-09-29 18:46:11,075][16350] Updated weights for policy 0, policy_version 1806 (0.0016) [2024-09-29 18:46:12,176][00189] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 7397376. Throughput: 0: 988.4. Samples: 348678. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 18:46:12,182][00189] Avg episode reward: [(0, '27.717')] [2024-09-29 18:46:17,176][00189] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3846.1). Total num frames: 7409664. Throughput: 0: 927.5. Samples: 352840. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 18:46:17,184][00189] Avg episode reward: [(0, '27.802')] [2024-09-29 18:46:22,176][00189] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3860.0). Total num frames: 7434240. Throughput: 0: 922.3. Samples: 356008. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-09-29 18:46:22,181][00189] Avg episode reward: [(0, '27.819')] [2024-09-29 18:46:22,864][16350] Updated weights for policy 0, policy_version 1816 (0.0023) [2024-09-29 18:46:27,176][00189] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 7454720. Throughput: 0: 982.2. Samples: 362954. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-29 18:46:27,180][00189] Avg episode reward: [(0, '30.037')] [2024-09-29 18:46:32,190][00189] Fps is (10 sec: 4090.3, 60 sec: 3822.0, 300 sec: 3859.8). Total num frames: 7475200. Throughput: 0: 964.6. Samples: 368036. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-29 18:46:32,195][00189] Avg episode reward: [(0, '29.792')] [2024-09-29 18:46:33,595][16350] Updated weights for policy 0, policy_version 1826 (0.0013) [2024-09-29 18:46:37,176][00189] Fps is (10 sec: 3686.4, 60 sec: 3755.0, 300 sec: 3860.0). Total num frames: 7491584. Throughput: 0: 937.1. Samples: 370262. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-29 18:46:37,178][00189] Avg episode reward: [(0, '27.544')] [2024-09-29 18:46:42,176][00189] Fps is (10 sec: 4101.7, 60 sec: 3959.6, 300 sec: 3860.0). Total num frames: 7516160. Throughput: 0: 974.3. Samples: 377352. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 18:46:42,183][00189] Avg episode reward: [(0, '26.265')] [2024-09-29 18:46:42,998][16350] Updated weights for policy 0, policy_version 1836 (0.0013) [2024-09-29 18:46:47,178][00189] Fps is (10 sec: 4095.3, 60 sec: 3891.1, 300 sec: 3859.9). Total num frames: 7532544. Throughput: 0: 997.1. Samples: 383410. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-29 18:46:47,184][00189] Avg episode reward: [(0, '27.717')] [2024-09-29 18:46:52,176][00189] Fps is (10 sec: 3276.8, 60 sec: 3754.9, 300 sec: 3860.0). Total num frames: 7548928. Throughput: 0: 968.3. Samples: 385498. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 18:46:52,181][00189] Avg episode reward: [(0, '26.228')] [2024-09-29 18:46:54,750][16350] Updated weights for policy 0, policy_version 1846 (0.0012) [2024-09-29 18:46:57,176][00189] Fps is (10 sec: 3687.0, 60 sec: 3891.3, 300 sec: 3860.0). Total num frames: 7569408. Throughput: 0: 948.2. Samples: 391346. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 18:46:57,180][00189] Avg episode reward: [(0, '24.913')] [2024-09-29 18:47:02,176][00189] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3860.0). Total num frames: 7593984. Throughput: 0: 1006.8. Samples: 398144. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 18:47:02,179][00189] Avg episode reward: [(0, '25.178')] [2024-09-29 18:47:04,570][16350] Updated weights for policy 0, policy_version 1856 (0.0023) [2024-09-29 18:47:07,176][00189] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3860.0). Total num frames: 7606272. Throughput: 0: 986.8. Samples: 400414. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 18:47:07,181][00189] Avg episode reward: [(0, '25.674')] [2024-09-29 18:47:12,176][00189] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3860.0). Total num frames: 7626752. Throughput: 0: 939.4. Samples: 405226. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-29 18:47:12,183][00189] Avg episode reward: [(0, '26.403')] [2024-09-29 18:47:15,494][16350] Updated weights for policy 0, policy_version 1866 (0.0015) [2024-09-29 18:47:17,176][00189] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3860.0). Total num frames: 7647232. Throughput: 0: 979.4. Samples: 412096. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 18:47:17,179][00189] Avg episode reward: [(0, '26.504')] [2024-09-29 18:47:22,176][00189] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 7667712. Throughput: 0: 1005.2. Samples: 415494. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 18:47:22,180][00189] Avg episode reward: [(0, '27.014')] [2024-09-29 18:47:26,629][16350] Updated weights for policy 0, policy_version 1876 (0.0020) [2024-09-29 18:47:27,179][00189] Fps is (10 sec: 3685.2, 60 sec: 3822.7, 300 sec: 3859.9). Total num frames: 7684096. Throughput: 0: 945.3. Samples: 419894. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-29 18:47:27,182][00189] Avg episode reward: [(0, '27.387')] [2024-09-29 18:47:32,176][00189] Fps is (10 sec: 3686.4, 60 sec: 3823.8, 300 sec: 3860.0). Total num frames: 7704576. Throughput: 0: 957.1. Samples: 426478. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-29 18:47:32,183][00189] Avg episode reward: [(0, '28.864')] [2024-09-29 18:47:35,734][16350] Updated weights for policy 0, policy_version 1886 (0.0012) [2024-09-29 18:47:37,176][00189] Fps is (10 sec: 4507.1, 60 sec: 3959.5, 300 sec: 3860.0). Total num frames: 7729152. Throughput: 0: 984.8. Samples: 429816. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-29 18:47:37,184][00189] Avg episode reward: [(0, '27.708')] [2024-09-29 18:47:42,176][00189] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3860.0). Total num frames: 7741440. Throughput: 0: 965.1. Samples: 434776. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-29 18:47:42,183][00189] Avg episode reward: [(0, '27.442')] [2024-09-29 18:47:42,280][16336] Saving /content/train_dir/samplefactory-vizdoom-v1/checkpoint_p0/checkpoint_000001891_7745536.pth... [2024-09-29 18:47:42,445][16336] Removing /content/train_dir/samplefactory-vizdoom-v1/checkpoint_p0/checkpoint_000001663_6811648.pth [2024-09-29 18:47:47,176][00189] Fps is (10 sec: 3276.8, 60 sec: 3823.0, 300 sec: 3860.0). Total num frames: 7761920. Throughput: 0: 926.6. Samples: 439842. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-29 18:47:47,182][00189] Avg episode reward: [(0, '28.329')] [2024-09-29 18:47:48,032][16350] Updated weights for policy 0, policy_version 1896 (0.0013) [2024-09-29 18:47:52,176][00189] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 7782400. Throughput: 0: 950.1. Samples: 443170. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 18:47:52,183][00189] Avg episode reward: [(0, '27.930')] [2024-09-29 18:47:57,176][00189] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 7798784. Throughput: 0: 980.0. Samples: 449326. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 18:47:57,183][00189] Avg episode reward: [(0, '27.138')] [2024-09-29 18:47:58,847][16350] Updated weights for policy 0, policy_version 1906 (0.0022) [2024-09-29 18:48:02,178][00189] Fps is (10 sec: 3276.1, 60 sec: 3686.3, 300 sec: 3859.9). Total num frames: 7815168. Throughput: 0: 919.1. Samples: 453456. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-29 18:48:02,186][00189] Avg episode reward: [(0, '26.849')] [2024-09-29 18:48:07,176][00189] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 7835648. Throughput: 0: 915.9. Samples: 456710. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-29 18:48:07,178][00189] Avg episode reward: [(0, '27.335')] [2024-09-29 18:48:09,272][16350] Updated weights for policy 0, policy_version 1916 (0.0012) [2024-09-29 18:48:12,176][00189] Fps is (10 sec: 4096.9, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 7856128. Throughput: 0: 966.4. Samples: 463380. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-29 18:48:12,178][00189] Avg episode reward: [(0, '27.352')] [2024-09-29 18:48:17,181][00189] Fps is (10 sec: 3684.5, 60 sec: 3754.3, 300 sec: 3846.0). Total num frames: 7872512. Throughput: 0: 912.9. Samples: 467562. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-29 18:48:17,184][00189] Avg episode reward: [(0, '27.242')] [2024-09-29 18:48:21,390][16350] Updated weights for policy 0, policy_version 1926 (0.0012) [2024-09-29 18:48:22,176][00189] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3846.1). Total num frames: 7888896. Throughput: 0: 895.6. Samples: 470118. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-29 18:48:22,184][00189] Avg episode reward: [(0, '26.358')] [2024-09-29 18:48:27,176][00189] Fps is (10 sec: 4098.1, 60 sec: 3823.1, 300 sec: 3846.1). Total num frames: 7913472. Throughput: 0: 934.8. Samples: 476842. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 18:48:27,184][00189] Avg episode reward: [(0, '26.807')] [2024-09-29 18:48:31,456][16350] Updated weights for policy 0, policy_version 1936 (0.0013) [2024-09-29 18:48:32,176][00189] Fps is (10 sec: 4095.9, 60 sec: 3754.7, 300 sec: 3846.1). Total num frames: 7929856. Throughput: 0: 943.3. Samples: 482290. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-29 18:48:32,181][00189] Avg episode reward: [(0, '26.023')] [2024-09-29 18:48:37,176][00189] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3846.1). Total num frames: 7946240. Throughput: 0: 915.5. Samples: 484368. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-29 18:48:37,183][00189] Avg episode reward: [(0, '24.966')] [2024-09-29 18:48:42,176][00189] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3832.2). Total num frames: 7966720. Throughput: 0: 921.7. Samples: 490802. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 18:48:42,182][00189] Avg episode reward: [(0, '24.764')] [2024-09-29 18:48:42,311][16350] Updated weights for policy 0, policy_version 1946 (0.0014) [2024-09-29 18:48:47,180][00189] Fps is (10 sec: 4503.7, 60 sec: 3822.7, 300 sec: 3846.0). Total num frames: 7991296. Throughput: 0: 973.1. Samples: 497248. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 18:48:47,182][00189] Avg episode reward: [(0, '23.449')] [2024-09-29 18:48:52,178][00189] Fps is (10 sec: 3685.7, 60 sec: 3686.3, 300 sec: 3832.2). Total num frames: 8003584. Throughput: 0: 948.2. Samples: 499380. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 18:48:52,182][00189] Avg episode reward: [(0, '23.100')] [2024-09-29 18:48:53,924][16350] Updated weights for policy 0, policy_version 1956 (0.0013) [2024-09-29 18:48:57,176][00189] Fps is (10 sec: 3278.1, 60 sec: 3754.7, 300 sec: 3846.1). Total num frames: 8024064. Throughput: 0: 920.3. Samples: 504792. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-29 18:48:57,183][00189] Avg episode reward: [(0, '22.223')] [2024-09-29 18:49:02,176][00189] Fps is (10 sec: 4506.3, 60 sec: 3891.3, 300 sec: 3846.1). Total num frames: 8048640. Throughput: 0: 983.3. Samples: 511806. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 18:49:02,179][00189] Avg episode reward: [(0, '21.502')] [2024-09-29 18:49:02,842][16350] Updated weights for policy 0, policy_version 1966 (0.0015) [2024-09-29 18:49:07,178][00189] Fps is (10 sec: 4095.3, 60 sec: 3822.8, 300 sec: 3832.2). Total num frames: 8065024. Throughput: 0: 989.9. Samples: 514664. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 18:49:07,181][00189] Avg episode reward: [(0, '23.871')] [2024-09-29 18:49:12,176][00189] Fps is (10 sec: 3276.9, 60 sec: 3754.7, 300 sec: 3832.2). Total num frames: 8081408. Throughput: 0: 938.7. Samples: 519084. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 18:49:12,185][00189] Avg episode reward: [(0, '24.352')] [2024-09-29 18:49:14,368][16350] Updated weights for policy 0, policy_version 1976 (0.0015) [2024-09-29 18:49:17,182][00189] Fps is (10 sec: 4094.2, 60 sec: 3891.1, 300 sec: 3846.0). Total num frames: 8105984. Throughput: 0: 969.1. Samples: 525904. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-29 18:49:17,185][00189] Avg episode reward: [(0, '25.612')] [2024-09-29 18:49:22,182][00189] Fps is (10 sec: 4502.9, 60 sec: 3959.1, 300 sec: 3832.1). Total num frames: 8126464. Throughput: 0: 999.6. Samples: 529358. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 18:49:22,185][00189] Avg episode reward: [(0, '26.076')] [2024-09-29 18:49:24,780][16350] Updated weights for policy 0, policy_version 1986 (0.0016) [2024-09-29 18:49:27,177][00189] Fps is (10 sec: 3278.4, 60 sec: 3754.6, 300 sec: 3832.2). Total num frames: 8138752. Throughput: 0: 958.3. Samples: 533926. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-29 18:49:27,186][00189] Avg episode reward: [(0, '26.721')] [2024-09-29 18:49:32,176][00189] Fps is (10 sec: 3278.7, 60 sec: 3822.9, 300 sec: 3832.2). Total num frames: 8159232. Throughput: 0: 948.4. Samples: 539922. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-29 18:49:32,185][00189] Avg episode reward: [(0, '27.974')] [2024-09-29 18:49:34,939][16350] Updated weights for policy 0, policy_version 1996 (0.0015) [2024-09-29 18:49:37,176][00189] Fps is (10 sec: 4506.1, 60 sec: 3959.5, 300 sec: 3832.2). Total num frames: 8183808. Throughput: 0: 977.4. Samples: 543362. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-29 18:49:37,183][00189] Avg episode reward: [(0, '27.640')] [2024-09-29 18:49:42,176][00189] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3832.2). Total num frames: 8200192. Throughput: 0: 981.8. Samples: 548974. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-29 18:49:42,186][00189] Avg episode reward: [(0, '27.553')] [2024-09-29 18:49:42,199][16336] Saving /content/train_dir/samplefactory-vizdoom-v1/checkpoint_p0/checkpoint_000002002_8200192.pth... [2024-09-29 18:49:42,363][16336] Removing /content/train_dir/samplefactory-vizdoom-v1/checkpoint_p0/checkpoint_000001777_7278592.pth [2024-09-29 18:49:46,892][16350] Updated weights for policy 0, policy_version 2006 (0.0034) [2024-09-29 18:49:47,177][00189] Fps is (10 sec: 3276.6, 60 sec: 3754.9, 300 sec: 3832.2). Total num frames: 8216576. Throughput: 0: 930.8. Samples: 553694. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-29 18:49:47,179][00189] Avg episode reward: [(0, '26.497')] [2024-09-29 18:49:52,176][00189] Fps is (10 sec: 4096.0, 60 sec: 3959.6, 300 sec: 3832.2). Total num frames: 8241152. Throughput: 0: 943.5. Samples: 557122. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 18:49:52,179][00189] Avg episode reward: [(0, '25.506')] [2024-09-29 18:49:55,878][16350] Updated weights for policy 0, policy_version 2016 (0.0014) [2024-09-29 18:49:57,176][00189] Fps is (10 sec: 4096.2, 60 sec: 3891.2, 300 sec: 3818.4). Total num frames: 8257536. Throughput: 0: 990.6. Samples: 563660. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-29 18:49:57,181][00189] Avg episode reward: [(0, '24.925')] [2024-09-29 18:50:02,177][00189] Fps is (10 sec: 3276.6, 60 sec: 3754.6, 300 sec: 3832.2). Total num frames: 8273920. Throughput: 0: 935.0. Samples: 567976. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-29 18:50:02,179][00189] Avg episode reward: [(0, '24.243')] [2024-09-29 18:50:07,176][00189] Fps is (10 sec: 3686.4, 60 sec: 3823.0, 300 sec: 3832.2). Total num frames: 8294400. Throughput: 0: 929.0. Samples: 571158. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-29 18:50:07,182][00189] Avg episode reward: [(0, '22.742')] [2024-09-29 18:50:07,583][16350] Updated weights for policy 0, policy_version 2026 (0.0019) [2024-09-29 18:50:12,176][00189] Fps is (10 sec: 4505.9, 60 sec: 3959.5, 300 sec: 3832.2). Total num frames: 8318976. Throughput: 0: 979.8. Samples: 578014. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-29 18:50:12,179][00189] Avg episode reward: [(0, '23.888')] [2024-09-29 18:50:17,176][00189] Fps is (10 sec: 3686.3, 60 sec: 3755.0, 300 sec: 3818.3). Total num frames: 8331264. Throughput: 0: 955.9. Samples: 582938. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-29 18:50:17,179][00189] Avg episode reward: [(0, '25.137')] [2024-09-29 18:50:18,975][16350] Updated weights for policy 0, policy_version 2036 (0.0018) [2024-09-29 18:50:22,176][00189] Fps is (10 sec: 3276.8, 60 sec: 3755.0, 300 sec: 3832.2). Total num frames: 8351744. Throughput: 0: 928.0. Samples: 585120. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 18:50:22,179][00189] Avg episode reward: [(0, '25.686')] [2024-09-29 18:50:27,176][00189] Fps is (10 sec: 4096.2, 60 sec: 3891.3, 300 sec: 3818.3). Total num frames: 8372224. Throughput: 0: 954.8. Samples: 591938. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-29 18:50:27,179][00189] Avg episode reward: [(0, '26.087')] [2024-09-29 18:50:28,226][16350] Updated weights for policy 0, policy_version 2046 (0.0020) [2024-09-29 18:50:32,176][00189] Fps is (10 sec: 4095.9, 60 sec: 3891.2, 300 sec: 3818.4). Total num frames: 8392704. Throughput: 0: 983.2. Samples: 597938. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-29 18:50:32,179][00189] Avg episode reward: [(0, '26.602')] [2024-09-29 18:50:37,176][00189] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3818.3). Total num frames: 8404992. Throughput: 0: 951.5. Samples: 599938. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-29 18:50:37,181][00189] Avg episode reward: [(0, '27.178')] [2024-09-29 18:50:40,001][16350] Updated weights for policy 0, policy_version 2056 (0.0022) [2024-09-29 18:50:42,176][00189] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3832.2). Total num frames: 8429568. Throughput: 0: 941.9. Samples: 606046. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 18:50:42,178][00189] Avg episode reward: [(0, '25.838')] [2024-09-29 18:50:47,176][00189] Fps is (10 sec: 4915.2, 60 sec: 3959.5, 300 sec: 3832.2). Total num frames: 8454144. Throughput: 0: 999.3. Samples: 612946. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 18:50:47,178][00189] Avg episode reward: [(0, '24.839')] [2024-09-29 18:50:49,866][16350] Updated weights for policy 0, policy_version 2066 (0.0014) [2024-09-29 18:50:52,176][00189] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3832.2). Total num frames: 8466432. Throughput: 0: 977.7. Samples: 615154. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 18:50:52,186][00189] Avg episode reward: [(0, '24.644')] [2024-09-29 18:50:57,176][00189] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3832.2). Total num frames: 8486912. Throughput: 0: 937.3. Samples: 620194. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 18:50:57,182][00189] Avg episode reward: [(0, '23.429')] [2024-09-29 18:51:00,507][16350] Updated weights for policy 0, policy_version 2076 (0.0017) [2024-09-29 18:51:02,176][00189] Fps is (10 sec: 4096.1, 60 sec: 3891.2, 300 sec: 3832.2). Total num frames: 8507392. Throughput: 0: 982.1. Samples: 627130. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 18:51:02,181][00189] Avg episode reward: [(0, '23.836')] [2024-09-29 18:51:07,176][00189] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3832.2). Total num frames: 8527872. Throughput: 0: 1004.8. Samples: 630338. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 18:51:07,178][00189] Avg episode reward: [(0, '25.016')] [2024-09-29 18:51:11,943][16350] Updated weights for policy 0, policy_version 2086 (0.0018) [2024-09-29 18:51:12,176][00189] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3846.1). Total num frames: 8544256. Throughput: 0: 950.5. Samples: 634710. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-29 18:51:12,180][00189] Avg episode reward: [(0, '24.708')] [2024-09-29 18:51:17,176][00189] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3832.2). Total num frames: 8564736. Throughput: 0: 964.5. Samples: 641340. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-29 18:51:17,181][00189] Avg episode reward: [(0, '26.498')] [2024-09-29 18:51:20,926][16350] Updated weights for policy 0, policy_version 2096 (0.0015) [2024-09-29 18:51:22,176][00189] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3846.1). Total num frames: 8589312. Throughput: 0: 992.4. Samples: 644596. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-29 18:51:22,180][00189] Avg episode reward: [(0, '26.574')] [2024-09-29 18:51:27,176][00189] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3818.5). Total num frames: 8601600. Throughput: 0: 970.1. Samples: 649700. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 18:51:27,183][00189] Avg episode reward: [(0, '27.274')] [2024-09-29 18:51:32,176][00189] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3832.2). Total num frames: 8622080. Throughput: 0: 941.3. Samples: 655306. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 18:51:32,181][00189] Avg episode reward: [(0, '27.931')] [2024-09-29 18:51:32,609][16350] Updated weights for policy 0, policy_version 2106 (0.0026) [2024-09-29 18:51:37,176][00189] Fps is (10 sec: 4505.5, 60 sec: 4027.7, 300 sec: 3832.2). Total num frames: 8646656. Throughput: 0: 970.4. Samples: 658824. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 18:51:37,182][00189] Avg episode reward: [(0, '27.273')] [2024-09-29 18:51:42,176][00189] Fps is (10 sec: 4095.9, 60 sec: 3891.2, 300 sec: 3832.2). Total num frames: 8663040. Throughput: 0: 993.8. Samples: 664914. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-29 18:51:42,179][00189] Avg episode reward: [(0, '27.293')] [2024-09-29 18:51:42,190][16336] Saving /content/train_dir/samplefactory-vizdoom-v1/checkpoint_p0/checkpoint_000002115_8663040.pth... [2024-09-29 18:51:42,353][16336] Removing /content/train_dir/samplefactory-vizdoom-v1/checkpoint_p0/checkpoint_000001891_7745536.pth [2024-09-29 18:51:43,085][16350] Updated weights for policy 0, policy_version 2116 (0.0013) [2024-09-29 18:51:47,176][00189] Fps is (10 sec: 3276.9, 60 sec: 3754.7, 300 sec: 3832.2). Total num frames: 8679424. Throughput: 0: 938.9. Samples: 669380. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-29 18:51:47,183][00189] Avg episode reward: [(0, '27.672')] [2024-09-29 18:51:52,176][00189] Fps is (10 sec: 3686.5, 60 sec: 3891.2, 300 sec: 3832.2). Total num frames: 8699904. Throughput: 0: 941.9. Samples: 672724. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 18:51:52,178][00189] Avg episode reward: [(0, '27.973')] [2024-09-29 18:51:53,260][16350] Updated weights for policy 0, policy_version 2126 (0.0020) [2024-09-29 18:51:57,182][00189] Fps is (10 sec: 4502.8, 60 sec: 3959.1, 300 sec: 3832.1). Total num frames: 8724480. Throughput: 0: 998.5. Samples: 679650. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 18:51:57,191][00189] Avg episode reward: [(0, '27.665')] [2024-09-29 18:52:02,179][00189] Fps is (10 sec: 3685.2, 60 sec: 3822.7, 300 sec: 3832.1). Total num frames: 8736768. Throughput: 0: 949.0. Samples: 684050. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 18:52:02,182][00189] Avg episode reward: [(0, '28.233')] [2024-09-29 18:52:04,761][16350] Updated weights for policy 0, policy_version 2136 (0.0014) [2024-09-29 18:52:07,176][00189] Fps is (10 sec: 3278.9, 60 sec: 3822.9, 300 sec: 3832.2). Total num frames: 8757248. Throughput: 0: 940.8. Samples: 686930. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 18:52:07,179][00189] Avg episode reward: [(0, '27.358')] [2024-09-29 18:52:12,176][00189] Fps is (10 sec: 4507.0, 60 sec: 3959.5, 300 sec: 3846.1). Total num frames: 8781824. Throughput: 0: 981.6. Samples: 693872. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 18:52:12,178][00189] Avg episode reward: [(0, '26.396')] [2024-09-29 18:52:13,987][16350] Updated weights for policy 0, policy_version 2146 (0.0014) [2024-09-29 18:52:17,176][00189] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3832.2). Total num frames: 8798208. Throughput: 0: 973.1. Samples: 699094. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 18:52:17,179][00189] Avg episode reward: [(0, '26.313')] [2024-09-29 18:52:22,176][00189] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3832.2). Total num frames: 8814592. Throughput: 0: 940.6. Samples: 701152. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 18:52:22,178][00189] Avg episode reward: [(0, '25.951')] [2024-09-29 18:52:25,549][16350] Updated weights for policy 0, policy_version 2156 (0.0012) [2024-09-29 18:52:27,176][00189] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3832.2). Total num frames: 8835072. Throughput: 0: 953.7. Samples: 707830. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-29 18:52:27,178][00189] Avg episode reward: [(0, '25.999')] [2024-09-29 18:52:32,176][00189] Fps is (10 sec: 4095.9, 60 sec: 3891.2, 300 sec: 3818.3). Total num frames: 8855552. Throughput: 0: 995.2. Samples: 714164. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-29 18:52:32,179][00189] Avg episode reward: [(0, '24.607')] [2024-09-29 18:52:36,727][16350] Updated weights for policy 0, policy_version 2166 (0.0016) [2024-09-29 18:52:37,178][00189] Fps is (10 sec: 3685.5, 60 sec: 3754.5, 300 sec: 3832.2). Total num frames: 8871936. Throughput: 0: 966.6. Samples: 716222. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 18:52:37,184][00189] Avg episode reward: [(0, '25.066')] [2024-09-29 18:52:42,176][00189] Fps is (10 sec: 3686.5, 60 sec: 3822.9, 300 sec: 3832.2). Total num frames: 8892416. Throughput: 0: 936.0. Samples: 721762. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 18:52:42,179][00189] Avg episode reward: [(0, '25.705')] [2024-09-29 18:52:46,113][16350] Updated weights for policy 0, policy_version 2176 (0.0014) [2024-09-29 18:52:47,176][00189] Fps is (10 sec: 4506.6, 60 sec: 3959.5, 300 sec: 3846.1). Total num frames: 8916992. Throughput: 0: 990.5. Samples: 728618. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 18:52:47,178][00189] Avg episode reward: [(0, '27.098')] [2024-09-29 18:52:52,178][00189] Fps is (10 sec: 4095.1, 60 sec: 3891.1, 300 sec: 3846.0). Total num frames: 8933376. Throughput: 0: 985.5. Samples: 731280. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 18:52:52,183][00189] Avg episode reward: [(0, '26.636')] [2024-09-29 18:52:57,176][00189] Fps is (10 sec: 3276.8, 60 sec: 3755.1, 300 sec: 3846.1). Total num frames: 8949760. Throughput: 0: 931.4. Samples: 735784. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-29 18:52:57,178][00189] Avg episode reward: [(0, '26.829')] [2024-09-29 18:52:57,917][16350] Updated weights for policy 0, policy_version 2186 (0.0029) [2024-09-29 18:53:02,176][00189] Fps is (10 sec: 3687.2, 60 sec: 3891.4, 300 sec: 3846.1). Total num frames: 8970240. Throughput: 0: 970.2. Samples: 742754. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-29 18:53:02,178][00189] Avg episode reward: [(0, '27.316')] [2024-09-29 18:53:07,176][00189] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 8990720. Throughput: 0: 999.2. Samples: 746118. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-29 18:53:07,181][00189] Avg episode reward: [(0, '27.519')] [2024-09-29 18:53:07,464][16350] Updated weights for policy 0, policy_version 2196 (0.0019) [2024-09-29 18:53:12,176][00189] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3846.1). Total num frames: 9007104. Throughput: 0: 953.6. Samples: 750744. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 18:53:12,182][00189] Avg episode reward: [(0, '28.568')] [2024-09-29 18:53:17,176][00189] Fps is (10 sec: 3686.3, 60 sec: 3822.9, 300 sec: 3860.0). Total num frames: 9027584. Throughput: 0: 947.4. Samples: 756798. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 18:53:17,185][00189] Avg episode reward: [(0, '27.017')] [2024-09-29 18:53:18,415][16350] Updated weights for policy 0, policy_version 2206 (0.0015) [2024-09-29 18:53:22,176][00189] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3860.0). Total num frames: 9052160. Throughput: 0: 980.1. Samples: 760326. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 18:53:22,178][00189] Avg episode reward: [(0, '27.919')] [2024-09-29 18:53:27,177][00189] Fps is (10 sec: 3686.0, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 9064448. Throughput: 0: 975.8. Samples: 765676. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 18:53:27,185][00189] Avg episode reward: [(0, '27.578')] [2024-09-29 18:53:30,236][16350] Updated weights for policy 0, policy_version 2216 (0.0024) [2024-09-29 18:53:32,176][00189] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3860.0). Total num frames: 9084928. Throughput: 0: 937.5. Samples: 770804. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 18:53:32,179][00189] Avg episode reward: [(0, '28.232')] [2024-09-29 18:53:37,176][00189] Fps is (10 sec: 4506.1, 60 sec: 3959.6, 300 sec: 3873.8). Total num frames: 9109504. Throughput: 0: 954.4. Samples: 774224. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-09-29 18:53:37,179][00189] Avg episode reward: [(0, '27.252')] [2024-09-29 18:53:39,161][16350] Updated weights for policy 0, policy_version 2226 (0.0020) [2024-09-29 18:53:42,176][00189] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 9125888. Throughput: 0: 998.9. Samples: 780736. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-29 18:53:42,178][00189] Avg episode reward: [(0, '25.880')] [2024-09-29 18:53:42,190][16336] Saving /content/train_dir/samplefactory-vizdoom-v1/checkpoint_p0/checkpoint_000002228_9125888.pth... [2024-09-29 18:53:42,339][16336] Removing /content/train_dir/samplefactory-vizdoom-v1/checkpoint_p0/checkpoint_000002002_8200192.pth [2024-09-29 18:53:47,176][00189] Fps is (10 sec: 2867.3, 60 sec: 3686.4, 300 sec: 3846.1). Total num frames: 9138176. Throughput: 0: 936.4. Samples: 784890. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 18:53:47,183][00189] Avg episode reward: [(0, '25.980')] [2024-09-29 18:53:50,998][16350] Updated weights for policy 0, policy_version 2236 (0.0015) [2024-09-29 18:53:52,176][00189] Fps is (10 sec: 3686.4, 60 sec: 3823.1, 300 sec: 3860.0). Total num frames: 9162752. Throughput: 0: 934.7. Samples: 788180. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 18:53:52,178][00189] Avg episode reward: [(0, '25.094')] [2024-09-29 18:53:57,176][00189] Fps is (10 sec: 4505.5, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 9183232. Throughput: 0: 982.4. Samples: 794954. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-29 18:53:57,178][00189] Avg episode reward: [(0, '25.268')] [2024-09-29 18:54:01,445][16350] Updated weights for policy 0, policy_version 2246 (0.0017) [2024-09-29 18:54:02,177][00189] Fps is (10 sec: 3686.0, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 9199616. Throughput: 0: 954.4. Samples: 799748. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-29 18:54:02,183][00189] Avg episode reward: [(0, '25.858')] [2024-09-29 18:54:07,176][00189] Fps is (10 sec: 3686.5, 60 sec: 3822.9, 300 sec: 3860.0). Total num frames: 9220096. Throughput: 0: 930.6. Samples: 802202. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-29 18:54:07,182][00189] Avg episode reward: [(0, '26.706')] [2024-09-29 18:54:11,453][16350] Updated weights for policy 0, policy_version 2256 (0.0013) [2024-09-29 18:54:12,176][00189] Fps is (10 sec: 4096.5, 60 sec: 3891.2, 300 sec: 3846.2). Total num frames: 9240576. Throughput: 0: 968.6. Samples: 809260. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-09-29 18:54:12,179][00189] Avg episode reward: [(0, '28.267')] [2024-09-29 18:54:17,176][00189] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3846.2). Total num frames: 9261056. Throughput: 0: 981.6. Samples: 814978. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-09-29 18:54:17,181][00189] Avg episode reward: [(0, '28.796')] [2024-09-29 18:54:22,176][00189] Fps is (10 sec: 3276.7, 60 sec: 3686.4, 300 sec: 3846.1). Total num frames: 9273344. Throughput: 0: 953.5. Samples: 817130. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-09-29 18:54:22,179][00189] Avg episode reward: [(0, '29.447')] [2024-09-29 18:54:23,141][16350] Updated weights for policy 0, policy_version 2266 (0.0017) [2024-09-29 18:54:27,176][00189] Fps is (10 sec: 3686.4, 60 sec: 3891.3, 300 sec: 3860.0). Total num frames: 9297920. Throughput: 0: 946.7. Samples: 823336. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 18:54:27,183][00189] Avg episode reward: [(0, '29.770')] [2024-09-29 18:54:32,014][16350] Updated weights for policy 0, policy_version 2276 (0.0012) [2024-09-29 18:54:32,176][00189] Fps is (10 sec: 4915.3, 60 sec: 3959.5, 300 sec: 3860.0). Total num frames: 9322496. Throughput: 0: 1004.4. Samples: 830088. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 18:54:32,178][00189] Avg episode reward: [(0, '29.296')] [2024-09-29 18:54:37,176][00189] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3846.1). Total num frames: 9334784. Throughput: 0: 978.6. Samples: 832218. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 18:54:37,179][00189] Avg episode reward: [(0, '28.237')] [2024-09-29 18:54:42,176][00189] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3860.0). Total num frames: 9355264. Throughput: 0: 945.9. Samples: 837520. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-29 18:54:42,179][00189] Avg episode reward: [(0, '29.240')] [2024-09-29 18:54:43,420][16350] Updated weights for policy 0, policy_version 2286 (0.0025) [2024-09-29 18:54:47,176][00189] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3860.0). Total num frames: 9379840. Throughput: 0: 994.8. Samples: 844512. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 18:54:47,185][00189] Avg episode reward: [(0, '29.306')] [2024-09-29 18:54:52,176][00189] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 9396224. Throughput: 0: 1008.3. Samples: 847576. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-29 18:54:52,179][00189] Avg episode reward: [(0, '29.456')] [2024-09-29 18:54:54,039][16350] Updated weights for policy 0, policy_version 2296 (0.0026) [2024-09-29 18:54:57,176][00189] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3860.0). Total num frames: 9412608. Throughput: 0: 945.8. Samples: 851820. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-29 18:54:57,178][00189] Avg episode reward: [(0, '29.235')] [2024-09-29 18:55:02,176][00189] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3873.8). Total num frames: 9437184. Throughput: 0: 970.0. Samples: 858628. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-29 18:55:02,180][00189] Avg episode reward: [(0, '28.123')] [2024-09-29 18:55:03,931][16350] Updated weights for policy 0, policy_version 2306 (0.0018) [2024-09-29 18:55:07,182][00189] Fps is (10 sec: 4503.1, 60 sec: 3959.1, 300 sec: 3859.9). Total num frames: 9457664. Throughput: 0: 1000.5. Samples: 862160. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 18:55:07,192][00189] Avg episode reward: [(0, '28.691')] [2024-09-29 18:55:12,176][00189] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3860.0). Total num frames: 9469952. Throughput: 0: 970.3. Samples: 866998. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 18:55:12,185][00189] Avg episode reward: [(0, '27.323')] [2024-09-29 18:55:15,344][16350] Updated weights for policy 0, policy_version 2316 (0.0026) [2024-09-29 18:55:17,176][00189] Fps is (10 sec: 3688.3, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 9494528. Throughput: 0: 951.9. Samples: 872924. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-29 18:55:17,180][00189] Avg episode reward: [(0, '27.099')] [2024-09-29 18:55:22,176][00189] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3873.8). Total num frames: 9515008. Throughput: 0: 982.2. Samples: 876418. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-29 18:55:22,179][00189] Avg episode reward: [(0, '26.390')] [2024-09-29 18:55:24,897][16350] Updated weights for policy 0, policy_version 2326 (0.0016) [2024-09-29 18:55:27,176][00189] Fps is (10 sec: 3686.5, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 9531392. Throughput: 0: 989.6. Samples: 882052. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-29 18:55:27,186][00189] Avg episode reward: [(0, '26.748')] [2024-09-29 18:55:32,176][00189] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3873.8). Total num frames: 9547776. Throughput: 0: 943.7. Samples: 886978. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 18:55:32,179][00189] Avg episode reward: [(0, '28.301')] [2024-09-29 18:55:35,931][16350] Updated weights for policy 0, policy_version 2336 (0.0016) [2024-09-29 18:55:37,176][00189] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3873.8). Total num frames: 9572352. Throughput: 0: 952.4. Samples: 890436. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-29 18:55:37,183][00189] Avg episode reward: [(0, '27.823')] [2024-09-29 18:55:42,176][00189] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3860.0). Total num frames: 9592832. Throughput: 0: 1010.2. Samples: 897280. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-29 18:55:42,184][00189] Avg episode reward: [(0, '29.135')] [2024-09-29 18:55:42,205][16336] Saving /content/train_dir/samplefactory-vizdoom-v1/checkpoint_p0/checkpoint_000002342_9592832.pth... [2024-09-29 18:55:42,421][16336] Removing /content/train_dir/samplefactory-vizdoom-v1/checkpoint_p0/checkpoint_000002115_8663040.pth [2024-09-29 18:55:47,176][00189] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3860.0). Total num frames: 9605120. Throughput: 0: 952.2. Samples: 901478. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 18:55:47,191][00189] Avg episode reward: [(0, '29.393')] [2024-09-29 18:55:47,237][16350] Updated weights for policy 0, policy_version 2346 (0.0013) [2024-09-29 18:55:52,176][00189] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 9629696. Throughput: 0: 943.8. Samples: 904626. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-29 18:55:52,178][00189] Avg episode reward: [(0, '30.303')] [2024-09-29 18:55:52,190][16336] Saving new best policy, reward=30.303! [2024-09-29 18:55:56,558][16350] Updated weights for policy 0, policy_version 2356 (0.0014) [2024-09-29 18:55:57,176][00189] Fps is (10 sec: 4505.5, 60 sec: 3959.4, 300 sec: 3873.8). Total num frames: 9650176. Throughput: 0: 987.2. Samples: 911424. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 18:55:57,184][00189] Avg episode reward: [(0, '29.679')] [2024-09-29 18:56:02,176][00189] Fps is (10 sec: 3686.3, 60 sec: 3822.9, 300 sec: 3860.0). Total num frames: 9666560. Throughput: 0: 966.9. Samples: 916434. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 18:56:02,182][00189] Avg episode reward: [(0, '28.925')] [2024-09-29 18:56:07,176][00189] Fps is (10 sec: 3276.9, 60 sec: 3755.0, 300 sec: 3860.0). Total num frames: 9682944. Throughput: 0: 939.2. Samples: 918680. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 18:56:07,182][00189] Avg episode reward: [(0, '28.472')] [2024-09-29 18:56:08,089][16350] Updated weights for policy 0, policy_version 2366 (0.0027) [2024-09-29 18:56:12,176][00189] Fps is (10 sec: 4096.1, 60 sec: 3959.5, 300 sec: 3873.8). Total num frames: 9707520. Throughput: 0: 968.6. Samples: 925640. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 18:56:12,182][00189] Avg episode reward: [(0, '29.587')] [2024-09-29 18:56:17,176][00189] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 9728000. Throughput: 0: 992.8. Samples: 931654. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 18:56:17,178][00189] Avg episode reward: [(0, '28.900')] [2024-09-29 18:56:18,140][16350] Updated weights for policy 0, policy_version 2376 (0.0021) [2024-09-29 18:56:22,176][00189] Fps is (10 sec: 3276.7, 60 sec: 3754.7, 300 sec: 3860.0). Total num frames: 9740288. Throughput: 0: 964.0. Samples: 933818. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-29 18:56:22,181][00189] Avg episode reward: [(0, '28.969')] [2024-09-29 18:56:27,176][00189] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 9764864. Throughput: 0: 942.5. Samples: 939692. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 18:56:27,181][00189] Avg episode reward: [(0, '29.108')] [2024-09-29 18:56:28,597][16350] Updated weights for policy 0, policy_version 2386 (0.0028) [2024-09-29 18:56:32,176][00189] Fps is (10 sec: 4915.3, 60 sec: 4027.7, 300 sec: 3873.8). Total num frames: 9789440. Throughput: 0: 1005.2. Samples: 946712. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 18:56:32,180][00189] Avg episode reward: [(0, '27.936')] [2024-09-29 18:56:37,176][00189] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3860.0). Total num frames: 9801728. Throughput: 0: 983.0. Samples: 948862. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 18:56:37,184][00189] Avg episode reward: [(0, '27.958')] [2024-09-29 18:56:40,235][16350] Updated weights for policy 0, policy_version 2396 (0.0027) [2024-09-29 18:56:42,176][00189] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3873.8). Total num frames: 9822208. Throughput: 0: 944.7. Samples: 953936. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 18:56:42,182][00189] Avg episode reward: [(0, '27.746')] [2024-09-29 18:56:47,176][00189] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3873.8). Total num frames: 9842688. Throughput: 0: 990.3. Samples: 960996. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 18:56:47,182][00189] Avg episode reward: [(0, '27.519')] [2024-09-29 18:56:48,987][16350] Updated weights for policy 0, policy_version 2406 (0.0015) [2024-09-29 18:56:52,177][00189] Fps is (10 sec: 4095.8, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 9863168. Throughput: 0: 1013.6. Samples: 964294. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 18:56:52,181][00189] Avg episode reward: [(0, '26.985')] [2024-09-29 18:56:57,176][00189] Fps is (10 sec: 3686.4, 60 sec: 3823.0, 300 sec: 3873.9). Total num frames: 9879552. Throughput: 0: 954.3. Samples: 968584. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 18:56:57,178][00189] Avg episode reward: [(0, '27.215')] [2024-09-29 18:57:00,614][16350] Updated weights for policy 0, policy_version 2416 (0.0032) [2024-09-29 18:57:02,176][00189] Fps is (10 sec: 3686.6, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 9900032. Throughput: 0: 967.3. Samples: 975184. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 18:57:02,180][00189] Avg episode reward: [(0, '29.558')] [2024-09-29 18:57:07,176][00189] Fps is (10 sec: 4505.5, 60 sec: 4027.7, 300 sec: 3873.8). Total num frames: 9924608. Throughput: 0: 996.8. Samples: 978676. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 18:57:07,182][00189] Avg episode reward: [(0, '30.511')] [2024-09-29 18:57:07,188][16336] Saving new best policy, reward=30.511! [2024-09-29 18:57:10,998][16350] Updated weights for policy 0, policy_version 2426 (0.0018) [2024-09-29 18:57:12,176][00189] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3860.0). Total num frames: 9936896. Throughput: 0: 978.6. Samples: 983728. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 18:57:12,183][00189] Avg episode reward: [(0, '30.918')] [2024-09-29 18:57:12,203][16336] Saving new best policy, reward=30.918! [2024-09-29 18:57:17,176][00189] Fps is (10 sec: 3276.9, 60 sec: 3822.9, 300 sec: 3873.8). Total num frames: 9957376. Throughput: 0: 943.9. Samples: 989188. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-29 18:57:17,181][00189] Avg episode reward: [(0, '31.507')] [2024-09-29 18:57:17,186][16336] Saving new best policy, reward=31.507! [2024-09-29 18:57:21,197][16350] Updated weights for policy 0, policy_version 2436 (0.0014) [2024-09-29 18:57:22,176][00189] Fps is (10 sec: 4505.7, 60 sec: 4027.7, 300 sec: 3887.7). Total num frames: 9981952. Throughput: 0: 972.4. Samples: 992622. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 18:57:22,185][00189] Avg episode reward: [(0, '30.279')] [2024-09-29 18:57:27,176][00189] Fps is (10 sec: 4095.9, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 9998336. Throughput: 0: 997.2. Samples: 998810. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 18:57:27,183][00189] Avg episode reward: [(0, '29.350')] [2024-09-29 18:57:29,441][00189] Component Batcher_0 stopped! [2024-09-29 18:57:29,441][16336] Stopping Batcher_0... [2024-09-29 18:57:29,445][16336] Loop batcher_evt_loop terminating... [2024-09-29 18:57:29,449][16336] Saving /content/train_dir/samplefactory-vizdoom-v1/checkpoint_p0/checkpoint_000002443_10006528.pth... [2024-09-29 18:57:29,532][16350] Weights refcount: 2 0 [2024-09-29 18:57:29,543][00189] Component InferenceWorker_p0-w0 stopped! [2024-09-29 18:57:29,549][16350] Stopping InferenceWorker_p0-w0... [2024-09-29 18:57:29,550][16350] Loop inference_proc0-0_evt_loop terminating... [2024-09-29 18:57:29,634][16336] Removing /content/train_dir/samplefactory-vizdoom-v1/checkpoint_p0/checkpoint_000002228_9125888.pth [2024-09-29 18:57:29,661][16336] Saving /content/train_dir/samplefactory-vizdoom-v1/checkpoint_p0/checkpoint_000002443_10006528.pth... [2024-09-29 18:57:29,963][00189] Component LearnerWorker_p0 stopped! [2024-09-29 18:57:29,970][16336] Stopping LearnerWorker_p0... [2024-09-29 18:57:29,970][16336] Loop learner_proc0_evt_loop terminating... [2024-09-29 18:57:30,174][00189] Component RolloutWorker_w7 stopped! [2024-09-29 18:57:30,178][16357] Stopping RolloutWorker_w7... [2024-09-29 18:57:30,192][16357] Loop rollout_proc7_evt_loop terminating... [2024-09-29 18:57:30,207][00189] Component RolloutWorker_w3 stopped! [2024-09-29 18:57:30,210][16353] Stopping RolloutWorker_w3... [2024-09-29 18:57:30,210][16353] Loop rollout_proc3_evt_loop terminating... [2024-09-29 18:57:30,225][00189] Component RolloutWorker_w5 stopped! [2024-09-29 18:57:30,232][16355] Stopping RolloutWorker_w5... [2024-09-29 18:57:30,233][16355] Loop rollout_proc5_evt_loop terminating... [2024-09-29 18:57:30,281][00189] Component RolloutWorker_w4 stopped! [2024-09-29 18:57:30,286][16354] Stopping RolloutWorker_w4... [2024-09-29 18:57:30,287][16354] Loop rollout_proc4_evt_loop terminating... [2024-09-29 18:57:30,291][00189] Component RolloutWorker_w6 stopped! [2024-09-29 18:57:30,294][16356] Stopping RolloutWorker_w6... [2024-09-29 18:57:30,303][16356] Loop rollout_proc6_evt_loop terminating... [2024-09-29 18:57:30,308][00189] Component RolloutWorker_w0 stopped! [2024-09-29 18:57:30,312][16349] Stopping RolloutWorker_w0... [2024-09-29 18:57:30,317][16349] Loop rollout_proc0_evt_loop terminating... [2024-09-29 18:57:30,340][16351] Stopping RolloutWorker_w1... [2024-09-29 18:57:30,341][16351] Loop rollout_proc1_evt_loop terminating... [2024-09-29 18:57:30,339][00189] Component RolloutWorker_w2 stopped! [2024-09-29 18:57:30,341][00189] Component RolloutWorker_w1 stopped! [2024-09-29 18:57:30,344][00189] Waiting for process learner_proc0 to stop... [2024-09-29 18:57:30,344][16352] Stopping RolloutWorker_w2... [2024-09-29 18:57:30,354][16352] Loop rollout_proc2_evt_loop terminating... [2024-09-29 18:57:31,763][00189] Waiting for process inference_proc0-0 to join... [2024-09-29 18:57:31,975][00189] Waiting for process rollout_proc0 to join... [2024-09-29 18:57:33,353][00189] Waiting for process rollout_proc1 to join... [2024-09-29 18:57:33,356][00189] Waiting for process rollout_proc2 to join... [2024-09-29 18:57:33,362][00189] Waiting for process rollout_proc3 to join... [2024-09-29 18:57:33,365][00189] Waiting for process rollout_proc4 to join... [2024-09-29 18:57:33,369][00189] Waiting for process rollout_proc5 to join... [2024-09-29 18:57:33,374][00189] Waiting for process rollout_proc6 to join... [2024-09-29 18:57:33,377][00189] Waiting for process rollout_proc7 to join... [2024-09-29 18:57:33,381][00189] Batcher 0 profile tree view: batching: 25.8119, releasing_batches: 0.0279 [2024-09-29 18:57:33,382][00189] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0026 wait_policy_total: 465.3649 update_model: 7.1331 weight_update: 0.0014 one_step: 0.0024 handle_policy_step: 534.8475 deserialize: 14.7441, stack: 2.9110, obs_to_device_normalize: 112.9528, forward: 264.8554, send_messages: 27.6460 prepare_outputs: 84.2702 to_cpu: 51.8794 [2024-09-29 18:57:33,385][00189] Learner 0 profile tree view: misc: 0.0059, prepare_batch: 16.0138 train: 75.8934 epoch_init: 0.0055, minibatch_init: 0.0151, losses_postprocess: 0.5288, kl_divergence: 0.5843, after_optimizer: 2.8341 calculate_losses: 23.8120 losses_init: 0.0057, forward_head: 1.9738, bptt_initial: 14.9899, tail: 1.1221, advantages_returns: 0.2352, losses: 3.0315 bptt: 2.1277 bptt_forward_core: 2.0241 update: 47.4305 clip: 1.4289 [2024-09-29 18:57:33,386][00189] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.3783, enqueue_policy_requests: 112.3786, env_step: 819.3750, overhead: 13.3659, complete_rollouts: 7.9114 save_policy_outputs: 24.0079 split_output_tensors: 7.9281 [2024-09-29 18:57:33,388][00189] RolloutWorker_w7 profile tree view: wait_for_trajectories: 0.3150, enqueue_policy_requests: 115.6216, env_step: 816.6218, overhead: 13.7407, complete_rollouts: 6.8241 save_policy_outputs: 23.0504 split_output_tensors: 8.1288 [2024-09-29 18:57:33,389][00189] Loop Runner_EvtLoop terminating... [2024-09-29 18:57:33,394][00189] Runner profile tree view: main_loop: 1068.7619 [2024-09-29 18:57:33,395][00189] Collected {0: 10006528}, FPS: 3744.3 [2024-09-29 18:59:31,974][00189] Environment doom_basic already registered, overwriting... [2024-09-29 18:59:31,976][00189] Environment doom_two_colors_easy already registered, overwriting... [2024-09-29 18:59:31,979][00189] Environment doom_two_colors_hard already registered, overwriting... [2024-09-29 18:59:31,981][00189] Environment doom_dm already registered, overwriting... [2024-09-29 18:59:31,983][00189] Environment doom_dwango5 already registered, overwriting... [2024-09-29 18:59:31,985][00189] Environment doom_my_way_home_flat_actions already registered, overwriting... [2024-09-29 18:59:31,987][00189] Environment doom_defend_the_center_flat_actions already registered, overwriting... [2024-09-29 18:59:31,988][00189] Environment doom_my_way_home already registered, overwriting... [2024-09-29 18:59:31,990][00189] Environment doom_deadly_corridor already registered, overwriting... [2024-09-29 18:59:31,991][00189] Environment doom_defend_the_center already registered, overwriting... [2024-09-29 18:59:31,993][00189] Environment doom_defend_the_line already registered, overwriting... [2024-09-29 18:59:31,994][00189] Environment doom_health_gathering already registered, overwriting... [2024-09-29 18:59:31,995][00189] Environment doom_health_gathering_supreme already registered, overwriting... [2024-09-29 18:59:31,998][00189] Environment doom_battle already registered, overwriting... [2024-09-29 18:59:31,999][00189] Environment doom_battle2 already registered, overwriting... [2024-09-29 18:59:32,000][00189] Environment doom_duel_bots already registered, overwriting... [2024-09-29 18:59:32,001][00189] Environment doom_deathmatch_bots already registered, overwriting... [2024-09-29 18:59:32,002][00189] Environment doom_duel already registered, overwriting... [2024-09-29 18:59:32,003][00189] Environment doom_deathmatch_full already registered, overwriting... [2024-09-29 18:59:32,004][00189] Environment doom_benchmark already registered, overwriting... [2024-09-29 18:59:32,005][00189] register_encoder_factory: [2024-09-29 18:59:32,048][00189] Loading existing experiment configuration from /content/train_dir/samplefactory-vizdoom-v1/config.json [2024-09-29 18:59:32,049][00189] Overriding arg 'train_for_env_steps' with value 20000000 passed from command line [2024-09-29 18:59:32,053][00189] Experiment dir /content/train_dir/samplefactory-vizdoom-v1 already exists! [2024-09-29 18:59:32,055][00189] Resuming existing experiment from /content/train_dir/samplefactory-vizdoom-v1... [2024-09-29 18:59:32,057][00189] Weights and Biases integration disabled [2024-09-29 18:59:32,060][00189] Environment var CUDA_VISIBLE_DEVICES is 0 [2024-09-29 18:59:33,530][00189] Starting experiment with the following configuration: help=False algo=APPO env=doom_health_gathering_supreme experiment=samplefactory-vizdoom-v1 train_dir=/content/train_dir restart_behavior=resume device=gpu seed=None num_policies=1 async_rl=True serial_mode=False batched_sampling=False num_batches_to_accumulate=2 worker_num_splits=2 policy_workers_per_policy=1 max_policy_lag=1000 num_workers=8 num_envs_per_worker=4 batch_size=1024 num_batches_per_epoch=1 num_epochs=1 rollout=32 recurrence=32 shuffle_minibatches=False gamma=0.99 reward_scale=1.0 reward_clip=1000.0 value_bootstrap=False normalize_returns=True exploration_loss_coeff=0.001 value_loss_coeff=0.5 kl_loss_coeff=0.0 exploration_loss=symmetric_kl gae_lambda=0.95 ppo_clip_ratio=0.1 ppo_clip_value=0.2 with_vtrace=False vtrace_rho=1.0 vtrace_c=1.0 optimizer=adam adam_eps=1e-06 adam_beta1=0.9 adam_beta2=0.999 max_grad_norm=4.0 learning_rate=0.0001 lr_schedule=constant lr_schedule_kl_threshold=0.008 lr_adaptive_min=1e-06 lr_adaptive_max=0.01 obs_subtract_mean=0.0 obs_scale=255.0 normalize_input=True normalize_input_keys=None decorrelate_experience_max_seconds=0 decorrelate_envs_on_one_worker=True actor_worker_gpus=[] set_workers_cpu_affinity=True force_envs_single_thread=False default_niceness=0 log_to_file=True experiment_summaries_interval=10 flush_summaries_interval=30 stats_avg=100 summaries_use_frameskip=True heartbeat_interval=20 heartbeat_reporting_interval=600 train_for_env_steps=20000000 train_for_seconds=10000000000 save_every_sec=120 keep_checkpoints=2 load_checkpoint_kind=latest save_milestones_sec=-1 save_best_every_sec=5 save_best_metric=reward save_best_after=100000 benchmark=False encoder_mlp_layers=[512, 512] encoder_conv_architecture=convnet_simple encoder_conv_mlp_layers=[512] use_rnn=True rnn_size=512 rnn_type=gru rnn_num_layers=1 decoder_mlp_layers=[] nonlinearity=elu policy_initialization=orthogonal policy_init_gain=1.0 actor_critic_share_weights=True adaptive_stddev=True continuous_tanh_scale=0.0 initial_stddev=1.0 use_env_info_cache=False env_gpu_actions=False env_gpu_observations=True env_frameskip=4 env_framestack=1 pixel_format=CHW use_record_episode_statistics=False with_wandb=False wandb_user=None wandb_project=sample_factory wandb_group=None wandb_job_type=SF wandb_tags=[] with_pbt=False pbt_mix_policies_in_one_env=True pbt_period_env_steps=5000000 pbt_start_mutation=20000000 pbt_replace_fraction=0.3 pbt_mutation_rate=0.15 pbt_replace_reward_gap=0.1 pbt_replace_reward_gap_absolute=1e-06 pbt_optimize_gamma=False pbt_target_objective=true_objective pbt_perturb_min=1.1 pbt_perturb_max=1.5 num_agents=-1 num_humans=0 num_bots=-1 start_bot_difficulty=None timelimit=None res_w=128 res_h=72 wide_aspect_ratio=False eval_env_frameskip=1 fps=35 command_line=--env=doom_health_gathering_supreme --num_workers=8 --num_envs_per_worker=4 --train_for_env_steps=10000 --experiment=samplefactory-vizdoom-v1 --restart_behavior=resume cli_args={'env': 'doom_health_gathering_supreme', 'experiment': 'samplefactory-vizdoom-v1', 'restart_behavior': 'resume', 'num_workers': 8, 'num_envs_per_worker': 4, 'train_for_env_steps': 10000} git_hash=unknown git_repo_name=not a git repository [2024-09-29 18:59:33,532][00189] Saving configuration to /content/train_dir/samplefactory-vizdoom-v1/config.json... [2024-09-29 18:59:33,535][00189] Rollout worker 0 uses device cpu [2024-09-29 18:59:33,536][00189] Rollout worker 1 uses device cpu [2024-09-29 18:59:33,538][00189] Rollout worker 2 uses device cpu [2024-09-29 18:59:33,540][00189] Rollout worker 3 uses device cpu [2024-09-29 18:59:33,541][00189] Rollout worker 4 uses device cpu [2024-09-29 18:59:33,542][00189] Rollout worker 5 uses device cpu [2024-09-29 18:59:33,543][00189] Rollout worker 6 uses device cpu [2024-09-29 18:59:33,544][00189] Rollout worker 7 uses device cpu [2024-09-29 18:59:33,690][00189] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-09-29 18:59:33,692][00189] InferenceWorker_p0-w0: min num requests: 2 [2024-09-29 18:59:33,726][00189] Starting all processes... [2024-09-29 18:59:33,728][00189] Starting process learner_proc0 [2024-09-29 18:59:33,776][00189] Starting all processes... [2024-09-29 18:59:33,780][00189] Starting process inference_proc0-0 [2024-09-29 18:59:33,780][00189] Starting process rollout_proc0 [2024-09-29 18:59:33,782][00189] Starting process rollout_proc1 [2024-09-29 18:59:33,782][00189] Starting process rollout_proc2 [2024-09-29 18:59:33,782][00189] Starting process rollout_proc3 [2024-09-29 18:59:33,782][00189] Starting process rollout_proc4 [2024-09-29 18:59:33,782][00189] Starting process rollout_proc5 [2024-09-29 18:59:33,782][00189] Starting process rollout_proc6 [2024-09-29 18:59:33,782][00189] Starting process rollout_proc7 [2024-09-29 18:59:45,058][21458] Worker 7 uses CPU cores [1] [2024-09-29 18:59:45,121][21455] Worker 4 uses CPU cores [0] [2024-09-29 18:59:45,135][21457] Worker 6 uses CPU cores [0] [2024-09-29 18:59:45,164][21437] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-09-29 18:59:45,165][21437] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2024-09-29 18:59:45,176][21453] Worker 2 uses CPU cores [0] [2024-09-29 18:59:45,192][21456] Worker 5 uses CPU cores [1] [2024-09-29 18:59:45,213][21437] Num visible devices: 1 [2024-09-29 18:59:45,227][21454] Worker 3 uses CPU cores [1] [2024-09-29 18:59:45,239][21437] Starting seed is not provided [2024-09-29 18:59:45,240][21437] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-09-29 18:59:45,240][21437] Initializing actor-critic model on device cuda:0 [2024-09-29 18:59:45,241][21437] RunningMeanStd input shape: (3, 72, 128) [2024-09-29 18:59:45,242][21437] RunningMeanStd input shape: (1,) [2024-09-29 18:59:45,248][21451] Worker 0 uses CPU cores [0] [2024-09-29 18:59:45,281][21450] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-09-29 18:59:45,281][21450] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2024-09-29 18:59:45,282][21452] Worker 1 uses CPU cores [1] [2024-09-29 18:59:45,285][21437] ConvEncoder: input_channels=3 [2024-09-29 18:59:45,306][21450] Num visible devices: 1 [2024-09-29 18:59:45,416][21437] Conv encoder output size: 512 [2024-09-29 18:59:45,417][21437] Policy head output size: 512 [2024-09-29 18:59:45,431][21437] Created Actor Critic model with architecture: [2024-09-29 18:59:45,432][21437] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2024-09-29 18:59:46,940][21437] Using optimizer [2024-09-29 18:59:46,942][21437] Loading state from checkpoint /content/train_dir/samplefactory-vizdoom-v1/checkpoint_p0/checkpoint_000002443_10006528.pth... [2024-09-29 18:59:46,979][21437] Loading model from checkpoint [2024-09-29 18:59:46,983][21437] Loaded experiment state at self.train_step=2443, self.env_steps=10006528 [2024-09-29 18:59:46,984][21437] Initialized policy 0 weights for model version 2443 [2024-09-29 18:59:46,987][21437] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-09-29 18:59:46,994][21437] LearnerWorker_p0 finished initialization! [2024-09-29 18:59:47,061][00189] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 10006528. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-09-29 18:59:47,110][21450] RunningMeanStd input shape: (3, 72, 128) [2024-09-29 18:59:47,111][21450] RunningMeanStd input shape: (1,) [2024-09-29 18:59:47,123][21450] ConvEncoder: input_channels=3 [2024-09-29 18:59:47,223][21450] Conv encoder output size: 512 [2024-09-29 18:59:47,224][21450] Policy head output size: 512 [2024-09-29 18:59:48,968][00189] Inference worker 0-0 is ready! [2024-09-29 18:59:48,970][00189] All inference workers are ready! Signal rollout workers to start! [2024-09-29 18:59:49,042][21458] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-29 18:59:49,044][21452] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-29 18:59:49,037][21454] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-29 18:59:49,037][21455] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-29 18:59:49,043][21453] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-29 18:59:49,043][21456] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-29 18:59:49,040][21457] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-29 18:59:49,048][21451] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-29 18:59:50,274][21454] Decorrelating experience for 0 frames... [2024-09-29 18:59:51,045][21457] Decorrelating experience for 0 frames... [2024-09-29 18:59:51,051][21453] Decorrelating experience for 0 frames... [2024-09-29 18:59:51,058][21455] Decorrelating experience for 0 frames... [2024-09-29 18:59:51,065][21451] Decorrelating experience for 0 frames... [2024-09-29 18:59:51,765][21454] Decorrelating experience for 32 frames... [2024-09-29 18:59:52,061][00189] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 10006528. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-09-29 18:59:52,380][21452] Decorrelating experience for 0 frames... [2024-09-29 18:59:52,390][21458] Decorrelating experience for 0 frames... [2024-09-29 18:59:52,799][21455] Decorrelating experience for 32 frames... [2024-09-29 18:59:52,801][21451] Decorrelating experience for 32 frames... [2024-09-29 18:59:52,804][21453] Decorrelating experience for 32 frames... [2024-09-29 18:59:53,689][00189] Heartbeat connected on Batcher_0 [2024-09-29 18:59:53,693][00189] Heartbeat connected on LearnerWorker_p0 [2024-09-29 18:59:53,732][00189] Heartbeat connected on InferenceWorker_p0-w0 [2024-09-29 18:59:53,844][21452] Decorrelating experience for 32 frames... [2024-09-29 18:59:53,864][21458] Decorrelating experience for 32 frames... [2024-09-29 18:59:54,306][21454] Decorrelating experience for 64 frames... [2024-09-29 18:59:54,666][21457] Decorrelating experience for 32 frames... [2024-09-29 18:59:54,785][21451] Decorrelating experience for 64 frames... [2024-09-29 18:59:54,791][21455] Decorrelating experience for 64 frames... [2024-09-29 18:59:55,231][21458] Decorrelating experience for 64 frames... [2024-09-29 18:59:55,652][21457] Decorrelating experience for 64 frames... [2024-09-29 18:59:55,689][21454] Decorrelating experience for 96 frames... [2024-09-29 18:59:55,705][21455] Decorrelating experience for 96 frames... [2024-09-29 18:59:55,895][00189] Heartbeat connected on RolloutWorker_w4 [2024-09-29 18:59:55,902][00189] Heartbeat connected on RolloutWorker_w3 [2024-09-29 18:59:56,093][21456] Decorrelating experience for 0 frames... [2024-09-29 18:59:56,425][21452] Decorrelating experience for 64 frames... [2024-09-29 18:59:57,062][00189] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 10006528. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-09-29 18:59:57,064][00189] Avg episode reward: [(0, '0.320')] [2024-09-29 18:59:57,227][21458] Decorrelating experience for 96 frames... [2024-09-29 18:59:57,669][00189] Heartbeat connected on RolloutWorker_w7 [2024-09-29 18:59:57,983][21457] Decorrelating experience for 96 frames... [2024-09-29 18:59:57,985][21451] Decorrelating experience for 96 frames... [2024-09-29 18:59:58,486][00189] Heartbeat connected on RolloutWorker_w0 [2024-09-29 18:59:58,518][00189] Heartbeat connected on RolloutWorker_w6 [2024-09-29 18:59:58,655][21456] Decorrelating experience for 32 frames... [2024-09-29 18:59:58,784][21453] Decorrelating experience for 64 frames... [2024-09-29 19:00:01,117][21437] Signal inference workers to stop experience collection... [2024-09-29 19:00:01,133][21450] InferenceWorker_p0-w0: stopping experience collection [2024-09-29 19:00:01,245][21452] Decorrelating experience for 96 frames... [2024-09-29 19:00:01,316][21456] Decorrelating experience for 64 frames... [2024-09-29 19:00:01,388][00189] Heartbeat connected on RolloutWorker_w1 [2024-09-29 19:00:01,677][21453] Decorrelating experience for 96 frames... [2024-09-29 19:00:01,780][00189] Heartbeat connected on RolloutWorker_w2 [2024-09-29 19:00:01,803][21456] Decorrelating experience for 96 frames... [2024-09-29 19:00:01,892][00189] Heartbeat connected on RolloutWorker_w5 [2024-09-29 19:00:02,061][00189] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 10006528. Throughput: 0: 148.0. Samples: 2220. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-09-29 19:00:02,062][00189] Avg episode reward: [(0, '5.379')] [2024-09-29 19:00:02,176][21437] Signal inference workers to resume experience collection... [2024-09-29 19:00:02,178][21450] InferenceWorker_p0-w0: resuming experience collection [2024-09-29 19:00:07,061][00189] Fps is (10 sec: 2048.3, 60 sec: 1024.0, 300 sec: 1024.0). Total num frames: 10027008. Throughput: 0: 291.1. Samples: 5822. Policy #0 lag: (min: 0.0, avg: 0.4, max: 3.0) [2024-09-29 19:00:07,063][00189] Avg episode reward: [(0, '6.594')] [2024-09-29 19:00:12,061][00189] Fps is (10 sec: 3276.8, 60 sec: 1310.7, 300 sec: 1310.7). Total num frames: 10039296. Throughput: 0: 314.2. Samples: 7854. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 19:00:12,063][00189] Avg episode reward: [(0, '11.183')] [2024-09-29 19:00:13,090][21450] Updated weights for policy 0, policy_version 2453 (0.0360) [2024-09-29 19:00:17,061][00189] Fps is (10 sec: 3276.8, 60 sec: 1774.9, 300 sec: 1774.9). Total num frames: 10059776. Throughput: 0: 456.5. Samples: 13694. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-29 19:00:17,063][00189] Avg episode reward: [(0, '16.468')] [2024-09-29 19:00:22,064][00189] Fps is (10 sec: 4504.0, 60 sec: 2223.3, 300 sec: 2223.3). Total num frames: 10084352. Throughput: 0: 576.0. Samples: 20162. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 19:00:22,067][00189] Avg episode reward: [(0, '18.507')] [2024-09-29 19:00:23,113][21450] Updated weights for policy 0, policy_version 2463 (0.0015) [2024-09-29 19:00:27,061][00189] Fps is (10 sec: 3686.4, 60 sec: 2252.8, 300 sec: 2252.8). Total num frames: 10096640. Throughput: 0: 554.5. Samples: 22182. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-29 19:00:27,063][00189] Avg episode reward: [(0, '20.025')] [2024-09-29 19:00:32,061][00189] Fps is (10 sec: 2868.2, 60 sec: 2366.6, 300 sec: 2366.6). Total num frames: 10113024. Throughput: 0: 597.5. Samples: 26888. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-29 19:00:32,068][00189] Avg episode reward: [(0, '22.158')] [2024-09-29 19:00:35,107][21450] Updated weights for policy 0, policy_version 2473 (0.0019) [2024-09-29 19:00:37,061][00189] Fps is (10 sec: 4096.0, 60 sec: 2621.4, 300 sec: 2621.4). Total num frames: 10137600. Throughput: 0: 743.6. Samples: 33464. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 19:00:37,063][00189] Avg episode reward: [(0, '24.667')] [2024-09-29 19:00:42,061][00189] Fps is (10 sec: 4096.0, 60 sec: 2681.0, 300 sec: 2681.0). Total num frames: 10153984. Throughput: 0: 807.7. Samples: 36344. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-29 19:00:42,063][00189] Avg episode reward: [(0, '25.874')] [2024-09-29 19:00:47,061][00189] Fps is (10 sec: 2867.2, 60 sec: 2662.4, 300 sec: 2662.4). Total num frames: 10166272. Throughput: 0: 850.4. Samples: 40488. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 19:00:47,064][00189] Avg episode reward: [(0, '26.582')] [2024-09-29 19:00:47,088][21450] Updated weights for policy 0, policy_version 2483 (0.0016) [2024-09-29 19:00:52,061][00189] Fps is (10 sec: 3686.4, 60 sec: 3072.0, 300 sec: 2835.7). Total num frames: 10190848. Throughput: 0: 915.6. Samples: 47022. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 19:00:52,063][00189] Avg episode reward: [(0, '28.578')] [2024-09-29 19:00:56,305][21450] Updated weights for policy 0, policy_version 2493 (0.0021) [2024-09-29 19:00:57,061][00189] Fps is (10 sec: 4505.6, 60 sec: 3413.4, 300 sec: 2925.7). Total num frames: 10211328. Throughput: 0: 944.5. Samples: 50356. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 19:00:57,066][00189] Avg episode reward: [(0, '30.478')] [2024-09-29 19:01:02,061][00189] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 2894.5). Total num frames: 10223616. Throughput: 0: 915.4. Samples: 54888. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 19:01:02,068][00189] Avg episode reward: [(0, '30.459')] [2024-09-29 19:01:07,061][00189] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 2969.6). Total num frames: 10244096. Throughput: 0: 883.2. Samples: 59902. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 19:01:07,068][00189] Avg episode reward: [(0, '29.821')] [2024-09-29 19:01:09,296][21450] Updated weights for policy 0, policy_version 2503 (0.0013) [2024-09-29 19:01:12,061][00189] Fps is (10 sec: 3686.3, 60 sec: 3686.4, 300 sec: 2987.7). Total num frames: 10260480. Throughput: 0: 900.5. Samples: 62704. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-29 19:01:12,067][00189] Avg episode reward: [(0, '30.212')] [2024-09-29 19:01:17,061][00189] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 2958.2). Total num frames: 10272768. Throughput: 0: 904.2. Samples: 67578. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 19:01:17,068][00189] Avg episode reward: [(0, '29.561')] [2024-09-29 19:01:22,061][00189] Fps is (10 sec: 2457.7, 60 sec: 3345.3, 300 sec: 2931.9). Total num frames: 10285056. Throughput: 0: 839.8. Samples: 71254. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 19:01:22,068][00189] Avg episode reward: [(0, '29.660')] [2024-09-29 19:01:23,236][21450] Updated weights for policy 0, policy_version 2513 (0.0027) [2024-09-29 19:01:27,061][00189] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 2990.1). Total num frames: 10305536. Throughput: 0: 836.4. Samples: 73980. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-29 19:01:27,063][00189] Avg episode reward: [(0, '28.129')] [2024-09-29 19:01:32,061][00189] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3003.7). Total num frames: 10321920. Throughput: 0: 865.8. Samples: 79448. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-29 19:01:32,065][00189] Avg episode reward: [(0, '27.458')] [2024-09-29 19:01:32,081][21437] Saving /content/train_dir/samplefactory-vizdoom-v1/checkpoint_p0/checkpoint_000002520_10321920.pth... [2024-09-29 19:01:32,262][21437] Removing /content/train_dir/samplefactory-vizdoom-v1/checkpoint_p0/checkpoint_000002342_9592832.pth [2024-09-29 19:01:36,360][21450] Updated weights for policy 0, policy_version 2523 (0.0023) [2024-09-29 19:01:37,061][00189] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 2978.9). Total num frames: 10334208. Throughput: 0: 794.2. Samples: 82762. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 19:01:37,068][00189] Avg episode reward: [(0, '27.874')] [2024-09-29 19:01:42,061][00189] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 2991.9). Total num frames: 10350592. Throughput: 0: 772.4. Samples: 85116. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-29 19:01:42,065][00189] Avg episode reward: [(0, '27.433')] [2024-09-29 19:01:47,061][00189] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3037.9). Total num frames: 10371072. Throughput: 0: 801.5. Samples: 90956. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-29 19:01:47,063][00189] Avg episode reward: [(0, '29.459')] [2024-09-29 19:01:47,674][21450] Updated weights for policy 0, policy_version 2533 (0.0022) [2024-09-29 19:01:52,061][00189] Fps is (10 sec: 3686.4, 60 sec: 3276.8, 300 sec: 3047.4). Total num frames: 10387456. Throughput: 0: 810.1. Samples: 96358. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 19:01:52,069][00189] Avg episode reward: [(0, '30.374')] [2024-09-29 19:01:57,061][00189] Fps is (10 sec: 3276.8, 60 sec: 3208.5, 300 sec: 3056.2). Total num frames: 10403840. Throughput: 0: 794.8. Samples: 98472. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 19:01:57,068][00189] Avg episode reward: [(0, '31.114')] [2024-09-29 19:01:59,260][21450] Updated weights for policy 0, policy_version 2543 (0.0020) [2024-09-29 19:02:02,061][00189] Fps is (10 sec: 4096.0, 60 sec: 3413.3, 300 sec: 3125.1). Total num frames: 10428416. Throughput: 0: 828.9. Samples: 104878. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 19:02:02,068][00189] Avg episode reward: [(0, '30.672')] [2024-09-29 19:02:07,061][00189] Fps is (10 sec: 4505.6, 60 sec: 3413.3, 300 sec: 3159.8). Total num frames: 10448896. Throughput: 0: 888.6. Samples: 111242. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 19:02:07,066][00189] Avg episode reward: [(0, '28.178')] [2024-09-29 19:02:09,751][21450] Updated weights for policy 0, policy_version 2553 (0.0016) [2024-09-29 19:02:12,062][00189] Fps is (10 sec: 3276.3, 60 sec: 3345.0, 300 sec: 3135.5). Total num frames: 10461184. Throughput: 0: 875.2. Samples: 113364. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 19:02:12,065][00189] Avg episode reward: [(0, '27.433')] [2024-09-29 19:02:17,061][00189] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3167.6). Total num frames: 10481664. Throughput: 0: 879.0. Samples: 119002. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-29 19:02:17,068][00189] Avg episode reward: [(0, '24.261')] [2024-09-29 19:02:20,016][21450] Updated weights for policy 0, policy_version 2563 (0.0015) [2024-09-29 19:02:22,061][00189] Fps is (10 sec: 4506.3, 60 sec: 3686.4, 300 sec: 3223.9). Total num frames: 10506240. Throughput: 0: 958.2. Samples: 125882. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 19:02:22,068][00189] Avg episode reward: [(0, '24.787')] [2024-09-29 19:02:27,061][00189] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3225.6). Total num frames: 10522624. Throughput: 0: 964.6. Samples: 128522. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 19:02:27,068][00189] Avg episode reward: [(0, '23.522')] [2024-09-29 19:02:31,892][21450] Updated weights for policy 0, policy_version 2573 (0.0012) [2024-09-29 19:02:32,061][00189] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3227.2). Total num frames: 10539008. Throughput: 0: 932.5. Samples: 132920. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 19:02:32,063][00189] Avg episode reward: [(0, '23.883')] [2024-09-29 19:02:37,061][00189] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3252.7). Total num frames: 10559488. Throughput: 0: 959.8. Samples: 139550. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-29 19:02:37,063][00189] Avg episode reward: [(0, '25.250')] [2024-09-29 19:02:41,156][21450] Updated weights for policy 0, policy_version 2583 (0.0018) [2024-09-29 19:02:42,061][00189] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3276.8). Total num frames: 10579968. Throughput: 0: 988.3. Samples: 142944. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-29 19:02:42,063][00189] Avg episode reward: [(0, '25.485')] [2024-09-29 19:02:47,061][00189] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3276.8). Total num frames: 10596352. Throughput: 0: 942.6. Samples: 147294. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 19:02:47,063][00189] Avg episode reward: [(0, '24.850')] [2024-09-29 19:02:52,061][00189] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3298.9). Total num frames: 10616832. Throughput: 0: 931.8. Samples: 153174. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-29 19:02:52,066][00189] Avg episode reward: [(0, '23.798')] [2024-09-29 19:02:52,808][21450] Updated weights for policy 0, policy_version 2593 (0.0029) [2024-09-29 19:02:57,061][00189] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3319.9). Total num frames: 10637312. Throughput: 0: 955.9. Samples: 156376. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 19:02:57,063][00189] Avg episode reward: [(0, '25.108')] [2024-09-29 19:03:02,061][00189] Fps is (10 sec: 3276.7, 60 sec: 3686.4, 300 sec: 3297.8). Total num frames: 10649600. Throughput: 0: 932.9. Samples: 160984. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 19:03:02,068][00189] Avg episode reward: [(0, '26.259')] [2024-09-29 19:03:06,465][21450] Updated weights for policy 0, policy_version 2603 (0.0031) [2024-09-29 19:03:07,061][00189] Fps is (10 sec: 2457.6, 60 sec: 3549.9, 300 sec: 3276.8). Total num frames: 10661888. Throughput: 0: 866.4. Samples: 164868. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-29 19:03:07,065][00189] Avg episode reward: [(0, '26.356')] [2024-09-29 19:03:12,061][00189] Fps is (10 sec: 3276.9, 60 sec: 3686.5, 300 sec: 3296.8). Total num frames: 10682368. Throughput: 0: 871.1. Samples: 167720. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-29 19:03:12,064][00189] Avg episode reward: [(0, '27.206')] [2024-09-29 19:03:17,062][00189] Fps is (10 sec: 3686.0, 60 sec: 3618.1, 300 sec: 3296.3). Total num frames: 10698752. Throughput: 0: 896.8. Samples: 173278. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-29 19:03:17,068][00189] Avg episode reward: [(0, '27.060')] [2024-09-29 19:03:18,337][21450] Updated weights for policy 0, policy_version 2613 (0.0018) [2024-09-29 19:03:22,061][00189] Fps is (10 sec: 2867.2, 60 sec: 3413.3, 300 sec: 3276.8). Total num frames: 10711040. Throughput: 0: 827.6. Samples: 176794. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 19:03:22,064][00189] Avg episode reward: [(0, '26.727')] [2024-09-29 19:03:27,061][00189] Fps is (10 sec: 2867.5, 60 sec: 3413.3, 300 sec: 3276.8). Total num frames: 10727424. Throughput: 0: 809.2. Samples: 179360. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 19:03:27,066][00189] Avg episode reward: [(0, '27.493')] [2024-09-29 19:03:30,976][21450] Updated weights for policy 0, policy_version 2623 (0.0015) [2024-09-29 19:03:32,061][00189] Fps is (10 sec: 3686.3, 60 sec: 3481.6, 300 sec: 3295.0). Total num frames: 10747904. Throughput: 0: 834.4. Samples: 184844. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-29 19:03:32,071][00189] Avg episode reward: [(0, '27.466')] [2024-09-29 19:03:32,081][21437] Saving /content/train_dir/samplefactory-vizdoom-v1/checkpoint_p0/checkpoint_000002624_10747904.pth... [2024-09-29 19:03:32,201][21437] Removing /content/train_dir/samplefactory-vizdoom-v1/checkpoint_p0/checkpoint_000002443_10006528.pth [2024-09-29 19:03:37,061][00189] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3276.8). Total num frames: 10760192. Throughput: 0: 803.1. Samples: 189314. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 19:03:37,065][00189] Avg episode reward: [(0, '25.856')] [2024-09-29 19:03:42,061][00189] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 3276.8). Total num frames: 10776576. Throughput: 0: 779.2. Samples: 191440. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 19:03:42,068][00189] Avg episode reward: [(0, '26.982')] [2024-09-29 19:03:43,380][21450] Updated weights for policy 0, policy_version 2633 (0.0025) [2024-09-29 19:03:47,061][00189] Fps is (10 sec: 4096.1, 60 sec: 3413.3, 300 sec: 3310.9). Total num frames: 10801152. Throughput: 0: 824.0. Samples: 198062. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 19:03:47,066][00189] Avg episode reward: [(0, '26.232')] [2024-09-29 19:03:52,061][00189] Fps is (10 sec: 4095.9, 60 sec: 3345.0, 300 sec: 3310.2). Total num frames: 10817536. Throughput: 0: 873.9. Samples: 204192. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 19:03:52,064][00189] Avg episode reward: [(0, '26.913')] [2024-09-29 19:03:53,933][21450] Updated weights for policy 0, policy_version 2643 (0.0018) [2024-09-29 19:03:57,061][00189] Fps is (10 sec: 3276.7, 60 sec: 3276.8, 300 sec: 3309.6). Total num frames: 10833920. Throughput: 0: 855.2. Samples: 206202. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-29 19:03:57,064][00189] Avg episode reward: [(0, '27.887')] [2024-09-29 19:04:02,061][00189] Fps is (10 sec: 3686.6, 60 sec: 3413.3, 300 sec: 3325.0). Total num frames: 10854400. Throughput: 0: 855.6. Samples: 211780. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-29 19:04:02,068][00189] Avg episode reward: [(0, '27.965')] [2024-09-29 19:04:04,520][21450] Updated weights for policy 0, policy_version 2653 (0.0018) [2024-09-29 19:04:07,061][00189] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3339.8). Total num frames: 10874880. Throughput: 0: 925.3. Samples: 218434. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 19:04:07,066][00189] Avg episode reward: [(0, '27.701')] [2024-09-29 19:04:12,061][00189] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3338.6). Total num frames: 10891264. Throughput: 0: 922.7. Samples: 220882. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-29 19:04:12,064][00189] Avg episode reward: [(0, '26.966')] [2024-09-29 19:04:16,459][21450] Updated weights for policy 0, policy_version 2663 (0.0016) [2024-09-29 19:04:17,061][00189] Fps is (10 sec: 3276.9, 60 sec: 3481.7, 300 sec: 3337.5). Total num frames: 10907648. Throughput: 0: 897.8. Samples: 225246. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-29 19:04:17,063][00189] Avg episode reward: [(0, '27.117')] [2024-09-29 19:04:22,061][00189] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3366.2). Total num frames: 10932224. Throughput: 0: 946.9. Samples: 231926. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 19:04:22,068][00189] Avg episode reward: [(0, '25.993')] [2024-09-29 19:04:26,306][21450] Updated weights for policy 0, policy_version 2673 (0.0013) [2024-09-29 19:04:27,061][00189] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3364.6). Total num frames: 10948608. Throughput: 0: 971.3. Samples: 235148. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 19:04:27,063][00189] Avg episode reward: [(0, '25.341')] [2024-09-29 19:04:32,061][00189] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3348.7). Total num frames: 10960896. Throughput: 0: 904.4. Samples: 238760. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 19:04:32,067][00189] Avg episode reward: [(0, '25.448')] [2024-09-29 19:04:37,061][00189] Fps is (10 sec: 2867.2, 60 sec: 3618.1, 300 sec: 3347.4). Total num frames: 10977280. Throughput: 0: 876.8. Samples: 243648. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-29 19:04:37,066][00189] Avg episode reward: [(0, '25.057')] [2024-09-29 19:04:39,705][21450] Updated weights for policy 0, policy_version 2683 (0.0025) [2024-09-29 19:04:42,061][00189] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3360.1). Total num frames: 10997760. Throughput: 0: 896.2. Samples: 246530. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 19:04:42,065][00189] Avg episode reward: [(0, '26.025')] [2024-09-29 19:04:47,061][00189] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3401.8). Total num frames: 11010048. Throughput: 0: 875.6. Samples: 251180. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 19:04:47,064][00189] Avg episode reward: [(0, '27.011')] [2024-09-29 19:04:52,061][00189] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3457.3). Total num frames: 11026432. Throughput: 0: 823.6. Samples: 255494. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-29 19:04:52,067][00189] Avg episode reward: [(0, '27.702')] [2024-09-29 19:04:53,187][21450] Updated weights for policy 0, policy_version 2693 (0.0027) [2024-09-29 19:04:57,061][00189] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3512.8). Total num frames: 11042816. Throughput: 0: 831.0. Samples: 258278. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 19:04:57,068][00189] Avg episode reward: [(0, '28.662')] [2024-09-29 19:05:02,061][00189] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3499.0). Total num frames: 11059200. Throughput: 0: 855.7. Samples: 263752. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 19:05:02,065][00189] Avg episode reward: [(0, '30.593')] [2024-09-29 19:05:05,511][21450] Updated weights for policy 0, policy_version 2703 (0.0017) [2024-09-29 19:05:07,062][00189] Fps is (10 sec: 3276.3, 60 sec: 3345.0, 300 sec: 3512.8). Total num frames: 11075584. Throughput: 0: 797.2. Samples: 267802. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 19:05:07,067][00189] Avg episode reward: [(0, '30.850')] [2024-09-29 19:05:12,061][00189] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3512.8). Total num frames: 11096064. Throughput: 0: 794.9. Samples: 270918. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-29 19:05:12,064][00189] Avg episode reward: [(0, '30.066')] [2024-09-29 19:05:15,511][21450] Updated weights for policy 0, policy_version 2713 (0.0013) [2024-09-29 19:05:17,061][00189] Fps is (10 sec: 4096.7, 60 sec: 3481.6, 300 sec: 3499.0). Total num frames: 11116544. Throughput: 0: 867.2. Samples: 277786. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-29 19:05:17,065][00189] Avg episode reward: [(0, '29.321')] [2024-09-29 19:05:22,063][00189] Fps is (10 sec: 3685.4, 60 sec: 3344.9, 300 sec: 3512.8). Total num frames: 11132928. Throughput: 0: 867.1. Samples: 282668. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-29 19:05:22,067][00189] Avg episode reward: [(0, '29.547')] [2024-09-29 19:05:27,061][00189] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3512.8). Total num frames: 11149312. Throughput: 0: 851.6. Samples: 284854. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 19:05:27,064][00189] Avg episode reward: [(0, '29.175')] [2024-09-29 19:05:27,272][21450] Updated weights for policy 0, policy_version 2723 (0.0022) [2024-09-29 19:05:32,061][00189] Fps is (10 sec: 4097.0, 60 sec: 3549.9, 300 sec: 3512.8). Total num frames: 11173888. Throughput: 0: 899.9. Samples: 291674. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-29 19:05:32,069][00189] Avg episode reward: [(0, '28.659')] [2024-09-29 19:05:32,077][21437] Saving /content/train_dir/samplefactory-vizdoom-v1/checkpoint_p0/checkpoint_000002728_11173888.pth... [2024-09-29 19:05:32,199][21437] Removing /content/train_dir/samplefactory-vizdoom-v1/checkpoint_p0/checkpoint_000002520_10321920.pth [2024-09-29 19:05:36,998][21450] Updated weights for policy 0, policy_version 2733 (0.0023) [2024-09-29 19:05:37,061][00189] Fps is (10 sec: 4505.6, 60 sec: 3618.1, 300 sec: 3526.7). Total num frames: 11194368. Throughput: 0: 932.4. Samples: 297450. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 19:05:37,066][00189] Avg episode reward: [(0, '28.462')] [2024-09-29 19:05:42,061][00189] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3526.7). Total num frames: 11206656. Throughput: 0: 917.3. Samples: 299558. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-29 19:05:42,066][00189] Avg episode reward: [(0, '28.873')] [2024-09-29 19:05:47,061][00189] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3512.8). Total num frames: 11227136. Throughput: 0: 927.6. Samples: 305492. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 19:05:47,065][00189] Avg episode reward: [(0, '29.590')] [2024-09-29 19:05:48,128][21450] Updated weights for policy 0, policy_version 2743 (0.0012) [2024-09-29 19:05:52,061][00189] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3526.7). Total num frames: 11251712. Throughput: 0: 988.3. Samples: 312274. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-29 19:05:52,066][00189] Avg episode reward: [(0, '28.931')] [2024-09-29 19:05:57,061][00189] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3540.6). Total num frames: 11268096. Throughput: 0: 970.4. Samples: 314586. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 19:05:57,068][00189] Avg episode reward: [(0, '28.541')] [2024-09-29 19:05:59,922][21450] Updated weights for policy 0, policy_version 2753 (0.0012) [2024-09-29 19:06:02,061][00189] Fps is (10 sec: 3276.7, 60 sec: 3754.7, 300 sec: 3526.7). Total num frames: 11284480. Throughput: 0: 923.8. Samples: 319356. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 19:06:02,064][00189] Avg episode reward: [(0, '28.350')] [2024-09-29 19:06:07,061][00189] Fps is (10 sec: 4096.0, 60 sec: 3891.3, 300 sec: 3554.5). Total num frames: 11309056. Throughput: 0: 967.7. Samples: 326210. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-29 19:06:07,067][00189] Avg episode reward: [(0, '27.386')] [2024-09-29 19:06:08,857][21450] Updated weights for policy 0, policy_version 2763 (0.0015) [2024-09-29 19:06:12,061][00189] Fps is (10 sec: 4095.8, 60 sec: 3822.9, 300 sec: 3568.4). Total num frames: 11325440. Throughput: 0: 991.4. Samples: 329468. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-29 19:06:12,065][00189] Avg episode reward: [(0, '27.214')] [2024-09-29 19:06:17,061][00189] Fps is (10 sec: 2867.2, 60 sec: 3686.4, 300 sec: 3568.4). Total num frames: 11337728. Throughput: 0: 930.6. Samples: 333550. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-29 19:06:17,066][00189] Avg episode reward: [(0, '26.636')] [2024-09-29 19:06:20,576][21450] Updated weights for policy 0, policy_version 2773 (0.0023) [2024-09-29 19:06:22,061][00189] Fps is (10 sec: 3686.6, 60 sec: 3823.1, 300 sec: 3582.3). Total num frames: 11362304. Throughput: 0: 948.3. Samples: 340122. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 19:06:22,062][00189] Avg episode reward: [(0, '27.198')] [2024-09-29 19:06:27,061][00189] Fps is (10 sec: 4915.2, 60 sec: 3959.5, 300 sec: 3610.0). Total num frames: 11386880. Throughput: 0: 977.8. Samples: 343560. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 19:06:27,065][00189] Avg episode reward: [(0, '28.702')] [2024-09-29 19:06:31,049][21450] Updated weights for policy 0, policy_version 2783 (0.0017) [2024-09-29 19:06:32,065][00189] Fps is (10 sec: 3684.7, 60 sec: 3754.4, 300 sec: 3610.0). Total num frames: 11399168. Throughput: 0: 958.1. Samples: 348610. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-29 19:06:32,068][00189] Avg episode reward: [(0, '28.663')] [2024-09-29 19:06:37,061][00189] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3623.9). Total num frames: 11419648. Throughput: 0: 927.9. Samples: 354028. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-29 19:06:37,063][00189] Avg episode reward: [(0, '29.015')] [2024-09-29 19:06:41,304][21450] Updated weights for policy 0, policy_version 2793 (0.0025) [2024-09-29 19:06:42,061][00189] Fps is (10 sec: 4097.9, 60 sec: 3891.2, 300 sec: 3623.9). Total num frames: 11440128. Throughput: 0: 952.4. Samples: 357442. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 19:06:42,063][00189] Avg episode reward: [(0, '29.917')] [2024-09-29 19:06:47,061][00189] Fps is (10 sec: 4095.7, 60 sec: 3891.1, 300 sec: 3637.8). Total num frames: 11460608. Throughput: 0: 981.7. Samples: 363534. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 19:06:47,067][00189] Avg episode reward: [(0, '29.649')] [2024-09-29 19:06:52,061][00189] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3623.9). Total num frames: 11472896. Throughput: 0: 927.2. Samples: 367932. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-29 19:06:52,068][00189] Avg episode reward: [(0, '29.128')] [2024-09-29 19:06:53,083][21450] Updated weights for policy 0, policy_version 2803 (0.0016) [2024-09-29 19:06:57,061][00189] Fps is (10 sec: 3277.1, 60 sec: 3754.7, 300 sec: 3610.0). Total num frames: 11493376. Throughput: 0: 930.5. Samples: 371340. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-29 19:06:57,065][00189] Avg episode reward: [(0, '27.929')] [2024-09-29 19:07:02,061][00189] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3610.0). Total num frames: 11513856. Throughput: 0: 969.4. Samples: 377172. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-29 19:07:02,068][00189] Avg episode reward: [(0, '28.443')] [2024-09-29 19:07:04,403][21450] Updated weights for policy 0, policy_version 2813 (0.0025) [2024-09-29 19:07:07,062][00189] Fps is (10 sec: 3276.3, 60 sec: 3618.0, 300 sec: 3610.0). Total num frames: 11526144. Throughput: 0: 902.9. Samples: 380754. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-29 19:07:07,067][00189] Avg episode reward: [(0, '29.373')] [2024-09-29 19:07:12,061][00189] Fps is (10 sec: 2867.1, 60 sec: 3618.1, 300 sec: 3596.1). Total num frames: 11542528. Throughput: 0: 877.4. Samples: 383042. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-29 19:07:12,064][00189] Avg episode reward: [(0, '29.430')] [2024-09-29 19:07:16,903][21450] Updated weights for policy 0, policy_version 2823 (0.0013) [2024-09-29 19:07:17,061][00189] Fps is (10 sec: 3687.0, 60 sec: 3754.7, 300 sec: 3582.3). Total num frames: 11563008. Throughput: 0: 893.3. Samples: 388806. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-29 19:07:17,063][00189] Avg episode reward: [(0, '28.894')] [2024-09-29 19:07:22,061][00189] Fps is (10 sec: 3276.9, 60 sec: 3549.9, 300 sec: 3568.4). Total num frames: 11575296. Throughput: 0: 876.6. Samples: 393476. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 19:07:22,063][00189] Avg episode reward: [(0, '28.404')] [2024-09-29 19:07:27,061][00189] Fps is (10 sec: 2867.2, 60 sec: 3413.3, 300 sec: 3568.4). Total num frames: 11591680. Throughput: 0: 838.2. Samples: 395162. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-29 19:07:27,067][00189] Avg episode reward: [(0, '28.274')] [2024-09-29 19:07:30,237][21450] Updated weights for policy 0, policy_version 2833 (0.0017) [2024-09-29 19:07:32,061][00189] Fps is (10 sec: 3276.8, 60 sec: 3481.9, 300 sec: 3554.5). Total num frames: 11608064. Throughput: 0: 825.3. Samples: 400674. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 19:07:32,069][00189] Avg episode reward: [(0, '28.666')] [2024-09-29 19:07:32,080][21437] Saving /content/train_dir/samplefactory-vizdoom-v1/checkpoint_p0/checkpoint_000002834_11608064.pth... [2024-09-29 19:07:32,226][21437] Removing /content/train_dir/samplefactory-vizdoom-v1/checkpoint_p0/checkpoint_000002624_10747904.pth [2024-09-29 19:07:37,061][00189] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3554.5). Total num frames: 11628544. Throughput: 0: 867.3. Samples: 406962. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-29 19:07:37,066][00189] Avg episode reward: [(0, '28.100')] [2024-09-29 19:07:41,833][21450] Updated weights for policy 0, policy_version 2843 (0.0016) [2024-09-29 19:07:42,064][00189] Fps is (10 sec: 3685.1, 60 sec: 3413.1, 300 sec: 3554.5). Total num frames: 11644928. Throughput: 0: 835.5. Samples: 408940. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 19:07:42,066][00189] Avg episode reward: [(0, '27.308')] [2024-09-29 19:07:47,061][00189] Fps is (10 sec: 3686.4, 60 sec: 3413.4, 300 sec: 3554.5). Total num frames: 11665408. Throughput: 0: 823.3. Samples: 414222. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-29 19:07:47,063][00189] Avg episode reward: [(0, '27.174')] [2024-09-29 19:07:51,462][21450] Updated weights for policy 0, policy_version 2853 (0.0015) [2024-09-29 19:07:52,061][00189] Fps is (10 sec: 4097.5, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 11685888. Throughput: 0: 897.1. Samples: 421122. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 19:07:52,063][00189] Avg episode reward: [(0, '26.635')] [2024-09-29 19:07:57,061][00189] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3568.4). Total num frames: 11702272. Throughput: 0: 908.5. Samples: 423924. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 19:07:57,065][00189] Avg episode reward: [(0, '27.857')] [2024-09-29 19:08:02,061][00189] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3582.3). Total num frames: 11718656. Throughput: 0: 878.4. Samples: 428332. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 19:08:02,066][00189] Avg episode reward: [(0, '27.094')] [2024-09-29 19:08:03,033][21450] Updated weights for policy 0, policy_version 2863 (0.0012) [2024-09-29 19:08:07,063][00189] Fps is (10 sec: 4095.0, 60 sec: 3618.1, 300 sec: 3596.1). Total num frames: 11743232. Throughput: 0: 929.8. Samples: 435318. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 19:08:07,068][00189] Avg episode reward: [(0, '27.879')] [2024-09-29 19:08:12,063][00189] Fps is (10 sec: 4504.7, 60 sec: 3686.3, 300 sec: 3610.0). Total num frames: 11763712. Throughput: 0: 963.7. Samples: 438532. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-29 19:08:12,070][00189] Avg episode reward: [(0, '28.135')] [2024-09-29 19:08:13,078][21450] Updated weights for policy 0, policy_version 2873 (0.0027) [2024-09-29 19:08:17,061][00189] Fps is (10 sec: 3277.5, 60 sec: 3549.9, 300 sec: 3610.0). Total num frames: 11776000. Throughput: 0: 938.0. Samples: 442882. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 19:08:17,065][00189] Avg episode reward: [(0, '28.643')] [2024-09-29 19:08:22,061][00189] Fps is (10 sec: 3277.5, 60 sec: 3686.4, 300 sec: 3623.9). Total num frames: 11796480. Throughput: 0: 929.9. Samples: 448806. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-29 19:08:22,067][00189] Avg episode reward: [(0, '28.190')] [2024-09-29 19:08:23,959][21450] Updated weights for policy 0, policy_version 2883 (0.0013) [2024-09-29 19:08:27,061][00189] Fps is (10 sec: 4505.7, 60 sec: 3822.9, 300 sec: 3637.8). Total num frames: 11821056. Throughput: 0: 961.8. Samples: 452218. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-29 19:08:27,068][00189] Avg episode reward: [(0, '27.788')] [2024-09-29 19:08:32,061][00189] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3651.7). Total num frames: 11837440. Throughput: 0: 972.0. Samples: 457962. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 19:08:32,066][00189] Avg episode reward: [(0, '29.280')] [2024-09-29 19:08:35,827][21450] Updated weights for policy 0, policy_version 2893 (0.0015) [2024-09-29 19:08:37,061][00189] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3651.7). Total num frames: 11853824. Throughput: 0: 917.2. Samples: 462394. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 19:08:37,068][00189] Avg episode reward: [(0, '27.686')] [2024-09-29 19:08:42,061][00189] Fps is (10 sec: 3276.7, 60 sec: 3754.9, 300 sec: 3623.9). Total num frames: 11870208. Throughput: 0: 918.6. Samples: 465262. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 19:08:42,068][00189] Avg episode reward: [(0, '28.587')] [2024-09-29 19:08:47,062][00189] Fps is (10 sec: 3276.4, 60 sec: 3686.3, 300 sec: 3623.9). Total num frames: 11886592. Throughput: 0: 943.8. Samples: 470806. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 19:08:47,065][00189] Avg episode reward: [(0, '27.781')] [2024-09-29 19:08:47,345][21450] Updated weights for policy 0, policy_version 2903 (0.0014) [2024-09-29 19:08:52,061][00189] Fps is (10 sec: 2867.1, 60 sec: 3549.8, 300 sec: 3610.0). Total num frames: 11898880. Throughput: 0: 870.4. Samples: 474484. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-29 19:08:52,064][00189] Avg episode reward: [(0, '27.066')] [2024-09-29 19:08:57,061][00189] Fps is (10 sec: 3277.2, 60 sec: 3618.1, 300 sec: 3610.0). Total num frames: 11919360. Throughput: 0: 858.2. Samples: 477148. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 19:08:57,067][00189] Avg episode reward: [(0, '25.574')] [2024-09-29 19:08:59,856][21450] Updated weights for policy 0, policy_version 2913 (0.0026) [2024-09-29 19:09:02,061][00189] Fps is (10 sec: 4096.2, 60 sec: 3686.4, 300 sec: 3610.0). Total num frames: 11939840. Throughput: 0: 889.1. Samples: 482890. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 19:09:02,066][00189] Avg episode reward: [(0, '24.322')] [2024-09-29 19:09:07,061][00189] Fps is (10 sec: 3276.7, 60 sec: 3481.7, 300 sec: 3596.1). Total num frames: 11952128. Throughput: 0: 853.0. Samples: 487190. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-29 19:09:07,067][00189] Avg episode reward: [(0, '24.134')] [2024-09-29 19:09:12,061][00189] Fps is (10 sec: 2457.6, 60 sec: 3345.2, 300 sec: 3582.3). Total num frames: 11964416. Throughput: 0: 814.3. Samples: 488860. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 19:09:12,063][00189] Avg episode reward: [(0, '24.658')] [2024-09-29 19:09:13,057][21450] Updated weights for policy 0, policy_version 2923 (0.0020) [2024-09-29 19:09:17,061][00189] Fps is (10 sec: 3686.3, 60 sec: 3549.8, 300 sec: 3582.3). Total num frames: 11988992. Throughput: 0: 834.3. Samples: 495504. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-29 19:09:17,066][00189] Avg episode reward: [(0, '24.304')] [2024-09-29 19:09:22,064][00189] Fps is (10 sec: 4504.0, 60 sec: 3549.7, 300 sec: 3596.1). Total num frames: 12009472. Throughput: 0: 871.0. Samples: 501590. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 19:09:22,071][00189] Avg episode reward: [(0, '26.394')] [2024-09-29 19:09:23,254][21450] Updated weights for policy 0, policy_version 2933 (0.0018) [2024-09-29 19:09:27,061][00189] Fps is (10 sec: 3277.0, 60 sec: 3345.1, 300 sec: 3596.1). Total num frames: 12021760. Throughput: 0: 853.2. Samples: 503654. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-29 19:09:27,066][00189] Avg episode reward: [(0, '27.384')] [2024-09-29 19:09:32,061][00189] Fps is (10 sec: 3687.7, 60 sec: 3481.6, 300 sec: 3623.9). Total num frames: 12046336. Throughput: 0: 862.3. Samples: 509608. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 19:09:32,063][00189] Avg episode reward: [(0, '28.096')] [2024-09-29 19:09:32,072][21437] Saving /content/train_dir/samplefactory-vizdoom-v1/checkpoint_p0/checkpoint_000002941_12046336.pth... [2024-09-29 19:09:32,201][21437] Removing /content/train_dir/samplefactory-vizdoom-v1/checkpoint_p0/checkpoint_000002728_11173888.pth [2024-09-29 19:09:33,588][21450] Updated weights for policy 0, policy_version 2943 (0.0013) [2024-09-29 19:09:37,061][00189] Fps is (10 sec: 4505.6, 60 sec: 3549.9, 300 sec: 3623.9). Total num frames: 12066816. Throughput: 0: 935.5. Samples: 516582. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 19:09:37,067][00189] Avg episode reward: [(0, '27.273')] [2024-09-29 19:09:42,061][00189] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3637.8). Total num frames: 12083200. Throughput: 0: 923.4. Samples: 518700. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-29 19:09:42,063][00189] Avg episode reward: [(0, '27.370')] [2024-09-29 19:09:45,450][21450] Updated weights for policy 0, policy_version 2953 (0.0029) [2024-09-29 19:09:47,061][00189] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3637.8). Total num frames: 12099584. Throughput: 0: 904.3. Samples: 523582. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-29 19:09:47,065][00189] Avg episode reward: [(0, '26.619')] [2024-09-29 19:09:52,061][00189] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3665.6). Total num frames: 12124160. Throughput: 0: 959.3. Samples: 530358. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 19:09:52,067][00189] Avg episode reward: [(0, '25.246')] [2024-09-29 19:09:54,412][21450] Updated weights for policy 0, policy_version 2963 (0.0013) [2024-09-29 19:09:57,061][00189] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3665.6). Total num frames: 12140544. Throughput: 0: 986.3. Samples: 533244. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 19:09:57,064][00189] Avg episode reward: [(0, '25.160')] [2024-09-29 19:10:02,061][00189] Fps is (10 sec: 2867.0, 60 sec: 3549.8, 300 sec: 3651.7). Total num frames: 12152832. Throughput: 0: 917.2. Samples: 536776. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-29 19:10:02,065][00189] Avg episode reward: [(0, '25.783')] [2024-09-29 19:10:07,061][00189] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3651.7). Total num frames: 12173312. Throughput: 0: 903.9. Samples: 542262. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-29 19:10:07,064][00189] Avg episode reward: [(0, '26.193')] [2024-09-29 19:10:07,843][21450] Updated weights for policy 0, policy_version 2973 (0.0026) [2024-09-29 19:10:12,061][00189] Fps is (10 sec: 3686.6, 60 sec: 3754.7, 300 sec: 3637.8). Total num frames: 12189696. Throughput: 0: 918.5. Samples: 544988. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 19:10:12,068][00189] Avg episode reward: [(0, '27.203')] [2024-09-29 19:10:17,061][00189] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3624.0). Total num frames: 12201984. Throughput: 0: 883.2. Samples: 549352. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-29 19:10:17,065][00189] Avg episode reward: [(0, '27.749')] [2024-09-29 19:10:21,582][21450] Updated weights for policy 0, policy_version 2983 (0.0025) [2024-09-29 19:10:22,061][00189] Fps is (10 sec: 2867.2, 60 sec: 3481.8, 300 sec: 3623.9). Total num frames: 12218368. Throughput: 0: 828.2. Samples: 553850. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 19:10:22,063][00189] Avg episode reward: [(0, '28.270')] [2024-09-29 19:10:27,061][00189] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3610.0). Total num frames: 12238848. Throughput: 0: 847.0. Samples: 556816. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-29 19:10:27,063][00189] Avg episode reward: [(0, '28.501')] [2024-09-29 19:10:32,061][00189] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3596.1). Total num frames: 12255232. Throughput: 0: 857.1. Samples: 562150. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-29 19:10:32,065][00189] Avg episode reward: [(0, '28.396')] [2024-09-29 19:10:33,401][21450] Updated weights for policy 0, policy_version 2993 (0.0012) [2024-09-29 19:10:37,061][00189] Fps is (10 sec: 2867.2, 60 sec: 3345.1, 300 sec: 3596.1). Total num frames: 12267520. Throughput: 0: 806.3. Samples: 566642. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 19:10:37,066][00189] Avg episode reward: [(0, '29.464')] [2024-09-29 19:10:42,061][00189] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3610.0). Total num frames: 12292096. Throughput: 0: 816.5. Samples: 569988. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 19:10:42,065][00189] Avg episode reward: [(0, '30.474')] [2024-09-29 19:10:43,640][21450] Updated weights for policy 0, policy_version 3003 (0.0016) [2024-09-29 19:10:47,061][00189] Fps is (10 sec: 4505.6, 60 sec: 3549.9, 300 sec: 3596.1). Total num frames: 12312576. Throughput: 0: 888.9. Samples: 576774. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 19:10:47,063][00189] Avg episode reward: [(0, '32.683')] [2024-09-29 19:10:47,068][21437] Saving new best policy, reward=32.683! [2024-09-29 19:10:52,061][00189] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3596.1). Total num frames: 12328960. Throughput: 0: 861.4. Samples: 581024. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-29 19:10:52,065][00189] Avg episode reward: [(0, '31.066')] [2024-09-29 19:10:55,370][21450] Updated weights for policy 0, policy_version 3013 (0.0018) [2024-09-29 19:10:57,061][00189] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3596.2). Total num frames: 12345344. Throughput: 0: 860.6. Samples: 583714. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-29 19:10:57,068][00189] Avg episode reward: [(0, '30.052')] [2024-09-29 19:11:02,061][00189] Fps is (10 sec: 4096.0, 60 sec: 3618.2, 300 sec: 3596.1). Total num frames: 12369920. Throughput: 0: 913.5. Samples: 590458. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-29 19:11:02,068][00189] Avg episode reward: [(0, '31.990')] [2024-09-29 19:11:04,937][21450] Updated weights for policy 0, policy_version 3023 (0.0014) [2024-09-29 19:11:07,061][00189] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3596.2). Total num frames: 12386304. Throughput: 0: 933.7. Samples: 595866. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 19:11:07,067][00189] Avg episode reward: [(0, '31.798')] [2024-09-29 19:11:12,061][00189] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3610.0). Total num frames: 12402688. Throughput: 0: 916.0. Samples: 598034. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 19:11:12,066][00189] Avg episode reward: [(0, '31.782')] [2024-09-29 19:11:16,283][21450] Updated weights for policy 0, policy_version 3033 (0.0018) [2024-09-29 19:11:17,061][00189] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3596.1). Total num frames: 12423168. Throughput: 0: 941.4. Samples: 604514. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 19:11:17,063][00189] Avg episode reward: [(0, '30.029')] [2024-09-29 19:11:22,064][00189] Fps is (10 sec: 4504.0, 60 sec: 3822.7, 300 sec: 3596.1). Total num frames: 12447744. Throughput: 0: 987.3. Samples: 611076. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 19:11:22,068][00189] Avg episode reward: [(0, '30.112')] [2024-09-29 19:11:27,061][00189] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3596.2). Total num frames: 12460032. Throughput: 0: 952.0. Samples: 612830. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-29 19:11:27,063][00189] Avg episode reward: [(0, '29.105')] [2024-09-29 19:11:28,341][21450] Updated weights for policy 0, policy_version 3043 (0.0015) [2024-09-29 19:11:32,061][00189] Fps is (10 sec: 2868.2, 60 sec: 3686.4, 300 sec: 3582.3). Total num frames: 12476416. Throughput: 0: 902.4. Samples: 617384. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-29 19:11:32,063][00189] Avg episode reward: [(0, '28.785')] [2024-09-29 19:11:32,077][21437] Saving /content/train_dir/samplefactory-vizdoom-v1/checkpoint_p0/checkpoint_000003046_12476416.pth... [2024-09-29 19:11:32,213][21437] Removing /content/train_dir/samplefactory-vizdoom-v1/checkpoint_p0/checkpoint_000002834_11608064.pth [2024-09-29 19:11:37,061][00189] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3582.3). Total num frames: 12496896. Throughput: 0: 936.4. Samples: 623164. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-29 19:11:37,063][00189] Avg episode reward: [(0, '28.359')] [2024-09-29 19:11:39,336][21450] Updated weights for policy 0, policy_version 3053 (0.0017) [2024-09-29 19:11:42,061][00189] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3554.5). Total num frames: 12509184. Throughput: 0: 932.4. Samples: 625674. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-29 19:11:42,066][00189] Avg episode reward: [(0, '28.161')] [2024-09-29 19:11:47,061][00189] Fps is (10 sec: 2457.6, 60 sec: 3481.6, 300 sec: 3554.5). Total num frames: 12521472. Throughput: 0: 862.6. Samples: 629274. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-29 19:11:47,068][00189] Avg episode reward: [(0, '27.151')] [2024-09-29 19:11:52,061][00189] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 12541952. Throughput: 0: 869.7. Samples: 635002. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-09-29 19:11:52,064][00189] Avg episode reward: [(0, '25.698')] [2024-09-29 19:11:52,576][21450] Updated weights for policy 0, policy_version 3063 (0.0014) [2024-09-29 19:11:57,061][00189] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3554.5). Total num frames: 12562432. Throughput: 0: 888.3. Samples: 638008. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 19:11:57,070][00189] Avg episode reward: [(0, '26.616')] [2024-09-29 19:12:02,066][00189] Fps is (10 sec: 3274.9, 60 sec: 3413.0, 300 sec: 3554.4). Total num frames: 12574720. Throughput: 0: 840.2. Samples: 642330. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 19:12:02,069][00189] Avg episode reward: [(0, '27.216')] [2024-09-29 19:12:04,741][21450] Updated weights for policy 0, policy_version 3073 (0.0022) [2024-09-29 19:12:07,061][00189] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3568.4). Total num frames: 12595200. Throughput: 0: 826.7. Samples: 648276. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 19:12:07,062][00189] Avg episode reward: [(0, '26.885')] [2024-09-29 19:12:12,061][00189] Fps is (10 sec: 4508.2, 60 sec: 3618.1, 300 sec: 3582.3). Total num frames: 12619776. Throughput: 0: 865.6. Samples: 651780. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 19:12:12,065][00189] Avg episode reward: [(0, '26.605')] [2024-09-29 19:12:13,855][21450] Updated weights for policy 0, policy_version 3083 (0.0013) [2024-09-29 19:12:17,061][00189] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3596.2). Total num frames: 12636160. Throughput: 0: 891.6. Samples: 657506. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 19:12:17,066][00189] Avg episode reward: [(0, '27.800')] [2024-09-29 19:12:22,061][00189] Fps is (10 sec: 3276.8, 60 sec: 3413.5, 300 sec: 3596.1). Total num frames: 12652544. Throughput: 0: 871.0. Samples: 662358. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-29 19:12:22,067][00189] Avg episode reward: [(0, '28.970')] [2024-09-29 19:12:25,267][21450] Updated weights for policy 0, policy_version 3093 (0.0016) [2024-09-29 19:12:27,063][00189] Fps is (10 sec: 4095.0, 60 sec: 3618.0, 300 sec: 3623.9). Total num frames: 12677120. Throughput: 0: 892.2. Samples: 665826. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 19:12:27,069][00189] Avg episode reward: [(0, '29.565')] [2024-09-29 19:12:32,062][00189] Fps is (10 sec: 4505.1, 60 sec: 3686.3, 300 sec: 3623.9). Total num frames: 12697600. Throughput: 0: 965.0. Samples: 672700. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 19:12:32,071][00189] Avg episode reward: [(0, '30.081')] [2024-09-29 19:12:36,152][21450] Updated weights for policy 0, policy_version 3103 (0.0018) [2024-09-29 19:12:37,063][00189] Fps is (10 sec: 3276.8, 60 sec: 3549.7, 300 sec: 3610.0). Total num frames: 12709888. Throughput: 0: 933.9. Samples: 677028. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 19:12:37,066][00189] Avg episode reward: [(0, '30.262')] [2024-09-29 19:12:42,061][00189] Fps is (10 sec: 3686.7, 60 sec: 3754.7, 300 sec: 3623.9). Total num frames: 12734464. Throughput: 0: 939.4. Samples: 680280. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-29 19:12:42,063][00189] Avg episode reward: [(0, '30.692')] [2024-09-29 19:12:45,624][21450] Updated weights for policy 0, policy_version 3113 (0.0025) [2024-09-29 19:12:47,061][00189] Fps is (10 sec: 4506.8, 60 sec: 3891.2, 300 sec: 3623.9). Total num frames: 12754944. Throughput: 0: 994.9. Samples: 687094. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-29 19:12:47,063][00189] Avg episode reward: [(0, '29.772')] [2024-09-29 19:12:52,065][00189] Fps is (10 sec: 3684.7, 60 sec: 3822.6, 300 sec: 3623.9). Total num frames: 12771328. Throughput: 0: 977.1. Samples: 692248. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 19:12:52,068][00189] Avg episode reward: [(0, '29.415')] [2024-09-29 19:12:57,038][21450] Updated weights for policy 0, policy_version 3123 (0.0015) [2024-09-29 19:12:57,061][00189] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3637.8). Total num frames: 12791808. Throughput: 0: 948.0. Samples: 694438. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 19:12:57,066][00189] Avg episode reward: [(0, '29.256')] [2024-09-29 19:13:02,061][00189] Fps is (10 sec: 4097.9, 60 sec: 3959.8, 300 sec: 3623.9). Total num frames: 12812288. Throughput: 0: 976.7. Samples: 701456. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-29 19:13:02,065][00189] Avg episode reward: [(0, '29.086')] [2024-09-29 19:13:06,336][21450] Updated weights for policy 0, policy_version 3133 (0.0020) [2024-09-29 19:13:07,065][00189] Fps is (10 sec: 4094.2, 60 sec: 3959.2, 300 sec: 3623.9). Total num frames: 12832768. Throughput: 0: 1004.7. Samples: 707574. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 19:13:07,067][00189] Avg episode reward: [(0, '28.874')] [2024-09-29 19:13:12,062][00189] Fps is (10 sec: 3686.0, 60 sec: 3822.9, 300 sec: 3637.8). Total num frames: 12849152. Throughput: 0: 974.2. Samples: 709664. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 19:13:12,067][00189] Avg episode reward: [(0, '28.437')] [2024-09-29 19:13:17,061][00189] Fps is (10 sec: 3687.9, 60 sec: 3891.2, 300 sec: 3637.8). Total num frames: 12869632. Throughput: 0: 951.5. Samples: 715516. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 19:13:17,068][00189] Avg episode reward: [(0, '27.493')] [2024-09-29 19:13:17,739][21450] Updated weights for policy 0, policy_version 3143 (0.0015) [2024-09-29 19:13:22,061][00189] Fps is (10 sec: 4096.4, 60 sec: 3959.5, 300 sec: 3623.9). Total num frames: 12890112. Throughput: 0: 1006.6. Samples: 722324. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-29 19:13:22,068][00189] Avg episode reward: [(0, '28.733')] [2024-09-29 19:13:27,061][00189] Fps is (10 sec: 3686.5, 60 sec: 3823.1, 300 sec: 3623.9). Total num frames: 12906496. Throughput: 0: 983.1. Samples: 724518. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-29 19:13:27,065][00189] Avg episode reward: [(0, '28.758')] [2024-09-29 19:13:29,355][21450] Updated weights for policy 0, policy_version 3153 (0.0026) [2024-09-29 19:13:32,061][00189] Fps is (10 sec: 3686.4, 60 sec: 3823.0, 300 sec: 3637.8). Total num frames: 12926976. Throughput: 0: 940.8. Samples: 729428. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 19:13:32,068][00189] Avg episode reward: [(0, '29.599')] [2024-09-29 19:13:32,076][21437] Saving /content/train_dir/samplefactory-vizdoom-v1/checkpoint_p0/checkpoint_000003156_12926976.pth... [2024-09-29 19:13:32,192][21437] Removing /content/train_dir/samplefactory-vizdoom-v1/checkpoint_p0/checkpoint_000002941_12046336.pth [2024-09-29 19:13:37,061][00189] Fps is (10 sec: 4096.0, 60 sec: 3959.6, 300 sec: 3651.7). Total num frames: 12947456. Throughput: 0: 979.6. Samples: 736324. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 19:13:37,069][00189] Avg episode reward: [(0, '28.348')] [2024-09-29 19:13:38,335][21450] Updated weights for policy 0, policy_version 3163 (0.0012) [2024-09-29 19:13:42,061][00189] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3665.6). Total num frames: 12967936. Throughput: 0: 1001.6. Samples: 739512. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 19:13:42,067][00189] Avg episode reward: [(0, '29.860')] [2024-09-29 19:13:47,061][00189] Fps is (10 sec: 3276.7, 60 sec: 3754.6, 300 sec: 3665.6). Total num frames: 12980224. Throughput: 0: 937.3. Samples: 743636. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-09-29 19:13:47,064][00189] Avg episode reward: [(0, '30.203')] [2024-09-29 19:13:50,335][21450] Updated weights for policy 0, policy_version 3173 (0.0018) [2024-09-29 19:13:52,061][00189] Fps is (10 sec: 3276.7, 60 sec: 3823.2, 300 sec: 3665.6). Total num frames: 13000704. Throughput: 0: 946.0. Samples: 750138. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 19:13:52,063][00189] Avg episode reward: [(0, '31.440')] [2024-09-29 19:13:57,061][00189] Fps is (10 sec: 4505.7, 60 sec: 3891.2, 300 sec: 3679.5). Total num frames: 13025280. Throughput: 0: 969.1. Samples: 753274. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-29 19:13:57,063][00189] Avg episode reward: [(0, '31.362')] [2024-09-29 19:14:01,086][21450] Updated weights for policy 0, policy_version 3183 (0.0020) [2024-09-29 19:14:02,061][00189] Fps is (10 sec: 3686.3, 60 sec: 3754.6, 300 sec: 3679.5). Total num frames: 13037568. Throughput: 0: 949.9. Samples: 758262. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 19:14:02,067][00189] Avg episode reward: [(0, '31.869')] [2024-09-29 19:14:07,061][00189] Fps is (10 sec: 3276.8, 60 sec: 3754.9, 300 sec: 3707.2). Total num frames: 13058048. Throughput: 0: 914.7. Samples: 763486. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-29 19:14:07,065][00189] Avg episode reward: [(0, '29.519')] [2024-09-29 19:14:11,546][21450] Updated weights for policy 0, policy_version 3193 (0.0028) [2024-09-29 19:14:12,061][00189] Fps is (10 sec: 4096.2, 60 sec: 3823.0, 300 sec: 3693.3). Total num frames: 13078528. Throughput: 0: 941.2. Samples: 766874. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 19:14:12,067][00189] Avg episode reward: [(0, '30.184')] [2024-09-29 19:14:17,062][00189] Fps is (10 sec: 3685.8, 60 sec: 3754.6, 300 sec: 3679.5). Total num frames: 13094912. Throughput: 0: 961.7. Samples: 772708. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 19:14:17,066][00189] Avg episode reward: [(0, '28.500')] [2024-09-29 19:14:22,061][00189] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3693.3). Total num frames: 13111296. Throughput: 0: 903.6. Samples: 776984. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 19:14:22,063][00189] Avg episode reward: [(0, '29.544')] [2024-09-29 19:14:23,666][21450] Updated weights for policy 0, policy_version 3203 (0.0020) [2024-09-29 19:14:27,061][00189] Fps is (10 sec: 3687.0, 60 sec: 3754.7, 300 sec: 3679.5). Total num frames: 13131776. Throughput: 0: 903.8. Samples: 780184. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-29 19:14:27,063][00189] Avg episode reward: [(0, '28.451')] [2024-09-29 19:14:32,061][00189] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3679.5). Total num frames: 13152256. Throughput: 0: 954.2. Samples: 786576. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-29 19:14:32,067][00189] Avg episode reward: [(0, '28.476')] [2024-09-29 19:14:34,138][21450] Updated weights for policy 0, policy_version 3213 (0.0021) [2024-09-29 19:14:37,061][00189] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3665.6). Total num frames: 13164544. Throughput: 0: 903.3. Samples: 790788. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-29 19:14:37,068][00189] Avg episode reward: [(0, '28.604')] [2024-09-29 19:14:42,061][00189] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3679.5). Total num frames: 13185024. Throughput: 0: 886.5. Samples: 793168. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 19:14:42,063][00189] Avg episode reward: [(0, '30.214')] [2024-09-29 19:14:45,658][21450] Updated weights for policy 0, policy_version 3223 (0.0022) [2024-09-29 19:14:47,061][00189] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3665.6). Total num frames: 13205504. Throughput: 0: 918.4. Samples: 799590. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-29 19:14:47,063][00189] Avg episode reward: [(0, '31.731')] [2024-09-29 19:14:52,061][00189] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3665.6). Total num frames: 13221888. Throughput: 0: 918.1. Samples: 804802. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 19:14:52,065][00189] Avg episode reward: [(0, '31.781')] [2024-09-29 19:14:57,061][00189] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3679.5). Total num frames: 13238272. Throughput: 0: 888.3. Samples: 806848. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 19:14:57,064][00189] Avg episode reward: [(0, '31.296')] [2024-09-29 19:14:57,876][21450] Updated weights for policy 0, policy_version 3233 (0.0017) [2024-09-29 19:15:02,061][00189] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3679.5). Total num frames: 13258752. Throughput: 0: 894.7. Samples: 812968. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 19:15:02,063][00189] Avg episode reward: [(0, '30.493')] [2024-09-29 19:15:07,061][00189] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3693.3). Total num frames: 13279232. Throughput: 0: 940.6. Samples: 819312. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 19:15:07,066][00189] Avg episode reward: [(0, '28.403')] [2024-09-29 19:15:07,726][21450] Updated weights for policy 0, policy_version 3243 (0.0016) [2024-09-29 19:15:12,062][00189] Fps is (10 sec: 3276.5, 60 sec: 3549.8, 300 sec: 3693.3). Total num frames: 13291520. Throughput: 0: 912.6. Samples: 821250. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 19:15:12,064][00189] Avg episode reward: [(0, '27.262')] [2024-09-29 19:15:17,061][00189] Fps is (10 sec: 3276.8, 60 sec: 3618.2, 300 sec: 3707.2). Total num frames: 13312000. Throughput: 0: 878.8. Samples: 826120. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-29 19:15:17,063][00189] Avg episode reward: [(0, '26.396')] [2024-09-29 19:15:19,624][21450] Updated weights for policy 0, policy_version 3253 (0.0015) [2024-09-29 19:15:22,061][00189] Fps is (10 sec: 4096.4, 60 sec: 3686.4, 300 sec: 3707.2). Total num frames: 13332480. Throughput: 0: 931.6. Samples: 832712. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 19:15:22,063][00189] Avg episode reward: [(0, '27.614')] [2024-09-29 19:15:27,061][00189] Fps is (10 sec: 3686.3, 60 sec: 3618.1, 300 sec: 3707.2). Total num frames: 13348864. Throughput: 0: 938.9. Samples: 835418. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 19:15:27,071][00189] Avg episode reward: [(0, '27.065')] [2024-09-29 19:15:31,952][21450] Updated weights for policy 0, policy_version 3263 (0.0037) [2024-09-29 19:15:32,061][00189] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3721.1). Total num frames: 13365248. Throughput: 0: 884.4. Samples: 839386. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 19:15:32,069][00189] Avg episode reward: [(0, '28.038')] [2024-09-29 19:15:32,080][21437] Saving /content/train_dir/samplefactory-vizdoom-v1/checkpoint_p0/checkpoint_000003263_13365248.pth... [2024-09-29 19:15:32,214][21437] Removing /content/train_dir/samplefactory-vizdoom-v1/checkpoint_p0/checkpoint_000003046_12476416.pth [2024-09-29 19:15:37,061][00189] Fps is (10 sec: 3686.5, 60 sec: 3686.4, 300 sec: 3707.2). Total num frames: 13385728. Throughput: 0: 911.6. Samples: 845824. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-29 19:15:37,070][00189] Avg episode reward: [(0, '26.786')] [2024-09-29 19:15:41,271][21450] Updated weights for policy 0, policy_version 3273 (0.0026) [2024-09-29 19:15:42,061][00189] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3707.2). Total num frames: 13406208. Throughput: 0: 938.7. Samples: 849090. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 19:15:42,068][00189] Avg episode reward: [(0, '28.051')] [2024-09-29 19:15:47,061][00189] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3693.3). Total num frames: 13418496. Throughput: 0: 903.5. Samples: 853624. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-29 19:15:47,062][00189] Avg episode reward: [(0, '27.911')] [2024-09-29 19:15:52,061][00189] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3707.2). Total num frames: 13438976. Throughput: 0: 884.4. Samples: 859108. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 19:15:52,063][00189] Avg episode reward: [(0, '27.528')] [2024-09-29 19:15:53,522][21450] Updated weights for policy 0, policy_version 3283 (0.0019) [2024-09-29 19:15:57,061][00189] Fps is (10 sec: 4095.9, 60 sec: 3686.4, 300 sec: 3693.3). Total num frames: 13459456. Throughput: 0: 915.1. Samples: 862430. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 19:15:57,064][00189] Avg episode reward: [(0, '26.218')] [2024-09-29 19:16:02,061][00189] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3707.2). Total num frames: 13479936. Throughput: 0: 936.6. Samples: 868266. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 19:16:02,063][00189] Avg episode reward: [(0, '26.684')] [2024-09-29 19:16:04,836][21450] Updated weights for policy 0, policy_version 3293 (0.0014) [2024-09-29 19:16:07,061][00189] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3693.3). Total num frames: 13492224. Throughput: 0: 890.0. Samples: 872760. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-29 19:16:07,063][00189] Avg episode reward: [(0, '27.339')] [2024-09-29 19:16:12,061][00189] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3707.2). Total num frames: 13516800. Throughput: 0: 903.9. Samples: 876092. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 19:16:12,069][00189] Avg episode reward: [(0, '27.898')] [2024-09-29 19:16:14,630][21450] Updated weights for policy 0, policy_version 3303 (0.0013) [2024-09-29 19:16:17,061][00189] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3693.4). Total num frames: 13537280. Throughput: 0: 961.7. Samples: 882662. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 19:16:17,067][00189] Avg episode reward: [(0, '27.222')] [2024-09-29 19:16:22,061][00189] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3693.3). Total num frames: 13549568. Throughput: 0: 913.8. Samples: 886944. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 19:16:22,066][00189] Avg episode reward: [(0, '27.007')] [2024-09-29 19:16:26,548][21450] Updated weights for policy 0, policy_version 3313 (0.0029) [2024-09-29 19:16:27,061][00189] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3707.2). Total num frames: 13570048. Throughput: 0: 901.0. Samples: 889634. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 19:16:27,063][00189] Avg episode reward: [(0, '27.352')] [2024-09-29 19:16:32,061][00189] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3721.1). Total num frames: 13594624. Throughput: 0: 953.6. Samples: 896534. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 19:16:32,066][00189] Avg episode reward: [(0, '27.863')] [2024-09-29 19:16:36,848][21450] Updated weights for policy 0, policy_version 3323 (0.0012) [2024-09-29 19:16:37,061][00189] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3735.0). Total num frames: 13611008. Throughput: 0: 948.3. Samples: 901780. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-29 19:16:37,071][00189] Avg episode reward: [(0, '27.177')] [2024-09-29 19:16:42,061][00189] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3748.9). Total num frames: 13627392. Throughput: 0: 920.3. Samples: 903844. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-29 19:16:42,063][00189] Avg episode reward: [(0, '28.310')] [2024-09-29 19:16:47,061][00189] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3748.9). Total num frames: 13647872. Throughput: 0: 930.7. Samples: 910146. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-29 19:16:47,063][00189] Avg episode reward: [(0, '29.649')] [2024-09-29 19:16:47,677][21450] Updated weights for policy 0, policy_version 3333 (0.0020) [2024-09-29 19:16:52,061][00189] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3748.9). Total num frames: 13668352. Throughput: 0: 968.8. Samples: 916358. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-29 19:16:52,065][00189] Avg episode reward: [(0, '30.630')] [2024-09-29 19:16:57,061][00189] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3749.0). Total num frames: 13680640. Throughput: 0: 939.8. Samples: 918384. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-29 19:16:57,063][00189] Avg episode reward: [(0, '31.049')] [2024-09-29 19:16:59,586][21450] Updated weights for policy 0, policy_version 3343 (0.0012) [2024-09-29 19:17:02,061][00189] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3748.9). Total num frames: 13701120. Throughput: 0: 915.0. Samples: 923836. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-29 19:17:02,063][00189] Avg episode reward: [(0, '30.950')] [2024-09-29 19:17:07,061][00189] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3748.9). Total num frames: 13725696. Throughput: 0: 968.4. Samples: 930524. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-29 19:17:07,066][00189] Avg episode reward: [(0, '31.072')] [2024-09-29 19:17:09,148][21450] Updated weights for policy 0, policy_version 3353 (0.0015) [2024-09-29 19:17:12,062][00189] Fps is (10 sec: 3686.1, 60 sec: 3686.3, 300 sec: 3735.0). Total num frames: 13737984. Throughput: 0: 967.7. Samples: 933182. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-29 19:17:12,069][00189] Avg episode reward: [(0, '27.862')] [2024-09-29 19:17:17,061][00189] Fps is (10 sec: 2867.2, 60 sec: 3618.1, 300 sec: 3735.0). Total num frames: 13754368. Throughput: 0: 911.6. Samples: 937558. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 19:17:17,063][00189] Avg episode reward: [(0, '27.589')] [2024-09-29 19:17:20,739][21450] Updated weights for policy 0, policy_version 3363 (0.0017) [2024-09-29 19:17:22,061][00189] Fps is (10 sec: 4096.4, 60 sec: 3822.9, 300 sec: 3735.0). Total num frames: 13778944. Throughput: 0: 940.5. Samples: 944102. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-29 19:17:22,066][00189] Avg episode reward: [(0, '26.564')] [2024-09-29 19:17:27,064][00189] Fps is (10 sec: 4504.0, 60 sec: 3822.7, 300 sec: 3735.0). Total num frames: 13799424. Throughput: 0: 970.2. Samples: 947506. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 19:17:27,067][00189] Avg episode reward: [(0, '25.039')] [2024-09-29 19:17:32,061][00189] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3735.0). Total num frames: 13811712. Throughput: 0: 927.6. Samples: 951890. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 19:17:32,063][00189] Avg episode reward: [(0, '24.654')] [2024-09-29 19:17:32,081][21437] Saving /content/train_dir/samplefactory-vizdoom-v1/checkpoint_p0/checkpoint_000003372_13811712.pth... [2024-09-29 19:17:32,214][21437] Removing /content/train_dir/samplefactory-vizdoom-v1/checkpoint_p0/checkpoint_000003156_12926976.pth [2024-09-29 19:17:32,367][21450] Updated weights for policy 0, policy_version 3373 (0.0027) [2024-09-29 19:17:37,061][00189] Fps is (10 sec: 3687.7, 60 sec: 3754.7, 300 sec: 3735.0). Total num frames: 13836288. Throughput: 0: 921.4. Samples: 957822. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 19:17:37,065][00189] Avg episode reward: [(0, '23.850')] [2024-09-29 19:17:41,772][21450] Updated weights for policy 0, policy_version 3383 (0.0018) [2024-09-29 19:17:42,061][00189] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3735.0). Total num frames: 13856768. Throughput: 0: 951.1. Samples: 961184. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-29 19:17:42,068][00189] Avg episode reward: [(0, '26.675')] [2024-09-29 19:17:47,061][00189] Fps is (10 sec: 3686.3, 60 sec: 3754.7, 300 sec: 3735.1). Total num frames: 13873152. Throughput: 0: 951.4. Samples: 966648. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-09-29 19:17:47,064][00189] Avg episode reward: [(0, '24.745')] [2024-09-29 19:17:52,061][00189] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3721.1). Total num frames: 13889536. Throughput: 0: 909.9. Samples: 971468. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 19:17:52,063][00189] Avg episode reward: [(0, '25.036')] [2024-09-29 19:17:53,698][21450] Updated weights for policy 0, policy_version 3393 (0.0024) [2024-09-29 19:17:57,061][00189] Fps is (10 sec: 3686.5, 60 sec: 3822.9, 300 sec: 3721.1). Total num frames: 13910016. Throughput: 0: 927.5. Samples: 974918. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-29 19:17:57,069][00189] Avg episode reward: [(0, '25.498')] [2024-09-29 19:18:02,061][00189] Fps is (10 sec: 4095.9, 60 sec: 3822.9, 300 sec: 3721.2). Total num frames: 13930496. Throughput: 0: 973.7. Samples: 981374. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 19:18:02,063][00189] Avg episode reward: [(0, '25.947')] [2024-09-29 19:18:04,125][21450] Updated weights for policy 0, policy_version 3403 (0.0012) [2024-09-29 19:18:07,061][00189] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3721.1). Total num frames: 13946880. Throughput: 0: 920.0. Samples: 985502. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-29 19:18:07,063][00189] Avg episode reward: [(0, '25.829')] [2024-09-29 19:18:12,061][00189] Fps is (10 sec: 3686.5, 60 sec: 3823.0, 300 sec: 3721.1). Total num frames: 13967360. Throughput: 0: 913.3. Samples: 988600. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-29 19:18:12,067][00189] Avg episode reward: [(0, '25.317')] [2024-09-29 19:18:14,747][21450] Updated weights for policy 0, policy_version 3413 (0.0026) [2024-09-29 19:18:17,061][00189] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3721.1). Total num frames: 13987840. Throughput: 0: 965.2. Samples: 995326. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 19:18:17,066][00189] Avg episode reward: [(0, '26.805')] [2024-09-29 19:18:22,061][00189] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3721.1). Total num frames: 14004224. Throughput: 0: 938.4. Samples: 1000050. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-29 19:18:22,063][00189] Avg episode reward: [(0, '26.207')] [2024-09-29 19:18:26,815][21450] Updated weights for policy 0, policy_version 3423 (0.0017) [2024-09-29 19:18:27,062][00189] Fps is (10 sec: 3276.5, 60 sec: 3686.6, 300 sec: 3707.2). Total num frames: 14020608. Throughput: 0: 910.4. Samples: 1002152. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-29 19:18:27,068][00189] Avg episode reward: [(0, '27.064')] [2024-09-29 19:18:32,064][00189] Fps is (10 sec: 3685.3, 60 sec: 3822.7, 300 sec: 3707.2). Total num frames: 14041088. Throughput: 0: 937.4. Samples: 1008834. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-29 19:18:32,070][00189] Avg episode reward: [(0, '25.743')] [2024-09-29 19:18:36,612][21450] Updated weights for policy 0, policy_version 3433 (0.0013) [2024-09-29 19:18:37,061][00189] Fps is (10 sec: 4096.4, 60 sec: 3754.7, 300 sec: 3707.2). Total num frames: 14061568. Throughput: 0: 960.4. Samples: 1014684. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-29 19:18:37,063][00189] Avg episode reward: [(0, '26.312')] [2024-09-29 19:18:42,061][00189] Fps is (10 sec: 3277.8, 60 sec: 3618.1, 300 sec: 3707.2). Total num frames: 14073856. Throughput: 0: 927.7. Samples: 1016666. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 19:18:42,067][00189] Avg episode reward: [(0, '25.936')] [2024-09-29 19:18:47,061][00189] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3721.1). Total num frames: 14098432. Throughput: 0: 915.5. Samples: 1022572. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-29 19:18:47,064][00189] Avg episode reward: [(0, '25.627')] [2024-09-29 19:18:47,857][21450] Updated weights for policy 0, policy_version 3443 (0.0032) [2024-09-29 19:18:52,061][00189] Fps is (10 sec: 4505.5, 60 sec: 3822.9, 300 sec: 3707.2). Total num frames: 14118912. Throughput: 0: 972.8. Samples: 1029280. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-29 19:18:52,072][00189] Avg episode reward: [(0, '26.228')] [2024-09-29 19:18:57,061][00189] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3721.1). Total num frames: 14135296. Throughput: 0: 952.4. Samples: 1031456. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-29 19:18:57,067][00189] Avg episode reward: [(0, '24.652')] [2024-09-29 19:18:59,633][21450] Updated weights for policy 0, policy_version 3453 (0.0019) [2024-09-29 19:19:02,061][00189] Fps is (10 sec: 3276.7, 60 sec: 3686.4, 300 sec: 3707.2). Total num frames: 14151680. Throughput: 0: 908.3. Samples: 1036200. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-29 19:19:02,068][00189] Avg episode reward: [(0, '25.873')] [2024-09-29 19:19:07,063][00189] Fps is (10 sec: 4095.1, 60 sec: 3822.8, 300 sec: 3721.1). Total num frames: 14176256. Throughput: 0: 956.0. Samples: 1043070. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-29 19:19:07,071][00189] Avg episode reward: [(0, '25.557')] [2024-09-29 19:19:08,882][21450] Updated weights for policy 0, policy_version 3463 (0.0012) [2024-09-29 19:19:12,061][00189] Fps is (10 sec: 4096.1, 60 sec: 3754.7, 300 sec: 3721.1). Total num frames: 14192640. Throughput: 0: 979.5. Samples: 1046228. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-29 19:19:12,068][00189] Avg episode reward: [(0, '26.039')] [2024-09-29 19:19:17,061][00189] Fps is (10 sec: 3277.5, 60 sec: 3686.4, 300 sec: 3721.1). Total num frames: 14209024. Throughput: 0: 924.0. Samples: 1050412. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 19:19:17,063][00189] Avg episode reward: [(0, '26.503')] [2024-09-29 19:19:20,734][21450] Updated weights for policy 0, policy_version 3473 (0.0023) [2024-09-29 19:19:22,061][00189] Fps is (10 sec: 3686.3, 60 sec: 3754.7, 300 sec: 3721.1). Total num frames: 14229504. Throughput: 0: 935.8. Samples: 1056796. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-29 19:19:22,063][00189] Avg episode reward: [(0, '25.858')] [2024-09-29 19:19:27,064][00189] Fps is (10 sec: 4094.8, 60 sec: 3822.8, 300 sec: 3721.1). Total num frames: 14249984. Throughput: 0: 968.8. Samples: 1060264. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 19:19:27,066][00189] Avg episode reward: [(0, '26.837')] [2024-09-29 19:19:31,616][21450] Updated weights for policy 0, policy_version 3483 (0.0019) [2024-09-29 19:19:32,061][00189] Fps is (10 sec: 3686.5, 60 sec: 3754.8, 300 sec: 3735.0). Total num frames: 14266368. Throughput: 0: 947.2. Samples: 1065196. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 19:19:32,063][00189] Avg episode reward: [(0, '26.017')] [2024-09-29 19:19:32,083][21437] Saving /content/train_dir/samplefactory-vizdoom-v1/checkpoint_p0/checkpoint_000003483_14266368.pth... [2024-09-29 19:19:32,254][21437] Removing /content/train_dir/samplefactory-vizdoom-v1/checkpoint_p0/checkpoint_000003263_13365248.pth [2024-09-29 19:19:37,061][00189] Fps is (10 sec: 3687.5, 60 sec: 3754.7, 300 sec: 3735.0). Total num frames: 14286848. Throughput: 0: 918.3. Samples: 1070604. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 19:19:37,064][00189] Avg episode reward: [(0, '27.354')] [2024-09-29 19:19:41,615][21450] Updated weights for policy 0, policy_version 3493 (0.0014) [2024-09-29 19:19:42,061][00189] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3735.0). Total num frames: 14307328. Throughput: 0: 944.0. Samples: 1073934. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 19:19:42,063][00189] Avg episode reward: [(0, '26.604')] [2024-09-29 19:19:47,061][00189] Fps is (10 sec: 3686.2, 60 sec: 3754.6, 300 sec: 3735.0). Total num frames: 14323712. Throughput: 0: 969.0. Samples: 1079804. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 19:19:47,064][00189] Avg episode reward: [(0, '27.236')] [2024-09-29 19:19:52,061][00189] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3735.0). Total num frames: 14340096. Throughput: 0: 912.8. Samples: 1084144. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-29 19:19:52,063][00189] Avg episode reward: [(0, '26.507')] [2024-09-29 19:19:53,503][21450] Updated weights for policy 0, policy_version 3503 (0.0019) [2024-09-29 19:19:57,061][00189] Fps is (10 sec: 3686.6, 60 sec: 3754.7, 300 sec: 3735.0). Total num frames: 14360576. Throughput: 0: 918.4. Samples: 1087558. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-29 19:19:57,063][00189] Avg episode reward: [(0, '26.499')] [2024-09-29 19:20:02,061][00189] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3748.9). Total num frames: 14385152. Throughput: 0: 973.7. Samples: 1094230. Policy #0 lag: (min: 0.0, avg: 0.3, max: 2.0) [2024-09-29 19:20:02,063][00189] Avg episode reward: [(0, '27.582')] [2024-09-29 19:20:03,105][21450] Updated weights for policy 0, policy_version 3513 (0.0019) [2024-09-29 19:20:07,061][00189] Fps is (10 sec: 3686.4, 60 sec: 3686.5, 300 sec: 3748.9). Total num frames: 14397440. Throughput: 0: 928.6. Samples: 1098584. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-29 19:20:07,068][00189] Avg episode reward: [(0, '27.796')] [2024-09-29 19:20:12,061][00189] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 14417920. Throughput: 0: 909.7. Samples: 1101196. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-29 19:20:12,063][00189] Avg episode reward: [(0, '27.409')] [2024-09-29 19:20:14,618][21450] Updated weights for policy 0, policy_version 3523 (0.0017) [2024-09-29 19:20:17,061][00189] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3748.9). Total num frames: 14438400. Throughput: 0: 951.4. Samples: 1108010. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 19:20:17,063][00189] Avg episode reward: [(0, '27.957')] [2024-09-29 19:20:22,061][00189] Fps is (10 sec: 3686.3, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 14454784. Throughput: 0: 949.1. Samples: 1113316. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-29 19:20:22,066][00189] Avg episode reward: [(0, '28.636')] [2024-09-29 19:20:26,667][21450] Updated weights for policy 0, policy_version 3533 (0.0030) [2024-09-29 19:20:27,061][00189] Fps is (10 sec: 3276.7, 60 sec: 3686.6, 300 sec: 3748.9). Total num frames: 14471168. Throughput: 0: 921.5. Samples: 1115402. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 19:20:27,068][00189] Avg episode reward: [(0, '29.275')] [2024-09-29 19:20:32,061][00189] Fps is (10 sec: 3686.5, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 14491648. Throughput: 0: 930.2. Samples: 1121662. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 19:20:32,067][00189] Avg episode reward: [(0, '28.911')] [2024-09-29 19:20:35,743][21450] Updated weights for policy 0, policy_version 3543 (0.0016) [2024-09-29 19:20:37,061][00189] Fps is (10 sec: 4505.8, 60 sec: 3822.9, 300 sec: 3762.8). Total num frames: 14516224. Throughput: 0: 975.1. Samples: 1128024. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-09-29 19:20:37,065][00189] Avg episode reward: [(0, '28.612')] [2024-09-29 19:20:42,061][00189] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3762.8). Total num frames: 14528512. Throughput: 0: 944.6. Samples: 1130066. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-29 19:20:42,062][00189] Avg episode reward: [(0, '26.950')] [2024-09-29 19:20:47,061][00189] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3762.8). Total num frames: 14548992. Throughput: 0: 914.5. Samples: 1135384. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 19:20:47,068][00189] Avg episode reward: [(0, '27.624')] [2024-09-29 19:20:47,575][21450] Updated weights for policy 0, policy_version 3553 (0.0012) [2024-09-29 19:20:52,061][00189] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3762.8). Total num frames: 14569472. Throughput: 0: 966.7. Samples: 1142084. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 19:20:52,063][00189] Avg episode reward: [(0, '28.180')] [2024-09-29 19:20:57,061][00189] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 14585856. Throughput: 0: 968.7. Samples: 1144788. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-29 19:20:57,068][00189] Avg episode reward: [(0, '29.950')] [2024-09-29 19:20:59,051][21450] Updated weights for policy 0, policy_version 3563 (0.0016) [2024-09-29 19:21:02,061][00189] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3762.8). Total num frames: 14602240. Throughput: 0: 910.5. Samples: 1148982. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 19:21:02,063][00189] Avg episode reward: [(0, '29.888')] [2024-09-29 19:21:07,061][00189] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3762.8). Total num frames: 14626816. Throughput: 0: 943.2. Samples: 1155760. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-29 19:21:07,064][00189] Avg episode reward: [(0, '30.694')] [2024-09-29 19:21:08,805][21450] Updated weights for policy 0, policy_version 3573 (0.0015) [2024-09-29 19:21:12,061][00189] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3762.8). Total num frames: 14647296. Throughput: 0: 970.9. Samples: 1159090. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 19:21:12,063][00189] Avg episode reward: [(0, '32.197')] [2024-09-29 19:21:17,061][00189] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3762.8). Total num frames: 14659584. Throughput: 0: 932.8. Samples: 1163636. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 19:21:17,065][00189] Avg episode reward: [(0, '33.656')] [2024-09-29 19:21:17,071][21437] Saving new best policy, reward=33.656! [2024-09-29 19:21:20,617][21450] Updated weights for policy 0, policy_version 3583 (0.0031) [2024-09-29 19:21:22,061][00189] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3762.8). Total num frames: 14680064. Throughput: 0: 918.2. Samples: 1169344. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-29 19:21:22,068][00189] Avg episode reward: [(0, '31.227')] [2024-09-29 19:21:27,061][00189] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3762.8). Total num frames: 14704640. Throughput: 0: 949.5. Samples: 1172794. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-29 19:21:27,067][00189] Avg episode reward: [(0, '30.542')] [2024-09-29 19:21:30,611][21450] Updated weights for policy 0, policy_version 3593 (0.0022) [2024-09-29 19:21:32,061][00189] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 14716928. Throughput: 0: 953.1. Samples: 1178272. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-29 19:21:32,068][00189] Avg episode reward: [(0, '29.816')] [2024-09-29 19:21:32,081][21437] Saving /content/train_dir/samplefactory-vizdoom-v1/checkpoint_p0/checkpoint_000003593_14716928.pth... [2024-09-29 19:21:32,230][21437] Removing /content/train_dir/samplefactory-vizdoom-v1/checkpoint_p0/checkpoint_000003372_13811712.pth [2024-09-29 19:21:37,061][00189] Fps is (10 sec: 2867.1, 60 sec: 3618.1, 300 sec: 3748.9). Total num frames: 14733312. Throughput: 0: 908.2. Samples: 1182954. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-29 19:21:37,063][00189] Avg episode reward: [(0, '30.166')] [2024-09-29 19:21:42,019][21450] Updated weights for policy 0, policy_version 3603 (0.0026) [2024-09-29 19:21:42,061][00189] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3762.8). Total num frames: 14757888. Throughput: 0: 921.5. Samples: 1186256. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 19:21:42,067][00189] Avg episode reward: [(0, '29.536')] [2024-09-29 19:21:47,064][00189] Fps is (10 sec: 4504.2, 60 sec: 3822.7, 300 sec: 3762.7). Total num frames: 14778368. Throughput: 0: 972.5. Samples: 1192750. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 19:21:47,066][00189] Avg episode reward: [(0, '27.625')] [2024-09-29 19:21:52,061][00189] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3762.8). Total num frames: 14790656. Throughput: 0: 911.5. Samples: 1196776. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-29 19:21:52,068][00189] Avg episode reward: [(0, '29.532')] [2024-09-29 19:21:54,001][21450] Updated weights for policy 0, policy_version 3613 (0.0030) [2024-09-29 19:21:57,061][00189] Fps is (10 sec: 3277.9, 60 sec: 3754.7, 300 sec: 3762.8). Total num frames: 14811136. Throughput: 0: 903.6. Samples: 1199750. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 19:21:57,067][00189] Avg episode reward: [(0, '29.953')] [2024-09-29 19:22:02,061][00189] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3748.9). Total num frames: 14831616. Throughput: 0: 951.4. Samples: 1206448. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 19:22:02,068][00189] Avg episode reward: [(0, '31.394')] [2024-09-29 19:22:03,252][21450] Updated weights for policy 0, policy_version 3623 (0.0020) [2024-09-29 19:22:07,061][00189] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3762.8). Total num frames: 14848000. Throughput: 0: 936.3. Samples: 1211478. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 19:22:07,063][00189] Avg episode reward: [(0, '32.794')] [2024-09-29 19:22:12,061][00189] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3762.8). Total num frames: 14864384. Throughput: 0: 907.7. Samples: 1213640. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 19:22:12,063][00189] Avg episode reward: [(0, '31.632')] [2024-09-29 19:22:15,039][21450] Updated weights for policy 0, policy_version 3633 (0.0013) [2024-09-29 19:22:17,061][00189] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3762.8). Total num frames: 14888960. Throughput: 0: 932.0. Samples: 1220210. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 19:22:17,063][00189] Avg episode reward: [(0, '32.304')] [2024-09-29 19:22:22,061][00189] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 14905344. Throughput: 0: 962.9. Samples: 1226282. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 19:22:22,066][00189] Avg episode reward: [(0, '32.614')] [2024-09-29 19:22:26,641][21450] Updated weights for policy 0, policy_version 3643 (0.0014) [2024-09-29 19:22:27,061][00189] Fps is (10 sec: 3276.6, 60 sec: 3618.1, 300 sec: 3762.8). Total num frames: 14921728. Throughput: 0: 933.1. Samples: 1228248. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 19:22:27,064][00189] Avg episode reward: [(0, '32.176')] [2024-09-29 19:22:32,061][00189] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 14942208. Throughput: 0: 910.4. Samples: 1233714. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 19:22:32,069][00189] Avg episode reward: [(0, '30.598')] [2024-09-29 19:22:36,300][21450] Updated weights for policy 0, policy_version 3653 (0.0013) [2024-09-29 19:22:37,061][00189] Fps is (10 sec: 4096.2, 60 sec: 3823.0, 300 sec: 3748.9). Total num frames: 14962688. Throughput: 0: 969.3. Samples: 1240396. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 19:22:37,069][00189] Avg episode reward: [(0, '28.966')] [2024-09-29 19:22:42,061][00189] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3748.9). Total num frames: 14979072. Throughput: 0: 959.1. Samples: 1242908. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 19:22:42,063][00189] Avg episode reward: [(0, '29.418')] [2024-09-29 19:22:47,061][00189] Fps is (10 sec: 3276.8, 60 sec: 3618.3, 300 sec: 3748.9). Total num frames: 14995456. Throughput: 0: 910.7. Samples: 1247430. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 19:22:47,065][00189] Avg episode reward: [(0, '29.092')] [2024-09-29 19:22:48,184][21450] Updated weights for policy 0, policy_version 3663 (0.0025) [2024-09-29 19:22:52,061][00189] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3762.8). Total num frames: 15020032. Throughput: 0: 949.1. Samples: 1254186. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 19:22:52,063][00189] Avg episode reward: [(0, '28.704')] [2024-09-29 19:22:57,061][00189] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3762.8). Total num frames: 15040512. Throughput: 0: 973.3. Samples: 1257438. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 19:22:57,063][00189] Avg episode reward: [(0, '28.691')] [2024-09-29 19:22:58,595][21450] Updated weights for policy 0, policy_version 3673 (0.0019) [2024-09-29 19:23:02,061][00189] Fps is (10 sec: 3276.6, 60 sec: 3686.4, 300 sec: 3748.9). Total num frames: 15052800. Throughput: 0: 923.9. Samples: 1261788. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 19:23:02,065][00189] Avg episode reward: [(0, '28.325')] [2024-09-29 19:23:07,061][00189] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 15073280. Throughput: 0: 923.6. Samples: 1267842. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 19:23:07,063][00189] Avg episode reward: [(0, '28.398')] [2024-09-29 19:23:09,269][21450] Updated weights for policy 0, policy_version 3683 (0.0023) [2024-09-29 19:23:12,061][00189] Fps is (10 sec: 4505.9, 60 sec: 3891.2, 300 sec: 3762.8). Total num frames: 15097856. Throughput: 0: 956.3. Samples: 1271280. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 19:23:12,064][00189] Avg episode reward: [(0, '27.772')] [2024-09-29 19:23:17,061][00189] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3748.9). Total num frames: 15110144. Throughput: 0: 952.7. Samples: 1276586. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 19:23:17,066][00189] Avg episode reward: [(0, '27.398')] [2024-09-29 19:23:20,918][21450] Updated weights for policy 0, policy_version 3693 (0.0015) [2024-09-29 19:23:22,061][00189] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3762.8). Total num frames: 15130624. Throughput: 0: 919.5. Samples: 1281774. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-29 19:23:22,068][00189] Avg episode reward: [(0, '28.063')] [2024-09-29 19:23:27,061][00189] Fps is (10 sec: 4095.9, 60 sec: 3822.9, 300 sec: 3762.8). Total num frames: 15151104. Throughput: 0: 934.8. Samples: 1284972. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 19:23:27,068][00189] Avg episode reward: [(0, '28.278')] [2024-09-29 19:23:30,078][21450] Updated weights for policy 0, policy_version 3703 (0.0016) [2024-09-29 19:23:32,061][00189] Fps is (10 sec: 4095.9, 60 sec: 3822.9, 300 sec: 3762.8). Total num frames: 15171584. Throughput: 0: 981.1. Samples: 1291582. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 19:23:32,067][00189] Avg episode reward: [(0, '30.033')] [2024-09-29 19:23:32,081][21437] Saving /content/train_dir/samplefactory-vizdoom-v1/checkpoint_p0/checkpoint_000003704_15171584.pth... [2024-09-29 19:23:32,230][21437] Removing /content/train_dir/samplefactory-vizdoom-v1/checkpoint_p0/checkpoint_000003483_14266368.pth [2024-09-29 19:23:37,061][00189] Fps is (10 sec: 3276.9, 60 sec: 3686.4, 300 sec: 3762.8). Total num frames: 15183872. Throughput: 0: 922.8. Samples: 1295712. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 19:23:37,068][00189] Avg episode reward: [(0, '31.874')] [2024-09-29 19:23:41,899][21450] Updated weights for policy 0, policy_version 3713 (0.0014) [2024-09-29 19:23:42,061][00189] Fps is (10 sec: 3686.5, 60 sec: 3822.9, 300 sec: 3762.8). Total num frames: 15208448. Throughput: 0: 921.2. Samples: 1298892. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 19:23:42,068][00189] Avg episode reward: [(0, '30.930')] [2024-09-29 19:23:47,061][00189] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3762.8). Total num frames: 15228928. Throughput: 0: 977.0. Samples: 1305754. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 19:23:47,072][00189] Avg episode reward: [(0, '30.692')] [2024-09-29 19:23:52,061][00189] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3762.8). Total num frames: 15245312. Throughput: 0: 952.5. Samples: 1310706. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-29 19:23:52,068][00189] Avg episode reward: [(0, '29.961')] [2024-09-29 19:23:52,528][21450] Updated weights for policy 0, policy_version 3723 (0.0016) [2024-09-29 19:23:57,061][00189] Fps is (10 sec: 3276.7, 60 sec: 3686.4, 300 sec: 3762.8). Total num frames: 15261696. Throughput: 0: 925.4. Samples: 1312922. Policy #0 lag: (min: 0.0, avg: 0.3, max: 2.0) [2024-09-29 19:23:57,064][00189] Avg episode reward: [(0, '30.667')] [2024-09-29 19:24:02,061][00189] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3762.8). Total num frames: 15286272. Throughput: 0: 957.4. Samples: 1319670. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-29 19:24:02,065][00189] Avg episode reward: [(0, '30.074')] [2024-09-29 19:24:02,581][21450] Updated weights for policy 0, policy_version 3733 (0.0024) [2024-09-29 19:24:07,061][00189] Fps is (10 sec: 4505.7, 60 sec: 3891.2, 300 sec: 3776.7). Total num frames: 15306752. Throughput: 0: 981.6. Samples: 1325948. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-29 19:24:07,065][00189] Avg episode reward: [(0, '28.206')] [2024-09-29 19:24:12,061][00189] Fps is (10 sec: 3276.9, 60 sec: 3686.4, 300 sec: 3762.8). Total num frames: 15319040. Throughput: 0: 956.8. Samples: 1328028. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-29 19:24:12,067][00189] Avg episode reward: [(0, '28.056')] [2024-09-29 19:24:14,183][21450] Updated weights for policy 0, policy_version 3743 (0.0014) [2024-09-29 19:24:17,061][00189] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3776.7). Total num frames: 15343616. Throughput: 0: 941.5. Samples: 1333950. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 19:24:17,065][00189] Avg episode reward: [(0, '29.286')] [2024-09-29 19:24:22,061][00189] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3776.7). Total num frames: 15364096. Throughput: 0: 1005.5. Samples: 1340960. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-29 19:24:22,072][00189] Avg episode reward: [(0, '30.993')] [2024-09-29 19:24:23,327][21450] Updated weights for policy 0, policy_version 3753 (0.0023) [2024-09-29 19:24:27,061][00189] Fps is (10 sec: 3686.4, 60 sec: 3823.0, 300 sec: 3776.7). Total num frames: 15380480. Throughput: 0: 983.3. Samples: 1343142. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 19:24:27,067][00189] Avg episode reward: [(0, '30.138')] [2024-09-29 19:24:32,061][00189] Fps is (10 sec: 3686.4, 60 sec: 3823.0, 300 sec: 3776.7). Total num frames: 15400960. Throughput: 0: 939.0. Samples: 1348010. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-29 19:24:32,063][00189] Avg episode reward: [(0, '29.939')] [2024-09-29 19:24:34,734][21450] Updated weights for policy 0, policy_version 3763 (0.0012) [2024-09-29 19:24:37,061][00189] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3776.7). Total num frames: 15421440. Throughput: 0: 980.8. Samples: 1354844. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 19:24:37,063][00189] Avg episode reward: [(0, '29.027')] [2024-09-29 19:24:42,061][00189] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3790.5). Total num frames: 15441920. Throughput: 0: 1006.0. Samples: 1358190. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-29 19:24:42,065][00189] Avg episode reward: [(0, '28.987')] [2024-09-29 19:24:45,875][21450] Updated weights for policy 0, policy_version 3773 (0.0025) [2024-09-29 19:24:47,067][00189] Fps is (10 sec: 3274.7, 60 sec: 3754.3, 300 sec: 3776.6). Total num frames: 15454208. Throughput: 0: 950.0. Samples: 1362426. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-29 19:24:47,070][00189] Avg episode reward: [(0, '29.065')] [2024-09-29 19:24:52,061][00189] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3790.5). Total num frames: 15478784. Throughput: 0: 955.1. Samples: 1368928. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 19:24:52,068][00189] Avg episode reward: [(0, '30.128')] [2024-09-29 19:24:55,437][21450] Updated weights for policy 0, policy_version 3783 (0.0015) [2024-09-29 19:24:57,061][00189] Fps is (10 sec: 4508.5, 60 sec: 3959.5, 300 sec: 3776.7). Total num frames: 15499264. Throughput: 0: 981.9. Samples: 1372212. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 19:24:57,063][00189] Avg episode reward: [(0, '29.349')] [2024-09-29 19:25:02,064][00189] Fps is (10 sec: 3685.1, 60 sec: 3822.7, 300 sec: 3790.5). Total num frames: 15515648. Throughput: 0: 965.3. Samples: 1377394. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 19:25:02,067][00189] Avg episode reward: [(0, '30.729')] [2024-09-29 19:25:07,061][00189] Fps is (10 sec: 3276.7, 60 sec: 3754.6, 300 sec: 3776.6). Total num frames: 15532032. Throughput: 0: 929.6. Samples: 1382794. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-29 19:25:07,065][00189] Avg episode reward: [(0, '31.629')] [2024-09-29 19:25:07,200][21450] Updated weights for policy 0, policy_version 3793 (0.0030) [2024-09-29 19:25:12,061][00189] Fps is (10 sec: 4097.5, 60 sec: 3959.5, 300 sec: 3790.5). Total num frames: 15556608. Throughput: 0: 956.1. Samples: 1386166. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-29 19:25:12,067][00189] Avg episode reward: [(0, '33.066')] [2024-09-29 19:25:17,061][00189] Fps is (10 sec: 4096.1, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 15572992. Throughput: 0: 986.2. Samples: 1392388. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 19:25:17,064][00189] Avg episode reward: [(0, '31.900')] [2024-09-29 19:25:17,131][21450] Updated weights for policy 0, policy_version 3803 (0.0027) [2024-09-29 19:25:22,061][00189] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3790.5). Total num frames: 15589376. Throughput: 0: 930.9. Samples: 1396734. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 19:25:22,063][00189] Avg episode reward: [(0, '31.231')] [2024-09-29 19:25:27,061][00189] Fps is (10 sec: 4096.1, 60 sec: 3891.2, 300 sec: 3804.4). Total num frames: 15613952. Throughput: 0: 931.6. Samples: 1400110. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 19:25:27,069][00189] Avg episode reward: [(0, '30.920')] [2024-09-29 19:25:28,027][21450] Updated weights for policy 0, policy_version 3813 (0.0013) [2024-09-29 19:25:32,061][00189] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3790.5). Total num frames: 15634432. Throughput: 0: 984.4. Samples: 1406720. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-29 19:25:32,067][00189] Avg episode reward: [(0, '30.372')] [2024-09-29 19:25:32,086][21437] Saving /content/train_dir/samplefactory-vizdoom-v1/checkpoint_p0/checkpoint_000003817_15634432.pth... [2024-09-29 19:25:32,246][21437] Removing /content/train_dir/samplefactory-vizdoom-v1/checkpoint_p0/checkpoint_000003593_14716928.pth [2024-09-29 19:25:37,061][00189] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3790.5). Total num frames: 15646720. Throughput: 0: 936.8. Samples: 1411082. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-29 19:25:37,067][00189] Avg episode reward: [(0, '29.872')] [2024-09-29 19:25:40,186][21450] Updated weights for policy 0, policy_version 3823 (0.0019) [2024-09-29 19:25:42,061][00189] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3790.5). Total num frames: 15667200. Throughput: 0: 919.1. Samples: 1413570. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-29 19:25:42,069][00189] Avg episode reward: [(0, '29.061')] [2024-09-29 19:25:47,061][00189] Fps is (10 sec: 4096.0, 60 sec: 3891.6, 300 sec: 3790.5). Total num frames: 15687680. Throughput: 0: 956.0. Samples: 1420410. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 19:25:47,069][00189] Avg episode reward: [(0, '27.431')] [2024-09-29 19:25:49,074][21450] Updated weights for policy 0, policy_version 3833 (0.0018) [2024-09-29 19:25:52,061][00189] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3804.4). Total num frames: 15708160. Throughput: 0: 959.6. Samples: 1425974. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 19:25:52,069][00189] Avg episode reward: [(0, '28.658')] [2024-09-29 19:25:57,061][00189] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3790.5). Total num frames: 15720448. Throughput: 0: 928.2. Samples: 1427936. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-29 19:25:57,063][00189] Avg episode reward: [(0, '28.582')] [2024-09-29 19:26:01,168][21450] Updated weights for policy 0, policy_version 3843 (0.0034) [2024-09-29 19:26:02,061][00189] Fps is (10 sec: 3276.8, 60 sec: 3754.9, 300 sec: 3776.6). Total num frames: 15740928. Throughput: 0: 922.4. Samples: 1433894. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-29 19:26:02,063][00189] Avg episode reward: [(0, '28.674')] [2024-09-29 19:26:07,061][00189] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3790.5). Total num frames: 15765504. Throughput: 0: 973.1. Samples: 1440522. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 19:26:07,066][00189] Avg episode reward: [(0, '27.018')] [2024-09-29 19:26:12,061][00189] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3790.5). Total num frames: 15777792. Throughput: 0: 943.5. Samples: 1442568. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-29 19:26:12,067][00189] Avg episode reward: [(0, '25.811')] [2024-09-29 19:26:12,332][21450] Updated weights for policy 0, policy_version 3853 (0.0026) [2024-09-29 19:26:17,061][00189] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3790.5). Total num frames: 15798272. Throughput: 0: 911.2. Samples: 1447726. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-29 19:26:17,063][00189] Avg episode reward: [(0, '27.623')] [2024-09-29 19:26:22,036][21450] Updated weights for policy 0, policy_version 3863 (0.0013) [2024-09-29 19:26:22,061][00189] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3790.5). Total num frames: 15822848. Throughput: 0: 966.8. Samples: 1454588. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 19:26:22,063][00189] Avg episode reward: [(0, '28.812')] [2024-09-29 19:26:27,061][00189] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3790.5). Total num frames: 15835136. Throughput: 0: 974.0. Samples: 1457402. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 19:26:27,066][00189] Avg episode reward: [(0, '27.688')] [2024-09-29 19:26:32,061][00189] Fps is (10 sec: 2867.2, 60 sec: 3618.1, 300 sec: 3790.5). Total num frames: 15851520. Throughput: 0: 913.6. Samples: 1461520. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 19:26:32,063][00189] Avg episode reward: [(0, '27.490')] [2024-09-29 19:26:34,131][21450] Updated weights for policy 0, policy_version 3873 (0.0012) [2024-09-29 19:26:37,061][00189] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 15876096. Throughput: 0: 936.4. Samples: 1468112. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-29 19:26:37,069][00189] Avg episode reward: [(0, '27.209')] [2024-09-29 19:26:42,061][00189] Fps is (10 sec: 4505.3, 60 sec: 3822.9, 300 sec: 3790.6). Total num frames: 15896576. Throughput: 0: 968.7. Samples: 1471530. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 19:26:42,064][00189] Avg episode reward: [(0, '26.907')] [2024-09-29 19:26:44,122][21450] Updated weights for policy 0, policy_version 3883 (0.0021) [2024-09-29 19:26:47,061][00189] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3790.5). Total num frames: 15908864. Throughput: 0: 942.8. Samples: 1476318. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-29 19:26:47,068][00189] Avg episode reward: [(0, '25.454')] [2024-09-29 19:26:52,061][00189] Fps is (10 sec: 3277.0, 60 sec: 3686.4, 300 sec: 3790.5). Total num frames: 15929344. Throughput: 0: 922.2. Samples: 1482022. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 19:26:52,063][00189] Avg episode reward: [(0, '24.058')] [2024-09-29 19:26:54,940][21450] Updated weights for policy 0, policy_version 3893 (0.0015) [2024-09-29 19:26:57,061][00189] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3804.4). Total num frames: 15953920. Throughput: 0: 951.8. Samples: 1485398. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 19:26:57,065][00189] Avg episode reward: [(0, '24.950')] [2024-09-29 19:27:02,061][00189] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3804.4). Total num frames: 15970304. Throughput: 0: 960.9. Samples: 1490968. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 19:27:02,065][00189] Avg episode reward: [(0, '25.671')] [2024-09-29 19:27:06,936][21450] Updated weights for policy 0, policy_version 3903 (0.0020) [2024-09-29 19:27:07,061][00189] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3804.4). Total num frames: 15986688. Throughput: 0: 913.3. Samples: 1495686. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 19:27:07,063][00189] Avg episode reward: [(0, '24.280')] [2024-09-29 19:27:12,061][00189] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 16007168. Throughput: 0: 927.1. Samples: 1499122. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 19:27:12,063][00189] Avg episode reward: [(0, '25.156')] [2024-09-29 19:27:15,955][21450] Updated weights for policy 0, policy_version 3913 (0.0020) [2024-09-29 19:27:17,061][00189] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3804.4). Total num frames: 16027648. Throughput: 0: 985.1. Samples: 1505850. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 19:27:17,067][00189] Avg episode reward: [(0, '27.380')] [2024-09-29 19:27:22,061][00189] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3804.4). Total num frames: 16044032. Throughput: 0: 932.4. Samples: 1510072. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-29 19:27:22,064][00189] Avg episode reward: [(0, '29.233')] [2024-09-29 19:27:27,063][00189] Fps is (10 sec: 3685.5, 60 sec: 3822.8, 300 sec: 3804.4). Total num frames: 16064512. Throughput: 0: 920.5. Samples: 1512952. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-29 19:27:27,065][00189] Avg episode reward: [(0, '29.483')] [2024-09-29 19:27:27,939][21450] Updated weights for policy 0, policy_version 3923 (0.0023) [2024-09-29 19:27:32,061][00189] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3804.4). Total num frames: 16084992. Throughput: 0: 962.0. Samples: 1519608. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 19:27:32,066][00189] Avg episode reward: [(0, '28.243')] [2024-09-29 19:27:32,077][21437] Saving /content/train_dir/samplefactory-vizdoom-v1/checkpoint_p0/checkpoint_000003927_16084992.pth... [2024-09-29 19:27:32,193][21437] Removing /content/train_dir/samplefactory-vizdoom-v1/checkpoint_p0/checkpoint_000003704_15171584.pth [2024-09-29 19:27:37,061][00189] Fps is (10 sec: 3687.3, 60 sec: 3754.7, 300 sec: 3804.4). Total num frames: 16101376. Throughput: 0: 946.3. Samples: 1524606. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 19:27:37,065][00189] Avg episode reward: [(0, '29.245')] [2024-09-29 19:27:39,361][21450] Updated weights for policy 0, policy_version 3933 (0.0015) [2024-09-29 19:27:42,061][00189] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3804.4). Total num frames: 16117760. Throughput: 0: 919.6. Samples: 1526780. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 19:27:42,063][00189] Avg episode reward: [(0, '29.789')] [2024-09-29 19:27:47,061][00189] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3804.4). Total num frames: 16142336. Throughput: 0: 941.7. Samples: 1533344. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-29 19:27:47,066][00189] Avg episode reward: [(0, '28.551')] [2024-09-29 19:27:48,730][21450] Updated weights for policy 0, policy_version 3943 (0.0012) [2024-09-29 19:27:52,061][00189] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 16158720. Throughput: 0: 975.8. Samples: 1539598. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-29 19:27:52,063][00189] Avg episode reward: [(0, '27.584')] [2024-09-29 19:27:57,063][00189] Fps is (10 sec: 3276.0, 60 sec: 3686.2, 300 sec: 3804.4). Total num frames: 16175104. Throughput: 0: 945.3. Samples: 1541662. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 19:27:57,066][00189] Avg episode reward: [(0, '27.613')] [2024-09-29 19:28:00,673][21450] Updated weights for policy 0, policy_version 3953 (0.0014) [2024-09-29 19:28:02,061][00189] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3804.4). Total num frames: 16195584. Throughput: 0: 918.8. Samples: 1547198. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 19:28:02,063][00189] Avg episode reward: [(0, '28.679')] [2024-09-29 19:28:07,061][00189] Fps is (10 sec: 4506.6, 60 sec: 3891.2, 300 sec: 3804.4). Total num frames: 16220160. Throughput: 0: 977.8. Samples: 1554074. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-29 19:28:07,071][00189] Avg episode reward: [(0, '30.378')] [2024-09-29 19:28:10,986][21450] Updated weights for policy 0, policy_version 3963 (0.0027) [2024-09-29 19:28:12,065][00189] Fps is (10 sec: 3684.7, 60 sec: 3754.4, 300 sec: 3804.4). Total num frames: 16232448. Throughput: 0: 966.8. Samples: 1556458. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 19:28:12,067][00189] Avg episode reward: [(0, '30.117')] [2024-09-29 19:28:17,061][00189] Fps is (10 sec: 3276.9, 60 sec: 3754.7, 300 sec: 3804.4). Total num frames: 16252928. Throughput: 0: 924.2. Samples: 1561196. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 19:28:17,065][00189] Avg episode reward: [(0, '29.625')] [2024-09-29 19:28:21,405][21450] Updated weights for policy 0, policy_version 3973 (0.0015) [2024-09-29 19:28:22,061][00189] Fps is (10 sec: 4097.9, 60 sec: 3822.9, 300 sec: 3804.4). Total num frames: 16273408. Throughput: 0: 965.3. Samples: 1568046. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 19:28:22,065][00189] Avg episode reward: [(0, '29.130')] [2024-09-29 19:28:27,061][00189] Fps is (10 sec: 4096.0, 60 sec: 3823.1, 300 sec: 3804.4). Total num frames: 16293888. Throughput: 0: 990.7. Samples: 1571360. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-29 19:28:27,063][00189] Avg episode reward: [(0, '29.463')] [2024-09-29 19:28:32,064][00189] Fps is (10 sec: 3275.6, 60 sec: 3686.2, 300 sec: 3804.4). Total num frames: 16306176. Throughput: 0: 937.2. Samples: 1575522. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-29 19:28:32,067][00189] Avg episode reward: [(0, '29.686')] [2024-09-29 19:28:33,350][21450] Updated weights for policy 0, policy_version 3983 (0.0013) [2024-09-29 19:28:37,061][00189] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3804.4). Total num frames: 16330752. Throughput: 0: 938.1. Samples: 1581814. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 19:28:37,062][00189] Avg episode reward: [(0, '30.950')] [2024-09-29 19:28:42,061][00189] Fps is (10 sec: 4507.0, 60 sec: 3891.2, 300 sec: 3804.4). Total num frames: 16351232. Throughput: 0: 970.0. Samples: 1585310. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-29 19:28:42,064][00189] Avg episode reward: [(0, '30.875')] [2024-09-29 19:28:42,266][21450] Updated weights for policy 0, policy_version 3993 (0.0025) [2024-09-29 19:28:47,063][00189] Fps is (10 sec: 3685.4, 60 sec: 3754.5, 300 sec: 3804.4). Total num frames: 16367616. Throughput: 0: 962.0. Samples: 1590492. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-29 19:28:47,068][00189] Avg episode reward: [(0, '31.589')] [2024-09-29 19:28:52,061][00189] Fps is (10 sec: 3686.6, 60 sec: 3822.9, 300 sec: 3818.3). Total num frames: 16388096. Throughput: 0: 929.9. Samples: 1595918. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 19:28:52,063][00189] Avg episode reward: [(0, '31.080')] [2024-09-29 19:28:53,884][21450] Updated weights for policy 0, policy_version 4003 (0.0019) [2024-09-29 19:28:57,061][00189] Fps is (10 sec: 4097.1, 60 sec: 3891.4, 300 sec: 3804.4). Total num frames: 16408576. Throughput: 0: 951.3. Samples: 1599262. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 19:28:57,063][00189] Avg episode reward: [(0, '30.910')] [2024-09-29 19:29:02,061][00189] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3804.4). Total num frames: 16429056. Throughput: 0: 982.1. Samples: 1605390. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-29 19:29:02,066][00189] Avg episode reward: [(0, '32.135')] [2024-09-29 19:29:04,901][21450] Updated weights for policy 0, policy_version 4013 (0.0014) [2024-09-29 19:29:07,061][00189] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3804.4). Total num frames: 16441344. Throughput: 0: 928.0. Samples: 1609806. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-29 19:29:07,066][00189] Avg episode reward: [(0, '31.685')] [2024-09-29 19:29:12,061][00189] Fps is (10 sec: 3686.4, 60 sec: 3891.5, 300 sec: 3804.4). Total num frames: 16465920. Throughput: 0: 931.1. Samples: 1613260. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-29 19:29:12,062][00189] Avg episode reward: [(0, '30.656')] [2024-09-29 19:29:14,473][21450] Updated weights for policy 0, policy_version 4023 (0.0021) [2024-09-29 19:29:17,061][00189] Fps is (10 sec: 4505.4, 60 sec: 3891.2, 300 sec: 3804.4). Total num frames: 16486400. Throughput: 0: 995.1. Samples: 1620300. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-29 19:29:17,063][00189] Avg episode reward: [(0, '33.100')] [2024-09-29 19:29:22,061][00189] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3804.4). Total num frames: 16502784. Throughput: 0: 957.4. Samples: 1624896. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 19:29:22,067][00189] Avg episode reward: [(0, '32.179')] [2024-09-29 19:29:26,004][21450] Updated weights for policy 0, policy_version 4033 (0.0018) [2024-09-29 19:29:27,061][00189] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3804.4). Total num frames: 16523264. Throughput: 0: 938.9. Samples: 1627560. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 19:29:27,068][00189] Avg episode reward: [(0, '31.279')] [2024-09-29 19:29:32,061][00189] Fps is (10 sec: 4095.9, 60 sec: 3959.7, 300 sec: 3804.4). Total num frames: 16543744. Throughput: 0: 973.2. Samples: 1634284. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-29 19:29:32,065][00189] Avg episode reward: [(0, '30.670')] [2024-09-29 19:29:32,078][21437] Saving /content/train_dir/samplefactory-vizdoom-v1/checkpoint_p0/checkpoint_000004039_16543744.pth... [2024-09-29 19:29:32,204][21437] Removing /content/train_dir/samplefactory-vizdoom-v1/checkpoint_p0/checkpoint_000003817_15634432.pth [2024-09-29 19:29:35,600][21450] Updated weights for policy 0, policy_version 4043 (0.0017) [2024-09-29 19:29:37,061][00189] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3804.4). Total num frames: 16564224. Throughput: 0: 976.8. Samples: 1639876. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 19:29:37,065][00189] Avg episode reward: [(0, '30.536')] [2024-09-29 19:29:42,061][00189] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3804.5). Total num frames: 16576512. Throughput: 0: 947.0. Samples: 1641878. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-29 19:29:42,063][00189] Avg episode reward: [(0, '30.681')] [2024-09-29 19:29:47,061][00189] Fps is (10 sec: 3276.8, 60 sec: 3823.1, 300 sec: 3790.5). Total num frames: 16596992. Throughput: 0: 944.0. Samples: 1647870. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 19:29:47,065][00189] Avg episode reward: [(0, '29.738')] [2024-09-29 19:29:47,218][21450] Updated weights for policy 0, policy_version 4053 (0.0015) [2024-09-29 19:29:52,061][00189] Fps is (10 sec: 4505.7, 60 sec: 3891.2, 300 sec: 3804.4). Total num frames: 16621568. Throughput: 0: 994.1. Samples: 1654542. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-29 19:29:52,067][00189] Avg episode reward: [(0, '28.569')] [2024-09-29 19:29:57,061][00189] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3790.6). Total num frames: 16633856. Throughput: 0: 963.7. Samples: 1656626. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-29 19:29:57,066][00189] Avg episode reward: [(0, '29.408')] [2024-09-29 19:29:58,848][21450] Updated weights for policy 0, policy_version 4063 (0.0023) [2024-09-29 19:30:02,061][00189] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3804.4). Total num frames: 16654336. Throughput: 0: 921.0. Samples: 1661744. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-29 19:30:02,063][00189] Avg episode reward: [(0, '27.878')] [2024-09-29 19:30:07,061][00189] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3790.5). Total num frames: 16674816. Throughput: 0: 971.8. Samples: 1668626. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-29 19:30:07,070][00189] Avg episode reward: [(0, '28.797')] [2024-09-29 19:30:07,836][21450] Updated weights for policy 0, policy_version 4073 (0.0039) [2024-09-29 19:30:12,061][00189] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3804.4). Total num frames: 16695296. Throughput: 0: 979.2. Samples: 1671622. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-29 19:30:12,067][00189] Avg episode reward: [(0, '27.373')] [2024-09-29 19:30:17,061][00189] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3804.4). Total num frames: 16711680. Throughput: 0: 922.1. Samples: 1675780. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 19:30:17,066][00189] Avg episode reward: [(0, '28.188')] [2024-09-29 19:30:19,755][21450] Updated weights for policy 0, policy_version 4083 (0.0014) [2024-09-29 19:30:22,061][00189] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 16732160. Throughput: 0: 949.5. Samples: 1682604. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 19:30:22,064][00189] Avg episode reward: [(0, '27.364')] [2024-09-29 19:30:27,061][00189] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 16752640. Throughput: 0: 980.6. Samples: 1686004. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 19:30:27,064][00189] Avg episode reward: [(0, '27.855')] [2024-09-29 19:30:29,976][21450] Updated weights for policy 0, policy_version 4093 (0.0027) [2024-09-29 19:30:32,061][00189] Fps is (10 sec: 3686.3, 60 sec: 3754.7, 300 sec: 3804.4). Total num frames: 16769024. Throughput: 0: 952.0. Samples: 1690710. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-29 19:30:32,063][00189] Avg episode reward: [(0, '27.602')] [2024-09-29 19:30:37,061][00189] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3804.4). Total num frames: 16789504. Throughput: 0: 928.7. Samples: 1696332. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-29 19:30:37,064][00189] Avg episode reward: [(0, '29.305')] [2024-09-29 19:30:40,645][21450] Updated weights for policy 0, policy_version 4103 (0.0013) [2024-09-29 19:30:42,061][00189] Fps is (10 sec: 4096.1, 60 sec: 3891.2, 300 sec: 3804.4). Total num frames: 16809984. Throughput: 0: 957.1. Samples: 1699696. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 19:30:42,063][00189] Avg episode reward: [(0, '29.764')] [2024-09-29 19:30:47,061][00189] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 16826368. Throughput: 0: 976.4. Samples: 1705684. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-29 19:30:47,064][00189] Avg episode reward: [(0, '28.375')] [2024-09-29 19:30:52,061][00189] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3804.4). Total num frames: 16842752. Throughput: 0: 927.2. Samples: 1710352. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 19:30:52,069][00189] Avg episode reward: [(0, '28.737')] [2024-09-29 19:30:52,310][21450] Updated weights for policy 0, policy_version 4113 (0.0024) [2024-09-29 19:30:57,061][00189] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3818.3). Total num frames: 16867328. Throughput: 0: 936.0. Samples: 1713742. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-29 19:30:57,063][00189] Avg episode reward: [(0, '28.690')] [2024-09-29 19:31:01,498][21450] Updated weights for policy 0, policy_version 4123 (0.0015) [2024-09-29 19:31:02,061][00189] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3804.4). Total num frames: 16887808. Throughput: 0: 992.8. Samples: 1720456. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 19:31:02,065][00189] Avg episode reward: [(0, '28.179')] [2024-09-29 19:31:07,061][00189] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3804.4). Total num frames: 16900096. Throughput: 0: 934.2. Samples: 1724644. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-29 19:31:07,066][00189] Avg episode reward: [(0, '26.599')] [2024-09-29 19:31:12,061][00189] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3804.4). Total num frames: 16920576. Throughput: 0: 922.4. Samples: 1727512. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-29 19:31:12,068][00189] Avg episode reward: [(0, '26.278')] [2024-09-29 19:31:13,065][21450] Updated weights for policy 0, policy_version 4133 (0.0014) [2024-09-29 19:31:17,061][00189] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3804.4). Total num frames: 16945152. Throughput: 0: 971.1. Samples: 1734410. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 19:31:17,062][00189] Avg episode reward: [(0, '27.020')] [2024-09-29 19:31:22,062][00189] Fps is (10 sec: 4095.3, 60 sec: 3822.8, 300 sec: 3818.3). Total num frames: 16961536. Throughput: 0: 966.1. Samples: 1739808. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-29 19:31:22,070][00189] Avg episode reward: [(0, '28.681')] [2024-09-29 19:31:24,081][21450] Updated weights for policy 0, policy_version 4143 (0.0014) [2024-09-29 19:31:27,061][00189] Fps is (10 sec: 3276.7, 60 sec: 3754.7, 300 sec: 3818.3). Total num frames: 16977920. Throughput: 0: 938.7. Samples: 1741936. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-29 19:31:27,068][00189] Avg episode reward: [(0, '29.064')] [2024-09-29 19:31:32,061][00189] Fps is (10 sec: 4096.7, 60 sec: 3891.2, 300 sec: 3818.3). Total num frames: 17002496. Throughput: 0: 954.4. Samples: 1748632. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-09-29 19:31:32,067][00189] Avg episode reward: [(0, '29.419')] [2024-09-29 19:31:32,080][21437] Saving /content/train_dir/samplefactory-vizdoom-v1/checkpoint_p0/checkpoint_000004151_17002496.pth... [2024-09-29 19:31:32,209][21437] Removing /content/train_dir/samplefactory-vizdoom-v1/checkpoint_p0/checkpoint_000003927_16084992.pth [2024-09-29 19:31:33,863][21450] Updated weights for policy 0, policy_version 4153 (0.0020) [2024-09-29 19:31:37,061][00189] Fps is (10 sec: 4505.7, 60 sec: 3891.2, 300 sec: 3818.3). Total num frames: 17022976. Throughput: 0: 985.9. Samples: 1754718. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-29 19:31:37,063][00189] Avg episode reward: [(0, '29.347')] [2024-09-29 19:31:42,061][00189] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3818.3). Total num frames: 17035264. Throughput: 0: 956.4. Samples: 1756780. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-29 19:31:42,064][00189] Avg episode reward: [(0, '28.959')] [2024-09-29 19:31:45,565][21450] Updated weights for policy 0, policy_version 4163 (0.0024) [2024-09-29 19:31:47,061][00189] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3818.3). Total num frames: 17055744. Throughput: 0: 931.4. Samples: 1762370. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-29 19:31:47,068][00189] Avg episode reward: [(0, '30.995')] [2024-09-29 19:31:52,061][00189] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3818.3). Total num frames: 17080320. Throughput: 0: 991.5. Samples: 1769262. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 19:31:52,068][00189] Avg episode reward: [(0, '29.979')] [2024-09-29 19:31:55,304][21450] Updated weights for policy 0, policy_version 4173 (0.0012) [2024-09-29 19:31:57,061][00189] Fps is (10 sec: 4095.9, 60 sec: 3822.9, 300 sec: 3818.3). Total num frames: 17096704. Throughput: 0: 987.2. Samples: 1771936. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 19:31:57,068][00189] Avg episode reward: [(0, '28.551')] [2024-09-29 19:32:02,064][00189] Fps is (10 sec: 3275.7, 60 sec: 3754.5, 300 sec: 3818.3). Total num frames: 17113088. Throughput: 0: 933.4. Samples: 1776414. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 19:32:02,067][00189] Avg episode reward: [(0, '30.970')] [2024-09-29 19:32:06,261][21450] Updated weights for policy 0, policy_version 4183 (0.0026) [2024-09-29 19:32:07,061][00189] Fps is (10 sec: 3686.5, 60 sec: 3891.2, 300 sec: 3818.3). Total num frames: 17133568. Throughput: 0: 960.8. Samples: 1783042. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 19:32:07,064][00189] Avg episode reward: [(0, '29.452')] [2024-09-29 19:32:12,068][00189] Fps is (10 sec: 4094.3, 60 sec: 3890.7, 300 sec: 3818.2). Total num frames: 17154048. Throughput: 0: 988.7. Samples: 1786436. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-29 19:32:12,072][00189] Avg episode reward: [(0, '29.899')] [2024-09-29 19:32:17,061][00189] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3818.3). Total num frames: 17170432. Throughput: 0: 937.2. Samples: 1790804. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 19:32:17,070][00189] Avg episode reward: [(0, '29.669')] [2024-09-29 19:32:18,177][21450] Updated weights for policy 0, policy_version 4193 (0.0030) [2024-09-29 19:32:22,061][00189] Fps is (10 sec: 3689.2, 60 sec: 3823.0, 300 sec: 3818.3). Total num frames: 17190912. Throughput: 0: 939.7. Samples: 1797006. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 19:32:22,063][00189] Avg episode reward: [(0, '29.245')] [2024-09-29 19:32:27,061][00189] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3818.3). Total num frames: 17211392. Throughput: 0: 970.2. Samples: 1800438. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 19:32:27,063][00189] Avg episode reward: [(0, '29.409')] [2024-09-29 19:32:27,100][21450] Updated weights for policy 0, policy_version 4203 (0.0013) [2024-09-29 19:32:32,061][00189] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3818.3). Total num frames: 17227776. Throughput: 0: 966.1. Samples: 1805846. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-29 19:32:32,064][00189] Avg episode reward: [(0, '28.420')] [2024-09-29 19:32:37,061][00189] Fps is (10 sec: 3276.7, 60 sec: 3686.4, 300 sec: 3818.3). Total num frames: 17244160. Throughput: 0: 922.5. Samples: 1810774. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 19:32:37,071][00189] Avg episode reward: [(0, '28.675')] [2024-09-29 19:32:38,931][21450] Updated weights for policy 0, policy_version 4213 (0.0012) [2024-09-29 19:32:42,061][00189] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3818.3). Total num frames: 17268736. Throughput: 0: 938.0. Samples: 1814148. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 19:32:42,070][00189] Avg episode reward: [(0, '29.162')] [2024-09-29 19:32:47,062][00189] Fps is (10 sec: 4505.0, 60 sec: 3891.1, 300 sec: 3832.2). Total num frames: 17289216. Throughput: 0: 983.5. Samples: 1820668. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 19:32:47,064][00189] Avg episode reward: [(0, '27.546')] [2024-09-29 19:32:49,798][21450] Updated weights for policy 0, policy_version 4223 (0.0015) [2024-09-29 19:32:52,064][00189] Fps is (10 sec: 3275.6, 60 sec: 3686.2, 300 sec: 3818.3). Total num frames: 17301504. Throughput: 0: 929.4. Samples: 1824868. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 19:32:52,072][00189] Avg episode reward: [(0, '27.990')] [2024-09-29 19:32:57,061][00189] Fps is (10 sec: 3277.3, 60 sec: 3754.7, 300 sec: 3818.3). Total num frames: 17321984. Throughput: 0: 925.5. Samples: 1828078. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-29 19:32:57,065][00189] Avg episode reward: [(0, '28.401')] [2024-09-29 19:33:00,074][21450] Updated weights for policy 0, policy_version 4233 (0.0017) [2024-09-29 19:33:02,061][00189] Fps is (10 sec: 4507.2, 60 sec: 3891.4, 300 sec: 3818.3). Total num frames: 17346560. Throughput: 0: 973.8. Samples: 1834626. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 19:33:02,066][00189] Avg episode reward: [(0, '30.361')] [2024-09-29 19:33:07,062][00189] Fps is (10 sec: 3686.1, 60 sec: 3754.6, 300 sec: 3818.4). Total num frames: 17358848. Throughput: 0: 944.0. Samples: 1839486. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 19:33:07,064][00189] Avg episode reward: [(0, '30.539')] [2024-09-29 19:33:11,987][21450] Updated weights for policy 0, policy_version 4243 (0.0017) [2024-09-29 19:33:12,061][00189] Fps is (10 sec: 3276.8, 60 sec: 3755.1, 300 sec: 3818.3). Total num frames: 17379328. Throughput: 0: 914.2. Samples: 1841576. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-29 19:33:12,064][00189] Avg episode reward: [(0, '30.531')] [2024-09-29 19:33:17,061][00189] Fps is (10 sec: 4096.4, 60 sec: 3822.9, 300 sec: 3818.3). Total num frames: 17399808. Throughput: 0: 946.7. Samples: 1848446. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-29 19:33:17,069][00189] Avg episode reward: [(0, '30.271')] [2024-09-29 19:33:21,497][21450] Updated weights for policy 0, policy_version 4253 (0.0015) [2024-09-29 19:33:22,061][00189] Fps is (10 sec: 4096.1, 60 sec: 3822.9, 300 sec: 3818.3). Total num frames: 17420288. Throughput: 0: 968.7. Samples: 1854364. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 19:33:22,064][00189] Avg episode reward: [(0, '30.458')] [2024-09-29 19:33:27,063][00189] Fps is (10 sec: 3276.1, 60 sec: 3686.3, 300 sec: 3818.3). Total num frames: 17432576. Throughput: 0: 939.3. Samples: 1856418. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 19:33:27,069][00189] Avg episode reward: [(0, '30.939')] [2024-09-29 19:33:32,061][00189] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3818.3). Total num frames: 17457152. Throughput: 0: 925.4. Samples: 1862310. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-29 19:33:32,068][00189] Avg episode reward: [(0, '29.023')] [2024-09-29 19:33:32,082][21437] Saving /content/train_dir/samplefactory-vizdoom-v1/checkpoint_p0/checkpoint_000004262_17457152.pth... [2024-09-29 19:33:32,224][21437] Removing /content/train_dir/samplefactory-vizdoom-v1/checkpoint_p0/checkpoint_000004039_16543744.pth [2024-09-29 19:33:32,808][21450] Updated weights for policy 0, policy_version 4263 (0.0012) [2024-09-29 19:33:37,061][00189] Fps is (10 sec: 4506.6, 60 sec: 3891.2, 300 sec: 3818.3). Total num frames: 17477632. Throughput: 0: 980.8. Samples: 1869002. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 19:33:37,063][00189] Avg episode reward: [(0, '28.389')] [2024-09-29 19:33:42,061][00189] Fps is (10 sec: 3686.3, 60 sec: 3754.6, 300 sec: 3818.3). Total num frames: 17494016. Throughput: 0: 960.3. Samples: 1871290. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-29 19:33:42,064][00189] Avg episode reward: [(0, '27.920')] [2024-09-29 19:33:44,163][21450] Updated weights for policy 0, policy_version 4273 (0.0026) [2024-09-29 19:33:47,061][00189] Fps is (10 sec: 3276.8, 60 sec: 3686.5, 300 sec: 3804.4). Total num frames: 17510400. Throughput: 0: 921.9. Samples: 1876112. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-29 19:33:47,070][00189] Avg episode reward: [(0, '28.236')] [2024-09-29 19:33:52,061][00189] Fps is (10 sec: 4096.1, 60 sec: 3891.4, 300 sec: 3818.3). Total num frames: 17534976. Throughput: 0: 963.6. Samples: 1882848. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-29 19:33:52,063][00189] Avg episode reward: [(0, '28.677')] [2024-09-29 19:33:53,558][21450] Updated weights for policy 0, policy_version 4283 (0.0012) [2024-09-29 19:33:57,061][00189] Fps is (10 sec: 4505.7, 60 sec: 3891.2, 300 sec: 3818.3). Total num frames: 17555456. Throughput: 0: 989.6. Samples: 1886110. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-29 19:33:57,064][00189] Avg episode reward: [(0, '27.572')] [2024-09-29 19:34:02,061][00189] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3818.3). Total num frames: 17567744. Throughput: 0: 928.4. Samples: 1890226. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-29 19:34:02,062][00189] Avg episode reward: [(0, '27.587')] [2024-09-29 19:34:05,849][21450] Updated weights for policy 0, policy_version 4293 (0.0012) [2024-09-29 19:34:07,061][00189] Fps is (10 sec: 3276.8, 60 sec: 3823.0, 300 sec: 3804.4). Total num frames: 17588224. Throughput: 0: 932.4. Samples: 1896322. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-29 19:34:07,069][00189] Avg episode reward: [(0, '27.759')] [2024-09-29 19:34:12,063][00189] Fps is (10 sec: 4095.2, 60 sec: 3822.8, 300 sec: 3804.4). Total num frames: 17608704. Throughput: 0: 961.8. Samples: 1899700. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 19:34:12,067][00189] Avg episode reward: [(0, '27.236')] [2024-09-29 19:34:16,362][21450] Updated weights for policy 0, policy_version 4303 (0.0015) [2024-09-29 19:34:17,061][00189] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3804.4). Total num frames: 17625088. Throughput: 0: 947.2. Samples: 1904934. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 19:34:17,063][00189] Avg episode reward: [(0, '27.841')] [2024-09-29 19:34:22,061][00189] Fps is (10 sec: 3687.1, 60 sec: 3754.7, 300 sec: 3804.4). Total num frames: 17645568. Throughput: 0: 922.0. Samples: 1910492. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 19:34:22,062][00189] Avg episode reward: [(0, '28.109')] [2024-09-29 19:34:26,215][21450] Updated weights for policy 0, policy_version 4313 (0.0017) [2024-09-29 19:34:27,061][00189] Fps is (10 sec: 4096.0, 60 sec: 3891.3, 300 sec: 3804.4). Total num frames: 17666048. Throughput: 0: 949.1. Samples: 1914000. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 19:34:27,069][00189] Avg episode reward: [(0, '28.705')] [2024-09-29 19:34:32,061][00189] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3804.4). Total num frames: 17686528. Throughput: 0: 980.4. Samples: 1920230. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 19:34:32,069][00189] Avg episode reward: [(0, '29.063')] [2024-09-29 19:34:37,061][00189] Fps is (10 sec: 3276.7, 60 sec: 3686.4, 300 sec: 3804.4). Total num frames: 17698816. Throughput: 0: 925.3. Samples: 1924486. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-29 19:34:37,063][00189] Avg episode reward: [(0, '29.182')] [2024-09-29 19:34:38,048][21450] Updated weights for policy 0, policy_version 4323 (0.0023) [2024-09-29 19:34:42,061][00189] Fps is (10 sec: 3686.4, 60 sec: 3823.0, 300 sec: 3818.3). Total num frames: 17723392. Throughput: 0: 928.8. Samples: 1927906. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 19:34:42,069][00189] Avg episode reward: [(0, '29.363')] [2024-09-29 19:34:46,913][21450] Updated weights for policy 0, policy_version 4333 (0.0016) [2024-09-29 19:34:47,061][00189] Fps is (10 sec: 4915.1, 60 sec: 3959.4, 300 sec: 3818.3). Total num frames: 17747968. Throughput: 0: 992.6. Samples: 1934894. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 19:34:47,072][00189] Avg episode reward: [(0, '28.111')] [2024-09-29 19:34:52,061][00189] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3818.3). Total num frames: 17760256. Throughput: 0: 963.7. Samples: 1939688. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-29 19:34:52,067][00189] Avg episode reward: [(0, '29.174')] [2024-09-29 19:34:57,064][00189] Fps is (10 sec: 3275.7, 60 sec: 3754.4, 300 sec: 3818.3). Total num frames: 17780736. Throughput: 0: 941.9. Samples: 1942086. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-29 19:34:57,067][00189] Avg episode reward: [(0, '29.787')] [2024-09-29 19:34:58,582][21450] Updated weights for policy 0, policy_version 4343 (0.0016) [2024-09-29 19:35:02,061][00189] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3818.3). Total num frames: 17801216. Throughput: 0: 981.7. Samples: 1949112. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-29 19:35:02,068][00189] Avg episode reward: [(0, '30.421')] [2024-09-29 19:35:07,061][00189] Fps is (10 sec: 4097.5, 60 sec: 3891.2, 300 sec: 3818.3). Total num frames: 17821696. Throughput: 0: 985.8. Samples: 1954852. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 19:35:07,066][00189] Avg episode reward: [(0, '29.975')] [2024-09-29 19:35:09,262][21450] Updated weights for policy 0, policy_version 4353 (0.0015) [2024-09-29 19:35:12,061][00189] Fps is (10 sec: 3686.4, 60 sec: 3823.1, 300 sec: 3818.3). Total num frames: 17838080. Throughput: 0: 956.1. Samples: 1957026. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 19:35:12,063][00189] Avg episode reward: [(0, '30.625')] [2024-09-29 19:35:17,061][00189] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3818.3). Total num frames: 17858560. Throughput: 0: 953.6. Samples: 1963140. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 19:35:17,067][00189] Avg episode reward: [(0, '29.275')] [2024-09-29 19:35:19,007][21450] Updated weights for policy 0, policy_version 4363 (0.0020) [2024-09-29 19:35:22,061][00189] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3832.2). Total num frames: 17883136. Throughput: 0: 1013.6. Samples: 1970098. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 19:35:22,068][00189] Avg episode reward: [(0, '29.818')] [2024-09-29 19:35:27,061][00189] Fps is (10 sec: 3686.3, 60 sec: 3822.9, 300 sec: 3818.3). Total num frames: 17895424. Throughput: 0: 985.9. Samples: 1972270. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-29 19:35:27,063][00189] Avg episode reward: [(0, '29.013')] [2024-09-29 19:35:30,651][21450] Updated weights for policy 0, policy_version 4373 (0.0018) [2024-09-29 19:35:32,061][00189] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3818.3). Total num frames: 17915904. Throughput: 0: 941.7. Samples: 1977268. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-29 19:35:32,064][00189] Avg episode reward: [(0, '28.515')] [2024-09-29 19:35:32,072][21437] Saving /content/train_dir/samplefactory-vizdoom-v1/checkpoint_p0/checkpoint_000004374_17915904.pth... [2024-09-29 19:35:32,205][21437] Removing /content/train_dir/samplefactory-vizdoom-v1/checkpoint_p0/checkpoint_000004151_17002496.pth [2024-09-29 19:35:37,061][00189] Fps is (10 sec: 4096.1, 60 sec: 3959.5, 300 sec: 3818.3). Total num frames: 17936384. Throughput: 0: 985.7. Samples: 1984046. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-29 19:35:37,062][00189] Avg episode reward: [(0, '28.503')] [2024-09-29 19:35:40,029][21450] Updated weights for policy 0, policy_version 4383 (0.0016) [2024-09-29 19:35:42,061][00189] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3832.2). Total num frames: 17956864. Throughput: 0: 1001.8. Samples: 1987162. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-29 19:35:42,065][00189] Avg episode reward: [(0, '30.125')] [2024-09-29 19:35:47,061][00189] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3832.2). Total num frames: 17973248. Throughput: 0: 940.2. Samples: 1991420. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 19:35:47,069][00189] Avg episode reward: [(0, '31.064')] [2024-09-29 19:35:51,522][21450] Updated weights for policy 0, policy_version 4393 (0.0019) [2024-09-29 19:35:52,061][00189] Fps is (10 sec: 3686.3, 60 sec: 3891.2, 300 sec: 3818.3). Total num frames: 17993728. Throughput: 0: 959.9. Samples: 1998046. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 19:35:52,064][00189] Avg episode reward: [(0, '31.623')] [2024-09-29 19:35:57,061][00189] Fps is (10 sec: 4505.6, 60 sec: 3959.7, 300 sec: 3832.2). Total num frames: 18018304. Throughput: 0: 984.0. Samples: 2001306. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 19:35:57,067][00189] Avg episode reward: [(0, '32.648')] [2024-09-29 19:36:02,061][00189] Fps is (10 sec: 3686.5, 60 sec: 3822.9, 300 sec: 3832.2). Total num frames: 18030592. Throughput: 0: 953.6. Samples: 2006050. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 19:36:02,063][00189] Avg episode reward: [(0, '32.973')] [2024-09-29 19:36:03,059][21450] Updated weights for policy 0, policy_version 4403 (0.0015) [2024-09-29 19:36:07,061][00189] Fps is (10 sec: 2867.1, 60 sec: 3754.6, 300 sec: 3818.3). Total num frames: 18046976. Throughput: 0: 915.3. Samples: 2011288. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-29 19:36:07,063][00189] Avg episode reward: [(0, '32.804')] [2024-09-29 19:36:12,061][00189] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3818.3). Total num frames: 18071552. Throughput: 0: 940.5. Samples: 2014590. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-29 19:36:12,063][00189] Avg episode reward: [(0, '33.557')] [2024-09-29 19:36:12,855][21450] Updated weights for policy 0, policy_version 4413 (0.0016) [2024-09-29 19:36:17,061][00189] Fps is (10 sec: 4096.1, 60 sec: 3822.9, 300 sec: 3818.3). Total num frames: 18087936. Throughput: 0: 958.5. Samples: 2020400. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 19:36:17,064][00189] Avg episode reward: [(0, '32.564')] [2024-09-29 19:36:22,061][00189] Fps is (10 sec: 2867.2, 60 sec: 3618.1, 300 sec: 3804.4). Total num frames: 18100224. Throughput: 0: 906.6. Samples: 2024842. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 19:36:22,062][00189] Avg episode reward: [(0, '33.366')] [2024-09-29 19:36:24,851][21450] Updated weights for policy 0, policy_version 4423 (0.0013) [2024-09-29 19:36:27,061][00189] Fps is (10 sec: 3686.4, 60 sec: 3823.0, 300 sec: 3804.4). Total num frames: 18124800. Throughput: 0: 913.6. Samples: 2028274. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 19:36:27,072][00189] Avg episode reward: [(0, '31.658')] [2024-09-29 19:36:32,061][00189] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3804.4). Total num frames: 18145280. Throughput: 0: 970.3. Samples: 2035082. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 19:36:32,063][00189] Avg episode reward: [(0, '28.876')] [2024-09-29 19:36:35,761][21450] Updated weights for policy 0, policy_version 4433 (0.0020) [2024-09-29 19:36:37,064][00189] Fps is (10 sec: 3275.8, 60 sec: 3686.2, 300 sec: 3804.4). Total num frames: 18157568. Throughput: 0: 917.3. Samples: 2039326. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 19:36:37,066][00189] Avg episode reward: [(0, '28.963')] [2024-09-29 19:36:42,061][00189] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3804.4). Total num frames: 18178048. Throughput: 0: 903.2. Samples: 2041950. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-29 19:36:42,063][00189] Avg episode reward: [(0, '29.386')] [2024-09-29 19:36:45,666][21450] Updated weights for policy 0, policy_version 4443 (0.0014) [2024-09-29 19:36:47,061][00189] Fps is (10 sec: 4507.0, 60 sec: 3822.9, 300 sec: 3804.4). Total num frames: 18202624. Throughput: 0: 951.4. Samples: 2048864. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 19:36:47,062][00189] Avg episode reward: [(0, '29.036')] [2024-09-29 19:36:52,061][00189] Fps is (10 sec: 4095.8, 60 sec: 3754.7, 300 sec: 3804.4). Total num frames: 18219008. Throughput: 0: 950.1. Samples: 2054042. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 19:36:52,066][00189] Avg episode reward: [(0, '28.124')] [2024-09-29 19:36:57,061][00189] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3804.5). Total num frames: 18235392. Throughput: 0: 921.5. Samples: 2056056. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-29 19:36:57,069][00189] Avg episode reward: [(0, '28.256')] [2024-09-29 19:36:57,991][21450] Updated weights for policy 0, policy_version 4453 (0.0017) [2024-09-29 19:37:02,061][00189] Fps is (10 sec: 3686.5, 60 sec: 3754.7, 300 sec: 3804.4). Total num frames: 18255872. Throughput: 0: 927.4. Samples: 2062134. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-29 19:37:02,063][00189] Avg episode reward: [(0, '29.913')] [2024-09-29 19:37:07,061][00189] Fps is (10 sec: 4096.0, 60 sec: 3823.0, 300 sec: 3804.5). Total num frames: 18276352. Throughput: 0: 970.4. Samples: 2068508. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 19:37:07,063][00189] Avg episode reward: [(0, '31.007')] [2024-09-29 19:37:07,789][21450] Updated weights for policy 0, policy_version 4463 (0.0014) [2024-09-29 19:37:12,061][00189] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3790.5). Total num frames: 18288640. Throughput: 0: 936.5. Samples: 2070418. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 19:37:12,063][00189] Avg episode reward: [(0, '31.438')] [2024-09-29 19:37:17,061][00189] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3790.5). Total num frames: 18309120. Throughput: 0: 900.8. Samples: 2075618. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-29 19:37:17,063][00189] Avg episode reward: [(0, '30.691')] [2024-09-29 19:37:19,235][21450] Updated weights for policy 0, policy_version 4473 (0.0013) [2024-09-29 19:37:22,061][00189] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3804.4). Total num frames: 18333696. Throughput: 0: 957.2. Samples: 2082396. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 19:37:22,067][00189] Avg episode reward: [(0, '31.770')] [2024-09-29 19:37:27,061][00189] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3804.4). Total num frames: 18350080. Throughput: 0: 960.8. Samples: 2085186. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 19:37:27,063][00189] Avg episode reward: [(0, '31.015')] [2024-09-29 19:37:31,068][21450] Updated weights for policy 0, policy_version 4483 (0.0023) [2024-09-29 19:37:32,061][00189] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3804.4). Total num frames: 18366464. Throughput: 0: 901.3. Samples: 2089424. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-29 19:37:32,068][00189] Avg episode reward: [(0, '30.820')] [2024-09-29 19:37:32,082][21437] Saving /content/train_dir/samplefactory-vizdoom-v1/checkpoint_p0/checkpoint_000004484_18366464.pth... [2024-09-29 19:37:32,201][21437] Removing /content/train_dir/samplefactory-vizdoom-v1/checkpoint_p0/checkpoint_000004262_17457152.pth [2024-09-29 19:37:37,061][00189] Fps is (10 sec: 3686.3, 60 sec: 3823.1, 300 sec: 3790.5). Total num frames: 18386944. Throughput: 0: 932.2. Samples: 2095992. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-29 19:37:37,070][00189] Avg episode reward: [(0, '31.268')] [2024-09-29 19:37:40,776][21450] Updated weights for policy 0, policy_version 4493 (0.0022) [2024-09-29 19:37:42,065][00189] Fps is (10 sec: 3684.7, 60 sec: 3754.4, 300 sec: 3776.6). Total num frames: 18403328. Throughput: 0: 953.7. Samples: 2098978. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-29 19:37:42,070][00189] Avg episode reward: [(0, '32.921')] [2024-09-29 19:37:47,061][00189] Fps is (10 sec: 3276.9, 60 sec: 3618.1, 300 sec: 3790.6). Total num frames: 18419712. Throughput: 0: 917.1. Samples: 2103402. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 19:37:47,066][00189] Avg episode reward: [(0, '33.092')] [2024-09-29 19:37:52,067][00189] Fps is (10 sec: 3685.6, 60 sec: 3686.0, 300 sec: 3790.5). Total num frames: 18440192. Throughput: 0: 906.4. Samples: 2109304. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-29 19:37:52,071][00189] Avg episode reward: [(0, '33.220')] [2024-09-29 19:37:52,681][21450] Updated weights for policy 0, policy_version 4503 (0.0027) [2024-09-29 19:37:57,061][00189] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3776.7). Total num frames: 18460672. Throughput: 0: 938.5. Samples: 2112650. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 19:37:57,065][00189] Avg episode reward: [(0, '34.699')] [2024-09-29 19:37:57,119][21437] Saving new best policy, reward=34.699! [2024-09-29 19:38:02,061][00189] Fps is (10 sec: 3688.9, 60 sec: 3686.4, 300 sec: 3790.5). Total num frames: 18477056. Throughput: 0: 946.3. Samples: 2118202. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 19:38:02,062][00189] Avg episode reward: [(0, '35.154')] [2024-09-29 19:38:02,072][21437] Saving new best policy, reward=35.154! [2024-09-29 19:38:04,176][21450] Updated weights for policy 0, policy_version 4513 (0.0018) [2024-09-29 19:38:07,061][00189] Fps is (10 sec: 3276.7, 60 sec: 3618.1, 300 sec: 3776.7). Total num frames: 18493440. Throughput: 0: 904.0. Samples: 2123074. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 19:38:07,063][00189] Avg episode reward: [(0, '36.367')] [2024-09-29 19:38:07,071][21437] Saving new best policy, reward=36.367! [2024-09-29 19:38:12,061][00189] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 18518016. Throughput: 0: 915.6. Samples: 2126390. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 19:38:12,071][00189] Avg episode reward: [(0, '35.781')] [2024-09-29 19:38:13,629][21450] Updated weights for policy 0, policy_version 4523 (0.0012) [2024-09-29 19:38:17,061][00189] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 18538496. Throughput: 0: 968.7. Samples: 2133014. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-29 19:38:17,068][00189] Avg episode reward: [(0, '34.741')] [2024-09-29 19:38:22,061][00189] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3790.6). Total num frames: 18550784. Throughput: 0: 918.0. Samples: 2137302. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-29 19:38:22,064][00189] Avg episode reward: [(0, '35.744')] [2024-09-29 19:38:25,379][21450] Updated weights for policy 0, policy_version 4533 (0.0017) [2024-09-29 19:38:27,061][00189] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3776.7). Total num frames: 18571264. Throughput: 0: 918.9. Samples: 2140324. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 19:38:27,063][00189] Avg episode reward: [(0, '36.032')] [2024-09-29 19:38:32,061][00189] Fps is (10 sec: 4505.7, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 18595840. Throughput: 0: 976.4. Samples: 2147340. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 19:38:32,063][00189] Avg episode reward: [(0, '34.194')] [2024-09-29 19:38:34,703][21450] Updated weights for policy 0, policy_version 4543 (0.0018) [2024-09-29 19:38:37,061][00189] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3790.5). Total num frames: 18612224. Throughput: 0: 959.3. Samples: 2152464. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 19:38:37,067][00189] Avg episode reward: [(0, '32.806')] [2024-09-29 19:38:42,063][00189] Fps is (10 sec: 3276.2, 60 sec: 3754.8, 300 sec: 3790.5). Total num frames: 18628608. Throughput: 0: 932.4. Samples: 2154612. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-29 19:38:42,069][00189] Avg episode reward: [(0, '32.062')] [2024-09-29 19:38:45,735][21450] Updated weights for policy 0, policy_version 4553 (0.0014) [2024-09-29 19:38:47,062][00189] Fps is (10 sec: 4095.5, 60 sec: 3891.1, 300 sec: 3790.5). Total num frames: 18653184. Throughput: 0: 959.4. Samples: 2161378. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 19:38:47,070][00189] Avg episode reward: [(0, '32.466')] [2024-09-29 19:38:52,061][00189] Fps is (10 sec: 4506.4, 60 sec: 3891.6, 300 sec: 3790.5). Total num frames: 18673664. Throughput: 0: 985.7. Samples: 2167430. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 19:38:52,067][00189] Avg episode reward: [(0, '31.369')] [2024-09-29 19:38:57,061][00189] Fps is (10 sec: 3277.2, 60 sec: 3754.7, 300 sec: 3790.5). Total num frames: 18685952. Throughput: 0: 959.9. Samples: 2169584. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 19:38:57,064][00189] Avg episode reward: [(0, '31.815')] [2024-09-29 19:38:57,579][21450] Updated weights for policy 0, policy_version 4563 (0.0015) [2024-09-29 19:39:02,061][00189] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3804.4). Total num frames: 18710528. Throughput: 0: 946.0. Samples: 2175584. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 19:39:02,070][00189] Avg episode reward: [(0, '30.221')] [2024-09-29 19:39:06,219][21450] Updated weights for policy 0, policy_version 4573 (0.0012) [2024-09-29 19:39:07,061][00189] Fps is (10 sec: 4915.1, 60 sec: 4027.7, 300 sec: 3818.3). Total num frames: 18735104. Throughput: 0: 1006.0. Samples: 2182574. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-29 19:39:07,063][00189] Avg episode reward: [(0, '30.485')] [2024-09-29 19:39:12,065][00189] Fps is (10 sec: 3684.7, 60 sec: 3822.6, 300 sec: 3804.4). Total num frames: 18747392. Throughput: 0: 989.8. Samples: 2184870. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-29 19:39:12,072][00189] Avg episode reward: [(0, '31.121')] [2024-09-29 19:39:17,061][00189] Fps is (10 sec: 3276.9, 60 sec: 3822.9, 300 sec: 3804.4). Total num frames: 18767872. Throughput: 0: 943.1. Samples: 2189780. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-29 19:39:17,062][00189] Avg episode reward: [(0, '29.211')] [2024-09-29 19:39:17,936][21450] Updated weights for policy 0, policy_version 4583 (0.0017) [2024-09-29 19:39:22,061][00189] Fps is (10 sec: 4097.9, 60 sec: 3959.5, 300 sec: 3804.4). Total num frames: 18788352. Throughput: 0: 987.2. Samples: 2196888. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 19:39:22,063][00189] Avg episode reward: [(0, '28.795')] [2024-09-29 19:39:27,061][00189] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3804.4). Total num frames: 18808832. Throughput: 0: 1013.4. Samples: 2200212. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-29 19:39:27,066][00189] Avg episode reward: [(0, '28.960')] [2024-09-29 19:39:28,013][21450] Updated weights for policy 0, policy_version 4593 (0.0029) [2024-09-29 19:39:32,061][00189] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3804.4). Total num frames: 18821120. Throughput: 0: 958.4. Samples: 2204504. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-29 19:39:32,072][00189] Avg episode reward: [(0, '29.784')] [2024-09-29 19:39:32,094][21437] Saving /content/train_dir/samplefactory-vizdoom-v1/checkpoint_p0/checkpoint_000004596_18825216.pth... [2024-09-29 19:39:32,215][21437] Removing /content/train_dir/samplefactory-vizdoom-v1/checkpoint_p0/checkpoint_000004374_17915904.pth [2024-09-29 19:39:37,061][00189] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3804.4). Total num frames: 18845696. Throughput: 0: 965.8. Samples: 2210890. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-09-29 19:39:37,070][00189] Avg episode reward: [(0, '29.105')] [2024-09-29 19:39:38,339][21450] Updated weights for policy 0, policy_version 4603 (0.0019) [2024-09-29 19:39:42,061][00189] Fps is (10 sec: 4505.6, 60 sec: 3959.6, 300 sec: 3790.5). Total num frames: 18866176. Throughput: 0: 995.5. Samples: 2214382. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 19:39:42,067][00189] Avg episode reward: [(0, '28.357')] [2024-09-29 19:39:47,065][00189] Fps is (10 sec: 3684.7, 60 sec: 3822.7, 300 sec: 3804.4). Total num frames: 18882560. Throughput: 0: 969.1. Samples: 2219200. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 19:39:47,068][00189] Avg episode reward: [(0, '29.175')] [2024-09-29 19:39:50,262][21450] Updated weights for policy 0, policy_version 4613 (0.0015) [2024-09-29 19:39:52,061][00189] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3790.6). Total num frames: 18898944. Throughput: 0: 931.8. Samples: 2224504. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 19:39:52,062][00189] Avg episode reward: [(0, '30.763')] [2024-09-29 19:39:57,061][00189] Fps is (10 sec: 4097.9, 60 sec: 3959.5, 300 sec: 3804.4). Total num frames: 18923520. Throughput: 0: 955.8. Samples: 2227878. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 19:39:57,068][00189] Avg episode reward: [(0, '31.336')] [2024-09-29 19:39:59,705][21450] Updated weights for policy 0, policy_version 4623 (0.0012) [2024-09-29 19:40:02,061][00189] Fps is (10 sec: 4095.7, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 18939904. Throughput: 0: 980.0. Samples: 2233882. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 19:40:02,065][00189] Avg episode reward: [(0, '31.044')] [2024-09-29 19:40:07,061][00189] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3790.5). Total num frames: 18956288. Throughput: 0: 924.4. Samples: 2238486. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-29 19:40:07,063][00189] Avg episode reward: [(0, '30.756')] [2024-09-29 19:40:11,200][21450] Updated weights for policy 0, policy_version 4633 (0.0028) [2024-09-29 19:40:12,061][00189] Fps is (10 sec: 3686.6, 60 sec: 3823.2, 300 sec: 3790.5). Total num frames: 18976768. Throughput: 0: 926.8. Samples: 2241916. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 19:40:12,063][00189] Avg episode reward: [(0, '30.767')] [2024-09-29 19:40:17,061][00189] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3790.5). Total num frames: 19001344. Throughput: 0: 982.3. Samples: 2248708. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-29 19:40:17,068][00189] Avg episode reward: [(0, '31.251')] [2024-09-29 19:40:22,061][00189] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3790.5). Total num frames: 19013632. Throughput: 0: 936.9. Samples: 2253052. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-29 19:40:22,063][00189] Avg episode reward: [(0, '31.174')] [2024-09-29 19:40:22,363][21450] Updated weights for policy 0, policy_version 4643 (0.0014) [2024-09-29 19:40:27,061][00189] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3790.5). Total num frames: 19034112. Throughput: 0: 920.7. Samples: 2255814. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-29 19:40:27,063][00189] Avg episode reward: [(0, '31.636')] [2024-09-29 19:40:31,775][21450] Updated weights for policy 0, policy_version 4653 (0.0014) [2024-09-29 19:40:32,061][00189] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3804.4). Total num frames: 19058688. Throughput: 0: 966.0. Samples: 2262664. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 19:40:32,063][00189] Avg episode reward: [(0, '30.608')] [2024-09-29 19:40:37,061][00189] Fps is (10 sec: 4095.8, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 19075072. Throughput: 0: 966.1. Samples: 2267978. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 19:40:37,070][00189] Avg episode reward: [(0, '29.823')] [2024-09-29 19:40:42,061][00189] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3790.5). Total num frames: 19091456. Throughput: 0: 940.3. Samples: 2270192. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-29 19:40:42,063][00189] Avg episode reward: [(0, '28.964')] [2024-09-29 19:40:43,627][21450] Updated weights for policy 0, policy_version 4663 (0.0017) [2024-09-29 19:40:47,061][00189] Fps is (10 sec: 3686.6, 60 sec: 3823.2, 300 sec: 3790.5). Total num frames: 19111936. Throughput: 0: 949.4. Samples: 2276606. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-29 19:40:47,065][00189] Avg episode reward: [(0, '30.317')] [2024-09-29 19:40:52,061][00189] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3790.5). Total num frames: 19136512. Throughput: 0: 991.1. Samples: 2283084. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 19:40:52,069][00189] Avg episode reward: [(0, '30.526')] [2024-09-29 19:40:53,497][21450] Updated weights for policy 0, policy_version 4673 (0.0013) [2024-09-29 19:40:57,061][00189] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3790.5). Total num frames: 19148800. Throughput: 0: 962.4. Samples: 2285226. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-29 19:40:57,062][00189] Avg episode reward: [(0, '29.539')] [2024-09-29 19:41:02,061][00189] Fps is (10 sec: 3276.8, 60 sec: 3823.0, 300 sec: 3804.4). Total num frames: 19169280. Throughput: 0: 936.4. Samples: 2290848. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-09-29 19:41:02,063][00189] Avg episode reward: [(0, '29.234')] [2024-09-29 19:41:04,120][21450] Updated weights for policy 0, policy_version 4683 (0.0027) [2024-09-29 19:41:07,061][00189] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3804.4). Total num frames: 19193856. Throughput: 0: 994.4. Samples: 2297800. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 19:41:07,069][00189] Avg episode reward: [(0, '30.435')] [2024-09-29 19:41:12,061][00189] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3804.4). Total num frames: 19210240. Throughput: 0: 990.8. Samples: 2300402. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-29 19:41:12,065][00189] Avg episode reward: [(0, '31.653')] [2024-09-29 19:41:15,800][21450] Updated weights for policy 0, policy_version 4693 (0.0014) [2024-09-29 19:41:17,061][00189] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3818.3). Total num frames: 19226624. Throughput: 0: 934.6. Samples: 2304720. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-29 19:41:17,062][00189] Avg episode reward: [(0, '30.813')] [2024-09-29 19:41:22,062][00189] Fps is (10 sec: 3686.0, 60 sec: 3891.1, 300 sec: 3804.4). Total num frames: 19247104. Throughput: 0: 972.1. Samples: 2311722. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-29 19:41:22,065][00189] Avg episode reward: [(0, '30.224')] [2024-09-29 19:41:24,618][21450] Updated weights for policy 0, policy_version 4703 (0.0013) [2024-09-29 19:41:27,061][00189] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3818.3). Total num frames: 19271680. Throughput: 0: 1000.5. Samples: 2315216. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-29 19:41:27,064][00189] Avg episode reward: [(0, '29.533')] [2024-09-29 19:41:32,061][00189] Fps is (10 sec: 3686.8, 60 sec: 3754.7, 300 sec: 3818.3). Total num frames: 19283968. Throughput: 0: 960.3. Samples: 2319818. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-29 19:41:32,063][00189] Avg episode reward: [(0, '30.015')] [2024-09-29 19:41:32,078][21437] Saving /content/train_dir/samplefactory-vizdoom-v1/checkpoint_p0/checkpoint_000004708_19283968.pth... [2024-09-29 19:41:32,260][21437] Removing /content/train_dir/samplefactory-vizdoom-v1/checkpoint_p0/checkpoint_000004484_18366464.pth [2024-09-29 19:41:36,463][21450] Updated weights for policy 0, policy_version 4713 (0.0013) [2024-09-29 19:41:37,061][00189] Fps is (10 sec: 3276.8, 60 sec: 3823.0, 300 sec: 3818.3). Total num frames: 19304448. Throughput: 0: 949.6. Samples: 2325816. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-29 19:41:37,063][00189] Avg episode reward: [(0, '29.013')] [2024-09-29 19:41:42,061][00189] Fps is (10 sec: 4505.5, 60 sec: 3959.4, 300 sec: 3818.3). Total num frames: 19329024. Throughput: 0: 976.3. Samples: 2329160. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-29 19:41:42,064][00189] Avg episode reward: [(0, '29.407')] [2024-09-29 19:41:47,067][00189] Fps is (10 sec: 3684.0, 60 sec: 3822.5, 300 sec: 3804.3). Total num frames: 19341312. Throughput: 0: 967.6. Samples: 2334396. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 19:41:47,070][00189] Avg episode reward: [(0, '30.272')] [2024-09-29 19:41:47,215][21450] Updated weights for policy 0, policy_version 4723 (0.0020) [2024-09-29 19:41:52,061][00189] Fps is (10 sec: 3276.9, 60 sec: 3754.7, 300 sec: 3818.3). Total num frames: 19361792. Throughput: 0: 926.0. Samples: 2339468. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-29 19:41:52,067][00189] Avg episode reward: [(0, '29.805')] [2024-09-29 19:41:57,061][00189] Fps is (10 sec: 4098.7, 60 sec: 3891.2, 300 sec: 3818.3). Total num frames: 19382272. Throughput: 0: 941.1. Samples: 2342752. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 19:41:57,063][00189] Avg episode reward: [(0, '30.965')] [2024-09-29 19:41:57,410][21450] Updated weights for policy 0, policy_version 4733 (0.0029) [2024-09-29 19:42:02,061][00189] Fps is (10 sec: 4095.8, 60 sec: 3891.2, 300 sec: 3818.3). Total num frames: 19402752. Throughput: 0: 989.6. Samples: 2349252. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-29 19:42:02,064][00189] Avg episode reward: [(0, '29.874')] [2024-09-29 19:42:07,062][00189] Fps is (10 sec: 3276.3, 60 sec: 3686.3, 300 sec: 3818.3). Total num frames: 19415040. Throughput: 0: 927.3. Samples: 2353450. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-29 19:42:07,070][00189] Avg episode reward: [(0, '29.911')] [2024-09-29 19:42:09,102][21450] Updated weights for policy 0, policy_version 4743 (0.0016) [2024-09-29 19:42:12,061][00189] Fps is (10 sec: 3686.5, 60 sec: 3822.9, 300 sec: 3832.2). Total num frames: 19439616. Throughput: 0: 922.3. Samples: 2356720. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 19:42:12,063][00189] Avg episode reward: [(0, '32.034')] [2024-09-29 19:42:17,061][00189] Fps is (10 sec: 4096.6, 60 sec: 3822.9, 300 sec: 3804.4). Total num frames: 19456000. Throughput: 0: 952.4. Samples: 2362678. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 19:42:17,064][00189] Avg episode reward: [(0, '31.952')] [2024-09-29 19:42:21,507][21450] Updated weights for policy 0, policy_version 4753 (0.0014) [2024-09-29 19:42:22,063][00189] Fps is (10 sec: 2866.4, 60 sec: 3686.3, 300 sec: 3790.5). Total num frames: 19468288. Throughput: 0: 901.4. Samples: 2366382. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 19:42:22,066][00189] Avg episode reward: [(0, '33.051')] [2024-09-29 19:42:27,061][00189] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3804.4). Total num frames: 19488768. Throughput: 0: 882.2. Samples: 2368858. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-29 19:42:27,067][00189] Avg episode reward: [(0, '31.989')] [2024-09-29 19:42:31,451][21450] Updated weights for policy 0, policy_version 4763 (0.0016) [2024-09-29 19:42:32,062][00189] Fps is (10 sec: 4096.7, 60 sec: 3754.6, 300 sec: 3804.4). Total num frames: 19509248. Throughput: 0: 918.8. Samples: 2375736. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-29 19:42:32,066][00189] Avg episode reward: [(0, '33.435')] [2024-09-29 19:42:37,061][00189] Fps is (10 sec: 4096.1, 60 sec: 3754.7, 300 sec: 3818.4). Total num frames: 19529728. Throughput: 0: 930.5. Samples: 2381342. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 19:42:37,065][00189] Avg episode reward: [(0, '33.516')] [2024-09-29 19:42:42,061][00189] Fps is (10 sec: 3277.1, 60 sec: 3549.9, 300 sec: 3804.4). Total num frames: 19542016. Throughput: 0: 903.6. Samples: 2383416. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 19:42:42,070][00189] Avg episode reward: [(0, '34.715')] [2024-09-29 19:42:43,547][21450] Updated weights for policy 0, policy_version 4773 (0.0012) [2024-09-29 19:42:47,061][00189] Fps is (10 sec: 3276.8, 60 sec: 3686.8, 300 sec: 3804.5). Total num frames: 19562496. Throughput: 0: 887.3. Samples: 2389178. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 19:42:47,067][00189] Avg episode reward: [(0, '33.818')] [2024-09-29 19:42:52,061][00189] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3818.3). Total num frames: 19587072. Throughput: 0: 944.7. Samples: 2395960. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 19:42:52,065][00189] Avg episode reward: [(0, '33.551')] [2024-09-29 19:42:53,027][21450] Updated weights for policy 0, policy_version 4783 (0.0013) [2024-09-29 19:42:57,061][00189] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3804.4). Total num frames: 19599360. Throughput: 0: 920.5. Samples: 2398142. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 19:42:57,066][00189] Avg episode reward: [(0, '32.457')] [2024-09-29 19:43:02,061][00189] Fps is (10 sec: 3276.7, 60 sec: 3618.1, 300 sec: 3818.3). Total num frames: 19619840. Throughput: 0: 899.6. Samples: 2403160. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 19:43:02,071][00189] Avg episode reward: [(0, '32.656')] [2024-09-29 19:43:04,423][21450] Updated weights for policy 0, policy_version 4793 (0.0019) [2024-09-29 19:43:07,061][00189] Fps is (10 sec: 4505.6, 60 sec: 3823.0, 300 sec: 3818.3). Total num frames: 19644416. Throughput: 0: 968.3. Samples: 2409952. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 19:43:07,070][00189] Avg episode reward: [(0, '32.512')] [2024-09-29 19:43:12,061][00189] Fps is (10 sec: 4096.2, 60 sec: 3686.4, 300 sec: 3804.4). Total num frames: 19660800. Throughput: 0: 980.6. Samples: 2412984. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 19:43:12,064][00189] Avg episode reward: [(0, '32.508')] [2024-09-29 19:43:16,163][21450] Updated weights for policy 0, policy_version 4803 (0.0019) [2024-09-29 19:43:17,061][00189] Fps is (10 sec: 2867.2, 60 sec: 3618.1, 300 sec: 3804.4). Total num frames: 19673088. Throughput: 0: 921.4. Samples: 2417198. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 19:43:17,063][00189] Avg episode reward: [(0, '32.774')] [2024-09-29 19:43:22,061][00189] Fps is (10 sec: 3686.4, 60 sec: 3823.1, 300 sec: 3818.3). Total num frames: 19697664. Throughput: 0: 945.0. Samples: 2423868. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 19:43:22,064][00189] Avg episode reward: [(0, '34.491')] [2024-09-29 19:43:25,162][21450] Updated weights for policy 0, policy_version 4813 (0.0020) [2024-09-29 19:43:27,063][00189] Fps is (10 sec: 4914.3, 60 sec: 3891.1, 300 sec: 3818.3). Total num frames: 19722240. Throughput: 0: 975.1. Samples: 2427298. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 19:43:27,065][00189] Avg episode reward: [(0, '33.602')] [2024-09-29 19:43:32,061][00189] Fps is (10 sec: 3686.3, 60 sec: 3754.7, 300 sec: 3804.4). Total num frames: 19734528. Throughput: 0: 956.7. Samples: 2432228. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 19:43:32,064][00189] Avg episode reward: [(0, '34.026')] [2024-09-29 19:43:32,076][21437] Saving /content/train_dir/samplefactory-vizdoom-v1/checkpoint_p0/checkpoint_000004818_19734528.pth... [2024-09-29 19:43:32,257][21437] Removing /content/train_dir/samplefactory-vizdoom-v1/checkpoint_p0/checkpoint_000004596_18825216.pth [2024-09-29 19:43:37,032][21450] Updated weights for policy 0, policy_version 4823 (0.0012) [2024-09-29 19:43:37,061][00189] Fps is (10 sec: 3277.4, 60 sec: 3754.7, 300 sec: 3818.3). Total num frames: 19755008. Throughput: 0: 926.8. Samples: 2437664. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 19:43:37,065][00189] Avg episode reward: [(0, '34.239')] [2024-09-29 19:43:42,061][00189] Fps is (10 sec: 4096.1, 60 sec: 3891.2, 300 sec: 3804.4). Total num frames: 19775488. Throughput: 0: 954.6. Samples: 2441098. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-09-29 19:43:42,070][00189] Avg episode reward: [(0, '35.277')] [2024-09-29 19:43:47,061][00189] Fps is (10 sec: 3686.3, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 19791872. Throughput: 0: 977.2. Samples: 2447132. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-29 19:43:47,064][00189] Avg episode reward: [(0, '36.262')] [2024-09-29 19:43:47,252][21450] Updated weights for policy 0, policy_version 4833 (0.0018) [2024-09-29 19:43:52,061][00189] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3804.4). Total num frames: 19808256. Throughput: 0: 922.8. Samples: 2451476. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-29 19:43:52,071][00189] Avg episode reward: [(0, '34.742')] [2024-09-29 19:43:57,061][00189] Fps is (10 sec: 4096.1, 60 sec: 3891.2, 300 sec: 3804.4). Total num frames: 19832832. Throughput: 0: 930.7. Samples: 2454864. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 19:43:57,071][00189] Avg episode reward: [(0, '33.657')] [2024-09-29 19:43:57,824][21450] Updated weights for policy 0, policy_version 4843 (0.0013) [2024-09-29 19:44:02,061][00189] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3790.5). Total num frames: 19853312. Throughput: 0: 990.1. Samples: 2461752. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 19:44:02,068][00189] Avg episode reward: [(0, '34.682')] [2024-09-29 19:44:07,061][00189] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3790.6). Total num frames: 19865600. Throughput: 0: 940.5. Samples: 2466190. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-29 19:44:07,063][00189] Avg episode reward: [(0, '34.768')] [2024-09-29 19:44:09,639][21450] Updated weights for policy 0, policy_version 4853 (0.0017) [2024-09-29 19:44:12,061][00189] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3790.5). Total num frames: 19886080. Throughput: 0: 921.7. Samples: 2468772. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-29 19:44:12,067][00189] Avg episode reward: [(0, '35.227')] [2024-09-29 19:44:17,061][00189] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3790.5). Total num frames: 19906560. Throughput: 0: 961.3. Samples: 2475488. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 19:44:17,065][00189] Avg episode reward: [(0, '35.479')] [2024-09-29 19:44:18,877][21450] Updated weights for policy 0, policy_version 4863 (0.0014) [2024-09-29 19:44:22,061][00189] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 19927040. Throughput: 0: 961.0. Samples: 2480910. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-29 19:44:22,064][00189] Avg episode reward: [(0, '32.458')] [2024-09-29 19:44:27,061][00189] Fps is (10 sec: 3276.8, 60 sec: 3618.2, 300 sec: 3790.5). Total num frames: 19939328. Throughput: 0: 930.1. Samples: 2482952. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-09-29 19:44:27,063][00189] Avg episode reward: [(0, '33.352')] [2024-09-29 19:44:30,767][21450] Updated weights for policy 0, policy_version 4873 (0.0014) [2024-09-29 19:44:32,061][00189] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 19963904. Throughput: 0: 930.6. Samples: 2489010. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-29 19:44:32,070][00189] Avg episode reward: [(0, '35.254')] [2024-09-29 19:44:37,061][00189] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 19984384. Throughput: 0: 973.7. Samples: 2495292. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-29 19:44:37,063][00189] Avg episode reward: [(0, '34.354')] [2024-09-29 19:44:42,064][00189] Fps is (10 sec: 3275.7, 60 sec: 3686.2, 300 sec: 3776.7). Total num frames: 19996672. Throughput: 0: 942.9. Samples: 2497296. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 19:44:42,070][00189] Avg episode reward: [(0, '33.769')] [2024-09-29 19:44:42,756][21450] Updated weights for policy 0, policy_version 4883 (0.0019) [2024-09-29 19:44:43,899][21437] Stopping Batcher_0... [2024-09-29 19:44:43,902][21437] Loop batcher_evt_loop terminating... [2024-09-29 19:44:43,903][21437] Saving /content/train_dir/samplefactory-vizdoom-v1/checkpoint_p0/checkpoint_000004884_20004864.pth... [2024-09-29 19:44:43,907][00189] Component Batcher_0 stopped! [2024-09-29 19:44:43,959][21450] Weights refcount: 2 0 [2024-09-29 19:44:43,967][21450] Stopping InferenceWorker_p0-w0... [2024-09-29 19:44:43,968][21450] Loop inference_proc0-0_evt_loop terminating... [2024-09-29 19:44:43,966][00189] Component InferenceWorker_p0-w0 stopped! [2024-09-29 19:44:44,060][21437] Removing /content/train_dir/samplefactory-vizdoom-v1/checkpoint_p0/checkpoint_000004708_19283968.pth [2024-09-29 19:44:44,070][21437] Saving /content/train_dir/samplefactory-vizdoom-v1/checkpoint_p0/checkpoint_000004884_20004864.pth... [2024-09-29 19:44:44,258][00189] Component LearnerWorker_p0 stopped! [2024-09-29 19:44:44,264][21437] Stopping LearnerWorker_p0... [2024-09-29 19:44:44,265][21437] Loop learner_proc0_evt_loop terminating... [2024-09-29 19:44:44,438][00189] Component RolloutWorker_w1 stopped! [2024-09-29 19:44:44,441][21452] Stopping RolloutWorker_w1... [2024-09-29 19:44:44,447][21452] Loop rollout_proc1_evt_loop terminating... [2024-09-29 19:44:44,459][00189] Component RolloutWorker_w5 stopped! [2024-09-29 19:44:44,469][00189] Component RolloutWorker_w3 stopped! [2024-09-29 19:44:44,465][21456] Stopping RolloutWorker_w5... [2024-09-29 19:44:44,478][21456] Loop rollout_proc5_evt_loop terminating... [2024-09-29 19:44:44,471][21454] Stopping RolloutWorker_w3... [2024-09-29 19:44:44,479][21454] Loop rollout_proc3_evt_loop terminating... [2024-09-29 19:44:44,493][00189] Component RolloutWorker_w7 stopped! [2024-09-29 19:44:44,496][21458] Stopping RolloutWorker_w7... [2024-09-29 19:44:44,501][21458] Loop rollout_proc7_evt_loop terminating... [2024-09-29 19:44:44,524][00189] Component RolloutWorker_w4 stopped! [2024-09-29 19:44:44,535][21457] Stopping RolloutWorker_w6... [2024-09-29 19:44:44,535][21457] Loop rollout_proc6_evt_loop terminating... [2024-09-29 19:44:44,535][00189] Component RolloutWorker_w2 stopped! [2024-09-29 19:44:44,525][21455] Stopping RolloutWorker_w4... [2024-09-29 19:44:44,530][21453] Stopping RolloutWorker_w2... [2024-09-29 19:44:44,537][00189] Component RolloutWorker_w6 stopped! [2024-09-29 19:44:44,541][21453] Loop rollout_proc2_evt_loop terminating... [2024-09-29 19:44:44,538][21455] Loop rollout_proc4_evt_loop terminating... [2024-09-29 19:44:44,545][00189] Component RolloutWorker_w0 stopped! [2024-09-29 19:44:44,546][00189] Waiting for process learner_proc0 to stop... [2024-09-29 19:44:44,545][21451] Stopping RolloutWorker_w0... [2024-09-29 19:44:44,572][21451] Loop rollout_proc0_evt_loop terminating... [2024-09-29 19:44:45,760][00189] Waiting for process inference_proc0-0 to join... [2024-09-29 19:44:45,854][00189] Waiting for process rollout_proc0 to join... [2024-09-29 19:44:47,209][00189] Waiting for process rollout_proc1 to join... [2024-09-29 19:44:47,221][00189] Waiting for process rollout_proc2 to join... [2024-09-29 19:44:47,229][00189] Waiting for process rollout_proc3 to join... [2024-09-29 19:44:47,230][00189] Waiting for process rollout_proc4 to join... [2024-09-29 19:44:47,238][00189] Waiting for process rollout_proc5 to join... [2024-09-29 19:44:47,242][00189] Waiting for process rollout_proc6 to join... [2024-09-29 19:44:47,245][00189] Waiting for process rollout_proc7 to join... [2024-09-29 19:44:47,249][00189] Batcher 0 profile tree view: batching: 65.7236, releasing_batches: 0.0603 [2024-09-29 19:44:47,251][00189] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0014 wait_policy_total: 1169.7114 update_model: 19.2669 weight_update: 0.0019 one_step: 0.0055 handle_policy_step: 1397.2669 deserialize: 37.3570, stack: 7.4911, obs_to_device_normalize: 291.6853, forward: 700.3479, send_messages: 71.5384 prepare_outputs: 217.6446 to_cpu: 134.4986 [2024-09-29 19:44:47,253][00189] Learner 0 profile tree view: misc: 0.0168, prepare_batch: 31.8270 train: 189.5459 epoch_init: 0.0265, minibatch_init: 0.0263, losses_postprocess: 1.3943, kl_divergence: 1.5858, after_optimizer: 6.0618 calculate_losses: 59.5410 losses_init: 0.0114, forward_head: 4.2241, bptt_initial: 37.8147, tail: 2.8096, advantages_returns: 0.8909, losses: 7.1561 bptt: 5.7476 bptt_forward_core: 5.5384 update: 119.2694 clip: 3.6954 [2024-09-29 19:44:47,255][00189] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.9244, enqueue_policy_requests: 293.7519, env_step: 2104.3976, overhead: 36.9787, complete_rollouts: 18.6073 save_policy_outputs: 64.9199 split_output_tensors: 22.0489 [2024-09-29 19:44:47,256][00189] RolloutWorker_w7 profile tree view: wait_for_trajectories: 0.8825, enqueue_policy_requests: 297.9886, env_step: 2096.3496, overhead: 37.0615, complete_rollouts: 17.7977 save_policy_outputs: 63.9550 split_output_tensors: 21.0123 [2024-09-29 19:44:47,260][00189] Loop Runner_EvtLoop terminating... [2024-09-29 19:44:47,262][00189] Runner profile tree view: main_loop: 2713.5359 [2024-09-29 19:44:47,262][00189] Collected {0: 20004864}, FPS: 3684.6 [2024-09-29 19:49:35,568][00189] Loading existing experiment configuration from /content/train_dir/samplefactory-vizdoom-v1/config.json [2024-09-29 19:49:35,570][00189] Overriding arg 'num_workers' with value 1 passed from command line [2024-09-29 19:49:35,572][00189] Adding new argument 'no_render'=True that is not in the saved config file! [2024-09-29 19:49:35,574][00189] Adding new argument 'save_video'=True that is not in the saved config file! [2024-09-29 19:49:35,576][00189] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-09-29 19:49:35,577][00189] Adding new argument 'video_name'=None that is not in the saved config file! [2024-09-29 19:49:35,579][00189] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! [2024-09-29 19:49:35,580][00189] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-09-29 19:49:35,581][00189] Adding new argument 'push_to_hub'=False that is not in the saved config file! [2024-09-29 19:49:35,582][00189] Adding new argument 'hf_repository'=None that is not in the saved config file! [2024-09-29 19:49:35,583][00189] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-09-29 19:49:35,584][00189] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-09-29 19:49:35,585][00189] Adding new argument 'train_script'=None that is not in the saved config file! [2024-09-29 19:49:35,586][00189] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-09-29 19:49:35,587][00189] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-09-29 19:49:35,605][00189] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-29 19:49:35,607][00189] RunningMeanStd input shape: (3, 72, 128) [2024-09-29 19:49:35,609][00189] RunningMeanStd input shape: (1,) [2024-09-29 19:49:35,627][00189] ConvEncoder: input_channels=3 [2024-09-29 19:49:35,748][00189] Conv encoder output size: 512 [2024-09-29 19:49:35,750][00189] Policy head output size: 512 [2024-09-29 19:49:37,318][00189] Loading state from checkpoint /content/train_dir/samplefactory-vizdoom-v1/checkpoint_p0/checkpoint_000004884_20004864.pth... [2024-09-29 19:49:38,190][00189] Num frames 100... [2024-09-29 19:49:38,309][00189] Num frames 200... [2024-09-29 19:49:38,434][00189] Num frames 300... [2024-09-29 19:49:38,555][00189] Num frames 400... [2024-09-29 19:49:38,676][00189] Num frames 500... [2024-09-29 19:49:38,798][00189] Num frames 600... [2024-09-29 19:49:38,916][00189] Num frames 700... [2024-09-29 19:49:39,036][00189] Num frames 800... [2024-09-29 19:49:39,172][00189] Num frames 900... [2024-09-29 19:49:39,343][00189] Num frames 1000... [2024-09-29 19:49:39,514][00189] Num frames 1100... [2024-09-29 19:49:39,676][00189] Num frames 1200... [2024-09-29 19:49:39,847][00189] Num frames 1300... [2024-09-29 19:49:40,007][00189] Num frames 1400... [2024-09-29 19:49:40,166][00189] Num frames 1500... [2024-09-29 19:49:40,336][00189] Num frames 1600... [2024-09-29 19:49:40,504][00189] Num frames 1700... [2024-09-29 19:49:40,680][00189] Num frames 1800... [2024-09-29 19:49:40,855][00189] Num frames 1900... [2024-09-29 19:49:41,026][00189] Num frames 2000... [2024-09-29 19:49:41,185][00189] Avg episode rewards: #0: 54.589, true rewards: #0: 20.590 [2024-09-29 19:49:41,186][00189] Avg episode reward: 54.589, avg true_objective: 20.590 [2024-09-29 19:49:41,260][00189] Num frames 2100... [2024-09-29 19:49:41,428][00189] Num frames 2200... [2024-09-29 19:49:41,608][00189] Num frames 2300... [2024-09-29 19:49:41,729][00189] Num frames 2400... [2024-09-29 19:49:41,850][00189] Num frames 2500... [2024-09-29 19:49:41,973][00189] Num frames 2600... [2024-09-29 19:49:42,092][00189] Num frames 2700... [2024-09-29 19:49:42,210][00189] Num frames 2800... [2024-09-29 19:49:42,328][00189] Num frames 2900... [2024-09-29 19:49:42,447][00189] Num frames 3000... [2024-09-29 19:49:42,569][00189] Num frames 3100... [2024-09-29 19:49:42,696][00189] Num frames 3200... [2024-09-29 19:49:42,822][00189] Num frames 3300... [2024-09-29 19:49:42,941][00189] Num frames 3400... [2024-09-29 19:49:43,040][00189] Avg episode rewards: #0: 44.174, true rewards: #0: 17.175 [2024-09-29 19:49:43,043][00189] Avg episode reward: 44.174, avg true_objective: 17.175 [2024-09-29 19:49:43,119][00189] Num frames 3500... [2024-09-29 19:49:43,233][00189] Num frames 3600... [2024-09-29 19:49:43,351][00189] Num frames 3700... [2024-09-29 19:49:43,466][00189] Num frames 3800... [2024-09-29 19:49:43,580][00189] Num frames 3900... [2024-09-29 19:49:43,706][00189] Num frames 4000... [2024-09-29 19:49:43,775][00189] Avg episode rewards: #0: 32.703, true rewards: #0: 13.370 [2024-09-29 19:49:43,777][00189] Avg episode reward: 32.703, avg true_objective: 13.370 [2024-09-29 19:49:43,887][00189] Num frames 4100... [2024-09-29 19:49:44,006][00189] Num frames 4200... [2024-09-29 19:49:44,123][00189] Num frames 4300... [2024-09-29 19:49:44,246][00189] Num frames 4400... [2024-09-29 19:49:44,364][00189] Num frames 4500... [2024-09-29 19:49:44,484][00189] Num frames 4600... [2024-09-29 19:49:44,601][00189] Num frames 4700... [2024-09-29 19:49:44,730][00189] Num frames 4800... [2024-09-29 19:49:44,855][00189] Num frames 4900... [2024-09-29 19:49:44,975][00189] Num frames 5000... [2024-09-29 19:49:45,093][00189] Num frames 5100... [2024-09-29 19:49:45,211][00189] Num frames 5200... [2024-09-29 19:49:45,331][00189] Num frames 5300... [2024-09-29 19:49:45,451][00189] Num frames 5400... [2024-09-29 19:49:45,568][00189] Num frames 5500... [2024-09-29 19:49:45,704][00189] Num frames 5600... [2024-09-29 19:49:45,831][00189] Num frames 5700... [2024-09-29 19:49:45,933][00189] Avg episode rewards: #0: 34.097, true rewards: #0: 14.347 [2024-09-29 19:49:45,935][00189] Avg episode reward: 34.097, avg true_objective: 14.347 [2024-09-29 19:49:46,009][00189] Num frames 5800... [2024-09-29 19:49:46,127][00189] Num frames 5900... [2024-09-29 19:49:46,243][00189] Num frames 6000... [2024-09-29 19:49:46,364][00189] Num frames 6100... [2024-09-29 19:49:46,482][00189] Num frames 6200... [2024-09-29 19:49:46,604][00189] Avg episode rewards: #0: 29.508, true rewards: #0: 12.508 [2024-09-29 19:49:46,605][00189] Avg episode reward: 29.508, avg true_objective: 12.508 [2024-09-29 19:49:46,663][00189] Num frames 6300... [2024-09-29 19:49:46,794][00189] Num frames 6400... [2024-09-29 19:49:46,914][00189] Num frames 6500... [2024-09-29 19:49:47,034][00189] Num frames 6600... [2024-09-29 19:49:47,149][00189] Num frames 6700... [2024-09-29 19:49:47,269][00189] Num frames 6800... [2024-09-29 19:49:47,388][00189] Num frames 6900... [2024-09-29 19:49:47,478][00189] Avg episode rewards: #0: 26.543, true rewards: #0: 11.543 [2024-09-29 19:49:47,480][00189] Avg episode reward: 26.543, avg true_objective: 11.543 [2024-09-29 19:49:47,566][00189] Num frames 7000... [2024-09-29 19:49:47,683][00189] Num frames 7100... [2024-09-29 19:49:47,814][00189] Num frames 7200... [2024-09-29 19:49:47,937][00189] Num frames 7300... [2024-09-29 19:49:48,055][00189] Num frames 7400... [2024-09-29 19:49:48,174][00189] Num frames 7500... [2024-09-29 19:49:48,290][00189] Num frames 7600... [2024-09-29 19:49:48,413][00189] Num frames 7700... [2024-09-29 19:49:48,532][00189] Num frames 7800... [2024-09-29 19:49:48,674][00189] Num frames 7900... [2024-09-29 19:49:48,816][00189] Num frames 8000... [2024-09-29 19:49:48,938][00189] Num frames 8100... [2024-09-29 19:49:49,057][00189] Num frames 8200... [2024-09-29 19:49:49,174][00189] Num frames 8300... [2024-09-29 19:49:49,232][00189] Avg episode rewards: #0: 28.574, true rewards: #0: 11.860 [2024-09-29 19:49:49,234][00189] Avg episode reward: 28.574, avg true_objective: 11.860 [2024-09-29 19:49:49,353][00189] Num frames 8400... [2024-09-29 19:49:49,473][00189] Num frames 8500... [2024-09-29 19:49:49,597][00189] Num frames 8600... [2024-09-29 19:49:49,715][00189] Num frames 8700... [2024-09-29 19:49:49,849][00189] Num frames 8800... [2024-09-29 19:49:49,966][00189] Num frames 8900... [2024-09-29 19:49:50,084][00189] Num frames 9000... [2024-09-29 19:49:50,222][00189] Avg episode rewards: #0: 27.087, true rewards: #0: 11.337 [2024-09-29 19:49:50,224][00189] Avg episode reward: 27.087, avg true_objective: 11.337 [2024-09-29 19:49:50,262][00189] Num frames 9100... [2024-09-29 19:49:50,380][00189] Num frames 9200... [2024-09-29 19:49:50,499][00189] Num frames 9300... [2024-09-29 19:49:50,614][00189] Num frames 9400... [2024-09-29 19:49:50,772][00189] Avg episode rewards: #0: 24.762, true rewards: #0: 10.540 [2024-09-29 19:49:50,777][00189] Avg episode reward: 24.762, avg true_objective: 10.540 [2024-09-29 19:49:50,803][00189] Num frames 9500... [2024-09-29 19:49:50,919][00189] Num frames 9600... [2024-09-29 19:49:51,041][00189] Num frames 9700... [2024-09-29 19:49:51,159][00189] Num frames 9800... [2024-09-29 19:49:51,275][00189] Num frames 9900... [2024-09-29 19:49:51,395][00189] Num frames 10000... [2024-09-29 19:49:51,519][00189] Num frames 10100... [2024-09-29 19:49:51,643][00189] Num frames 10200... [2024-09-29 19:49:51,823][00189] Num frames 10300... [2024-09-29 19:49:51,992][00189] Num frames 10400... [2024-09-29 19:49:52,154][00189] Num frames 10500... [2024-09-29 19:49:52,315][00189] Num frames 10600... [2024-09-29 19:49:52,483][00189] Num frames 10700... [2024-09-29 19:49:52,649][00189] Num frames 10800... [2024-09-29 19:49:52,808][00189] Num frames 10900... [2024-09-29 19:49:52,990][00189] Num frames 11000... [2024-09-29 19:49:53,156][00189] Num frames 11100... [2024-09-29 19:49:53,338][00189] Num frames 11200... [2024-09-29 19:49:53,511][00189] Num frames 11300... [2024-09-29 19:49:53,695][00189] Num frames 11400... [2024-09-29 19:49:53,883][00189] Num frames 11500... [2024-09-29 19:49:54,101][00189] Avg episode rewards: #0: 28.286, true rewards: #0: 11.586 [2024-09-29 19:49:54,103][00189] Avg episode reward: 28.286, avg true_objective: 11.586 [2024-09-29 19:51:02,795][00189] Replay video saved to /content/train_dir/samplefactory-vizdoom-v1/replay.mp4!